Setup

We provide an easy to use set of Java APIs and Pojos for you to interact with Sensium. Let's start by setting up your dependencies with Maven, Gradle or a Fat Jar.

Maven

We currently only publish snapshot releases of our Java API to our own Maven repository. You'll have to add our repository to your pom.xml file:

<repositories>
   <repository>
      <id>knowminer</id>
      <releases>
         <enabled>true</enabled>
      </releases>
      <snapshots>
         <enabled>true</enabled>
         <updatePolicy>always</updatePolicy>
      </snapshots>
      <url>http://knowminer.know-center.tugraz.at/nexus/content/repositories/public/</url>
   </repository>
</repositories>

After adding the repository to your pom.xml, all that's left is adding the dependency to our Java SDK:

<dependencies>
   <dependency>
      <groupId>io.sensium</groupId>
      <artifactId>sensium-java-sdk</artifactId>
      <version>1.0-SNAPSHOT</version>
   </dependency>
</dependencies>

Gradle

To use the Sensium Java SDK with Gradle we first have to add the Sensium Maven repository to your build.gradle file:

repositories {
   mavenCentral()
   maven { url "http://knowminer.know-center.tugraz.at/nexus/content/repositories/public/" }
   mavenLocal();
}

Once the repository is added, we can add our dependency:

dependencies {
   compile "io.sensium:sensium-java-sdk:1.0-SNAPSHOT"
}

Fat Jar

If you prefer to just use a fat jar with all dependencies included to use our Java SDK, you can download it here. The fat jar contains the Java SDK classes as well as the following 3rd party dependencies:

  • commons-codec 1.6
  • commons-logging 1.1.3
  • httpasyncclient 4.0
  • httpclient 4.3.2
  • httpcore 4.3.1
  • httpcore-nio 4.3
  • httpmime 4.3.2
  • jackson-annotations 2.1.4
  • jackson-core 2.1.4
  • jackson-databind 2.1.4
  • json 20131018
  • unirest-java 1.3.6

You have to ensure yourself that these dependencies don't clash with your other 3rd party dependencies


Basic Usage & Error Handling

To make a request you need your API key (for which you have to sign up), an instance of the Sensium and an ExtractionRequest object specifying what sensium should do. The basic usage looks like this:

// thread-safe, you only need one in your app
Sensium sensium = new Sensium("YOUR_API_KEY");
ExtractionRequest req = new ExtractionRequest();
ExtractionResponse resp = sensium.extract(req);

A call to Sensium#extract() will synchronously send the request to Sensium. In case an error occured, such as your request being ill-formed, the extract() method will throw a checked SensiumException. The result of the call is stored in the returned ExtractionResponse.

You have to at least specify the source to be processed by Sensium. All other changes to an ExtractionRequest are optional.

As indicated in the code, you can instantiate a Sensium object once and use it from multiple threads within your application.


Preparing the Request

The ExtractionRequest object is your way of telling Sensium what to do. It specifies the source to be analyzed as well as the types of analysis to be performed.

Specifying the Source

When creating your ExtractionRequest you have to specify the source that is to be analyzed. Sensium currently supports providing it with plain text or giving it an URL. In the latter case, Sensium will fetch the content from the URL and try to extract it's main text if its mime-type is some sort of HTML. Here's how you specify either of the two source types:

ExtractionRequest req = new ExtractionRequest();
req.url = "http://someurl.com";
// or we specify the plain-text itself
// req.text = "Some plain-text to be analyzed";

If you specify both, then the plain text will be used and the URL will be ignored

Specifying Extractors

Extractors are responsible for extracting specify information from a source. By default, Sensium will assume you want all possible types of information extracted from the source. To reduce processing time you can limit the extractors you want to have applied. Sensium will then only extract and return what you requested. Head over to the features page to see the different extractors in action. Here's how we'd request to get keyphrases and sentence-based summaries as well as named entities:

ExtractionRequest req = new ExtractionRequest();
req.text = "Some plain-text to be analyzed";
req.extractors = new Extractor[] { Extractor.Summary, Extractor.Entities };

The following extractors are available

Extractor NameFunctionLanguage Support
Tokensreturns tokens, including their stemmed and normalized form.English, French, German, Spanish, Italian
Sentencesreturns sentence boundariesEnglish, French, German, Spanish, Italian
PosTagsadds part-of-speech tags to the tokensEnglish, French, German, Spanish, Italian
Entitiesreturns named entities such as persons or locationsEnglish, French, German, Spanish, Italian
TemporalEventsreturns datesEnglish, French, German, Spanish, Italian
Summaryreturns keyphrases and a sentence-based summaryEnglish, French, German, Spanish, Italian
Sentimentreturns the sentiment of sentences and the full textEnglish

Specifying Mime-Type and Language

Sensium will try to automatically guess the mime-type and language of artifacts you send to it. You can overwrite this behaviour by specifying the mime-type and language in the ExtractionRequest manually:

ExtractionRequest req = new ExtractionRequest();
req.text = "Some plain-text to be analyzed";
req.extractors = new Extractor[] { Extractor.Summary, Extractor.Entities };
req.mimeType = "text/plain";
req.lang = "en"

The language is specified as a two-letter ISO 639-1 code, e.g. "en" for English. Sensium currently supports English, German, French, Spanish and Italian for all features except sentiment detection (English-only).


Processing the Response

Depending on the extractors you specified in the request, Sensium will return different results in the ExtractionResponse. Let's walk through those results and their meaning.

Language and Text

No matter what extractors you specify, Sensium will always return the plain-text and two-letter ISO 639-1 code. These two are accessed as follows:

Sensium sensium = new Sensium("YOUR_API_KEY");
ExtractionRequest req = new ExtractionRequest();
req.url = "http://someurl.com";
ExtractionResponse resp = sensium.extract(req);
System.out.println("Main text: " + resp.text);
System.out.println("Language: " + resp.lang);

All other extracted information found in a response is referencing the main text.

Tokens, Stems & POS-Tags

If you specified the Extractor.Tokens or Extractor.PosTags extractor, the ExtractionResponse#tokens field will contain a list of tokens. A Token exposes a couple of fields:

Token field Meaning
start, endstart and (exclusive) end position of the token in the main text. Zero based.
stemthe stemmed form
normalizedthe normalized (language-specific, lower-cased) form
posTagthe part-of-speech tag, using the languages tag set
posTagUnifiedthe unified part-of-speech tag, using our language independent, tiny tag set

These are the part-of-speech tag sets we currently support:

Language Tag Set
EnglishPenn Treebank Tagset
GermanStuttgart/Tübinger Tagsets
SpanishParole Reduced Tagset
FrenchFrench Treebank Tagset
ItalianTani POS Tagset

The tags of these and future tagsets are all mapped to our language independent unified tagset which consists of these tags:

ADJECTIVE, ADVERB, CONJUNCTION, DETERMINER, NOUN, PROPER_NOUN, NUMBER, OTHER, PARTICLE, PRONOUN, PREPOSITION, PUNCTUATION, VERB, UNKNOWN

If you want to process part-of-speech tags in a language agnostic way, then this tagset is for you.

Here's some sample code that walks through all tokens in a response and outputs all ADJECTIVEs in a language-agnostic way by using unified part-of-speech tags:

Sensium sensium = new Sensium("YOUR_API_KEY");
ExtractionRequest req = new ExtractionRequest();
req.url = "http://someurl.com";
req.extractors = new Extractor[] { Extractor.PosTags };
ExtractionResponse resp = sensium.extract(req);
for(Token token: resp.tokens) {   
   if("ADJECTIVE".equals(token.posTagUnified)) {
      System.out.println(token.getText(resp));
   }
}

The Token#getText method is a utility function that gets the token's text from the main text of the response, using the token's start and end fields.

Sentences

If you specified the Extractor.Sentences extractor, the ExtractionResponse#sentences field will contain a list of sentences found in the source. A Sentence has the following fields:

Sentence field Meaning
start, endstart and (exclusive) end position of the sentence in the main text. Zero based.

Here's a code snippet that extracts and displays all sentences of a text:

Sensium sensium = new Sensium("YOUR_API_KEY");
ExtractionRequest req = new ExtractionRequest();
req.url = "http://someurl.com";
req.extractors = new Extractor[] { Extractor.Sentences };
ExtractionResponse resp = sensium.extract(req);
for(Sentence sentence: resp.sentences) {      
   System.out.println(sentence.getText(resp));   
}

The Sentence#getText method is a utility function that gets the sentence's text from the main text of the response, using the sentence's start and end fields.

Named Entities

If you specified the Extractor.Entities extractor, the ExtractionResponse#entities field will contain a list of named entities such as persons, locations or organizations, found in the source. A NamedEntity has the following fields:

NamedEntity field Meaning
typetype of the entity, one of "Location", "Person", "Organization"
linklink to a linked (open) data resource on DBPedia. May be null.
normalizednormalized surface form, e.g. "Obama" and "B. Obama" would become "Barack Obama"
occurenceslist of Ocurrence instances, marking the spans in the main text this entity can be found at.

Here's a code snippet that extracts and displays all named entities found in the text:

Sensium sensium = new Sensium("YOUR_API_KEY");
ExtractionRequest req = new ExtractionRequest();
req.url = "http://someurl.com";
req.extractors = new Extractor[] { Extractor.Entities };
ExtractionResponse resp = sensium.extract(req);
for(NamedEntity entity: resp.entities) {      
   System.out.println(entity.normalized + ", type: " + entity.type + ", url: " + entity.link);
   for(Occurence occurence: entity.occurences) {
      System.out.println(occurence.getText(resp));
   }   
}

The Occurence#getText method is a shorthand that will use the occurrence's start and end field and substring the main text in the response

Temporal Events

If you specified the Extractor.TemporalEvents extractor, the ExtractionResponse#temporalEvents field will contain a list of TemporalEvent instances. A TemporalEvent has the following fields:

TemporalEvent field Meaning
start, endstart and (exclusive) end position of the token in the main text. Zero based.
timestamp number of milliseconds since the standard base time known as "the epoch", namely January 1, 1970, 00:00:00 GMT.

Here's a code snippet that extracts and displays temporal events of a given text:

Sensium sensium = new Sensium("YOUR_API_KEY");
ExtractionRequest req = new ExtractionRequest();
req.url = "http://someurl.com";
req.extractors = new Extractor[] { Extractor.TemporalEvents };
ExtractionResponse resp = sensium.extract(req);
for(TemporalEvent event: resp.temporalEvents) {
   System.out.println("surface form: " + event.getText(resp) + ", locale: " + new Date(event.timestamp));
}

The TemporalEvent#getText method is a shorthand that will use the event's start and end field and substring the main text in the response

Keyphrases & Summaries

If you specified the Extractor.Summary extractor, the ExtractionResponse#summary field will be set to an instance of Summary, which has the following fields:

Summary field Meaning
textsummary of the text consisting of the 5 most important sentences in the text
keySentenceslist of Occurrance instances delimiting the sentences making up the summary, ranked by decreasing importance
keyPhraseslist of KeyPhraseinstances, describing the 10 most important keyphrases, ranked by decreasing importance

The KeyPhrase class has the following fields

KeyPhrase field Meaning
texttextual representation of the keyphrase
scorerelative score of the keyphrase, specifying it's importance relative to the other keyphrases
occurenceslist of Occurrence instances, demarking the occurrences of the keyphrase in the main text

Here's a code snippet that extracts and displays the most important sentence and keyphrase of a given text:

Sensium sensium = new Sensium("YOUR_API_KEY");
ExtractionRequest req = new ExtractionRequest();
req.url = "http://someurl.com";
req.extractors = new Extractor[] { Extractor.Summary };
ExtractionResponse resp = sensium.extract(req);
System.out.println(resp.summary.keySentences[0].getText(resp));
System.out.println(resp.summary.keyPhrases[0].text);

Sentiment

If you specified the Extractor.Sentiment extractor, the ExtractionResponse#polarity and ExtractionResponse#objectivity fields will be set. Polarity describes whether a text talks positively or negatively about a given topic or subject. Objectivity describes whether a text is written objectively (news article) or subjectively (e.g. editorial). Both fields are of type Sentiment which has the following fields:

Sentiment field Meaning
scorethe document wide sentiment score, between -1 and 1
occurrenceslist of SentimentOccurrance instances delimiting spans in the main text for which sentiment exists

The SentimentOccurrance class has the following fields

SentimentOccurrance field Meaning
texttextual representation of the keyphrase
scorescore of the occurence, between -1 and 1

In case of polarity a negtive score means negative polarity (e.g. a bad review), while a positive score indicates positive polarity (a good review). In case of objectivity, a negative score indicates subjectivity, and a positive score indicates objectivity. The closer a score is to 0, the more neutral it is.

Here's a code snippet that displays the polarity and objectivity of a text:

Sensium sensium = new Sensium("YOUR_API_KEY");
ExtractionRequest req = new ExtractionRequest();
req.url = "http://someurl.com";
req.extractors = new Extractor[] { Extractor.Sentiment };
ExtractionResponse resp = sensium.extract(req);
System.out.println(resp.polarity.score < 0? "negative": (resp.polarity.score > 0 ? "positive": "neutral"));
System.out.println(resp.objectivity.score %lt; 0? "subjective": (resp.objectivity.score > 0 ? "objective": "neutral"));