IV - Recognizing Parts Of Speech Using Apache OpenNLP

Updated: Nov 23, 2021

In this post we will discuss on recognizing parts of speech in a given sentence using Apache OpenNLP library. This is part of the series on learning Natural Language Processing using Apache OpenNLP library.

Like other models where we detect sentence, or find names, or tokenize any give string, in this post also we will be using one of the trained models which is provided by Apache Open NLP library only.

You can find the other post discussing on detecting sentence, using named entity recognition to find names or tokenize a given string here in these posts Finding Sentences, Finding Names using NamedEntityRecognition and Tokenization of the Strings.

We will be using en-pos-maxent.bin models which detects the parts of speech and tag it based on the short names. In the below table, you can find the parts of speech and its short form name. Be sure to use the above model, else system will tag it with some other keywords.

Parts of Speech

Meaning of parts of speech


Noun, singular or mass




Verb, base form


Verb, past tense


Verb, third person singular present


Preposition or subordinating conjunction


Proper noun, singular





As we did while using the other models, here also we will perform the same steps, where we

  1. Load the models,

  2. Tokenize the given strings

  3. Tag the tokens with the relevant Parts of Speech short name.

Lets go through the code for each steps:

Load the Model

We will use the FileInputStream to load the relevant models. The model is loaded to create a POSModel object.

public POSModel loadModel(){
    try {
        InputStream is = new FileInputStream("src\\main\\resources\\models\\en-pos-maxent.bin");
        POSModel model = new POSModel(is);

        return model;
    } catch (FileNotFoundException e) {
    } catch (IOException e) {
    return null;

Tokenize Given String

Using the POSModel object created above, we have to create POSTaggerME object and then tokenize the given string.

POSModel model = loadModel();
POSTaggerME posObj = new POSTaggerME(model);
WhitespaceTokenizer whitespaceTokenizer = WhitespaceTokenizer.INSTANCE;
String[] tokens = whitespaceTokenizer.tokenize(sentence);

Tagging Tokens

Now the above tokens will be tagged and POSSample object will be created. These POSSample will contain the tokens tagged with the short name of the parts of speech. For an input like this

String input = "Welcome to DynamicallyBluntTech, for our class of natural language processing.";

We will get an output like this, where each word will tagged with the parts of speech short name.(Given in the above table)


We can also monitor the performance using PerformanceMonitor class.

public void performanceMonitor(POSSample sample){
    PerformanceMonitor monitor = new PerformanceMonitor(System.err, "uploaded");

This will output the below performance metrics:

Average: 0.0 uploaded/s
Total: 1 uploaded
Runtime: 0.0s

You can find the entire code for this post as well for others, in my Github Repository link.

This concludes the series on Apache OpenNLP library. Also, it provides a command line interface to launch it directly from command prompt/terminal and run the specific commands, but we will be conclude our discussion on OpenNLP library now.

Please do suggest more content topics of your choice and share your feedback. Also subscribe and appreciate the blog if you like it.

4 views0 comments

Recent Posts

See All