Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.
All code samples from this post are available on GitHub.
Continuing the theme of porting Stanford NLP libraries to .NET, I am glad to introduce one more library – Stanford Log-linear Part-Of-Speech Tagger.
To compile stanford-postagger.jar to .NET assembly you need nothing special, just follow the steps from my previous post “NLP: Stanford Parser with F# (.NET)“. Also you can download already compiled version from GitHub.
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like ‘noun-plural’.
Read more about Part-of-speech tagging on Wikipedia.
I was really surprised with performance of .NET version of Stanford POS Tagger. It is fast enough! If you do not need advanced syntactic dependencies between the words and part-of-speech information is enough, then do not use Stanford Parser, Stanford POS Tagger is just what you need.
module TaggerDemo open java.io open java.util open edu.stanford.nlp.ling open edu.stanford.nlp.tagger.maxent; open IKVM.FSharp let model = @"..\..\..\..\StanfordNLPLibraries\stanford-postagger\models\wsj-0-18-left3words.tagger" let tagReader (reader:Reader) = let tagger = MaxentTagger(model) MaxentTagger.tokenizeText(reader).iterator() |> Collections.toSeq |> Seq.iter (fun sentence -> let tSentence = tagger.tagSentence(sentence :?> List) printfn "%O" (Sentence.listToString(tSentence, false)) ) let tagFile (fileName:string) = tagReader (new BufferedReader(new FileReader(fileName))) let tagText (text:string) = tagReader (new StringReader(text))
As you see, it is really simple to use. We instantiate MaxentParser and initialize it with wsj-0-18-left3words.tagger model. After that we are loading text, tokenize it to sentences and tag sentences one by one.
Let’s test tagger on the F# Software Foundation Mission Statement =).
The mission of the F# Software Foundation is to promote, protect, and advance the F# programming language, and to support and facilitate the growth of a diverse and international community of F# programmers.
Mission/NNP Statement/NNP The/NNP mission/NN of/IN the/DT F/NN #/# Software/NNP Foundation/NNP is/VBZ to/TO promote/VB ,/, protect/VB ,/, and/CC advance/NN the/DT F/NN #/# programming/VBG language/NN ,/, and/CC to/TO support/VB and/CC facilitate/VB the/DT growth/NN of/IN a/DT diverse/JJ and/CC international/JJ community/NN of/IN F/NN #/# programmers/NNS ./.
Descriptions of POS tags you can find here.
14 thoughts on “NLP: Stanford POS Tagger with F# (.NET)”
This is awesome! I just discovered this.
Would you be willing to provide C# examples as well?
Yeesss, but still does not have enough time for this… I have a issue for this – https://github.com/sergey-tihon/Stanford.NLP.NET/issues/1 but some samples are already ready and available here https://github.com/sergey-tihon/FSharp.NLP.Stanford/tree/master/StanfordSoftware/Samples
This looks interesting, but can you provide a step by step of how to get started with C#? Ultimately I will be porting to VB.NET. I am an expert programmer but very little of this makes sense. What would I download and how do I go about installing it? The code above looks easy, but getting started is cryptic.
Everything is mentioned here (http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordPOSTagger.html) : you need NuGet package (Stanford.NLP.POSTagger) and trained models from zip archive http://www-nlp.stanford.edu/software/stanford-postagger-full-2014-01-04.zip
That is not helpful. I asked if you could be more articulate and express yourself better. I have nuget installed and I have the zip archive, and I downloaded ikvm.net. The instructions lead to nowhere for the installation of ikvm. This is a complete mess.
You does not need to install IKVM. Stanford.NLP.POSTagger is already recompiled and ready to use.
If you already installed Stanford.NLP.POSTagger from NuGet – you are ready to write you code.
Ok I see part of the problem. English is not your native language. This is making it hard for you to understand me, and for me to understand you. I tried Ikvm only because the other path did not work. You gave me a link to a page that only tells me what a POS tagger is. I already know that, so “everything are mentioned here” was wrong. Also you should have said, everything IS mentioned here. Also you said “You does need” rather than “You do not need”. Are you using a translator? I noticed other people having the same problem getting information from you, and when they ask you to provide a step by step, you basically ignored them. I have spent several hours on this, and it should only take a few minutes. I might seem a little angry, and you can see why.
I have Visual Studio 10, and nugget installed as the extension. I am not seeing the same kind of options as some of the pictures. Please tell me how you would open nugget and install Stanford.NLP.POSTagger from nugget.
Please do not be vague.
Don’t you think that asking for help and blaming the person at the same time are leading to nowhere? Write step-by-step what you are doing and what you can not understand. Being an expert in English and programming, it should not be difficult I guess. What kind of options you don’t see? The article is marked as outdated, so there definitely can be some differences. After that Sergey or anybody else would have a chance to help you. But personally I am not willing to help such a person. For me it seems that you don’t want help. You want somebody to do everything for you. Strange behaviour for an expert…
P. S. English is not my native language, so you can spellcheck mine post too.
It’s time to improve your expertise and to learn NuGet (http://docs.nuget.org/docs/start-here/managing-nuget-packages-using-the-dialog ).
Your problems are not related to Stanford.NLP.NET, IKVM.NET or other stuff mentioned in this post.
Thank you. That link is showing the pictures I was talking about. It seems the problem there lies with Visual Studio and nugget. I read the other people having the same problem. It is not my expertise brother. VS versions are different. I want you to know that I did not blame you personally. I was blaming the language barrier between us. I echoed that sentiment with the question “are you using a translator”. This implies a polite explanation.
Volhav, no I never want anyone to do everything for me. haha That is funny. In fact it is quite the opposite. I always help others in forums and I write succinct high level articles that explain things in a way that is very understandable by western English standards. If you look at the post, I said getting started was vague and cryptic, the rest is easy. If you are offended, then I did not mean to.
Could you please let us know “I am not seeing the same kind of options as some of the pictures.” What pictures and what options are you talking about? Could you be a little bit more precise? BTW, the link that Sergey provided to you is quite helpful, if you go to the home page http://sergey-tihon.github.io/Stanford.NLP.NET/index.html. There you can find the steps.
UPDATED: Oh, now I got it. You are unable to install nuget extension. So Sergey is right, we are not able to help you with the environment.
Well, nugget extension is installed, but not the same at all on VS express. Certain versions of VS make this a nightmare, but VS 2012 works on my other machine. I don’t like to use 2012, but I will have to work with it for now. Please note that some people make also have the same environment problem.