Stanford Log-linear Part-Of-Speech Tagger is available on NuGet

14/07/201325/02/2021F#, Machine Learning and NLPC#, F#, IKVM.NET, NuGet, Stanford NLP35 Comments

Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.

There is one more tool that has become ready on NuGet today. It is a Stanford Log-linear Part-Of-Speech Tagger. This is a third one Stanford NuGet package published by me, previous ones were a “Stanford Parser“ and “Stanford Named Entity Recognizer (NER)“. I have already posted about this tool with guidance on how to recompile it and use from F# (see “NLP: Stanford POS Tagger with F# (.NET)“). Please follow next steps to get started:

Install-Package Stanford.NLP.POSTagger
Download models from The Stanford NLP Group site.
Extract models from ’models‘ folder.
You are ready to start.

F# Sample

For more details see source code on GitHub.

let model = @"..\..\..\..\temp\stanford-postagger-2013-06-20\models\wsj-0-18-bidirectional-nodistsim.tagger"

let tagReader (reader:Reader) =
    let tagger = MaxentTagger(model)
    MaxentTagger.tokenizeText(reader)
    |> Iterable.toSeq
    |> Seq.iter (fun sentence ->
        let tSentence = tagger.tagSentence(sentence :?> List)
        printfn "%O" (Sentence.listToString(tSentence, false))
    )

let tagFile (fileName:string) =
    tagReader (new BufferedReader(new FileReader(fileName)))

let tagText (text:string) =
    tagReader (new StringReader(text))

C# Sample

For more details see source code on GitHub.

public static class TaggerDemo
{
    public const string Model =
        @"..\..\..\..\temp\stanford-postagger-2013-06-20\models\wsj-0-18-bidirectional-nodistsim.tagger";

    private static void TagReader(Reader reader)
    {
        var tagger = new MaxentTagger(Model);
        foreach (List sentence in MaxentTagger.tokenizeText(reader).toArray())
        {
             var tSentence = tagger.tagSentence(sentence);
             System.Console.WriteLine(Sentence.listToString(tSentence, false));
        }
    }

    public static void TagFile (string fileName)
    {
        TagReader(new BufferedReader(new FileReader(fileName)));
    }

    public static void TagText(string text)
    {
        TagReader(new StringReader(text));
    }
}

As a result of both samples you will see the same output. For example, if you start program with these parameters:

1 text "A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads 
text in some language and assigns parts of speech to each word (and other token), 
such as noun, verb, adjective, etc., although generally computational 
applications use more fine-grained POS tags like 'noun-plural'."

Then you will see following on your screen:

A/DT Part-Of-Speech/NNP Tagger/NNP -LRB-/-LRB- POS/NNP Tagger/NNP -RRB-/-RRB- 
is/VBZ a/DT piece/NN of/IN software/NN that/WDT reads/VBZ text/NN in/IN some/DT 
language/NN and/CC assigns/VBZ parts/NNS of/IN speech/NN to/TO each/DT word/NN 
-LRB-/-LRB- and/CC other/JJ token/JJ -RRB-/-RRB- ,/, such/JJ as/IN noun/JJ ,/, 
verb/JJ ,/, adjective/JJ ,/, etc./FW ,/, although/IN generally/RB computational/JJ 
applications/NNS use/VBP more/RBR fine-grained/JJ POS/NNP tags/NNS like/IN `/`` 
noun-plural/JJ '/'' ./.

Discover more from Sergey Tihon's Blog

Subscribe to get the latest posts sent to your email.

Published by Sergey Tihon 🦔🦀🦋

Father. Husband. Developer. Microsoft MVP. Likes 🦔, 🦀 and OSS. View all posts by Sergey Tihon 🦔🦀🦋

35 thoughts on “Stanford Log-linear Part-Of-Speech Tagger is available on NuGet”

Pingback: F# Weekly #28 2013 | Sergey Tihon's Blog
Pingback: Stanford Word Segmenter is available on NuGet | Sergey Tihon's Blog
Anonymous ;) says:

11/10/2013 at 15:52

Thanks for your effort! Although it took me about 3 hours to get your examples working, I’m very amazed about IKVM and your POS Tagger port.

Reply
1. Sergey Tihon says:
  
  11/10/2013 at 18:19
  
  What was the most difficult to get it to work?
  
  Reply
  1. Anonymous ;) says:
    
    11/10/2013 at 20:48
    
    Downloading the project from Github and downloading the correct Zip from the Stanford page.
    
    In hindsight all that I’ve done is very easy: Just downloading from Gitbub, Nuget + setting reference paths and updating the path to the correct directory. I’m totally new to the Stanford parser, POS tagger and tokenizer. I guess I was just confused between these 3.
Anonymous ;) says:

17/10/2013 at 17:08

It it possible to use features of Stanford CoreNLP with one of your ports? So far I’ve managed to get your NER and POS-Tagger port working. I’m thinking about using the Standford sentence splitter as well. I think I need the CoreNLP.jar for this. I tried converting it via “ivkm stanford-core-nlp-3.2.0.jar”, but that gave me a java.lang.ClassNotFoundException. My arguments are probably wrong and I need to include more jar files.

What do I have to do, do get the code from http://nlp.stanford.edu/software/corenlp.shtml running under C#?

Reply
1. Sergey Tihon says:
  
  17/10/2013 at 23:06
  
  CoreNLP package almost ready. Sorry but I have no time to prepare F#/С# samples and republish it into release channel. Please try this one – https://www.nuget.org/packages/Stanford.NLP.CoreNLP/
  
  Reply
  1. Anonymous ;) says:
    
    18/10/2013 at 14:06
    
    I’m currently getting an error trying to download that NuGet package:
    
    Attempting to resolve dependency ‘IKVM (≥ 7.3.4830.0)’.
    The remote server returned an error: (404) Not Found.
    
    This may be caused from the NuGet outage earlier today. Adding IKVM via NuGet manually didn’t solve this problem. I’ll try later again, maybe it’ll work then.
    
    Thanks for your effort and fast responses 😉
  2. Anonymous ;) says:
    
    18/10/2013 at 18:30
    
    Nuget seems to be working again. I downloaded your package. Thanks for providing it!
    
    During my attemps to port the Java Code from http://nlp.stanford.edu/software/corenlp.shtml to C# I got stuck on some errors. I stumbled across http://www.bitwjg.org/2012/11/16/transplant-the-stanford-corenlp-suite-from-java-to-c/
    which gave me a huge bump towards working code.
    
    I managed to correct some using statements, stripped down the code to something that compiles.
    Because of some run time errors, I thought of adding some references to model files, like you did in https://sergeytihon.wordpress.com/2013/09/09/stanford-word-segmenter-is-available-on-nuget/
    
    I was able to get “tokenize, ssplit, pos, lemma” working, by adding “pos.model” and “ner.model”.
    
    I hope this code will help others: http://pastebin.com/yUznNyxd
    
    However, if I add “ner”, I’ll get a RuntimeException was unhandeld “Error initializing binder 1” ad instantiating StanfordCoreNLP.
    I hope I’ll find a way to get “ner, parse, dcoref” running next week. Any suggestions for more ‘props.put’?
  3. Sergey Tihon says:
    
    20/10/2013 at 15:00
    
    Thanks for your effort. You cal look at following sample https://gist.github.com/casperOne/11b7bc6ff39c58d3aaa0 prepared by @OneFrameLink (https://twitter.com/OneFrameLink/status/388014050245738496).
  4. Anonymous ;) says:
    
    21/10/2013 at 17:51
    
    I was able to get “ner” working as well. I’m currently stuck at “dcoref”, but I won’t pursue it further at the moment.
    
    Here is the code that works for me: http://pastebin.com/6eu6N0TK
  5. Sergey Tihon says:
    
    26/10/2013 at 02:08
    
    I have finished your efforts. Thank you. https://sergeytihon.wordpress.com/2013/10/26/stanford-corenlp-is-available-on-nuget-for-fc-devs/
Pingback: Stanford CoreNLP is available on NuGet for F#/C# devs | Sergey Tihon's Blog
uma says:

07/01/2015 at 09:19

how can we have the parts of speech for hundreds of sentences,what is the way to connect with a database having our input data

Reply
1. Sergey Tihon says:
  
  07/01/2015 at 12:35
  
  Connection to database is really depend on your DB. If you have a large text, you need to split it into sentences and then find POS for each word.
  
  Reply
Sneha says:

28/04/2017 at 13:27

I have a c# code (though copied) im getting error at this statement ->var pipeline = new StanfordCoreNLP(props); (An unhandled exception of type ‘java.lang.RuntimeException’ occurred in stanford-corenlp-3.7.0.dll

Additional information: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file))

my models n core nlp are of same version stanford-corenlp-3.7.0-models.jar stanford-corenlp-3.7.0.jar

any help wold be greatly appreciated !!

Reply
1. Sergey Tihon says:
  
  28/04/2017 at 13:54
  
  Please use sample from here http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordPOSTagger.html and check that in jarRoot folder (unzipped jar file) you have subfolder with tagger model
  
  Reply
Sneha says:

28/04/2017 at 14:57

Please see this is how my folder is as of now
D:\SmartPrj\IntentenetMessage\SampleAPP\IntenetMessage\stanford-postagger-full-2016-10-31\models
n it contains all taggers n prop files and no jar files

Reply
1. Sergey Tihon says:
  
  28/04/2017 at 14:58
  
  Does it contain all files from inside the jar? with the same relative paths?
  
  Reply
Sneha says:

28/04/2017 at 15:32

the models folder contain various languages tagger files and props file as downloaded from the link given by you
and i have directly placed in my project..

Reply
1. Sergey Tihon says:
  
  28/04/2017 at 15:43
  
  Stanford NLP contains hardcoded file paths to model into the code, so If you do not specify path to model manually it uses default hardcoded values. In Java world, they able to resolve relative paths inside *.jar, we do the same trick setting CurrentDirectory to unzipped jar (without modifying internal structure of files).
  
  Otherwise you have to specify paths to all model manually https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/tests/Stanford.NLP.CoreNLP.FSharp.Tests/CoreNLP.fs#L96-L110
  
  Reply
  1. Sneha says:
    
    28/04/2017 at 16:32
    
    i cannot find demonyms.txt in the downloaded folder
    as mentioned in this link https://github.com/sergey-tihon/Stanford.NLP.NET/issues/52
    the model screenshot that shown in the link is totally different from the one i downloaded, my models folder just has taggers and props files
  2. Sneha says:
    
    28/04/2017 at 16:32
    
    i have used winzip to extract, is it because of that?
  3. Sergey Tihon says:
    
    28/04/2017 at 16:35
    
    Please create new issue here https://github.com/sergey-tihon/Stanford.NLP.NET/issues/new , provide you source code, link to zip that you downloaded and screenshot of unzipped jar (content on the folder)
Sneha says:

28/04/2017 at 16:52

Sergey, I have done it . Kindly check

Reply
Pratik Sharma says:

25/05/2018 at 13:42

Can someone provide me link to working code?? I can’t just get it run. I have even tried mentioned individual solutions. Right now I am using dot net core. Can I have a full project solution pointed in github?? Thanks

Reply
1. Sergey Tihon says:
  
  25/05/2018 at 15:07
  
  Sorry, there is not support for .net core, if it works then it is just a coincident.
  
  Reply
Devender Singh says:

04/03/2019 at 16:20

Can we get Tag results in XML like we get in Stanford CoreNLP using classifiers

Reply
1. Sergey Tihon says:
  
  04/03/2019 at 17:32
  
  You should be able to do it. There should be 1:1 mapping from original Java code samples
  
  Reply
  1. Devender Singh says:
    
    05/03/2019 at 09:03
    
    Thanks for reply. Can you post a link or example here?
  2. Sergey Tihon says:
    
    05/03/2019 at 12:00
    
    Latest samples are here http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordPOSTagger.html and here https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/samples/Stanford.NLP.POSTagger.CSharp/Program.cs
  3. Devender Singh says:
    
    05/03/2019 at 13:52
    
    I need output in XML Tags Like Microsoft which is provided by classifier.inlineXML method in Stanford NER for C#. Is it Possible?
  4. Sergey Tihon says:
    
    05/03/2019 at 14:54
    
    Sorry, I cannot say with confidence what Stanford CoreNLP can do and what it cannot.
    The best place for such questions is SO – https://stackoverflow.com/questions/tagged/stanford-nlp
    I only can help translate working Java sample to C# if you have difficulties.
riwajifyismine says:

11/10/2019 at 13:56

how to identify part of speech , i am getting the output as JJ and DT with words. What does that mean.

Reply
1. Sergey Tihon says:
  
  11/10/2019 at 14:02
  
  They are POS tags – start your reading here – https://nlp.stanford.edu/software/tagger.shtml
  “Part-of-speech name abbreviations: The English taggers use the Penn Treebank tag set. Here are some links to documentation of the Penn Treebank English POS tag set …”
  
  Reply