NLP: Stanford Parser with F# (.NET)

2013-04-22T21:26:31+02:00

It’s going to be finish of mine day, however before end I am reading this wonderful article to increase my knowledge.

2013-04-28T07:52:12+02:00

Your work looks to be very promising. Unfortunately, I have not made time just yet to become familiar with F#. Do you have any pointers on working with the Stanford objects in c#?
Maybe a quick snippet showing construction of the parser and getting some simple POS?

Best,
B.M.

Reply

2013-04-28T12:58:28+02:00

You can port it really straightforward

	using java.io;

	using edu.stanford.nlp.process;
	using edu.stanford.nlp.ling;
	using edu.stanford.nlp.trees;
	using edu.stanford.nlp.parser.lexparser;

	namespace Stanford_Parser
	{
	class Program
	{
	static void demoAPI(LexicalizedParser lp)
	{
	// This option shows parsing a list of correctly tokenized words
	var sent = new[] { "This", "is", "an", "easy", "sentence", "." };
	var rawWords = Sentence.toCoreLabelList(sent);
	var parse = lp.apply(rawWords);
	parse.pennPrint();

	// This option shows loading and using an explicit tokenizer
	var sent2 = "This is another sentence.";
	var tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
	var sent2Reader = new StringReader(sent2);
	var rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize();
	parse = lp.apply(rawWords2);

	var tlp = new PennTreebankLanguagePack();
	var gsf = tlp.grammaticalStructureFactory();
	var gs = gsf.newGrammaticalStructure(parse);
	var tdl = gs.typedDependenciesCCprocessed();

	System.Console.WriteLine();
	for(var it=tdl.iterator(); it.hasNext();)
	System.Console.WriteLine("{0}", it.next());
	System.Console.WriteLine();

	var tp = new TreePrint("penn,typedDependenciesCollapsed");
	tp.printTree(parse);
	}

	static void Main(string[] args)
	{
	var lp = LexicalizedParser.loadModel(@"..\..\..\..\StanfordNLPLibraries\stanford-parser\stanford-parser-2.0.4-models\englishPCFG.ser.gz");
	demoAPI(lp);
	}
	}
	}

view raw

gistfile1.cs

hosted with ❤ by GitHub

Reply

2014-10-30T09:47:02+01:00

Sergey Tihon,
Will you please send me the copmplete code with packages.

2013-04-28T18:36:31+02:00

Excellent, thanks!
I was just about to decompile your demo .dll’s to see the c# methods generated.

Its mainly this construct that threw me for a loop-
let demoAPI (lp:LexicalizedParser)

I really need to start working with F#, as your original code seems much more elegant than the c# version.

Thank you very much for your time, now I can start experimenting with the parser in my c# project!

-B.M.

Reply

Pingback: FSharp.NLP.Stanford.Parser available on NuGet | Sergey Tihon's Blog

Pingback: Stanford Parser is available on NuGet | Sergey Tihon's Blog

2013-07-11T13:38:55+02:00

I’m also trying to reuse your work in a C# project, but I am having trouble to build your project : IKVM.Fsharp.dll can’t be built because of some errors in “Collections.fs”… In fact Visual Studio can’t interpret “open java.util” in this file and I assume this is normal as IKVM is supposed to be the library that actually define it, so I don’t get why this import is needed here.
Maybe I missed some step of the process, do you have any idea of where it could come from?

Reply

2013-07-11T14:02:21+02:00

Please try NuGet Package https://nuget.org/packages/Stanford.NLP.Parser/ and try this sample https://sergeytihon.wordpress.com/2013/07/11/stanford-parser-is-available-on-nuget/

Reply

2013-07-11T15:32:25+02:00

Well the NuGet package is working fine, in fact I was mainly interested by your NER implementation, the parser itself works fine if I import it in another project.

2013-07-11T16:51:40+02:00

Please wait a bit, I will publish NER very soon. It is may happen even today.

Reply

2013-07-11T18:08:55+02:00

Oh, okay, thanks again for your great work 🙂

Pingback: Stanford Named Entity Recognizer (NER) is available on NuGet | Sergey Tihon's Blog

Pingback: let runFAKE = Download >> Unzip >> IKVMCompile >> Sign >> NuGet | Sergey Tihon's Blog

2013-09-08T17:51:55+02:00

I tried the same with Standford Segmenter (using C#). The main drawback is that the emitted files have no generics. For instance, I can’t write CRFClassifier, instead I only write CRFClassifier. Whenever I run my code I get RuntimeException, and I guess it is related to this problem.

Reply

2013-09-09T23:01:17+02:00

it is strange, because it works for me. I did NuGet package by your request https://www.nuget.org/packages/Stanford.NLP.Segmenter/3.2.0.0 details how it works you can find in the post https://sergeytihon.wordpress.com/2013/09/09/stanford-word-segmenter-is-available-on-nuget/

Reply

2013-09-11T17:52:52+02:00

Many thanks, it worked fine. I just had some mistakes with classifier flags. Thank you for your help.

2013-12-14T13:36:43+01:00

Hello, I appreciate your sharing this – but I don’t see how to get your example-code to work. I installed the Nuget packages (all of them – there’re several) but how do you actually get a workable F# file? What exactly do you do to import the Stanford NLP code? I tried “open FSharp.NLP.Stanford.Parse”
What is that “lp.” in this line: let demoAPI (lp:LexicalizedParser) =
And then finally, my code is not recognizing PTBTokenizer (nor a lot of other things in this example).

Any pointers would be appreciated. How can I get your illustrative sample-code to run?

Reply

2013-12-14T22:42:13+01:00

What do you mean by F# file? Are you trying to use it from *.fsx (F# script file) or compile *.fs.
If you need to compile you code you can look at full code sample on GitHub – https://github.com/sergey-tihon/fsharp-stanford-nlp-samples/tree/master/fsharp-stanford-nlp-samples/StanfordParser.Samples . If you need to do it from fsx, you need to load required assemblies in FSI (#r “…”).

Reply

2013-12-16T05:58:13+01:00

Thank you for responding Sergey. Sorry – I meant an .fs file that I want to compile into my Visual Studio solution (of which most is C#). Yes – I see the samples now. Which leads to the next question, if you don’t mind: How to get it to build? I downloaded it from that Github project as a zip file, unzipped it, and loaded the solution-file into VS 2013. I get 10 errors, possibly related to 204 warnings such as: “Could not located the assembly “IKVM.OpenJDK….
I’m thinking there is probably an important setup step that I’m missing. I do see under “How to use it”, these instructions: “Download models from..” (but I’m not seeing how to inform the Visual Studio project how to know where those models get placed), and “Extract models from ‘stanford-parse-3.2.0-models.jar (just unzip it).” and again, no indication of how to inform how to locate those.

2013-12-16T07:27:26+01:00

1)“Could not located the assembly IKVM.OpenJDK….” This mean that you should restore NuGet dependencies: Right click on the solution, click on the ‘Manage NuGet packages’, click on the `Restore`.
2) Find lines of code where mentioned path to ‘englishPCFG.ser.gz’. This file actually packed into `stanford-parse-3.3.0-models.jar`. Update it to correct one (where you extracted it)

2013-12-16T12:39:37+01:00

Hi Sergey – thank you for trying to help. I don’t see a “Restore” option. Right-clicking on the solution is Vs2013, I see the option “Manage Nuget Packages for Solution…”. Within the resulting dialog, I checked out “Installed packages”, and “All”, and “Online”, and “Updates”. For “Installed packages”, I do see one pkg, named “IKVM.NET”, and it has a “Manage” button. Clicking on that – brings up a “Select Projects” dialog, with all of the several projects already selected. I do see that when I bring up the source-file Collections.fs, on the line with “open java.util” that “java” is underlined in red. As is the words “Iterator”, and ArrayList.

And in the IKVM.FSharp project, looking in References – there’s a whole mess of references not found – all starting with “IKVM.” Looks like something is not in the right place?

Sorry to be a whiner. I’m rather excited at the prospect of exploring this! Thanks for your advice,

jh

Reply

2013-12-16T12:55:32+01:00

It is very strange… Could you try to re-install IKVM.NET from NuGet (remove and install again)? It looks like the simplest way …

Reply

2013-12-16T13:11:53+01:00

Done. Now I get 75 errors. Does “The namespace or module ‘edu’ is not defined” sound familiar? Which version of Visual Studio are you using?

2013-12-16T23:33:50+01:00

VS2013 or VS2012. It is not important.
‘edu’ look like you do not reference Stanford.NLP.Parser NuGet package

2013-12-18T07:06:49+01:00

Evidently, the NuGet packaging is what is not working. I downloaded the IKVM.NET bit of software separately and uncompressed it, and that does have the DLL files that this solution complained about missing. So I added a reference to ikvm-7.2.4630.5/bin-x64/JVM.DLL, and now those references do show up within (for example) the StanfordNamedEntityRecognizerSamples project. It still gives an error when trying to run it, though, raising a FIleLoadException, because IKVM.Open.JDK.Core, 7.3.4830.0, does not match the manifest. Is that perhaps because the manifest calls for a different version? Has anyone ever gotten this to run? I have a fresh virtual machine with Vs 2010 to use, to try again from scratch. But a more explicit set of steps would probably help, so that I don’t waste another day trying every possible combination.

Reply

2013-12-18T15:11:24+01:00

Hello, what is the actual sequence of steps required to get your project working? All I need help with, I think, is just to get to the point of having one working sample. I believe I can take it from there.

I used git clone https://github.com/sergey-tihon/fsharp-stanford-nlp-samples.git
to get your repository onto a fresh virtual machine (vm), with Windows 7 x64, and Visual Studio 2010 Ultimate.

I see that created a folder fsharp-stanford-nlp-samples, and there is a Visual Studio (VS) solution within that (which, by-the-way, I’m not able to open with VS 2010 – I had to shift over to another VM that I’d installed VS 2012 on). So I opened that solution file, tried to build.. 8 errors. You have to check “Allow NuGet to download missing packages during build.” from Tools/Options/Package Manager. Tried to build again: 7 Errors.

Could not resolve this reference. Could not locate the assembly ‘IKVM.Open.IDK.SwingAWT, ..
Ok – trying now to use NuGet to bring in dependencies. Opening “Manage NuGet Packages”, I do a search for “Stanford”, and see six different packages.

Stanford.NLP.NER
Stanford.NLP.Parser
Stanford.NLP.POSTagger
Stanford.NLP.CoreNLP
FSharp.NLP.Stanford.Parser
Stanford.NLP.Segmenter

I wonder – which of these needs to be installed? What is the minimum needed, to start? I tried getting just Stanford.NLP.Parser. Building the solution now yields 107 Warnings, 10 Errors.

So then I tried install all six of those packages, checking the checkboxes to ensure they were install for every project.

Now a build of the solution yields: 107 Warnings, 9 Errors. The first warning is the same as shown above.

I am thinking that, perhaps, it could be useful to have some steps explicitly laid out for people to use this. Unless (not unlikely) I am totally missing something obvious?

Thank you for your help Sergey,
James Hurst

Reply

2013-12-26T13:16:09+01:00

Hi, I think that I am partially reproduced this case. You should not reference all available packages. They may conflict with each other (the same types into the same namespaces). First of all decide which one you need and then reference it from NuGet (read more about packages on the Stanford NLP website http://nlp.stanford.edu/software/index.shtml).
CoreNLP should be an umbrella project. Almost all available features should be insight.

Reply

2014-01-22T22:16:41+01:00

Hello Sergey-

Is there any link or a tutorial like this to getting started to incorporate this parser in Java?

Reply

2014-01-22T23:13:16+01:00

Originally, it is a Java parser. Instructions are available on the original site – http://www-nlp.stanford.edu/software/lex-parser.shtml

Reply

2014-02-02T08:40:41+01:00

Thank you Sergey !

2014-03-26T15:37:46+01:00

Can you please tell me how exactly I should make ddl from .jar file. I am unable to do that..
I have used your line of code
ikvmc.exe stanford-parser.jar
After downloading, I have two different folders one is ikvmbin-7.2.4630.5 and second is stanford-parser-2012-11-12

Regards,
Rohit

Reply

2014-03-26T22:17:39+01:00

Hi, you should not do it by yourself. You can download recompiled version from NuGet https://www.nuget.org/packages/Stanford.NLP.Parser/ .
Up-to-date samples are available here http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordParser.html

Reply

2014-03-27T09:41:48+01:00

Where can i find english.all.3class.distsim.crf.ser.gz file ??
Its throwing the exception as ‘TypeInitializationException’…
I am referencing code from here..
http://www.stewh.com/2013/11/extracting-named-entities-in-c-using-the-stanford-nlp-parser/
Thanks

Reply

2014-03-27T11:10:51+01:00

Here it is http://www-nlp.stanford.edu/software/stanford-parser-full-2013-11-12.zip
Big blue button on http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordParser.html page.

Reply

2014-04-03T12:54:49+02:00

Hi Sergey,

I found the link below where they have used some txt files for state,names etc..

http://grepcode.com/file/repo1.maven.org/maven2/edu.stanford.nlp/stanford-corenlp/1.2.0/edu/stanford/nlp/models/dcoref/state-abbreviations.txt?av=f

So, my question how to include these files and use it in C#.Net code

Regards,
Rohit

Reply

2014-04-04T09:12:07+02:00

All files are packed in zip archive (that is referenced from page http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html ) You need to download zip, unpack it and use files from inside. There are two options: Temporary change current directory or manually specify paths to all required files https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/tests/Stanford.NLP.CoreNLP.FSharp.Tests/CoreNLP.fs

Reply

2014-04-04T15:34:21+02:00

Hey Sergey,

Thanks for the help. Your inputs are always helpful.

But, Wanted to ask one thing, can I make resume parser using Stanford Library?
If yes, from which point should i start. Because exact way I am not getting it.

Have seen all of the examples/demos for this library.

Can you please help me on that.

Thanks,
Rohit

Reply

2014-04-07T11:21:03+02:00

Hi, It depends on multiple things:
– What is you goal? What do you want to extract from your resume?
– What is the format of your resume? Is it structured?

Reply

2014-04-07T12:25:33+02:00

Hey Sergey,

1) What is you goal& What do you want to extract from you resume?
— My aim is to extract candidate information from resume and store in the database. Want to extract almost every single information Like
( First name,last name,email id,mobile#,projects,experience,personal info, academic records, awards, achievements,skill,qualifications etc etc.).
Currently I am concentrating Only on doc,docx, text files.
So this information will be useful while searching a suitable candidate for a job.

2) What is the format of you resume? Is it structured?
It is Unstructured, every candidate will have different type of resume.

Reply

2014-05-24T18:09:34+02:00

I guess you try with RegEx, can use Expresso initially to test your Regular Expression. I’m telling so coz resume usually have “Name” like stings before candidate writes his name like wise for other details.

Reply

2014-05-24T17:46:17+02:00

I’m using Stanford Dependency Parser to resole dependencies in one of my projects. I have following problem , I hope you will help me,
when in a review text where I’m analyzing dependencies it works great when sentence is short, but for long sentences it does not give all required dependencies. For example, when I try to find out dependencies in following sentence ,
“The Navigation is better.” there is dependency nsubj that groups “Navigation” and “better”, telling me the review regarding navigation is positive.

But when review sentence is bigger like
“Navigation system is better then the Jeeps and as good as my husbands Audi A-8 system.”

I don’t get any dependency relations grouping Navigation with better and Navigation with good. I tried using both basic and collapsed dependencies. I went through Stanford Dependencies Manual , but couldn’t figure out much that will help here. I just want whatever the aspect user is talking about should be grouped with its adjective and adverb.

Reply

2014-05-24T17:48:11+02:00

I’m trying with CCprocessed dependecy ….

Reply

2014-05-24T18:38:21+02:00

well there is a update I tried using all dependency models available in stanford.nlp.net , viz. .typedDependenciesCCprocessed(true); .typedDependenciesCollapsed(true); typedDependencies(true); typedDependenciesCollapsedTree(); allTypedDependencies();

Reply

2014-05-24T19:05:15+02:00

Hello, please ask this question on SO http://stackoverflow.com/questions/tagged/stanford-nlp

2014-10-30T09:44:31+01:00

I cant able to find out the file stanford-parser\stanford-parser-2.0.4-models\englishPCFG.ser.gz.
Please Help me

Reply

2014-10-30T09:45:33+01:00

Its Very Urgent.

Reply

2014-11-01T16:06:59+01:00

Models are inside the `*models.jar` in this zip: http://nlp.stanford.edu/software/stanford-parser-full-2014-06-16.zip

Reply

2015-01-12T23:29:38+01:00

I have followed your instructions up to “IKVM .jar to .dll compilation”. I think I was successful up to there. I then created a F# project in Visual Studio 2013. I put your code into it. VS does not have a definition of LexicalizedParser. I understand C++ and C# but I do not understand F#. I assume we must add a reference but I do not know what to reference. Is there more to the F# program that does the equivalent of a “using” in C#? Am I correct that the F# program needs a little bit more such as that?

I also used the tangiblesoftwaresolutions.com converter to convert the stanford-parser ParserDemo.java sample to C# but obviously that also needs a reference.

I apologize for not being able to figure this out, but if you can help me to understand what to reference then I will appreciate it.

I have seen your samples in your “Stanford Parser is available on NuGet for F# and C#” but the C# sample source also does not show what to reference and such. If that question is easily answered when I install what you have from that article then I should do that. Are the answers there?

Reply

2015-01-13T00:23:18+01:00

Okay I installed Stanford.NLP.Parser using the Package Manager Console. In the C# program (converted from the stanford-parser ParserDemo.java sample to C#) I managed to get:

using edu.stanford.nlp.process;
using edu.stanford.nlp.ling;
using edu.stanford.nlp.trees;
using edu.stanford.nlp.parser.lexparser;

And that seems to work except there is one error that is outside the scope of here. So I tried using the following for your F# sample here:

open edu.stanford.nlp.process
open edu.stanford.nlp.ling
open edu.stanford.nlp.trees
open edu.stanford.nlp.parser.lexparser

However VS says that “process” is reserved.

Reply

2015-01-13T17:48:13+01:00

Hi, could you please have a look at C# sample here http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordParser.html. Does it help for somehow?

2015-06-05T18:20:36+02:00

I copied the C# code you have here, but the line “var gs = gsf.newGrammaticalStructure(tree);” is causing an error:

“A first chance exception of type ‘edu.stanford.nlp.trees.tregex.TregexParser.LookaheadSuccess’ occurred in stanford-corenlp-3.5.2.dll”

Any ideas?

Reply

2015-06-05T22:57:09+02:00

Hi, try this one https://github.com/sergey-tihon/Stanford.NLP.NET/issues/19#issuecomment-109420786

Reply

2015-06-05T23:26:27+02:00

Unfortunately, I’m still getting the same error. It’s actually a stack overflow error, but the output window is printing “A first chance exception of type ‘edu.stanford.nlp.trees.tregex.TregexParser.LookaheadSuccess'” until the overflow occurs

2015-06-06T16:37:36+02:00

Actually, this sample https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/samples/Stanford.NLP.Parser.CSharp/Program.cs works on my machine. What NuGet package did you referenced?

2015-06-08T16:41:54+02:00

My code is almost identical to that example but does not work. I believe I have the most recent package from NuGet (3.5.2). I downloaded by typing “Install-Package Stanford.NLP.Parser” in the PM console as instructed.

2015-06-09T20:53:40+02:00

Any ideas?

2015-06-14T00:09:18+02:00

Sorry, no. It should work with latest nuget and latest model from Stanford site.

NLP: Stanford Parser with F# (.NET)

What we have in .NET?

IKVM.NET overview.

About Stanford NLP

IKVM .jar to .dll compilation

Let’s play!

Discover more from Sergey Tihon's Blog

Published by Sergey Tihon 🦔🦀🦋

63 thoughts on “NLP: Stanford Parser with F# (.NET)”

Leave a reply to James W Hurst Cancel reply

What we have in .NET?

IKVM.NET overview.

About Stanford NLP

IKVM .jar to .dll compilation

Let’s play!

Discover more from Sergey Tihon's Blog

Share this:

Published by Sergey Tihon 🦔🦀🦋

63 thoughts on “NLP: Stanford Parser with F# (.NET)”

Leave a reply to James W Hurst Cancel reply