This post is intended for F# developers, first of all, to show the big picture of The World of F# Type Providers. Here you can find the list of articles/posts about building type providers, list of existing type providers, which probably wait your help and list of open opportunities.
List of materials that can be useful if you want to create a new one:
Update (3 February 2014): PowerShell Type Provider merged into FSharp.Management.
I am happy to share with you the first version of PowerShell Type Provider. Last days were really hot, but finally the initial version was published.
Lots of different emotions visited me during the work =). Actually, Type Provider API is much harder than I thought. After reading books, it looked easier than it turned out in reality. Type Providers runtime is crafty.
To start you need to download source code and build it – no NuGet package for now. I want to get a portion of feedback and after that publish to the NuGet more consistent version.
Also you need to know that it is developed using PowerShell 3.0 runtime and .NET 4.0/4.5. This means that you can use only PowerShell 3.0 snap-ins.
#r @"C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Management.Automation\v4.0_3.0.0.0__31bf3856ad364e35\System.Management.Automation.dll"
#r @"C:\WINDOWS\Microsoft.NET\assembly\GAC_MSIL\Microsoft.PowerShell.Commands.Utility\v4.0_3.0.0.0__31bf3856ad364e35\Microsoft.Powershell.Commands.Utility.dll"
#r @"d:\GitHub\PowerShellTypeProvider\PowerShellTypeProvider\bin\Debug\PowerShellTypeProvider.dll"
type PS = FSharp.PowerShell.PowerShellTypeProvider<PSSnapIns="WDeploySnapin3.0">
As you see in the sample, PowerShellTypeProvider has a single mandatory static parameter PSSnapIns that contains semicolon-separated list of snap-ins that you want to import into PowerShell. If you want to use only default ones, leave the string empty. You can find list of snap-ins registered on your machine using Get-PSSnapin method.
Enjoy it. I will be happy to hear feadback (as well as comments about type provider source code from TP gurus).
Some weeks ago, I announced FSharp.NLP.Stanford.Parser and now I want to clarify the goals of this project and show an example of usage.
First of all, this is not an attempt to re-implement some functionality of Stanford Parser. It is just a tiny dust layer that aimed to simplify interaction with Java collections (especially Iterable interface) and bring the power of F# constructs (like pattern matching and discrimination unions) to the code that deals with tagging results.
Task
Let’s start with some sample NLP task: We want to show related questions before user asks a new one (as it works on StackOverflow). There are many possible solutions for this task. Let’s look at one that at the first step tries to understand key phrases that identify this question and runs the search using them.
Approach
First of all, let’s choose some real questions from StackOverflow to analyze them:
Now we can use Stanford Parser GUI to visualize the structure of these questions:
As you can see this question is about “F# project” and “object browser”This question is about “WebSharper”, “Mono 3.0” and “Mac”This one is about “extra methods”, “type providers” and “F#”The last one is about “MonoDevelop” and “F# projects”.
We can notice that all phrases that we have selected are parts of noun phrases(NP). As a first solution we can try to analyze tags in the tree and select NP that contains word level tags like (NN,NNS,NNP,NNPS).
Solution
#r @"..\packages\IKVM.7.3.4830.0\lib\IKVM.Runtime.dll"
#r @"..\packages\IKVM.7.3.4830.0\lib\IKVM.OpenJDK.Core.dll"
#r @"..\packages\Stanford.NLP.Parser.3.2.0.0\lib\ejml-0.19-nogui.dll"
#r @"..\packages\Stanford.NLP.Parser.3.2.0.0\lib\stanford-parser.dll"
open edu.stanford.nlp.parser.lexparser
open edu.stanford.nlp.trees
open System
let model = @"d:\englishPCFG.ser.gz";
let options = [|"-maxLength"; "500";"-retainTmpSubcategories"; "-MAX_ITEMS"; "500000";"-outputFormat"; "penn,typedDependenciesCollapsed"|]
let lp = LexicalizedParser.loadModel(model, options)
let tlp = PennTreebankLanguagePack();
let gsf = tlp.grammaticalStructureFactory();
open java.util
let toSeq (iter:Iterator) =
let rec loop (x:Iterator) =
seq {
yield x.next()
if x.hasNext() then
yield! (loop x)
}
loop iter
let getTree question =
let toke = tlp.getTokenizerFactory().getTokenizer(new java.io.StringReader(question));
let sentence = toke.tokenize();
lp.apply(sentence)
let getKeyPhrases (tree:Tree) =
let isNPwithNNx (node:Tree)=
if (node.label().value() <> "NP") then false
else node.getChildrenAsList().iterator()
|> toSeq
|> Seq.cast<Tree>
|> Seq.exists (fun x->
let y = x.label().value()
y= "NN" || y = "NNS" || y = "NNP" || y = "NNPS")
let rec foldTree acc (node:Tree) =
let acc =
if (node.isLeaf()) then acc
else node.getChildrenAsList().iterator()
|> toSeq
|> Seq.cast<Tree>
|> Seq.fold
(fun state x -> foldTree state x)
acc
if isNPwithNNx node
then node :: acc
else acc
foldTree [] tree
let questions =
[|"How to make an F# project work with the object browser";
"How can I build WebSharper on Mono 3.0 on Mac?";
"Adding extra methods as type extensions in F#";
"How to get MonoDevelop to compile F# projects?"|]
questions
|> Seq.iter (fun question ->
printfn "Question : %s" question
question
|> getTree
|> getKeyPhrases
|> List.rev
|> List.iter (fun p ->
p.getLeaves().iterator()
|> toSeq
|> Seq.cast<Tree>
|> Seq.map(fun x-> x.label().value())
|> Seq.toArray
|> printfn "\t%A")
)
If you run this script, you will see the following:
Question : How to make an F# project work with the object browser
[|”an”; “F”; “#”; “project”; “work”|]
[|”the”; “object”; “browser”|]
Question : How can I build WebSharper on Mono 3.0 on Mac?
[|”WebSharper”|]
[|”Mono”; “3.0”|]
[|”Mac”|]
Question : Adding extra methods as type extensions in F#
[|”extra”; “methods”|]
[|”type”; “extensions”|]
[|”F”; “#”|]
Question : How to get MonoDevelop to compile F# projects?
[|”MonoDevelop”|]
[|”F”; “#”; “projects”|]
It is almost what we have expected. Results are good enough, but we can simplify the code and make it more readable using FSharp.NLP.Stanford.Parser.
#r @"..\packages\IKVM.7.3.4830.0\lib\IKVM.Runtime.dll"
#r @"..\packages\IKVM.7.3.4830.0\lib\IKVM.OpenJDK.Core.dll"
#r @"..\packages\Stanford.NLP.Parser.3.2.0.0\lib\ejml-0.19-nogui.dll"
#r @"..\packages\Stanford.NLP.Parser.3.2.0.0\lib\stanford-parser.dll"
#r @"..\packages\FSharp.NLP.Stanford.Parser.0.0.3\lib\FSharp.NLP.Stanford.Parser.dll"
open edu.stanford.nlp.parser.lexparser
open edu.stanford.nlp.trees
open System
open FSharp.IKVM.Util
open FSharp.NLP.Stanford.Parser
let model = @"d:\englishPCFG.ser.gz";
let options = [|"-maxLength"; "500";"-retainTmpSubcategories"; "-MAX_ITEMS"; "500000";"-outputFormat"; "penn,typedDependenciesCollapsed"|]
let lp = LexicalizedParser.loadModel(model, options)
let tlp = PennTreebankLanguagePack();
let gsf = tlp.grammaticalStructureFactory();
let getTree question =
let toke = tlp.getTokenizerFactory().getTokenizer(new java.io.StringReader(question));
let sentence = toke.tokenize();
lp.apply(sentence)
let getKeyPhrases (tree:Tree) =
let isNNx = function
| Label NN | Label NNS | Label NNP | Label NNPS -> true
| _ -> false
let isNPwithNNx = function
| Label NP as node
when node.getChildrenAsList() |> Iterable.castToSeq<Tree> |> Seq.exists isNNx
-> true
| _ -> false
let rec foldTree acc (node:Tree) =
let acc =
if (node.isLeaf()) then acc
else node.getChildrenAsList()
|> Iterable.castToSeq<Tree>
|> Seq.fold
(fun state x -> foldTree state x)
acc
if isNPwithNNx node
then node :: acc
else acc
foldTree [] tree
let questions =
[|"How to make an F# project work with the object browser";
"How can I build WebSharper on Mono 3.0 on Mac?";
"Adding extra methods as type extensions in F#";
"How to get MonoDevelop to compile F# projects?"|]
questions
|> Seq.iter (fun question ->
printfn "Question : %s" question
question
|> getTree
|> getKeyPhrases
|> List.rev
|> List.iter (fun p ->
p.getLeaves()
|> Iterable.castToArray<Tree>
|> Array.map(fun x-> x.label().value())
|> printfn "\t%A")
)
Look more carefully at getKeyPhrases function. All tags are strongly typed now. You can be sure that you will never make a typo, code is more readable and self explained: