F# Weekly #32 2013

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

  • VegaHub was presented (A SignalR Hub Utility for data scientists to push Vega charts from F# Interactive).
  • New beta of TsunamiIDE is available now. Tons of improvements, major performance fixes etc.
  • New TypeProviders are inside Tsunami (NuGetTypeProvider, S3TypeProvider, FacebookTypeProvider, DocumentTypeProvider) and much more.
  • Taha Hachana presented “Google Visualization Line Chart“.
  • PowerShell Type Provider was updated: VS IntelliSense works for 64bit PowerShell snap-ins, like SharePoint 2013 one.
  • F# Outlining – for those who still do not use it.
  • Canopy 0.7.9 was released.

Videos/Presentations

Blogs

That’s all for now.  Have a great week.

Previous F# Weekly edition – #31

F# Type Providers: News from the battlefields

All your types are belong to us

Don Syme

This post is intended for F# developers, first of all, to show the big picture of The World of F# Type Providers. Here you can find the list of articles/posts about building type providers, list of existing type providers, which probably wait your help and list of open opportunities.

List of materials that can be useful if you want to create a new one:

List of available type providers:

Open opportunities:

Please let me know if I missed something.

Update 1: Build-in Tsunami type providers were added.

Update 2: SqlCommand and Azure were added.

F# Weekly #31 2013

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

That’s all for now.  Have a great week.

Previous F# Weekly edition – #30

PowerShell Type Provider

FSPSUpdate (3 February 2014): PowerShell Type Provider merged into FSharp.Management.

I am happy to share with you the first version of PowerShell Type Provider. Last days were really hot, but finally the initial version was published.

Lots of different emotions visited me during the work =). Actually, Type Provider API is much harder than I thought. After reading books, it looked easier than it turned out in reality. Type Providers runtime is crafty.

To start you need to download source code and build it – no NuGet package for now. I want to get a portion of feedback and after that publish to the NuGet more consistent version.

Also you need to know that it is developed using PowerShell 3.0 runtime and .NET 4.0/4.5. This means that you can use only PowerShell 3.0 snap-ins.

#r @"C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Management.Automation\v4.0_3.0.0.0__31bf3856ad364e35\System.Management.Automation.dll"
#r @"C:\WINDOWS\Microsoft.NET\assembly\GAC_MSIL\Microsoft.PowerShell.Commands.Utility\v4.0_3.0.0.0__31bf3856ad364e35\Microsoft.Powershell.Commands.Utility.dll"
#r @"d:\GitHub\PowerShellTypeProvider\PowerShellTypeProvider\bin\Debug\PowerShellTypeProvider.dll"

type PS = FSharp.PowerShell.PowerShellTypeProvider<PSSnapIns="WDeploySnapin3.0">

As you see in the sample, PowerShellTypeProvider has a single mandatory static parameter PSSnapIns that contains semicolon-separated list of snap-ins that you want to import into PowerShell. If you want to use only default ones, leave the string empty.
PowerShellIntellisenseYou can find list of snap-ins registered on your machine using Get-PSSnapin method.

PS-Get-PSSnapIns

Enjoy it. I will be happy to hear feadback (as well as comments about type provider source code from TP gurus).

F# Weekly #30 2013

python_bridgeWelcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

That’s all for now.  Have a great week.

Previous F# Weekly edition – #29

F# Weekly #29 2013

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

That’s all for now.  Have a great week.

Previous F# Weekly edition – #28

FSharp.NLP.Stanford.Parser justification or StackOverflow questions understanding.

Some weeks ago, I announced FSharp.NLP.Stanford.Parser and now I want to clarify the goals of this project and show an example of usage.

First of all, this is not an attempt to re-implement some functionality of Stanford Parser. It is just a tiny dust layer that aimed to simplify interaction with Java collections (especially Iterable interface) and bring the power of F# constructs (like pattern matching and discrimination unions) to the code that deals with tagging results.

Task

Let’s start with some sample NLP task: We want to show related questions before user asks a new one (as it works on StackOverflow). There are many possible solutions for this task. Let’s look at one that at the first step tries to understand key phrases that identify this question and runs the search using them.

Approach

First of all, let’s choose some real questions from StackOverflow to analyze them:

Now we can use Stanford Parser GUI to visualize the structure of these questions:

q1
As you can see this question is about “F# project” and “object browser”
This question about "WebSharper", "Mono 3.0" and "Mac"
This question is about “WebSharper”, “Mono 3.0” and “Mac”
This one about "extra methods", "type providers" and "F#"
This one is about “extra methods”, “type providers” and “F#”
The last one about "MonoDevelop" and  "F# projects".
The last one is about “MonoDevelop” and “F# projects”.

We can notice that all phrases that we have selected are parts of noun phrases(NP). As a first solution we can try to analyze tags in the tree and select NP that contains word level tags like (NN,NNS,NNP,NNPS).

Solution

#r @"..\packages\IKVM.7.3.4830.0\lib\IKVM.Runtime.dll"
#r @"..\packages\IKVM.7.3.4830.0\lib\IKVM.OpenJDK.Core.dll"
#r @"..\packages\Stanford.NLP.Parser.3.2.0.0\lib\ejml-0.19-nogui.dll"
#r @"..\packages\Stanford.NLP.Parser.3.2.0.0\lib\stanford-parser.dll"

open edu.stanford.nlp.parser.lexparser
open edu.stanford.nlp.trees
open System

let model = @"d:\englishPCFG.ser.gz";

let options = [|"-maxLength"; "500";"-retainTmpSubcategories"; "-MAX_ITEMS"; "500000";"-outputFormat"; "penn,typedDependenciesCollapsed"|]
let lp = LexicalizedParser.loadModel(model, options)

let tlp = PennTreebankLanguagePack();
let gsf = tlp.grammaticalStructureFactory();

open java.util
let toSeq (iter:Iterator) =
    let rec loop (x:Iterator) = 
        seq { 
            yield x.next()
            if x.hasNext() then 
                yield! (loop x)
            }
    loop iter

let getTree question = 
    let toke = tlp.getTokenizerFactory().getTokenizer(new java.io.StringReader(question));
    let sentence = toke.tokenize();
    lp.apply(sentence)

let getKeyPhrases (tree:Tree) = 
    let isNPwithNNx (node:Tree)= 
        if (node.label().value() <> "NP") then false
        else node.getChildrenAsList().iterator()
             |> toSeq 
             |> Seq.cast<Tree>
             |> Seq.exists (fun x-> 
                let y = x.label().value()
                y= "NN" || y = "NNS" || y = "NNP" || y = "NNPS")
    let rec foldTree acc (node:Tree) = 
        let acc = 
            if (node.isLeaf()) then acc
            else node.getChildrenAsList().iterator()
                 |> toSeq 
                 |> Seq.cast<Tree>
                 |> Seq.fold 
                    (fun state x -> foldTree state x)
                    acc
        if isNPwithNNx node 
          then node :: acc
          else acc
    foldTree [] tree

let questions = 
    [|"How to make an F# project work with the object browser";
      "How can I build WebSharper on Mono 3.0 on Mac?";
      "Adding extra methods as type extensions in F#";
      "How to get MonoDevelop to compile F# projects?"|]

questions
|> Seq.iter (fun question ->
    printfn "Question : %s" question
    question 
    |> getTree 
    |> getKeyPhrases
    |> List.rev
    |> List.iter (fun p ->
        p.getLeaves().iterator() 
        |> toSeq 
        |> Seq.cast<Tree> 
        |> Seq.map(fun x-> x.label().value()) 
        |> Seq.toArray
        |> printfn "\t%A")
)

If you run this script, you will see the following:

Question : How to make an F# project work with the object browser
[|”an”; “F”; “#”; “project”; “work”|]
[|”the”; “object”; “browser”|]
Question : How can I build WebSharper on Mono 3.0 on Mac?
[|”WebSharper”|]
[|”Mono”; “3.0”|]
[|”Mac”|]
Question : Adding extra methods as type extensions in F#
[|”extra”; “methods”|]
[|”type”; “extensions”|]
[|”F”; “#”|]
Question : How to get MonoDevelop to compile F# projects?
[|”MonoDevelop”|]
[|”F”; “#”; “projects”|]

It is almost what we have expected. Results are good enough, but we can simplify the code and make it more readable using FSharp.NLP.Stanford.Parser.

#r @"..\packages\IKVM.7.3.4830.0\lib\IKVM.Runtime.dll"
#r @"..\packages\IKVM.7.3.4830.0\lib\IKVM.OpenJDK.Core.dll"
#r @"..\packages\Stanford.NLP.Parser.3.2.0.0\lib\ejml-0.19-nogui.dll"
#r @"..\packages\Stanford.NLP.Parser.3.2.0.0\lib\stanford-parser.dll"
#r @"..\packages\FSharp.NLP.Stanford.Parser.0.0.3\lib\FSharp.NLP.Stanford.Parser.dll"

open edu.stanford.nlp.parser.lexparser
open edu.stanford.nlp.trees
open System
open FSharp.IKVM.Util
open FSharp.NLP.Stanford.Parser

let model = @"d:\englishPCFG.ser.gz";

let options = [|"-maxLength"; "500";"-retainTmpSubcategories"; "-MAX_ITEMS"; "500000";"-outputFormat"; "penn,typedDependenciesCollapsed"|]
let lp = LexicalizedParser.loadModel(model, options)

let tlp = PennTreebankLanguagePack();
let gsf = tlp.grammaticalStructureFactory();

let getTree question = 
    let toke = tlp.getTokenizerFactory().getTokenizer(new java.io.StringReader(question));
    let sentence = toke.tokenize();
    lp.apply(sentence)

let getKeyPhrases (tree:Tree) = 
    let isNNx = function
        | Label NN | Label NNS | Label NNP | Label NNPS -> true
        | _ -> false
    let isNPwithNNx = function
        | Label NP as node 
            when node.getChildrenAsList() |> Iterable.castToSeq<Tree> |> Seq.exists isNNx
            -> true
        | _ -> false
    let rec foldTree acc (node:Tree) = 
        let acc = 
            if (node.isLeaf()) then acc
            else node.getChildrenAsList()
                 |> Iterable.castToSeq<Tree>
                 |> Seq.fold 
                    (fun state x -> foldTree state x)
                    acc
        if isNPwithNNx node 
          then node :: acc
          else acc
    foldTree [] tree

let questions = 
    [|"How to make an F# project work with the object browser";
      "How can I build WebSharper on Mono 3.0 on Mac?";
      "Adding extra methods as type extensions in F#";
      "How to get MonoDevelop to compile F# projects?"|]

questions
|> Seq.iter (fun question ->
    printfn "Question : %s" question
    question 
    |> getTree 
    |> getKeyPhrases
    |> List.rev
    |> List.iter (fun p ->
        p.getLeaves()
        |> Iterable.castToArray<Tree>
        |> Array.map(fun x-> x.label().value()) 
        |> printfn "\t%A")
)

Look more carefully at getKeyPhrases function. All tags are strongly typed now. You can be sure that you will never make a typo, code is more readable and self explained:

STTags

let runFAKE = Download >> Unzip >> IKVMCompile >> Sign >> NuGet

This post is about one more FAKE use case. It will be not usual, but I hope useful script.

The problem I have faced to is recompilation of Stanford NLP products to .NET using IKVM.NET. I am sick of doing it manually. I posted instructions on how to do it, but I think that not many people have tried to do it. I believe that I can automate it end to end from downloading *.jar files to building NuGet packages. Of course, I have chosen FAKE for this task (Thanks to Steffen Forkmann for help with building NuGet packages).

The build scenario is the following:

  1. Download zip archive with *.jar files and trained models from Stanford NLP site (They can be large, up to 200Mb like for Stanford Parser, and I do not want to store all this stuff in my repository)
  2. Download IKVM.NET compiler as a zip archive. (It is not distributed with NuGet package and is not referenced from IKVM.NET site. It is really tricky to find it for the first time)
  3. Unzip all downloaded archives.
  4. Carefully recompile all required *.jar files considering all references.
  5. Sign all compiled assemblies to be able to deploy them to the GAC if needed.
  6. Compile NuGet package.

Steps 1-5 are not covered by FAKE OOTB tasks and I needed to implement them by myself. Since I wanted to use F# 3.0 features and .NET 4.5 capabilities (like System.IO.Compression.FileSystem.ZipFile for unzipping) I have chosen pre-release version of FAKE 2 that uses .NET 4 runtime. Pre-release version of FAKE can be restored from NuGet as follows:

"nuget.exe" "install" "FAKE" "-Pre" "-OutputDirectory" "..\build" "-ExcludeVersion"

Download manager

Requirements: For sure, I do not want to download files from the Internet during each build. Before downloading files, I want to check their presence on the file system, if they are missed then start downloading. During downloading, I want to see the progress status to be sure that everything works. The code that does it:

#r "System.IO.Compression.FileSystem.dll"
let downloadDir = @".\Download\"

let restoreFile url =
    let downloadFile file url =
        printfn "Downloading file '%s' to '%s'..." url file
        let BUFFER_SIZE = 16*1024
        use outputFileStream = File.Create(file, BUFFER_SIZE)
        let req = System.Net.WebRequest.Create(url)
        use response = req.GetResponse()
        use responseStream = response.GetResponseStream()
        let printStep = 100L*1024L
        let buffer = Array.create<byte> BUFFER_SIZE 0uy
        let rec download downloadedBytes =
            let bytesRead = responseStream.Read(buffer, 0, BUFFER_SIZE)
            outputFileStream.Write(buffer, 0, bytesRead)
            if (downloadedBytes/printStep <> (downloadedBytes-int64(bytesRead))/printStep)
                then printfn "\tDownloaded '%d' bytes" downloadedBytes
            if (bytesRead > 0) then download (downloadedBytes + int64(bytesRead))
        download 0L
    let file = downloadDir @@ System.IO.Path.GetFileName(url)
    if (not <| File.Exists(file))
        then url |> downloadFile file
    file
let unZipTo toDir file =
    printfn "Unzipping file '%s' to '%s'" file toDir
    Compression.ZipFile.ExtractToDirectory(file, toDir)
let restoreFolderFromUrl folder url =
    if not <| Directory.Exists folder
        then url |> restoreFile |> unZipTo (folder @@ @"..\")

let restoreFolderFromFile folder zipFile =
    if not <| Directory.Exists folder
        then zipFile |> unZipTo (folder @@ @"..\")

IKVM.NET Compiler

Compiler should be able to rebuild any number of *.jar files with predefined dependencies and sign result *.dll files if required.

let ikvmc =
    restoreFolderFromUrl @".\temp\ikvm-7.3.4830.0" "http://www.frijters.net/ikvmbin-7.3.4830.0.zip"
    @".\temp\ikvm-7.3.4830.0\bin\ikvmc.exe"
let ildasm = @"c:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\x64\ildasm.exe"
let ilasm = @"c:\Windows\Microsoft.NET\Framework64\v2.0.50727\ilasm.exe"
type IKVMcTask(jar:string) =
    member val JarFile = jar
    member val Version = "" with get, set
    member val Dependencies = List.empty<IKVMcTask> with get, set
let timeOut = TimeSpan.FromSeconds(120.0)
let IKVMCompile workingDirectory keyFile tasks =
    let getNewFileName newExtension (fileName:string) =
        Path.GetFileName(fileName).Replace(Path.GetExtension(fileName), newExtension)
    let startProcess fileName args =
        let result =
            ExecProcess
                (fun info ->
                    info.FileName <- fileName
                    info.WorkingDirectory <- FullName workingDirectory
                    info.Arguments <- args)
                timeOut
        if result<> 0 then
            failwithf "Process '%s' failed with exit code '%d'" fileName result
    let newKeyFile =
        let file = workingDirectory @@ (Path.GetFileName(keyFile))
        File.Copy(keyFile, file, true)
        Path.GetFileName(file)
    let rec compile (task:IKVMcTask) =
        let getIKVMCommandLineArgs() =
            let sb = Text.StringBuilder()
            task.Dependencies |> Seq.iter
               (fun x ->
                   compile x
                   x.JarFile |> getNewFileName ".dll" |> bprintf sb " -r:%s")
            if not <| String.IsNullOrEmpty(task.Version)
                then task.Version |> bprintf sb " -version:%s"
            bprintf sb " %s -out:%s"
                (task.JarFile |> getNewFileName ".jar")
                (task.JarFile |> getNewFileName ".dll")
            sb.ToString()
        File.Copy(task.JarFile, workingDirectory @@ (Path.GetFileName(task.JarFile)) ,true)
        startProcess ikvmc (getIKVMCommandLineArgs())

        if (File.Exists(keyFile)) then
            let dllFile = task.JarFile |> getNewFileName ".dll"
            let ilFile = task.JarFile |> getNewFileName ".il"
            startProcess ildasm (sprintf " /all /out=%s %s" ilFile dllFile)
            File.Delete(dllFile)
            startProcess ilasm (sprintf " /dll /key=%s %s" (newKeyFile) ilFile)
    tasks |> Seq.iter compile

Results

Using this helper function, build scripts come out pretty straightforward and easy. For example, recompilation of Stanford Parser looks as follows:

Target "RunIKVMCompiler" (fun _ ->
    restoreFolderFromUrl
        @".\temp\stanford-parser-full-2013-06-20"
        "http://nlp.stanford.edu/software/stanford-parser-full-2013-06-20.zip"
    restoreFolderFromFile
        @".\temp\stanford-parser-full-2013-06-20\edu"
        @".\temp\stanford-parser-full-2013-06-20\stanford-parser-3.2.0-models.jar"

    [IKVMcTask(@"temp\stanford-parser-full-2013-06-20\stanford-parser.jar",
        Version=version,
        Dependencies =
            [IKVMcTask(@"temp\stanford-parser-full-2013-06-20\ejml-0.19-nogui.jar",
                       Version="0.19.0.0")])]
    |> IKVMCompile ikvmDir @".\Stanford.NLP.snk"
)

All source code is available on GitHub.

Rattle for F# devs

The strange thing happens, Rattle is an awesome tool but it is not so well known for devs as it should be. We definitely need to fix this.

Rattle (the R Analytical Tool To Learn Easily) presents statistical and visual summaries of data, transforms data into forms that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

At first, we need to install new package from CRAN. To do so, just open R console and type the following:

install.packages("rattle")

Here, you need to check that you have RProvider installed.

Install-Package RProvider

Now we are ready to start.

#I @"..\packages\RProvider.1.0.0\lib"
#r "RDotNet.dll"
#r "RProvider.dll"

open RProvider.rattle
R.rattle() |> ignore

Execute this short snippet and you should see Rattle start screen similar to the following:rattle_start You are ready to study your data without a single line of code.

Load you data from wide range of sources:

rattle_load

Explore your data using strongest statistic technics:

rattle_explore

Test the nature of your data:

rattle_test

Transform your data:

rattle_transform

Cluster your data:

rattle_cluster

Identify relationships or affinities:

rattle_associate

Experiment with different models on your data, before implementing any of them in your favorite language:

rattle_model

Evaluate quality of your model:

rattle_evaluate

Learn your data!

Upd: If you are interested in it, then I can recommend the following book.