Sergey Tihon 🦔🦀🦋

I’m very happy to do some “guest” blogging for my good friend Leo and continue diving into various search-related topics. In this and upcoming posts, I’d like to jump right into something that interests me very much, and that is taking a look at what makes some documents more relevant than others as well as what factors influence rank score calculations.

Since Sharepoint 2013 is already out, I’d like to touch upon a question that comes up often when someone is considering moving from FAST ESP or FAST for Sharepoint 2010 to Sharepoint 2013 : “So how are rank scores calculated in Sharepoint 2013 Search as opposed to previous FAST versions”?

In upcoming posts, I will go more into “internals” of the current Sharepoint 2013 ranking model as well as introduce the basics of relevancy calculation concepts that apply across many search engines and are not necessarily specific to FAST…

View original post 686 more words

F# Weekly #49, 2013

09/12/201309/12/2013F# Weekly1 Comment

A roundup of F# content from this past week:

News

First-class F# ASP.NET web templates are now available in VisualStudio.
F# web templates for NancyFx via SideWaffle are now available!
LinqOptimizer 0.3.8 is on NuGet.
Intellisense on symbols ~~is working~~ is coming soon in Xamarin Studio.
Pre-Alpha version of Vega Chart VS plugin is available here.
Which F# community project has the best logo?
25 years of CPU work done in 3 days on medical research gene mapping using the power of Azure & F#.
Support for untyped results (Map<string,obj> as a row type) in SqlCommandProvider was released.
Foq 1.4 (mocking library for F#) was published on NuGet.

Video/Presentations

“Understanding the World with F#” by Tomas Petricek.
“F# in social gaming” by Yan Cui.
“Domain Driven Design with the F# type System — NDC London 2013” by Scott Wlaschin.
“モナドハンズオン前座” by bleis tift.

Blogs

Daniel Mohl posted “New F# Web App Item Templates“.
Jamie Dixon wrote about “F# and SignalR Stock Ticker: Part 2“.
Daniel Mohl wrote “A New F# ASP.NET MVC 5 and Web API 2 Project Template“.
Matthew Moloney blogged “Inter Exchange Bitcoin Arbitrage (Excel Dashboard with FCell)“.
gab_km shared “MailboxProcessor に処理をさせよう“.
Steve Gilham posted “An introduction to Functional Programming in F# — Part 1 : Functional Programming via circuits“.
JetBrains blogged “Getting to Know Dmitri Nesteruk: Quant Finance and Developer Tools“.
Richard Minerich shared “IPython notebook backed by an F# kernel“.
Michael Newton posted “Type Providers From the Ground Up“.
Simon Dickson wrote about “Owin: The Myth, the Monad“.
Scott Wlaschin posted “Domain Driven Design“.
The Swedish Coder shared “One hour a day 118: Podcast on F#“.
Don Syme blogged “Slides for “Making Magic with F# Type Providers” at NDC London“.
Keith Bloom wrote about “F# in Finance conference“.

That’s all for now. Have a great week.

Previous F# Weekly edition – #48

F# Weekly #48, 2013

02/12/201309/12/2013F# Weekly1 Comment

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

F# 3.1 Compiler/Library Code Drop was announced!
Don Syme presented Xenomorph Type Provider.
FsProjects(F# Community Space for incubating open community projects) was formed.
New release of SqlCommandProvider and docs.
F# binding auto-complete running in SublimeText 3.
All existing StanfordNLP NuGet packages were updated up to v3.3.0.
FSharp.Owin has a new name – Dyfrig.
Use F# in Excel without Tsunami! Code Editor is now part of FCell!
Simple web servers with Suave.
New F# introductory programming class to be taught at Bellevue College starting 1/25.
New cross-functional (F#, Haskell, …) meetup group was formed in Copenhagen.
Ryan Riley ‏is working on a big change for Frank-fs and would appreciate any feedback.
Which F# community project has the best logo? Vote by replying to this tweet.

Video/Presentations

“Reasonable Code with F#” by Mike Falanga
“Simpler Data Access and Smarter Calculations” by Paulmichael Blasucci.
“Intro to F# pattern matching” by Stephen Olsen.
“Using F# to change the way we work” by Jon Harrop.
“F# in Finance Tour” by Phil Trelford
“F# and Financial Data Making Data Analysis Simple” by Tomas Petricek.
“F# in the cloud” by Yan Cui.
“Calling and extending the F# compiler” by Don Syme & Tomas Petricek.

Blogs

Xenomorph blogged “Putting the F# in Finance with TimeScape“.
Cyan By Fuchsia posted “FsUnit and Visual Studio 2013 “Could not load file or assembly ‘FSharp.Core, Version=4.0.0.0 …’“.
Jamie Dixon wrote about “F# and SignalR Stock Ticker Example“.
Vasily Kalugin wrote “Silverlight F# (Fsharp + XAML) application. Access to WebCam using only F#“.
Don Syme wrote “Announcing the F# 3.1 Compiler/Library Code Drop (from the Visual F# Tools Team at Microsoft)“
Don Syme blogged “Putting the F# in Finance with Xenomorph TimeScape: A World of Financial Data at your Fingertips, Strongly Tooled and Strongly Typed“.
Don Syme blogged “How to contribute to the F# support in Xamarin Studio, Emacs and more“.
Don Syme blogged “Microsoft’s “F# in Finance” in London – Initial Report“.
Christoph Rüegg wrote “Test Your C# or F# Library on Mono With Vagrant“.
Daniel Mohl posted “Adding New Items to a Pure F# ASP.NET MVC/Web API Project“.
http://www.geekswithblogs.net published “Running Session Scripts with F#“.
Antonio Parata wrote about “Words permutation for passwords generation“.

That’s all for now. Have a great week.

Previous F# Weekly edition – #47

F# Weekly #47, 2013

25/11/201325/11/2013F# Weekly2 Comments

US debt in % of GDP, colored by presidents. Using Deedle data analysis & Vega visualization.

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

FAKE 2.2 is finally released!
LinqOptimizer.FSharp was released on NuGet.
F# formatting with inline LaTeX for typesetting math was released.
The best quantum computer simulator in the world is written in F#.
Nice to remember that F# has Statically Resolved Type Parameters.
Learn how Microsoft use F# to prototype critical machine learning algorithms.
Azure Type Provider for F# now supports downloading entire folders.
ExtCore 0.8.37 is out! Now includes AsyncSeq, agents, lazyList workflow, and simple futures.
Resources for F# on Curah.
MEAP Update: “Succeeding with functional-first languages in the industry” new to F# Deep Dives
FSharp.Data is now in “Up For Grabs“.
FsPickler 0.7 offers experimental support for structural, non-cryptographic, hashcode generation.
Flash cards for F# was posted on GitHub.
FSharp.Owin was published on NuGet.

Video/Presentations

“FQuake3 – F# Script Proof of Concept” by Will Smith.
Agents and Actor Models in F# 3.0 with Rachel Reese.
“Language-Integrated Quantum Operations: A Software Architecture for Quantum Computing” by Dave Wecker.
“All your types are belong to us!” by Phil Trelford.

Blogs

Sergey Tihon blogged “F# Neural Networks with RProvider & Deedle“.
Jamie Dixon posted “F#, Chain Of Responsibility, And High Ordered Functions“.
Iris Classon wrote “A simple C# and F# example with IoC and unit tests“.
Krzysztof Cieślak wrote about “C# vs F#: Rosetta Code“.
Daniel Mohl shared “F# Web Programming Session at Progressive F# Tutorials 2013“.
Steffen Forkmann blogged “FAKE 2.2 released“.
Nicolas Rolland wrote about “Huffman coding“.
Mike posted “On NULL“.
Jack Fox posted “Great Ideas Made Real in F#“.
Colin Bull blogged “Introducing Java type provider“.
Anthony Brown wrote “DunDDD F# for fun and games slides and demos“.

That’s all for now. Have a great week.

Previous F# Weekly edition – #46

R-Fiddle: An online playground for R code

22/11/201325/02/2021Machine Learning and NLP1 Comment

www.R-fiddle.org is an early stage beta that provides you with a free and powerful environment to write, run and share R-code right inside your browser. It even offers the option to include packages. Since a couple of days it’s gaining more and more traction, and was mentioned on the frontpage of Hacker News.

We designed it for those situations where you have code that you need to prototype quickly and then possibly share it with others for feedback. All this without needing a user account, or any scrap projects or files! We even included a very-easy-to-use ’embed’ function for blogs and website, so your visitors can edit and run R code on your own website or blog. This is the first version of R-fiddle, so do not hesitate to give us feedback.

Working together with the help of R-fiddle

You can use R-fiddle to share code snippets with colleagues…

View original post 526 more words

F# Neural Networks with FsLab

18/11/201325/02/2021F#, Machine Learning and NLP21 Comments

nn_preview Neural networks are very powerful tool and at the same time, it is not easy to use all its power. Now we are one step closer to it from F# and .NET. We will delegate model training to R using R Provider. Also we will use Deedle (that was announced some days ago) for handy data manipulation.

Prerequisites:

Install R, for more information please see RProvider prerequisites.
Install FsLab NuGet package (Reed more about FsLab on fslab.org).

Learning from Data:

First of all, we need to load required assemblies into our FSI session. It is pretty easy with FsLab because package have bootstrapping script.

#load "..\packages\FsLab.0.1.4\FsLab.fsx"

The next step is to download and install missed R packages. For this demo, we need neuralnet for training neural network model and prediction, caret for data visualization.

open RProvider.utils
R.install_packages("MASS")
R.install_packages("pbkrtest")
R.install_packages("lattice")
R.install_packages("Matrix")
R.install_packages("mgcv")
R.install_packages("grid")
R.install_packages("neuralnet")
R.install_packages("caret")
R.install_packages("zoo")

Now we are ready to start work. We need to open namespaces and load a data set. For this demo, we have chosen iris data set, which is classic for lots of demos.

open Deedle
open RDotNet
open RProvider
open RProvider.``base``
open RProvider.datasets
open RProvider.neuralnet
open RProvider.caret

let iris : Frame<int, string> = R.iris.GetValue()

To better understand what we are going to do, let’s plot this data set. First of all, split data into two parts: features (Sepal.Length; Sepal.Width; Petal.Length; Petal.Width) and a target variable (Species). After that plot these data into different dimensions (different colors represent different Species).

let features =
iris
|> Frame.filterCols (fun c _ -> c <> "Species")
|> Frame.mapColValues (fun c -> c.As<double>())
let targets =
R.as_factor(iris.Columns.["Species"])

R.featurePlot(x = features, y = targets, plot = "pairs")

nn_features

As you see, our task is not trivial – we have 3 classes instead of 2 (that is not classic situation) and classes are not clearly separable. Nevertheless let’s try! First of all, we need to split our data into 2 parts – training and testing data sets (70% vs 30%). The first part will be sent to the neural network for learning, the second one will be used for measuring model quality. Also let’s shuffle data to be honest.

iris.ReplaceColumn("Species", targets.AsNumeric())
let range = [1..iris.RowCount]
let trainingIdxs : int[] = R.sample(range, iris.RowCount*7/10).GetValue()
let testingIdxs : int[] = R.setdiff(range, trainingIdxs).GetValue()
let trainingSet = iris.Rows.[trainingIdxs]
let testingSet = iris.Rows.[testingIdxs]

Now we are ready to train a neural network, all we need is to provide a formula (specify what is the input for our model and what is the output) “Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width”, provide a data set and specify the structure of hidden layers. In the following example, we will train the network with two layers of hidden nodes, the first layer with 3 nodes and the second layer with 2 nodes.

let nn =
R.neuralnet(
"Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width",
data = trainingSet, hidden = R.c(3,2),
err_fct = "ce", linear_output = true)

// Plot the resulting neural network with coefficients
R.eval(R.parse(text="library(grid)"))
R.plot_nn nn

nn_network

Cool! How simple it is. To be able to measure quality of the classification we need to split our training set into features and targets.

let testingFeatures =
testingSet
|> Frame.filterCols (fun c _ -> c <> "Species")
|> Frame.mapColValues (fun c -> c.As<double>())
let testingTargets =
testingSet.Columns.["Species"].As<int>().Values

To execute the neural network on the new data (apply our classification) we should call R.compute method and pass the training data set there.

let prediction =
R.compute(nn, testingFeatures)
.AsList().["net.result"].AsVector()
|> Seq.cast<double>
|> Seq.map (round >> int))

Finally, let’s compare prediction results with testing values:

let misclassified =
Seq.zip prediction testingTargets
|> Seq.filter (fun (a,b) -> a<>b)
|> Seq.length

printfn "Misclassified irises '%d' of '%d'" misclassified (testingSet.RowCount)

If you execute all these steps one by one, you will see that there are only ~3 misclassifies of 45 samples. Pretty well quality.

Full script:

#load "..\packages\FsLab.0.1.4\FsLab.fsx"

// You need to install 'nnet' and 'caret' packages if you do not have them
open RProvider.utils
open RProvider.utils
R.install_packages("MASS")
R.install_packages("pbkrtest")
R.install_packages("lattice")
R.install_packages("Matrix")
R.install_packages("mgcv")
R.install_packages("grid")
R.install_packages("neuralnet")
R.install_packages("caret")
R.install_packages("zoo")

open Deedle
open RDotNet
open RProvider
open RProvider.``base``
open RProvider.datasets
open RProvider.neuralnet
open RProvider.caret

// Load data from R to Deedle frame
let iris : Frame<int, string> = R.iris.GetValue()

// Observe iris data set
let features =
iris
|> Frame.filterCols (fun c _ -> c <> "Species")
|> Frame.mapColValues (fun c -> c.As<double>())
let targets =
R.as_factor(iris.Columns.["Species"])

R.featurePlot(x = features, y = targets, plot = "pairs")

iris.ReplaceColumn("Species", targets.AsNumeric())
// Split data to training and testing sets (70% vs 30%)
let range = [1..iris.RowCount]
let trainingIdxs : int[] = R.sample(range, iris.RowCount*7/10).GetValue()
let testingIdxs : int[] = R.setdiff(range, trainingIdxs).GetValue()
let trainingSet = iris.Rows.[trainingIdxs]
let testingSet = iris.Rows.[testingIdxs]

// Train neural network
let nn =
R.neuralnet(
"Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width",
data = trainingSet, hidden = R.c(3,2),
err_fct = "ce", linear_output = true)

// Plot the resulting neural network with coefficients
R.eval(R.parse(text="library(grid)"))
R.plot_nn nn

// Split testing set into features and targets
let testingFeatures =
testingSet
|> Frame.filterCols (fun c _ -> c <> "Species")
|> Frame.mapColValues (fun c -> c.As<double>())
let testingTargets =
testingSet.Columns.["Species"].As<int>().Values

// Predict `Species` for testingFeatures with neural network
let prediction =
R.compute(nn, testingFeatures)
.AsList().["net.result"].AsVector()
|> Seq.cast<double>
|> Seq.map (round >> int))

// Calculate number of misclassified irises
let misclassified =
Seq.zip prediction testingTargets
|> Seq.filter (fun (a,b) -> a<>b)
|> Seq.length

printfn "Misclassified irises '%d' of '%d'" misclassified (testingSet.RowCount)

P.S.

Notice, if you have problems with bootstrapping RProvider and/or converting R data frame to Deedle data frames – you need to verify that during installation of NuGet packages, all assemblies have been copied to RProvider’s lib sub-folder (see in the following picture).

deedle_rprovider