RProvider – Sergey Tihon's Blog

 1setwd("C:GitR4DotNet")
 2 3#y = x1 + x2 + x3 + E 4#y is what you are trying explain 5#x1, x2, x3 are the variables that cause/influence y 6#E is things that we are not measuring/ using for calculations 7 8fuel.efficiency <- read.csv("C:/Git/R4DotNet/Data/FuelEfficiency.csv")
 9summary(fuel.efficiency)
1011#MPG = Miles per gallon12#GPM = Gallons per 100 miles13#WT = Weight of car in 1000 lbs14#DIS = Displacment in cubic inches15#NC = number of cylinders16#HP = Horsepower17#ACC = Acceleration in seconds from 0-6018#ET = Engine Type 0 = V, 1 = Straight1920plot(GPM~WT,data=fuel.efficiency)
21plot(GPM~DIS,data=fuel.efficiency)
2223fuel.efficiency$NC <- factor(fuel.efficiency$NC)
24fuel.efficiency$ET <-

View original post 333 more words

Twitter Followers Map with RProvider

26/12/201304/01/2014F#1 Comment

Today @oppenheimmd re-tweeted a nice tweet about building Twitter Followers Map with R. Certainly, I decided to build my own map and here it is:

Total number of followers on the screen is smaller than in Twitter. I think it happens because not all people specified the location in the account settings. Take a note that to be able execute this script you need to specify the location in your Twitter account.

As you probably understand from the title, I did this picture using RProvider instead of executing existing R code. Actually, use of twitterMap.R is pretty simple, if not to pay attention to difficulties with Twitter authorization and SSL certificate validation (this part is ugly a bit).

For this demo we need two R packages twitteR and RCurl (with all their dependencies). Please install them:

#I @"..\packages\RProvider.1.0.5\"
#load "RProvider.fsx"

//open RProvider.utils
//R.install_packages("twitteR")
//R.install_packages("RCurl")

open RDotNet
open RProvider
open RProvider.utils
open RProvider.``base``
open RProvider.twitteR
open RProvider.RCurl
open RProvider.ROAuth

I am lazy a bit to fight with RProvider syntax in some places. Actually, I do not even know if it is possible to rewrite such R code using RProvider or not… I have decided to cheat a bit and define a function that gets R expression as a string and evaluates it.

let eval (text:string) =
    namedParams ["text", box text] |> R.parse |> R.eval

You need to have consumerKey and consumerSecret from your registered Twitter application. If you do not have such ones yet, please follow the steps from this article that helps you to register a new Twitter application. The following code is tended to authenticate you in Twitter:

let twitCred =
    namedParams [
        // TODO: insert your consumerKey and consumerSecret
        "consumerKey", box "xxxxxxxxxxxxxxxxxxxxxx"
        "consumerSecret", box "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
        "requestURL", box "https://api.twitter.com/oauth/request_token"
        "accessURL", box "http://api.twitter.com/oauth/access_token"
        "authURL", box "http://api.twitter.com/oauth/authorize" ]
    |> R.OAuthFactory

R.download_file(url = "http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
R.assign("twitCred", twitCred)
eval """twitCred$handshake(cainfo="cacert.pem")"""

Here you need to do some manual work. These are the last authentication steps:

Copy URL from FSI window and paste it in your browser
Allow your twitter app to access your account data
Copy authorization number-code from browser
Paste code in FSI and press Enter

After that, you need to save your authorization data and set SSL certificate to be used globally, which allow twitterMap.R to communicate with Twitter under your account.

R.registerTwitterOAuth(twitCred)
eval """options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))"""

The last step is to run the script and plot your own map:

R.source("http://biostat.jhsph.edu/~jleek/code/twitterMap.R")
// TODO: do not forget to specify your twitter login and increase nMax if you have more than 5000 followers
eval """twitterMap("your_login", fileName="d:\\TwitterMap.pdf", nMax=5000)"""

P.S. If you rewrite this script without eval please post a link in comments 😉

R-Fiddle: An online playground for R code

22/11/201325/02/2021Machine Learning and NLP1 Comment

www.R-fiddle.org is an early stage beta that provides you with a free and powerful environment to write, run and share R-code right inside your browser. It even offers the option to include packages. Since a couple of days it’s gaining more and more traction, and was mentioned on the frontpage of Hacker News.

We designed it for those situations where you have code that you need to prototype quickly and then possibly share it with others for feedback. All this without needing a user account, or any scrap projects or files! We even included a very-easy-to-use ’embed’ function for blogs and website, so your visitors can edit and run R code on your own website or blog. This is the first version of R-fiddle, so do not hesitate to give us feedback.

Working together with the help of R-fiddle

You can use R-fiddle to share code snippets with colleagues…

View original post 526 more words

F# Neural Networks with FsLab

18/11/201325/02/2021F#, Machine Learning and NLP21 Comments

nn_preview Neural networks are very powerful tool and at the same time, it is not easy to use all its power. Now we are one step closer to it from F# and .NET. We will delegate model training to R using R Provider. Also we will use Deedle (that was announced some days ago) for handy data manipulation.

Prerequisites:

Install R, for more information please see RProvider prerequisites.
Install FsLab NuGet package (Reed more about FsLab on fslab.org).

Learning from Data:

First of all, we need to load required assemblies into our FSI session. It is pretty easy with FsLab because package have bootstrapping script.

#load "..\packages\FsLab.0.1.4\FsLab.fsx"

The next step is to download and install missed R packages. For this demo, we need neuralnet for training neural network model and prediction, caret for data visualization.

open RProvider.utils
R.install_packages("MASS")
R.install_packages("pbkrtest")
R.install_packages("lattice")
R.install_packages("Matrix")
R.install_packages("mgcv")
R.install_packages("grid")
R.install_packages("neuralnet")
R.install_packages("caret")
R.install_packages("zoo")

Now we are ready to start work. We need to open namespaces and load a data set. For this demo, we have chosen iris data set, which is classic for lots of demos.

open Deedle
open RDotNet
open RProvider
open RProvider.``base``
open RProvider.datasets
open RProvider.neuralnet
open RProvider.caret

let iris : Frame<int, string> = R.iris.GetValue()

To better understand what we are going to do, let’s plot this data set. First of all, split data into two parts: features (Sepal.Length; Sepal.Width; Petal.Length; Petal.Width) and a target variable (Species). After that plot these data into different dimensions (different colors represent different Species).

let features =
iris
|> Frame.filterCols (fun c _ -> c <> "Species")
|> Frame.mapColValues (fun c -> c.As<double>())
let targets =
R.as_factor(iris.Columns.["Species"])

R.featurePlot(x = features, y = targets, plot = "pairs")

nn_features

As you see, our task is not trivial – we have 3 classes instead of 2 (that is not classic situation) and classes are not clearly separable. Nevertheless let’s try! First of all, we need to split our data into 2 parts – training and testing data sets (70% vs 30%). The first part will be sent to the neural network for learning, the second one will be used for measuring model quality. Also let’s shuffle data to be honest.

iris.ReplaceColumn("Species", targets.AsNumeric())
let range = [1..iris.RowCount]
let trainingIdxs : int[] = R.sample(range, iris.RowCount*7/10).GetValue()
let testingIdxs : int[] = R.setdiff(range, trainingIdxs).GetValue()
let trainingSet = iris.Rows.[trainingIdxs]
let testingSet = iris.Rows.[testingIdxs]

Now we are ready to train a neural network, all we need is to provide a formula (specify what is the input for our model and what is the output) “Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width”, provide a data set and specify the structure of hidden layers. In the following example, we will train the network with two layers of hidden nodes, the first layer with 3 nodes and the second layer with 2 nodes.

let nn =
R.neuralnet(
"Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width",
data = trainingSet, hidden = R.c(3,2),
err_fct = "ce", linear_output = true)

// Plot the resulting neural network with coefficients
R.eval(R.parse(text="library(grid)"))
R.plot_nn nn

nn_network

Cool! How simple it is. To be able to measure quality of the classification we need to split our training set into features and targets.

let testingFeatures =
testingSet
|> Frame.filterCols (fun c _ -> c <> "Species")
|> Frame.mapColValues (fun c -> c.As<double>())
let testingTargets =
testingSet.Columns.["Species"].As<int>().Values

To execute the neural network on the new data (apply our classification) we should call R.compute method and pass the training data set there.

let prediction =
R.compute(nn, testingFeatures)
.AsList().["net.result"].AsVector()
|> Seq.cast<double>
|> Seq.map (round >> int))

Finally, let’s compare prediction results with testing values:

let misclassified =
Seq.zip prediction testingTargets
|> Seq.filter (fun (a,b) -> a<>b)
|> Seq.length

printfn "Misclassified irises '%d' of '%d'" misclassified (testingSet.RowCount)

If you execute all these steps one by one, you will see that there are only ~3 misclassifies of 45 samples. Pretty well quality.

Full script:

#load "..\packages\FsLab.0.1.4\FsLab.fsx"

// You need to install 'nnet' and 'caret' packages if you do not have them
open RProvider.utils
open RProvider.utils
R.install_packages("MASS")
R.install_packages("pbkrtest")
R.install_packages("lattice")
R.install_packages("Matrix")
R.install_packages("mgcv")
R.install_packages("grid")
R.install_packages("neuralnet")
R.install_packages("caret")
R.install_packages("zoo")

open Deedle
open RDotNet
open RProvider
open RProvider.``base``
open RProvider.datasets
open RProvider.neuralnet
open RProvider.caret

// Load data from R to Deedle frame
let iris : Frame<int, string> = R.iris.GetValue()

// Observe iris data set
let features =
iris
|> Frame.filterCols (fun c _ -> c <> "Species")
|> Frame.mapColValues (fun c -> c.As<double>())
let targets =
R.as_factor(iris.Columns.["Species"])

R.featurePlot(x = features, y = targets, plot = "pairs")

iris.ReplaceColumn("Species", targets.AsNumeric())
// Split data to training and testing sets (70% vs 30%)
let range = [1..iris.RowCount]
let trainingIdxs : int[] = R.sample(range, iris.RowCount*7/10).GetValue()
let testingIdxs : int[] = R.setdiff(range, trainingIdxs).GetValue()
let trainingSet = iris.Rows.[trainingIdxs]
let testingSet = iris.Rows.[testingIdxs]

// Train neural network
let nn =
R.neuralnet(
"Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width",
data = trainingSet, hidden = R.c(3,2),
err_fct = "ce", linear_output = true)

// Plot the resulting neural network with coefficients
R.eval(R.parse(text="library(grid)"))
R.plot_nn nn

// Split testing set into features and targets
let testingFeatures =
testingSet
|> Frame.filterCols (fun c _ -> c <> "Species")
|> Frame.mapColValues (fun c -> c.As<double>())
let testingTargets =
testingSet.Columns.["Species"].As<int>().Values

// Predict `Species` for testingFeatures with neural network
let prediction =
R.compute(nn, testingFeatures)
.AsList().["net.result"].AsVector()
|> Seq.cast<double>
|> Seq.map (round >> int))

// Calculate number of misclassified irises
let misclassified =
Seq.zip prediction testingTargets
|> Seq.filter (fun (a,b) -> a<>b)
|> Seq.length

printfn "Misclassified irises '%d' of '%d'" misclassified (testingSet.RowCount)

P.S.

Notice, if you have problems with bootstrapping RProvider and/or converting R data frame to Deedle data frames – you need to verify that during installation of NuGet packages, all assemblies have been copied to RProvider’s lib sub-folder (see in the following picture).

deedle_rprovider

Rattle for F# devs

16/07/201325/02/2021F#, Machine Learning and NLP1 Comment

The strange thing happens, Rattle is an awesome tool but it is not so well known for devs as it should be. We definitely need to fix this.

Rattle (the R Analytical Tool To Learn Easily) presents statistical and visual summaries of data, transforms data into forms that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

At first, we need to install new package from CRAN. To do so, just open R console and type the following:

install.packages("rattle")

Here, you need to check that you have RProvider installed.

Install-Package RProvider

Now we are ready to start.

#I @"..\packages\RProvider.1.0.0\lib"
#r "RDotNet.dll"
#r "RProvider.dll"

open RProvider.rattle
R.rattle() |> ignore

Execute this short snippet and you should see Rattle start screen similar to the following: You are ready to study your data without a single line of code.