Twitter Followers Map with RProvider

Today @oppenheimmd re-tweeted a nice tweet about building Twitter Followers Map with R. Certainly, I decided to build my own map and here it is:

twitterMap

Total number of followers on the screen is smaller than in Twitter. I think it happens because not all people specified the location in the account settings. Take a note that to be able execute this script you need to specify the location in your Twitter account.

As you probably understand from the title, I did this picture using RProvider instead of executing existing R code. Actually, use of twitterMap.R is pretty simple, if not to pay attention to difficulties with Twitter authorization and SSL certificate validation (this part is ugly a bit).

For this demo we need two R packages twitteR and RCurl (with all their dependencies). Please install them:

#I @"..\packages\RProvider.1.0.5\"
#load "RProvider.fsx"

//open RProvider.utils
//R.install_packages("twitteR")
//R.install_packages("RCurl")

open RDotNet
open RProvider
open RProvider.utils
open RProvider.``base``
open RProvider.twitteR
open RProvider.RCurl
open RProvider.ROAuth

I am lazy a bit to fight with RProvider syntax in some places. Actually, I do not even know if it is possible to rewrite such R code using RProvider or not… I have decided to cheat a bit and define a function that gets R expression as a string and evaluates it.

let eval (text:string) =
    namedParams ["text", box text] |> R.parse |> R.eval

You need to have consumerKey and consumerSecret from your registered Twitter application. If you do not have such ones yet, please follow the steps from this article that helps you to register a new Twitter application. The following code is tended to authenticate you in Twitter:

let twitCred =
    namedParams [
        // TODO: insert your consumerKey and consumerSecret
        "consumerKey", box "xxxxxxxxxxxxxxxxxxxxxx"
        "consumerSecret", box "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
        "requestURL", box "https://api.twitter.com/oauth/request_token"
        "accessURL", box "http://api.twitter.com/oauth/access_token"
        "authURL", box "http://api.twitter.com/oauth/authorize" ]
    |> R.OAuthFactory

R.download_file(url = "http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
R.assign("twitCred", twitCred)
eval """twitCred$handshake(cainfo="cacert.pem")"""

Here you need to do some manual work. These are the last authentication steps:

  1. Copy URL from FSI window and paste it in your browser
  2. Allow your twitter app to access your account data
  3. Copy authorization number-code from browser
  4. Paste code in FSI and press Enter

After that, you need to save your authorization data and set SSL certificate to be used globally, which allow twitterMap.R to communicate with Twitter under your account.

R.registerTwitterOAuth(twitCred)
eval """options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))"""

The last step is to run the script and plot your own map:

R.source("http://biostat.jhsph.edu/~jleek/code/twitterMap.R")
// TODO: do not forget to specify your twitter login and increase nMax if you have more than 5000 followers
eval """twitterMap("your_login", fileName="d:\\TwitterMap.pdf", nMax=5000)"""

P.S. If you rewrite this script without eval please post a link in comments 😉

New Twitter API or “F# Weekly” v1.1

Good news for Twitter and no so good for developers:twitter_app

Today(2013-06-11), we(Twitter) are retiring API v1 and fully transitioning to API v1.1.

What does it all mean? This means that all old services are no longer available. Twitter switched to new ones with mandatory OAuth authentication. From now, to work with twitter services we must register new apps and use OAuth.

Also, it means that:

As I know, there are two alternatives available instead of Twitterizer:

  • Tweetsharp (TweetSharp is a fast, clean wrapper around the Twitter API.)
  • LINQ to Twitter (An open source 3rd party LINQ Provider for the Twitter micro-blogging service.)

I have chosen Tweetsharp because its API similar to Twitterizer. This is a new F# Weekly under the hood script:

#r "Newtonsoft.Json.dll"
#r "Hammock.ClientProfile.dll"
#r "TweetSharp.dll"

open TweetSharp
open System
open System.Net
open System.Text.RegularExpressions

let service = new TwitterService(_consumerKey, _consumerSecret)
service.AuthenticateWith(_accessToken, _accessTokenSecret)

let getTweets query =
    let rec collect maxId =
        let options = SearchOptions(Q = query, Count =Nullable(100), MaxId = Nullable(maxId),
                                    Resulttype = Nullable(TwitterSearchResultType.Recent))
        printfn "Loading %s under id %d" query maxId
        let results = service.Search(options).Statuses |> Seq.toList
        printfn "\t Loaded %d tweets" results.Length
        if (results.Length = 0)
            then List.empty
            else
                let lastTweet = results |> List.rev |> List.head
                if (lastTweet.Id < maxId)                     then results |> List.append (collect (lastTweet.Id))
                    else results
    collect (Int64.MaxValue) |> List.rev

let urlRegexp = Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);

let filterUniqLinks (tweets: TwitterStatus list) =
    let hash = new System.Collections.Generic.HashSet();
    tweets |> List.fold
        (fun acc t ->
             let mathces = urlRegexp.Matches(t.Text)
             if (mathces.Count = 0) then acc
             else let urls =
                     [0 .. (mathces.Count-1)]
                     |> List.map (fun i -> mathces.[i].Value)
                     |> List.filter (fun url -> not(hash.Contains(url)))
                  if (List.isEmpty urls) then acc
                  else urls |> List.iter(fun url -> hash.Add(url) |> ignore)
                       t :: acc)
        [] |> List.rev

let tweets =
    ["#fsharp";"#fsharpx";"@dsyme";"#websharper";"@c4fsharp"]
    |> List.map getTweets
    |> List.concat
    |> List.sortBy (fun t -> t.CreatedDate)
    |> filterUniqLinks

let printTweetsInHtml filename (tweets: TwitterStatus list) =
    let formatTweet (text:string) =
        let matches = urlRegexp.Matches(text)
        seq {0 .. (matches.Count-1)}
            |> Seq.fold (
                fun (t:string) i ->
                    let url = matches.[i].Value
                    t.Replace(url, (sprintf "<a href="\&quot;%s\&quot;" target="\&quot;_blank\&quot;">%s</a>" url url)))
                text
    let rows =
      tweets
        |> List.mapi (fun i t ->
            let id = (tweets.Length - i)
            let text = formatTweet(t.Text)
            sprintf "</pre>
<table id="\&quot;%d\&quot;">
<tbody>
<tr>
<td rowspan="\&quot;2\&quot;" width="\&quot;30\&quot;">%d</td>
<td rowspan="\&quot;2\&quot;" width="\&quot;80\&quot;"><a href="\&quot;javascript:remove('%d')\&quot;">Remove</a></td>
<td rowspan="\&quot;2\&quot;"><a href="\&quot;https://twitter.com/%s\&quot;" target="\&quot;_blank\&quot;"><img alt="" src="\&quot;%s\&quot;/" /></a></td>
<td><b>%s</b></td>
</tr>
<tr>
<td>Created : %s</td>
</tr>
</tbody>
</table>
<pre>
"
id id id t.Author.ScreenName t.Author.ProfileImageUrl text (t.CreatedDate.ToString()))
        |> List.fold (fun s r -> s+" "+r) ""
    let html = sprintf "<script type="text/javascript">// <![CDATA[
function remove(id){return (elem=document.getElementById(id)).parentNode.removeChild(elem);}
// ]]></script>%s" rows
 System.IO.File.WriteAllText(filename, html)

printTweetsInHtml "d:\\tweets.html" tweets

“F# Weekly” under the hood

Under the F# Weekly news preparation lies a simple F# script.

This script uses Twitterizer2  – one of the simplest Twitter client libraries for .NET. Source code ia available on GitHub, binaries are available through NuGet.

Script logic is relatively simple. First of all, collect a list of queries for Twitter.

    let tweets = ["#fsharp";"#fsharpx";"@dsyme";"#websharper";"#fsharpweekly"]

Then make a call to the Twitter Search API for each query, concatenate the results for last week and sort all tweets by creation date.

                |> List.map (getTweets (DateTime.Now - TimeSpan.FromDays(7.0)))
                |> List.concat
                |> List.sortBy (fun t -> t.CreatedDate)

Then leave only ‘en’ news and filter out tweets without links and RT leaving only first occurrence of each unique link.

                |> List.filter (fun t -> t.Language = "en")
                |> filterUniqLinks

Also in the source code below you can find console printing method for results verification and html printing method for further manual results review.

Feel free to use it in your social researches.

#r "Twitterizer2.dll"

open Twitterizer
open Twitterizer.Entities
open System
open System.Net
open System.Text.RegularExpressions;

let getTweets (sinceDate:DateTime) query =
    let rec collect pageNum =
        let options = SearchOptions(NumberPerPage = 100, SinceDate = sinceDate, PageNumber = pageNum);
        printfn "Loading %d-%d" (pageNum*options.NumberPerPage) ((pageNum+1)*options.NumberPerPage)
        let result = TwitterSearch.Search(query, options);
        if (result.Result <> RequestResult.Success || result.ResponseObject.Count = 0)
            then List.empty
            else result.ResponseObject |> List.ofSeq |> List.append (collect (pageNum+1))
        collect 1 |> List.rev

let urlRegexp = Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);

let filterUniqLinks (tweets: TwitterSearchResult list) =
    let hash = new System.Collections.Generic.HashSet<string>();
    tweets |> List.fold
        (fun acc t ->
            let mathces = urlRegexp.Matches(t.Text)
            if (mathces.Count = 0) then acc
            else let urls =
                   [0 .. (mathces.Count-1)]
                       |> List.map (fun i -> mathces.[i].Value)
                       |> List.filter (fun url -> not(hash.Contains(url)))
                 if (List.isEmpty urls) then acc
                 else urls |> List.iter(fun url -> hash.Add(url) |> ignore)
                      t :: acc)
        [] |> List.rev

let printTweets (tweets: TwitterSearchResult list) =
    tweets |> List.iter (fun t ->
        printfn "%15s : %s : %s" t.FromUserScreenName (t.CreatedDate.ToShortDateString()) t.Text)

let tweets = ["#fsharp";"#fsharpx";"@dsyme";"#websharper";"#fsharpweekly"]
                |> List.map (getTweets (DateTime.Now - TimeSpan.FromDays(7.0)))
                |> List.concat
                |> List.sortBy (fun t -> t.CreatedDate)
                |> List.filter (fun t -> t.Language = "en")
                |> filterUniqLinks
printfn "Tweets count : %d" tweets.Length
printTweets tweets

let printTweetsInHtml filename (tweets: TwitterSearchResult list) =
    let formatTweet (text:string) =
        let matches = urlRegexp.Matches(text)
        seq {0 .. (matches.Count-1)}
            |> Seq.fold (
                fun (t:string) i ->
                    let url = matches.[i].Value
                    t.Replace(url, (sprintf "<a href=\"%s\" target=\"_blank\">%s</a>" url url)))
                text
    let rows =
      tweets
        |> List.mapi (fun i t ->
            let id = (tweets.Length - i)
            let text = formatTweet(t.Text)
            sprintf "<table id=\"%d\"><tr><td rowspan=\"2\" width=\"30\">%d</td><td rowspan=\"2\" width=\"80\"><a href=\"javascript:remove('%d')\">Remove</a><td rowspan=\"2\"><a href=\"https://twitter.com/%s\" target=\"_blank\"><img src=\"%s\"/></a></td><td><b>%s</b></td></tr><tr><td>Created : %s <br></td></tr></table>"
                     id id id t.FromUserScreenName t.ProfileImageLocation text (t.CreatedDate.ToString()))
        |> List.fold (fun s r -> s+"&nbsp;"+r) ""
    let html = sprintf "<html><head><script>function remove(id){return (elem=document.getElementById(id)).parentNode.removeChild(elem);}</script></head><body>%s</body></html>" rows
    System.IO.File.WriteAllText(filename, html)

printTweetsInHtml "d:\\tweets.html" tweets