Twitter Pulse #fsharp 2014

Past two years, at the beginning of the year, I did a post where I have tried to sum up year results and find some interesting facts/stats from tweets:

This year was really awesome, a lot of things happened:

There are so many things occurred, so I need your help to find everything that happened or changed this year.

Especially for this, I prepared a data set with tweets starting from Jan 1, 2013 that is available here, and ask you to help me analyze it.

How to start:

  1. Download  the data set & unzip an archive.
  2. Download latest version of Fsharp.Data.
  3. Copy-paste following code snippet:
#r @"..\packages\FSharp.Data.2.1.1\lib\net40\FSharp.Data.dll"
open FSharp.Data

type Tweets = CsvProvider<"fsharp_2013-2014.csv">
let tweets = Tweets.GetSample()

//TODO: Your awesome analytics here

My sample analysis:

As an example of what you can do with data, I prepared a calculation of people activity that shows who had more tweets this year.

tweets.Rows
|> Seq.filter (fun x -> x.CreatedDate.Year = 2014)
|> Seq.groupBy (fun x -> x.FromUserScreenName)
|> Seq.map (fun (group, items) ->
    (group, Seq.length items))
|> Seq.sortBy (fun (_, cnt) -> -cnt)
|> Seq.take 100
|> Seq.iter (fun (group, cnt) ->
    printfn "%s:%d" group cnt)

When you execute these rows you will get a statistical output in FSI with user names and number of tweets. You can copy this output to wordle.net and play with settings to visualize it in the nice way:
2015-01-02_1007

Please help me to observe data set and share your results with me on Twitter (@sergey-tihon). I will include your plots/charts in the end of this post. Thank you!

7 thoughts on “Twitter Pulse #fsharp 2014

  1. #r @”..\packages\FSharp.Data.2.1.1\lib\net40\FSharp.Data.dll”
    #load @”..\packages/FSharp.Charting.0.90.9/FSharp.Charting.fsx”

    open FSharp.Charting
    open FSharp.Data

    open System.Text.RegularExpressions

    let (|ParseRegex|_|) regex str =
    let m = Regex(regex).Match(str)
    if m.Success
    then Some (List.tail [ for x in m.Groups -> x.Value ])
    else None

    type Tweets = CsvProvider
    let tweets = Tweets.GetSample()

    let distinctTweets =
    tweets.Rows
    |> Seq.choose(fun t -> if t.Text.StartsWith(“RT @”) then None else Some(t) )
    |> Seq.distinctBy(fun t -> t.Text)

    let numberOfTweets =
    tweets.Rows
    |> Seq.length

    let numberOfDistinctTweets =
    distinctTweets
    |> Seq.length

    let groupsOfTweets =
    tweets.Rows
    |> Seq.groupBy (fun t -> t.Text)

    let cgroupsOfTweets =
    groupsOfTweets
    |> Seq.choose (fun (txt, t) -> Some(txt, (t|> Seq.length)))
    |> Seq.sortBy (fun (txt, no) -> -no)

    let printit format seq =
    seq |> Seq.iter (printfn format)

    let getReTweetUser (s:string) =
    match s with
    | ParseRegex “^RT \@(.*?)\:.*?” [name;] -> Some(name)
    | _ -> None

    let mostReTweetedUser =
    cgroupsOfTweets
    |> Seq.choose(fun (txt, no) -> let u = getReTweetUser txt
    if u.IsSome then
    Some(u, no)
    else
    None)
    |> Seq.groupBy(fun (name, no) -> name)
    |> Seq.choose(fun (name1,s) -> Some(name1.Value, (s|> Seq.sumBy(fun (name, no) -> no))))
    |> Seq.sortBy(fun (name, no) -> -no)

    Chart.Column(mostReTweetedUser|> Seq.take(20), Title=”@Tweeters with most retweets”)
    |> Chart.WithXAxis(LabelStyle=ChartTypes.LabelStyle.Create(Angle=45,TruncatedLabels=false, Interval=1.0))

    let totalTweets = [(“Total tweets”, numberOfTweets); (“Retweets”, numberOfTweets-numberOfDistinctTweets); (“Unique tweets”, numberOfDistinctTweets);]

    Chart.Column(totalTweets , Title=”Number of tweets”)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s