Past two years, at the beginning of the year, I did a post where I have tried to sum up year results and find some interesting facts/stats from tweets:
This year was really awesome, a lot of things happened:
- Real cooperation between Visual F# Team and F# open source community
- Legal entity for F# Software Foundation
- New F# logo and branded new site.
- Awesome F# Advent Calendar at the end of the year 😉
- and many more…
There are so many things occurred, so I need your help to find everything that happened or changed this year.
Especially for this, I prepared a data set with tweets starting from Jan 1, 2013 that is available here, and ask you to help me analyze it.
How to start:
- Download the data set & unzip an archive.
- Download latest version of Fsharp.Data.
- Copy-paste following code snippet:
#r @"..\packages\FSharp.Data.2.1.1\lib\net40\FSharp.Data.dll" open FSharp.Data type Tweets = CsvProvider<"fsharp_2013-2014.csv"> let tweets = Tweets.GetSample() //TODO: Your awesome analytics here
My sample analysis:
As an example of what you can do with data, I prepared a calculation of people activity that shows who had more tweets this year.
tweets.Rows |> Seq.filter (fun x -> x.CreatedDate.Year = 2014) |> Seq.groupBy (fun x -> x.FromUserScreenName) |> Seq.map (fun (group, items) -> (group, Seq.length items)) |> Seq.sortBy (fun (_, cnt) -> -cnt) |> Seq.take 100 |> Seq.iter (fun (group, cnt) -> printfn "%s:%d" group cnt)
When you execute these rows you will get a statistical output in FSI with user names and number of tweets. You can copy this output to wordle.net and play with settings to visualize it in the nice way:
Please help me to observe data set and share your results with me on Twitter (@sergey-tihon). I will include your plots/charts in the end of this post. Thank you!
#r @”..\packages\FSharp.Data.2.1.1\lib\net40\FSharp.Data.dll”
#load @”..\packages/FSharp.Charting.0.90.9/FSharp.Charting.fsx”
open FSharp.Charting
open FSharp.Data
open System.Text.RegularExpressions
let (|ParseRegex|_|) regex str =
let m = Regex(regex).Match(str)
if m.Success
then Some (List.tail [ for x in m.Groups -> x.Value ])
else None
type Tweets = CsvProvider
let tweets = Tweets.GetSample()
let distinctTweets =
tweets.Rows
|> Seq.choose(fun t -> if t.Text.StartsWith(“RT @”) then None else Some(t) )
|> Seq.distinctBy(fun t -> t.Text)
let numberOfTweets =
tweets.Rows
|> Seq.length
let numberOfDistinctTweets =
distinctTweets
|> Seq.length
let groupsOfTweets =
tweets.Rows
|> Seq.groupBy (fun t -> t.Text)
let cgroupsOfTweets =
groupsOfTweets
|> Seq.choose (fun (txt, t) -> Some(txt, (t|> Seq.length)))
|> Seq.sortBy (fun (txt, no) -> -no)
let printit format seq =
seq |> Seq.iter (printfn format)
let getReTweetUser (s:string) =
match s with
| ParseRegex “^RT \@(.*?)\:.*?” [name;] -> Some(name)
| _ -> None
let mostReTweetedUser =
cgroupsOfTweets
|> Seq.choose(fun (txt, no) -> let u = getReTweetUser txt
if u.IsSome then
Some(u, no)
else
None)
|> Seq.groupBy(fun (name, no) -> name)
|> Seq.choose(fun (name1,s) -> Some(name1.Value, (s|> Seq.sumBy(fun (name, no) -> no))))
|> Seq.sortBy(fun (name, no) -> -no)
Chart.Column(mostReTweetedUser|> Seq.take(20), Title=”@Tweeters with most retweets”)
|> Chart.WithXAxis(LabelStyle=ChartTypes.LabelStyle.Create(Angle=45,TruncatedLabels=false, Interval=1.0))
let totalTweets = [(“Total tweets”, numberOfTweets); (“Retweets”, numberOfTweets-numberOfDistinctTweets); (“Unique tweets”, numberOfDistinctTweets);]
Chart.Column(totalTweets , Title=”Number of tweets”)
Oh no…. I cannot be on the 1st place. maybe somewhere is a bug there? 😉
Thanks for charts
thats why I posted code too 😉