Blog – Page 10 – Sergey Tihon's Blog

Confluence/Jira communication from F# and C#

24/06/201324/06/2013F#3 Comments

jira_confluence Nowadays, Atlassian products become more and more popular. Different companies and teams start using Jira and Confluence for project management. It would be good to have an ability to communicate with these services from .NET. As you probably know, Jira and Confluence are pure Java applications. Both applications provide SOAP and REST services. REST is a new target for Atlassian, they focused on it and do not touch SOAP anymore. So SOAP services live with all their bugs inside and even deprecated in JIRA 6.0.

It is a bit strange for me. I can understand REST benefits, but it is step backward for developers’ convenience. Each programming language should maintain a client library. Any change in REST API can break all client tools. REST produces a lot of headache for service users. IMHO, REST should be done in OData manner to simplify life for API users. But it is out of topic a bit.

I have tried to use Jira and Confluence web services some time ago, it was harder and more limited then existing SOAP ones. I have not checked the latest ones, you can try if you wish: Confluence REST API documentation and JIRA REST API Tutorials. As I know, there is no mature client libraries for .NET.

I have already tried to get to work F# WSDL Type Provider and Confluence SOAP service. But it does not work, because a Confluence SOAP endpoint is not compatible with WCF. There is a known bug “Creating Service Reference from JIRA WSDL in Visual Studio 2010 generates all methods void“, that would not be fixed.

Workaround for C# guys

The workaround is to create a Web Reference vs Service Reference. Details you can find on StackOverflow: “Web Reference vs. Service Reference“. But I want to repeat these steps here:

Click on ‘Add Service Reference’.
Click on ‘Advanced’ button.
Click on ‘Add Web Reference’ button.
Paste an URL to the WSDL of Confluence SOAP service and click on the ‘Go to …’. (https://developer.atlassian.com/rpc/soap-axis/confluenceservice-v2?WSDL)
Click on ‘Add reference’ button.
Repeat same steps for Jira SOAP service. (https://jira.atlassian.com/rpc/soap/jirasoapservice-v2?WSDL)
That is all you need to start working with Jira and Confluence. As a result, you should see two web references in your project.

Workaround for F# guys

The easiest way to do it from F# is to build proxy library in C# and reference it from F#. I have already done it and if you wish you can download it from GitHub. There is one issue in such solution – function’s parameters named in not readable way like arg0, arg1 and so on. To understand what actually you should pass to the service you need to check actual parameter names in documentation: JiraSoapService and ConfluenceSoapService.

Confluence sample script:

#r @"..\Altassian.Proxy\bin\Release\Altassian.Proxy.dll"
open Altassian.Proxy.com.atlassian.confluence

let service = new ConfluenceSoapServiceService(Url = @"https://SERVER_NAME/rpc/soap-axis/confluenceservice-v2?WSDL")
let token = service.login("LOGIN","PASSWORD")

service.getSpaces(token)
|> Seq.iter (fun x-> printfn "%s" x.name )

service.Dispose()

Jira sample script:

#r @"..\Altassian.Proxy\bin\Release\Altassian.Proxy.dll"
open Altassian.Proxy.com.atlassian.jira

let service = new JiraSoapServiceService(Url = @"https://SERVER_NAME/rpc/soap/jirasoapservice-v2?wsdl")
let token = service.login("LOGIN","PASSWORD")

service.getIssuesFromJqlSearch(token, "status = open", 10)
|> Seq.iter (fun x-> printfn "%s" x.summary )

service.Dispose()

All source code is available on GitHub.

Selective crawling in SharePoint 2010 (with F# & Selenium)

24/06/2013Enterprise Search, F#1 Comment

SharePoint Search Service Applications have two modes for crawling content:

Full Crawl that re-crawls all documents from Content Source.
Incremental Crawl that crawls documents modified since the previous one.

But it is really not enough if you are working on search driven apps. (More about SharePoint crawling you can read in Brian Pendergrass “SP2010 Search *Explained: Crawling” post).

Search applications are a special kind of applications that force you to be iterative. Generally, you work with large amount of data and you cannot afford to do full crawl often, because it is a slow process. There is another reason why it is slow: more intelligent search requires more time to indexing. We can not increase computations in query time, because it directly affects users’ satisfaction. Crawling time is the only place for intelligence.

Custom document processing pipeline stages are tricky a bit. Generally, you can find some documents in your hundreds of thousands or millions corpus, which failed on your custom stage or were processed in a wrong way. These may happen because of anything (wrong URL format, corrupted file, locked document, lost connection, unusual encoding, too large file size, memory issue, BSOD on the crawling node, power outage and even due to the bug in the source code 🙂 ) Assume you were lucky to find documents where your customizations work wrong and even fix them. There is a question how to test your latest changes? Do you want to wait some days to check whether it works on these files or not? I think no… You probably want to have an ability to re-crawl some items and verify your changes.

Incremental crawl does not solve the problem. It is really hard to find all files that you want to re-crawl and modify them somehow. Sometimes modification is not possible at all. What to do in such situation?

Search Service Applications have an UI for high level monitoring of index health (see the picture below). There you can check the crawl status of document by URL and even re-crawl on individual item.

SharePoint does not provide an API to do it from code. All that we have is a single ASP.NET form in Central Administration. If you make a further research and catch call using Fiddler then you can find target code that process request. You can decompile SharePoint assemblies and find that some mysterious SQL Server stored procedure was called to add you document into processing queue (read more about that stuff in Mikael Svenson’s answer on FAST Search for SharePoint forum).

Ahh… It is already hard enough, just a pain and no fun. Even if we find where to get or how to calculate all parameters to stored procedure, it does not solve all our problems. Also we need to find a way to collect all URLs of buggy documents that we want to re-crawl. It is possible to do so using SharePoint web services, I have already posted about that (see “F# and FAST Search for SharePoint 2010“). If you like this approach, please continue the research. I am tired here.

Canopy magic

Why should I go so far in SharePoint internals for such a ‘simple’ task. Actually, we can automate this task through UI. We have a Canopy – good UI automation Selenium wrapper for F#. All we need is to write some lines of code that start browser, open the page and click some buttons many times. For sure this solution have some disadvantages:

You should be a bit familiar with Selenium, but this one is easy to fix.
It will be slow. It works for hundreds document, maybe for thousands, but no more. ( I think that if you need to re-crawl millions of documents you can run a full crawl).

Also such approach has some benefits:

It is easy to code and to use.
It is flexible.
It solves another problem – you can use Canopy for grabbing document URLs directly from the search result page or the other one.

All you need to start with Canopy is to download NuGet package and web driver for your favorite browser (Chrome WebDrover, IE WebDriver). The next steps are pretty straightforward: reference three assemblies, configure web driver location if it is different from default ‘c:\’ and start browser:

#r @"..\packages\Selenium.Support.2.33.0\lib\net40\WebDriver.Support.dll"
#r @"..\packages\Selenium.WebDriver.2.33.0\lib\net40\WebDriver.dll"
#r @"..\packages\canopy.0.7.7\lib\canopy.dll"

open canopy

configuration.chromeDir <- @"d:\"
start chrome

Be careful, Selenium, Canopy and web drivers are high intensively developed projects – newest versions maybe different from mentioned above. Now, we are ready to automate the behavior, but here is a little trick. To show up a menu we need to click on the area marked red on the screenshot below, but we should not touch the link inside this area. To click on the element in the specified position, we need to use Selenium advanced user interactions capabilities.

let sendToReCrawl url =
    let encode (s:string) = s.Replace(" ","%20")
    try
        let encodedUrl = encode url
        click "#ctl00_PlaceHolderMain_UseAsExactMatch" // Select "Exact Match"
        "#ctl00_PlaceHolderMain_UrlSearchTextBox" << encodedUrl
        click "#ctl00_PlaceHolderMain_ButtonFilter" // Click "Search" Button

        elements "#ctl00_PlaceHolderMain_UrlLogSummaryGridView tr .ms-unselectedtitle"
        |> Seq.iter (fun result ->
            OpenQA.Selenium.Interactions.Actions(browser)
                  .MoveToElement(result, result.Size.Width-7, 7)
                  .Click().Perform() |> ignore
            sleep 0.05
            match someElement "#mp1_0_2_Anchor" with
            | Some(el) -> click el
            | _ -> failwith "Menu item does not found."
        )
   with
   | ex -> printfn "%s" ex.Message

let recrawlDocuments logViewerUrl pageUrls =
    url logViewerUrl // Open LogViewer page
    click "#ctl00_PlaceHolderMain_RadioButton1" // Select "Url or Host name"
    pageUrls |> Seq.iteri (fun i x ->
        printfn "Processing item #%d" i;
        sendToReCrawl x)

That is all. I think that all other parts should be easy to understand. Here, CSS selectors used to specify elements to interact with.

Another one interesting part is grabbing URLs from search results page. It can be useful and it is easy to automate, let’s do it.

let grabSearchResults pageUrl =
    url pageUrl
    let rec collectUrls() =
        let urls =
            elements ".srch-Title3 a"
            |> List.map (fun el -> el.GetAttribute("href"))
        printfn "Loaded '%d' urls" (urls.Length)
        match someElement "#SRP_NextImg" with
        | None -> urls
        | Some(el) ->
            click el
            urls @ (collectUrls())
     collectUrls()

Finally, we are ready to execute all this stuff. We need to specify two URLs: first one is to the page with search results where we get all URLs, second one is to the logviewer page in you Search Service Application in Central Administration(do not forget to replace them in the sample above). Almost all SharePoint web applications require authentication, you can pass your login and password directly in URL as it done in the sample above.

grabSearchResults "http://LOGIN:PASSWORD@SEARVER_NAME/Pages/results.aspx?dupid=1025426827030739029&start1=1"
|> recrawlDocuments "http://LOGIN:PASSWORD@SEARVER_NAME:CA_POST/_admin/search/logviewer.aspx?appid={5095676a-12ec-4c68-a3aa-5b82677ca9e0}"

New Twitter API or “F# Weekly” v1.1

16/06/201317/07/2013F#, F# Weekly2 Comments

Good news for Twitter and no so good for developers:

Today(2013-06-11), we(Twitter) are retiring API v1 and fully transitioning to API v1.1.

What does it all mean? This means that all old services are no longer available. Twitter switched to new ones with mandatory OAuth authentication. From now, to work with twitter services we must register new apps and use OAuth.

Also, it means that:

Script from ““F# Weekly” under the hood” does not work. Twitterizer is a dead project and will not be updated up to Twitter API v1.1.
Beautiful script from “A Twitter search client in 10 lines of code with F# and the JSON type provider” does not work, because it uses v1 service.

As I know, there are two alternatives available instead of Twitterizer:

Tweetsharp (TweetSharp is a fast, clean wrapper around the Twitter API.)
LINQ to Twitter (An open source 3rd party LINQ Provider for the Twitter micro-blogging service.)

I have chosen Tweetsharp because its API similar to Twitterizer. This is a new F# Weekly under the hood script:

#r "Newtonsoft.Json.dll"
#r "Hammock.ClientProfile.dll"
#r "TweetSharp.dll"

open TweetSharp
open System
open System.Net
open System.Text.RegularExpressions

let service = new TwitterService(_consumerKey, _consumerSecret)
service.AuthenticateWith(_accessToken, _accessTokenSecret)

let getTweets query =
    let rec collect maxId =
        let options = SearchOptions(Q = query, Count =Nullable(100), MaxId = Nullable(maxId),
                                    Resulttype = Nullable(TwitterSearchResultType.Recent))
        printfn "Loading %s under id %d" query maxId
        let results = service.Search(options).Statuses |> Seq.toList
        printfn "\t Loaded %d tweets" results.Length
        if (results.Length = 0)
            then List.empty
            else
                let lastTweet = results |> List.rev |> List.head
                if (lastTweet.Id < maxId)                     then results |> List.append (collect (lastTweet.Id))
                    else results
    collect (Int64.MaxValue) |> List.rev

let urlRegexp = Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);

let filterUniqLinks (tweets: TwitterStatus list) =
    let hash = new System.Collections.Generic.HashSet();
    tweets |> List.fold
        (fun acc t ->
             let mathces = urlRegexp.Matches(t.Text)
             if (mathces.Count = 0) then acc
             else let urls =
                     [0 .. (mathces.Count-1)]
                     |> List.map (fun i -> mathces.[i].Value)
                     |> List.filter (fun url -> not(hash.Contains(url)))
                  if (List.isEmpty urls) then acc
                  else urls |> List.iter(fun url -> hash.Add(url) |> ignore)
                       t :: acc)
        [] |> List.rev

let tweets =
    ["#fsharp";"#fsharpx";"@dsyme";"#websharper";"@c4fsharp"]
    |> List.map getTweets
    |> List.concat
    |> List.sortBy (fun t -> t.CreatedDate)
    |> filterUniqLinks

let printTweetsInHtml filename (tweets: TwitterStatus list) =
    let formatTweet (text:string) =
        let matches = urlRegexp.Matches(text)
        seq {0 .. (matches.Count-1)}
            |> Seq.fold (
                fun (t:string) i ->
                    let url = matches.[i].Value
                    t.Replace(url, (sprintf "<a href="\&quot;%s\&quot;" target="\&quot;_blank\&quot;">%s</a>" url url)))
                text
    let rows =
      tweets
        |> List.mapi (fun i t ->
            let id = (tweets.Length - i)
            let text = formatTweet(t.Text)
            sprintf "</pre>
<table id="\&quot;%d\&quot;">
<tbody>
<tr>
<td rowspan="\&quot;2\&quot;" width="\&quot;30\&quot;">%d</td>
<td rowspan="\&quot;2\&quot;" width="\&quot;80\&quot;"><a href="\&quot;javascript:remove('%d')\&quot;">Remove</a></td>
<td rowspan="\&quot;2\&quot;"><a href="\&quot;https://twitter.com/%s\&quot;" target="\&quot;_blank\&quot;"><img alt="" src="\&quot;%s\&quot;/" /></a></td>
<td><b>%s</b></td>
</tr>
<tr>
<td>Created : %s</td>
</tr>
</tbody>
</table>
<pre>
"
id id id t.Author.ScreenName t.Author.ProfileImageUrl text (t.CreatedDate.ToString()))
        |> List.fold (fun s r -> s+" "+r) ""
    let html = sprintf "<script type="text/javascript">// <![CDATA[
function remove(id){return (elem=document.getElementById(id)).parentNode.removeChild(elem);}
// ]]></script>%s" rows
 System.IO.File.WriteAllText(filename, html)

printTweetsInHtml "d:\\tweets.html" tweets

My 100th anniversary blog post.

02/06/201302/06/2013Uncategorized4 Comments

This is my 100th blog post. I want to say thank you to all of you who reads my blog and follows me in twitter. Thanks to your interest new posts appear in this blog. blog_map

Continue reading ➞ My 100th anniversary blog post.

15 Principles for Data Scientists

02/06/2013UncategorizedLeave a Comment

Open Source Research

I have developed 15 principles for my daily work as a data scientist. These are the principles that I personally follow :

1- Do not lie with data and do not bullshit: Be honest and frank about empirical evidences. And most importantly do not lie to yourself with data

2- Build everlasting tools and share them with others: Spend a portion of your daily work building tools that makes someone’s life easier. We are freaking humans, we are supposed to be tool builders!

3- Educate yourself continuously: you are a scientist for Bhudda’s sake. Read hardcore math and stats from graduate level textbooks. Never settle down for shitty explanations of a method that you receive from a coworker in the hallway. Learn fundamentals and you can do magic. Read recent papers, go to conferences, publish, and review papers. There is no shortcut for this.

4- Sharpen your skills: learn one language well…

View original post 413 more words

Three easy ways to create simple Web Server with F#

18/05/2013F#5 Comments

I have tried to find easiest ways to create a simple web server with F#. There are three most simple ways to do it.

The goal is to create a simple web service that maps web request urls to the files in the site folder. If file with such name exists then return its content as html. Assume that all html files located in ‘D:\mySite\‘.

HttpListener

First and probably the most promising option was created by Julian Kay and described in his post “Creating a simple HTTP Server with F#“. I slightly modified source code to satisfy my initial goal. You can find detailed description of how it works in Julian’s post. (Works from FSI)

open System
open System.Net
open System.Text
open System.IO

let siteRoot = @"D:\mySite\"
let host = "http://localhost:8080/"

let listener (handler:(HttpListenerRequest->HttpListenerResponse->Async<unit>)) =
    let hl = new HttpListener()
    hl.Prefixes.Add host
    hl.Start()
    let task = Async.FromBeginEnd(hl.BeginGetContext, hl.EndGetContext)
    async {
        while true do
            let! context = task
            Async.Start(handler context.Request context.Response)
    } |> Async.Start

let output (req:HttpListenerRequest) =
    let file = Path.Combine(siteRoot,
                            Uri(host).MakeRelativeUri(req.Url).OriginalString)
    printfn "Requested : '%s'" file
    if (File.Exists file)
        then File.ReadAllText(file)
        else "File does not exist!"

listener (fun req resp ->
    async {
        let txt = Encoding.ASCII.GetBytes(output req)
        resp.ContentType <- "text/html"
        resp.OutputStream.Write(txt, 0, txt.Length)
        resp.OutputStream.Close()
    })
// TODO: add your code here

Self-hosted WCF service

The second option is a tuned self-hosted WCF service. This approach was proposed by Brian McNamara as an answer to the StackOverflow question “F# web server library“. (Works from FSI)

#r "System.ServiceModel.dll"
#r "System.ServiceModel.Web.dll"

open System
open System.IO

open System.ServiceModel
open System.ServiceModel.Web

let siteRoot = @"D:\mySite\"

[<ServiceContract>]
type MyContract() =
    [<OperationContract>]
    [<WebGet(UriTemplate="{file}")>]
    member this.Get(file:string) : Stream =
        printfn "Requested : '%s'" file
        WebOperationContext.Current.OutgoingResponse.ContentType <- "text/html"
        let bytes = File.ReadAllBytes(Path.Combine(siteRoot, file))
        upcast new MemoryStream(bytes)

let startAt address =
    let host = new WebServiceHost(typeof<MyContract>, new Uri(address))
    host.AddServiceEndpoint(typeof<MyContract>, new WebHttpBinding(), "")
      |> ignore
    host.Open()
    host

let server = startAt "http://localhost:8080/"
// TODO: add your code here
server.Close()

NancyFx

The third one is based on NancyFx. It is lightweight, low-ceremony, framework for building HTTP based services on .Net and Mono. Nancy is a popular framework in C# world, but does not have a natural support of F#. The F# code looks not so easy and simple as it could be. If you want to make it work, you need to create console application and install the Nancy and Nancy.Hosting.Self NuGet packages.

module WebServers

open System
open System.IO
open Nancy
open Nancy.Hosting.Self
open Nancy.Conventions

let (?) (this : obj) (prop : string) : obj =
    (this :?> DynamicDictionary).[prop]

let siteRoot = @"d:\mySite\"

type WebServerModule() as this =
    inherit NancyModule()
    do this.Get.["{file}"] <-
         fun parameters ->
              new Nancy.Responses.HtmlResponse(
                  HttpStatusCode.OK,
                  (fun (s:Stream) ->
                      let file = (parameters?file).ToString()
                      printfn "Requested : '%s'" file
                      let bytes = File.ReadAllBytes(Path.Combine(siteRoot, file))
                      s.Write(bytes,0,bytes.Length)
              )) |> box

let startAt host =
    let nancyHost = new NancyHost(new Uri(host))
    nancyHost.Start()
    nancyHost

let server = startAt "http://localhost:8080/"
printfn "Press [Enter] to exit."
Console.ReadKey() |> ignore
server.Stop()

A Twitter search client in 10 lines of code with F# and the JSON type provider

18/05/201320/05/2013F#1 Comment

Json.Net vs ServiceStack.Text

12/05/201328/10/2015F#2 Comments

I can not understand why JSON.NET so popular (http://www.servicestack.net/mythz_blog/?p=344)

Or maybe I am wrong and picture have changed ?

@sergey_tihon Lastest results for json.net using the most recent versions off NuGet twitter.com/JamesNK/status…

— James Newton-King ♔ (@JamesNK) May 12, 2013

Need to test it!

Mike Falanga speaks about Discriminated Unions at Cleveland F# SIG

28/04/201328/04/2013F#Leave a Comment

do the needful, write about it, simple

I was able to record the Cleveland F# SIG, where Mike Falanga spoke about F# language’s Discriminated Unions feature (http://msdn.microsoft.com/en-us/library/dd233226.aspx).

Here are the videos I took of the event for all attendees that weren’t able to make it:

View original post 283 more words