SharePoint 2013: Content Enrichment and performance of 3rd party services

There are a lot of cases when people write Content Enrichment service for SharePoint 2013 to integrate it with some 3rd party tagging/enrichment API. Such integration very often leads to huge crawl time degradation, the same happened in my case too.

The first thought was to measure the time that CPES (Content Processing Enrichment Service) spends on calls. I wrapped all my calls to external system in timers and run Full Crawl again. The results were awful, calls were in average from 10 to 30 times slower from CPES than from POC console app. Code was the same and of course perfect, so, probably my 3rd party service was not able to handle such an incredible load that was generated by my super-fast SharePoint Farm.

If you are in the same situation with fast SharePoint, awesome CPES implementation and bad 3rd party service, you probably should try async integration approach described in “Advanced Content Enrichment in SharePoint 2013 Search” by Richard diZerega.

But what if the problem is in my source code… what if execution environment affects execution time of REST calls… Some time later, I found a nice post “Quick Ways to Boost Performance and Scalability of ASP.NET, WCF and Desktop Clients” from Omar Al Zabir. This article describes the notion of Connection Manager:

By default, system.net has two concurrent connections per IP allowed. This means on a webserver, if it’s calling WCF service on another server or making any outbound call to a particular server, it will only be able to make two concurrent calls. When you have a distributed application and your web servers need to make frequent service calls to another server, this becomes the greatest bottleneck

WHOA! That means that it does not matter how fast our SharePoint Search service is. Our CPES executes no more than two calls to 3rd party service at the same time and queues other. Let’s fix this unpleasant injustice in web.config:

 <system.net>
   <connectionManagement>
     <add address="*" maxconnection="128"/>
   </connectionManagement>
 </system.net>

Now performance degradation should be fixed and if your 3rd party service is good enough your crawl time will be reasonable, otherwise you need to go back to advice from Richard diZereg.

SharePoint 2013: Content Enrichment for Large Files

There are a couple of guides on how to write Content Enrichment services for SharePoint 2013. One of them is official MSDN article “How to: Use the Content Enrichment web service callout for SharePoint Server“.

This article advice you two configuration steps to adjust max size of document that will be processed by CPES (Content Processing Enrichment Service).

  1. Modify web.config to accept messages up to 8 MB, and configure readerQuotas to be a sufficiently large value.
    <bindings>
     <basicHttpBinding>
       <!-- The service will accept a maximum blob of 8 MB. -->
       <binding maxReceivedMessageSize = "8388608">
       <readerQuotas maxDepth="32"
         maxStringContentLength="2147483647"
         maxArrayLength="2147483647" 
         maxBytesPerRead="2147483647" 
         maxNameTableCharCount="2147483647" /> 
       <security mode="None" />
       </binding>
     </basicHttpBinding>
    </bindings>
    
  2. Modify SPEnterpriseSearchContentEnrichmentConfiguration.
    $ssa = Get-SPEnterpriseSearchServiceApplication
    $config = New-SPEnterpriseSearchContentEnrichmentConfiguration
    $config.Endpoint = http://Site_URL/ContentEnrichmentService.svc
    $config.InputProperties = "Author", "Filename"
    $config.OutputProperties = "Author"
    $config.SendRawData = $True
    $config.MaxRawDataSize = 8192
    Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication
    $ssa –ContentEnrichmentConfiguration $config
    

The concept is generally good, but what if you need to process files larger than 8MB? Let’s try to increase this number up to 300Mb for example (I think that ideally this limit should be not less than max file size allowed for your web apps).

Let’s change both values and run full crawl of SharePoint site. After that, if you are lucky, you will see something like that in your “Error Breakdown”:

crawl_errors_001

WAT? Something went wrong, but what it was … Let’s investigate ULS logs on the machine with Search Service. After a couple of unforgettable minutes of reading ULS logs, I’ve found the following error message:

[Microsoft.CrawlerFlow-cb9134ec-91c6-4bac-89f9-a0cc9fe1e481] Microsoft.Ceres.Evaluation.Engine.ErrorHandling.HandleExceptionHelper : Evaluation failure detected: Operator : ContentEnrichment Operator type : ContentEnrichmentClient Error id : 3206 Correlation id : 60ef1afd-038a-4f64-8230-2b2493923f80 Partition id : 0c37852b-34d0-418e-91c6-2ac25af4be5b Message : Failed to send the item to the content processing enrichment service. 49691C90-7E17-101A-A91C-08002B2ECDA9:#9: https://mysite.com/MyDoc.pptx id : ssic://780174 System.ServiceModel.EndpointNotFoundException: There was no endpoint listening
at http://MyServer:8081/ContentProcessingEnrichmentService.svc that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details. —> System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) –

WAT? “The remote server returned an error: (404) Not Found”. Does the service not exist sometimes? How could it be? Let’s go to IIS log (on the machine where your CPES is installed). Path to the IIS logs should look similar to this c:\inetpub\logs\LogFiles\W3SVC3\.

iislog

It is true – CPES sometimes returns 404.13 status. Let’s google what this status code means.

404.13 – Content length too large. The request contains a Content-Length header. The value of the Content-Length header is larger than the limit that is allowed for the server.

Seems that IIS is not ready yet to receive our 300Mb files. There is one more parameter in web config that should be tweaked to  handle really large files, this parameter is maxAllowedContentLength (default value is 30000000, that is ~30Mb). Let’s change it in web.config:

 <system.webServer>
   <security>
     <requestFiltering>
       <requestLimits maxAllowedContentLength="314572800" />
     </requestFiltering>
   </security>
 </system.webServer>

Recrawl your content once again, and Voila, strange errors gone! Enjoy your content enrichment!)

Sharepoint 2013 Search Ranking and Relevancy Part 1: Let’s compare to FS14

Search Unleashed

I’m very happy to do some “guest” blogging for my good friend Leo and continue diving into various search-related topics.  In this and upcoming posts, I’d like to jump right into something that interests me very much, and that is taking a look at what makes some documents more relevant than others as well as what factors influence rank score calculations.

Since Sharepoint 2013 is already out, I’d like to touch upon a question that comes up often when someone is considering moving from FAST ESP or FAST for Sharepoint 2010 to Sharepoint 2013 :  “So how are rank scores calculated in Sharepoint 2013 Search as opposed to previous FAST versions”?

In upcoming posts, I will go more into “internals” of the current Sharepoint 2013 ranking model as well as introduce the basics of relevancy calculation concepts that apply across many search engines and are not necessarily specific to FAST…

View original post 686 more words

Declarative authorization in REST services in SharePoint with F# and ServiceStack

This post is a short overview of my talk at Belarus SharePoint User Group at 2013/06/27.

The primary goals were to find an efficient declarative way to specify authorization on REST service (that knows about SharePoint built in security model) and try F# in SharePoint on a real-life problem. Service Stack was selected over ASP.NET Web API because I wanted to find a solution that operates in SharePoint 2010 on .NET 3.5.

Microsoft Patents related to SharePoint 2013 Search

Extremely useful to understand what is under the hood of SharePoint 2013 Search.

Insights into search black magic

While investigating relevancy calculation in new SharePoint 2013, I did a research and it turned out that there are a lot publicly available patents which cover search in SharePoint, most of them are done in scope of Microsoft Research programs, according to names of inventors. Although there is no direct evidence that it was implemented exactly as described in patents, I created an Excel spreadsheet which mimiques logic described in patents which actual values from SharePoint – and numbers match!

I hope most curious of you will find it helpful to deep dive into Enterprise Search relevancy and better understand what happens behind the curtain.

Enterprise relevancy ranking using a neural network

Internal ranking model representation schema

Techniques to perform relative ranking for search results

Ranking and providing search results based in part on a number of click-through features

Document length as…

View original post 28 more words

Explain rank in Sharepoint 2013 Search

Add your thoughts here… (optional)

Insights into search black magic

Default ranking model in Sharepoint 2013 is completely different from what we’ve seen in FS4SP and is definitely a step forward comparing to SP2010. In a nutshell it uses multiple features (to take into account query terms, it’s proximity; document click-through, authority, anchor text, url depth etc ) which are mixed with help of neural network as a final step. Details can be found in patent claim http://www.google.com/patents/US8296292.

Hopefully there’s a way to bring more light into this black magic. I’ve modified default Display Template and added “Explain Rank” link to each item. This link redirects user to ExplainRank page which hosted under {search_center_url}/_layouts/15/ folder.

1 - starwars

I used

  • d=ctx.CurrentItem.Path
  • q=ctx.DataProvider.$2_3.k  (have found this hidden property using trial & error method)
  • another option is to extract value for q= from QueryText from ctx.ListData.Properties.SerializedQuery which value was<Query Culture=”en-US” EnableStemming=”True” EnablePhonetic=”False” EnableNicknames=”False” IgnoreAllNoiseQuery=”True” SummaryLength=”180″ MaxSnippetLength=”180″ DesiredSnippetLength=”90″ KeywordInclusion=”0″ QueryText=”star wars” QueryTemplate=”{searchboxquery}” TrimDuplicates=”True”…

View original post 150 more words

SharePoint 2013 Development Environment

Talbott Crowell's Software Development Blog

In this post, I’m going to describe my setup and some of the software and system requirements I needed for getting a SharePoint 2013 development environment up and running.  I’ll try to update this post as new releases of the Microsoft Office Developer Tools for Visual Studio 2012 are released or I make major changes or discoveries as time progresses.

sharepoint2013

Development Environment Setup

I’ve been using CloudShare for some of my SharePoint 2013 development ever since I purchased my new MacBook Air 11 inch which is maxed out at 8 GB and running Parallels so I can run Windows 7.  But recently I’ve decided to pull out my old Dell Precision M6500 laptop which has 16 GB to get a SharePoint 2013 development environment set up locally.  This beast is a great laptop for SharePoint development, but it is very heavy (the power supply is heavier then my MacBook Air). …

View original post 1,930 more words

Seven SharePoint Videos from //build/ 2012 Conference