SharePoint 2013: Content Enrichment for Large Files

There are a couple of guides on how to write Content Enrichment services for SharePoint 2013. One of them is official MSDN article “How to: Use the Content Enrichment web service callout for SharePoint Server“.

This article advice you two configuration steps to adjust max size of document that will be processed by CPES (Content Processing Enrichment Service).

  1. Modify web.config to accept messages up to 8 MB, and configure readerQuotas to be a sufficiently large value.
    <bindings>
     <basicHttpBinding>
       <!-- The service will accept a maximum blob of 8 MB. -->
       <binding maxReceivedMessageSize = "8388608">
       <readerQuotas maxDepth="32"
         maxStringContentLength="2147483647"
         maxArrayLength="2147483647" 
         maxBytesPerRead="2147483647" 
         maxNameTableCharCount="2147483647" /> 
       <security mode="None" />
       </binding>
     </basicHttpBinding>
    </bindings>
    
  2. Modify SPEnterpriseSearchContentEnrichmentConfiguration.
    $ssa = Get-SPEnterpriseSearchServiceApplication
    $config = New-SPEnterpriseSearchContentEnrichmentConfiguration
    $config.Endpoint = http://Site_URL/ContentEnrichmentService.svc
    $config.InputProperties = "Author", "Filename"
    $config.OutputProperties = "Author"
    $config.SendRawData = $True
    $config.MaxRawDataSize = 8192
    Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication
    $ssa –ContentEnrichmentConfiguration $config
    

The concept is generally good, but what if you need to process files larger than 8MB? Let’s try to increase this number up to 300Mb for example (I think that ideally this limit should be not less than max file size allowed for your web apps).

Let’s change both values and run full crawl of SharePoint site. After that, if you are lucky, you will see something like that in your “Error Breakdown”:

crawl_errors_001

WAT? Something went wrong, but what it was … Let’s investigate ULS logs on the machine with Search Service. After a couple of unforgettable minutes of reading ULS logs, I’ve found the following error message:

[Microsoft.CrawlerFlow-cb9134ec-91c6-4bac-89f9-a0cc9fe1e481] Microsoft.Ceres.Evaluation.Engine.ErrorHandling.HandleExceptionHelper : Evaluation failure detected: Operator : ContentEnrichment Operator type : ContentEnrichmentClient Error id : 3206 Correlation id : 60ef1afd-038a-4f64-8230-2b2493923f80 Partition id : 0c37852b-34d0-418e-91c6-2ac25af4be5b Message : Failed to send the item to the content processing enrichment service. 49691C90-7E17-101A-A91C-08002B2ECDA9:#9: https://mysite.com/MyDoc.pptx id : ssic://780174 System.ServiceModel.EndpointNotFoundException: There was no endpoint listening
at http://MyServer:8081/ContentProcessingEnrichmentService.svc that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details. —> System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) –

WAT? “The remote server returned an error: (404) Not Found”. Does the service not exist sometimes? How could it be? Let’s go to IIS log (on the machine where your CPES is installed). Path to the IIS logs should look similar to this c:\inetpub\logs\LogFiles\W3SVC3\.

iislog

It is true – CPES sometimes returns 404.13 status. Let’s google what this status code means.

404.13 – Content length too large. The request contains a Content-Length header. The value of the Content-Length header is larger than the limit that is allowed for the server.

Seems that IIS is not ready yet to receive our 300Mb files. There is one more parameter in web config that should be tweaked to  handle really large files, this parameter is maxAllowedContentLength (default value is 30000000, that is ~30Mb). Let’s change it in web.config:

 <system.webServer>
   <security>
     <requestFiltering>
       <requestLimits maxAllowedContentLength="314572800" />
     </requestFiltering>
   </security>
 </system.webServer>

Recrawl your content once again, and Voila, strange errors gone! Enjoy your content enrichment!)

F# Weekly #40, 2015

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

F# vNext News

New releases

That’s all for now. Have a great week.

Previous F# Weekly edition – #39Subscribe

F# Weekly #39, 2015

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

F# vNext News

New releases

That’s all for now. Have a great week.

Previous F# Weekly edition – #38Subscribe

F# Weekly #38, 2015

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

F# vNext News

New releases

That’s all for now. Have a great week.

Previous F# Weekly edition – #37Subscribe

F# Weekly #37, 2015

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

F# vNext News

New releases

That’s all for now. Have a great week.

Previous F# Weekly edition – #34#35#36Subscribe

F# Weekly #34,#35,#36, 2015

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

F# vNext News

New releases

That’s all for now. Have a great week.

Previous F# Weekly edition – #33Subscribe

Swagger for F# Web Apps

nlp-logo-navbarSwagger is a simple yet powerful representation of your RESTful API. With the largest ecosystem of API tooling on the planet, thousands of developers are supporting Swagger in almost every modern programming language and deployment environment. With a Swagger-enabled API, you get interactive documentation, client SDK generation and discoverability.

Swagger is very powerful framework that is able to generate schema and rich UI for your RESTful API. As I know, Swagger is very popular framework especially in the non-.NET world. You probably have already seen Swagger UI (like this) at several resources.

It turns out that it is not hard to use Swagger for .NET F# apps. There is a project called Swashbuckle, which adds Swagger to WebApi projects.

First of all, you need to create F# ASP.NET Web API project. You can do it using “F# Web Application templates (MVC 5 and Web API 2.2) by Ryan Riley and Daniel Mohl“. Choose “Web API 2.2 and Katana 3.0” option in project creation wizard.

So, now you have F# Web App with RESTful API: CarsController with two services ‘/api/cars‘ and ‘/api/car/{id}‘. Everything is awesome but we do not have UI that is able to show a list of available services, their parameters and return types. Swashbuckle will help us here, we need to install ‘Swashbuckle.Core‘ package to our web app.

Install-Package Swashbuckle.Core

The last step is to update HttpConfiguration in Startup.fs in proper way to register Swagger. Add following three lines to the end of RegisterWebApi method.

open Swashbuckle.Application

type Startup() =
    static member RegisterWebApi(config: HttpConfiguration) =
        // ...
        // Swagger configuration
        config
          .EnableSwagger(fun c -> c.SingleApiVersion("v1", "My API") |> ignore)
          .EnableSwaggerUi();

That’s all! When you start your web application and open ‘/swagger/ui/index‘ URI you will see beautiful documentation for your RESTful API.
SwaggerUI

Real-time analytics with Apache Storm – now in F#

Eugene Tolmachev's avatarI think, therefore I spam.

Over the past several month I’ve been prototyping various aspects of  an IoT platform – or more specifically, exploring the concerns of “soft” real-time handling of communications with potentially hundreds of thousands of devices.

Up to this point, being in .NET ecosystem I’ve been building distributed solutions with a most excellent lightweight ESB – MassTransit, but for IoT we wanted to be a little closer to the wire. Starting with the clean slate and having discovered Apache Storm and Nathan’s presentation and I realized that it addresses exactly the challenges we have.

It appears to be the ultimate reactive microservices platform for lambda architecture: it is fairly simple, fault tolerant overall, yet embracing fire-n-forget and “let it fail” on the component level.

While Storm favours JDK for development, has extensive component support for Java developers and heavily optimizes for JRE components execution, it also supports “shell” components via its multilang protocol. Which is what, unlike Spark…

View original post 206 more words

F# Weekly #33, 2015

Welcome to F# Weekly,

Note that F# Weekly goes to summer holidays under the gentle Spanish sun. The next edition of F# Weekly will be published on the first of September.

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

F# vNext News

New releases

That’s all for now. Have a great week.

Previous F# Weekly edition – #32Subscribe