How to enumerate large document library

In this post by the size of the library i mean a total size of the documents in the library, not an item count.

It is relevant for cases when you need to enumerate over all documents in the library to process they, but the size of the library greater then an amount of the RAM on the SharePoint machine.

If you will do it using SPListItemCollection or ContentIterator and try to process all items as a single batch then you will get out of memory exception. It is happens because SharePoint OM download all binaries to the worker process (before or during enumeration).

This problem could be solved using content paging. You can split the library content into small pages and process it page by page. Before page processing we should release all resources allocated for previous page. Also, exist approach that rely on the  humanity of the content structure. We can assume that the size of the documents from one folder is not large and can be processed as a single batch. Such processing order also has advantages over simple paging.

Below you can find an C# example of processing:

using Microsoft.Office.Server.Utilities;
using Microsoft.SharePoint;

public static void EnumerateFolder(SPFolder root, Action<SPListItem> processAction, Action<SPListItem, Exception> exceptionAction)
{
  foreach (SPFolder folder in root.SubFolders)
  EnumerateFolder(folder, processAction, exceptionAction);

  var contentIterator = new ContentIterator();
  contentIterator.ProcessFilesInFolder(root, false,
      (file) => { processAction(file.Item);},
      (file, exception) =>
      {
         exceptionAction(file.Item, exception);
         return false;
      });
}

EnumerateFolder method enumerate over all files into provided SPFolder and all subfolders and execute processAction on each one. The last parameter into ProcessFilesInFolder is an error handler that will be executed after each exception from item processing. Line 13 mean that we do not stop document processing after each exception. More details about ProcessFilesInFolder method you can find here.

Below you can find the same F# example.


open Microsoft.SharePoint
open Microsoft.Office.Server.Utilities

let rec enumerate (root:SPFolder) processAction exceptionAction =
  for folder in root.SubFolders do
    enumerate folder processAction exceptionAction
  ContentIterator().ProcessFilesInFolder(root, false,
    (fun file -> processAction(file.Item)),
    (fun file ex -> exceptionAction(file.Item, ex); false));

P.S. To use ContentIterator you should add Microsoft.Office.Server to the project references.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s