Implementing LINQ Where for SharePoint

First, a huge thanks to Waldek Mastykarz for running with my suggestion to run some performance tests on list item filtering. In short, CAML wins by a factor of 300, which I expect would be even more pronounced on larger lists and under load.

In his test, Waldek implements Where() as follows:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    List<SPListItem> list = new List<SPListItem>();
    foreach (SPListItem item in items)
        if (predicate(item))
            list.Add(item);
    return list;
}

This works as expected, but allocates a secondary data structure to store the filtered items. The preferred approach is to use the yield return syntax:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    foreach (SPListItem item in items)
        if (predicate(item))
            yield return item;
}

The actual IL this generates is too complex to go into here, but I highly suggest checking it out in Reflector. In short, the compiler creates a private class that provides a filtered enumerator without actually building an intermediate data structure, instead filtering in MoveNext(). Using yield also defers execution until the collection is actually enumerated, though I can’t think of a SharePoint example where this would actually matter.

Another alternative, which also defers execution, is to leverage LINQ’s Cast<T>() operator and LINQ’s IEnumerable<T>.Where():

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    return items.Cast<SPListItem>().Where(predicate);
}

I imagine the compiler would optimize the yield-generated class in much the same way it optimizes LINQ’s internal implementation, but I will leave that research for a later date. It would also be interesting to compare the performance between the different implementations, though in a SharePoint context I expect the difference would be insignificant compared with the more expensive operations needed to retrieve the data.

The Problem with IDisposable

In my previous post, I suggested a Dispose-safe implementation of SPWebCollection.ForEach(), which Waldek leveraged for his Where implementation. Presumably because he was concerned about leaking SPWebs, his Where() implementation just returns a list of the web IDs. While avoiding leaks is smart, an ID isn’t nearly as useful as the full SPWeb and opening a new SPWeb from the ID is an expensive operation. What if I wanted a filtered enumeration of the SPWeb objects?

Well if we use one of the patterns described above, we should be safe if we call Dispose() for each when we’re done, right? I probably wouldn’t bother asking if there weren’t a catch, so I’ll answer my question with another question: When would Dispose() be called on the webs for which the predicate is false? It wouldn’t! To prevent these leaks, we need to be a bit more sophisticated:

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                       Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs)
        if (predicate(web))
            yield return web;
        else
            web.Dispose();
}

Or using Cast<T>():

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                        Func<SPWeb, bool> predicate)
{
    return webs.Cast<SPWeb>().Where(w =>
    {
        bool r = predicate(w);
        if (!r)
            w.Dispose();
        return r;
    });
}

Again, a detailed IL investigation would likely prove one preferable to the other, but the principle is the same.

Finally, since caller-dependent disposal is unreliable and delegates are fun, I figure we could use a Dispose-safe filtered ForEach:

public static void ForEach(this SPWebCollection webs,
                           Action<SPWeb> action,
                           Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        action(web);
        web.Dispose();
    }
}

Which would let us do something like this to print all publishing sites in a collection:

site.AllWebs.ForEach(
    w => { Console.WriteLine(w.Title); },
    w => { return w.Features.Contains(FeatureIds.OfficePublishingWeb); }
);

That is, if we define yet another useful, if simple, extension method:

public static bool Contains(this SPFeatureCollection features, Guid featureID)
{
    return features[featureID] != null;
}

And a final note: why did the LINQ team use Func<T, bool> instead of Predicate<T>, which has existed since .NET 2.0?

Advertisements