Implementing LINQ Select for SPWebCollection

If I’m going to suggest that others use yield return, I suppose I should take my own advice. In particular, my original implementation of SPWebCollection.Select can be improved using the same techniques I used for Where:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

And we might as well support a filtered Select as well:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector,
                                                   Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

Implementations using Cast<T>() are left as an exercise for the reader.

The function itself isn’t particularly interesting, but I did stumble on something I found rather surprising. When I first wrote up my function, I typed the following:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        yield return selector(web);
        web.Dispose();
    }
}

I’m really not sure why I typed it that way…obviously you can’t keep going after you return, right? Well it turns out you can. The generated class just waits to call Dispose() until the next call to MoveNext(), effectively picking up where it left off. Here’s what that looks like in Reflector:

switch (this.<>1__state)
{
    case 0:
        this.<>1__state = -1;
        this.<>7__wrap1a = this.webs.GetEnumerator();
        this.<>1__state = 1;
        while (this.<>7__wrap1a.MoveNext())
        {
            this.<web>5__19 = (SPWeb) this.<>7__wrap1a.Current;
            this.<>2__current = this.selector(this.<web>5__19);
            this.<>1__state = 2;
            return true;
        Label_0080:
            this.<>1__state = 1;
            this.<web>5__19.Dispose();
        }
        this.<>m__Finally1c();
        break;

    case 2:
        goto Label_0080;
}
return false;

As Select will almost always be used in a foreach, with back-to-back calls of MoveNext(), the distinction is mostly academic. Still, I prefer to know that the web will be disposed immediately after the selector is done with it.

Advertisement

Implementing LINQ Where for SharePoint

First, a huge thanks to Waldek Mastykarz for running with my suggestion to run some performance tests on list item filtering. In short, CAML wins by a factor of 300, which I expect would be even more pronounced on larger lists and under load.

In his test, Waldek implements Where() as follows:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    List<SPListItem> list = new List<SPListItem>();
    foreach (SPListItem item in items)
        if (predicate(item))
            list.Add(item);
    return list;
}

This works as expected, but allocates a secondary data structure to store the filtered items. The preferred approach is to use the yield return syntax:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    foreach (SPListItem item in items)
        if (predicate(item))
            yield return item;
}

The actual IL this generates is too complex to go into here, but I highly suggest checking it out in Reflector. In short, the compiler creates a private class that provides a filtered enumerator without actually building an intermediate data structure, instead filtering in MoveNext(). Using yield also defers execution until the collection is actually enumerated, though I can’t think of a SharePoint example where this would actually matter.

Another alternative, which also defers execution, is to leverage LINQ’s Cast<T>() operator and LINQ’s IEnumerable<T>.Where():

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    return items.Cast<SPListItem>().Where(predicate);
}

I imagine the compiler would optimize the yield-generated class in much the same way it optimizes LINQ’s internal implementation, but I will leave that research for a later date. It would also be interesting to compare the performance between the different implementations, though in a SharePoint context I expect the difference would be insignificant compared with the more expensive operations needed to retrieve the data.

The Problem with IDisposable

In my previous post, I suggested a Dispose-safe implementation of SPWebCollection.ForEach(), which Waldek leveraged for his Where implementation. Presumably because he was concerned about leaking SPWebs, his Where() implementation just returns a list of the web IDs. While avoiding leaks is smart, an ID isn’t nearly as useful as the full SPWeb and opening a new SPWeb from the ID is an expensive operation. What if I wanted a filtered enumeration of the SPWeb objects?

Well if we use one of the patterns described above, we should be safe if we call Dispose() for each when we’re done, right? I probably wouldn’t bother asking if there weren’t a catch, so I’ll answer my question with another question: When would Dispose() be called on the webs for which the predicate is false? It wouldn’t! To prevent these leaks, we need to be a bit more sophisticated:

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                       Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs)
        if (predicate(web))
            yield return web;
        else
            web.Dispose();
}

Or using Cast<T>():

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                        Func<SPWeb, bool> predicate)
{
    return webs.Cast<SPWeb>().Where(w =>
    {
        bool r = predicate(w);
        if (!r)
            w.Dispose();
        return r;
    });
}

Again, a detailed IL investigation would likely prove one preferable to the other, but the principle is the same.

Finally, since caller-dependent disposal is unreliable and delegates are fun, I figure we could use a Dispose-safe filtered ForEach:

public static void ForEach(this SPWebCollection webs,
                           Action<SPWeb> action,
                           Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        action(web);
        web.Dispose();
    }
}

Which would let us do something like this to print all publishing sites in a collection:

site.AllWebs.ForEach(
    w => { Console.WriteLine(w.Title); },
    w => { return w.Features.Contains(FeatureIds.OfficePublishingWeb); }
);

That is, if we define yet another useful, if simple, extension method:

public static bool Contains(this SPFeatureCollection features, Guid featureID)
{
    return features[featureID] != null;
}

And a final note: why did the LINQ team use Func<T, bool> instead of Predicate<T>, which has existed since .NET 2.0?