LINQ for SPWebCollection Revisited: AsSafeEnumerable

I have previously discussed implementations of some core LINQ extension methods for SPWebCollection, a process complicated by the fact that SPWeb is IDisposable. The core logic is correct, but the lack of try...finally means exceptions will cause leaks. Since this affects any enumeration of the list, not just LINQ-specific operations, I started to look into making the enumerator smarter. This turned out to be much simpler than I expected:

public static IEnumerable<SPWeb> AsSafeEnumerable(this SPWebCollection webs)
{
  foreach (SPWeb web in webs)
  {
    try
    {
      yield return web;
    }
    finally
    {
      web.Dispose();
    }
  }
}

Internally, this logic is wrapped in a generated class (disassembled at the end of this post) that implements IEnumerable<SPWeb> and IEnumerator<SPWeb>. The finally block is compiled into its own method which is called in two places: MoveNext() before moving to the next item, and Dispose(). An often-overlooked tidbit about foreach is that the compiler will generate a try...finally block if the enumerator is (or might be) IDisposable, which ours is (inherited from IEnumerator<T>). So whether the loop advances normally or exits early (exceptions, breaks, etc), the current SPWeb will be disposed properly.

The bonus in handling disposal through the enumerator is that our LINQ extension methods can be delegated to the framework:

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                       Func<SPWeb, bool> predicate)
{
  return webs.AsSafeEnumerable().Where(predicate);
}
public static void ForEach(this SPWebCollection webs,
                           Action<SPWeb> action,
                           Func<SPWeb, bool> predicate)
{
  foreach (SPWeb web in webs.Where(predicate))
    action(web);
}

public static void ForEach(this SPWebCollection webs, Action<SPWeb> action)
{
  webs.ForEach(action, w => true);
}

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector,
                                                   Func<SPWeb, bool> predicate)
{
  return webs.Where(predicate).Select(selector);
}

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
  return webs.AsSafeEnumerable().Select(selector);
}

This approach was always possible for collections of normal objects using Cast<T>(), but is now available for SPWebCollection and SPSiteCollection as well.

Using with LINQ

I’ve talked a lot about LINQ-enabling SharePoint collections, but I’ve never posted actual LINQ code using it. With the above extensions defined, we can safely use LINQ against the original collection:

var webs = from w in site.AllWebs
           where string.IsNullOrEmpty(w.Theme)
           orderby w.Title
           select w.Title;

Note in this example that we can use orderby, even though we haven’t defined OrderBy() on SPWebCollection, because the call is chained from the IEnumerable<SPWeb> returned by Where(). However, this won’t compile:

var webs = from w in site.AllWebs
            orderby w.Title
            select w.Title;

Giving the following error:

Could not find an implementation of the query pattern for source type ‘Microsoft.SharePoint.SPWebCollection’.  ‘OrderBy’ not found.  Consider explicitly specifying the type of the range variable ‘w’.

We also don’t want to take the suggestion to specify a type for ‘w’:

var webs = from SPWeb w in site.AllWebs
            orderby w.Title
            select w.Title;

As this inserts a Cast, leaking every SPWeb:

IEnumerable<string> webs2 = site.AllWebs.Cast<SPWeb>().OrderBy<SPWeb, string>(delegate (SPWeb w) {
  return w.Title;
}).Select<SPWeb, string>(delegate (SPWeb w) {
  return w.Title;
});

For operators we haven’t defined, we can always call AsSafeEnumerable() to get an IEnumerable<SPWeb> that has everything:

var lists = from w in site.AllWebs.AsSafeEnumerable()
            from SPList l in w.Lists
            where !l.Hidden && !w.IsRootWeb
            select new { WebTitle = w.Title, ListTitle = l.Title };

AsSafeEnumerable Disassembled

[CompilerGenerated]
private sealed class <AsSafeEnumerable>d__0 : IEnumerable<SPWeb>, IEnumerable,
                                              IEnumerator<SPWeb>, IEnumerator, IDisposable
{
    // Fields
    private int <>1__state;
    private SPWeb <>2__current;
    public SPWebCollection <>3__webs;
    public IEnumerator <>7__wrap2;
    public IDisposable <>7__wrap3;
    private int <>l__initialThreadId;
    public SPWeb <web>5__1;
    public SPWebCollection webs;

    // Methods
    [DebuggerHidden]
    public <AsSafeEnumerable>d__0(int <>1__state)
    {
        this.<>1__state = <>1__state;
        this.<>l__initialThreadId = Thread.CurrentThread.ManagedThreadId;
    }

    private void <>m__Finally4()
    {
        this.<>1__state = -1;
        this.<>7__wrap3 = this.<>7__wrap2 as IDisposable;
        if (this.<>7__wrap3 != null)
        {
            this.<>7__wrap3.Dispose();
        }
    }

    private void <>m__Finally5()
    {
        this.<>1__state = 1;
        this.<web>5__1.Dispose();
    }

    private bool MoveNext()
    {
        bool flag;
        try
        {
            int num = this.<>1__state;
            if (num != 0)
            {
                if (num != 3)
                {
                    goto Label_009A;
                }
                goto Label_0073;
            }
            this.<>1__state = -1;
            this.<>7__wrap2 = this.webs.GetEnumerator();
            this.<>1__state = 1;
            while (this.<>7__wrap2.MoveNext())
            {
                this.<web>5__1 = (SPWeb) this.<>7__wrap2.Current;
                this.<>1__state = 2;
                this.<>2__current = this.<web>5__1;
                this.<>1__state = 3;
                return true;
            Label_0073:
                this.<>1__state = 2;
                this.<>m__Finally5();
            }
            this.<>m__Finally4();
        Label_009A:
            flag = false;
        }
        fault
        {
            this.System.IDisposable.Dispose();
        }
        return flag;
    }

    [DebuggerHidden]
    IEnumerator<SPWeb> IEnumerable<SPWeb>.GetEnumerator()
    {
        Enumerable.<AsSafeEnumerable>d__0 d__;
        if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId) && (this.<>1__state == -2))
        {
            this.<>1__state = 0;
            d__ = this;
        }
        else
        {
            d__ = new Enumerable.<AsSafeEnumerable>d__0(0);
        }
        d__.webs = this.<>3__webs;
        return d__;
    }

    [DebuggerHidden]
    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.System.Collections.Generic.IEnumerable<Microsoft.SharePoint.SPWeb>.GetEnumerator();
    }

    [DebuggerHidden]
    void IEnumerator.Reset()
    {
        throw new NotSupportedException();
    }

    void IDisposable.Dispose()
    {
        switch (this.<>1__state)
        {
            case 1:
            case 2:
            case 3:
                try
                {
                    switch (this.<>1__state)
                    {
                    }
                    break;
                    try
                    {
                    }
                    finally
                    {
                        this.<>m__Finally5();
                    }
                }
                finally
                {
                    this.<>m__Finally4();
                }
                break;
        }
    }

    // Properties
    SPWeb IEnumerator<SPWeb>.Current
    {
        [DebuggerHidden]
        get
        {
            return this.<>2__current;
        }
    }

    object IEnumerator.Current
    {
        [DebuggerHidden]
        get
        {
            return this.<>2__current;
        }
    }
}

Implementing LINQ Select for SPWebCollection

If I’m going to suggest that others use yield return, I suppose I should take my own advice. In particular, my original implementation of SPWebCollection.Select can be improved using the same techniques I used for Where:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

And we might as well support a filtered Select as well:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector,
                                                   Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

Implementations using Cast<T>() are left as an exercise for the reader.

The function itself isn’t particularly interesting, but I did stumble on something I found rather surprising. When I first wrote up my function, I typed the following:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        yield return selector(web);
        web.Dispose();
    }
}

I’m really not sure why I typed it that way…obviously you can’t keep going after you return, right? Well it turns out you can. The generated class just waits to call Dispose() until the next call to MoveNext(), effectively picking up where it left off. Here’s what that looks like in Reflector:

switch (this.<>1__state)
{
    case 0:
        this.<>1__state = -1;
        this.<>7__wrap1a = this.webs.GetEnumerator();
        this.<>1__state = 1;
        while (this.<>7__wrap1a.MoveNext())
        {
            this.<web>5__19 = (SPWeb) this.<>7__wrap1a.Current;
            this.<>2__current = this.selector(this.<web>5__19);
            this.<>1__state = 2;
            return true;
        Label_0080:
            this.<>1__state = 1;
            this.<web>5__19.Dispose();
        }
        this.<>m__Finally1c();
        break;

    case 2:
        goto Label_0080;
}
return false;

As Select will almost always be used in a foreach, with back-to-back calls of MoveNext(), the distinction is mostly academic. Still, I prefer to know that the web will be disposed immediately after the selector is done with it.

Implementing LINQ Where for SharePoint

First, a huge thanks to Waldek Mastykarz for running with my suggestion to run some performance tests on list item filtering. In short, CAML wins by a factor of 300, which I expect would be even more pronounced on larger lists and under load.

In his test, Waldek implements Where() as follows:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    List<SPListItem> list = new List<SPListItem>();
    foreach (SPListItem item in items)
        if (predicate(item))
            list.Add(item);
    return list;
}

This works as expected, but allocates a secondary data structure to store the filtered items. The preferred approach is to use the yield return syntax:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    foreach (SPListItem item in items)
        if (predicate(item))
            yield return item;
}

The actual IL this generates is too complex to go into here, but I highly suggest checking it out in Reflector. In short, the compiler creates a private class that provides a filtered enumerator without actually building an intermediate data structure, instead filtering in MoveNext(). Using yield also defers execution until the collection is actually enumerated, though I can’t think of a SharePoint example where this would actually matter.

Another alternative, which also defers execution, is to leverage LINQ’s Cast<T>() operator and LINQ’s IEnumerable<T>.Where():

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    return items.Cast<SPListItem>().Where(predicate);
}

I imagine the compiler would optimize the yield-generated class in much the same way it optimizes LINQ’s internal implementation, but I will leave that research for a later date. It would also be interesting to compare the performance between the different implementations, though in a SharePoint context I expect the difference would be insignificant compared with the more expensive operations needed to retrieve the data.

The Problem with IDisposable

In my previous post, I suggested a Dispose-safe implementation of SPWebCollection.ForEach(), which Waldek leveraged for his Where implementation. Presumably because he was concerned about leaking SPWebs, his Where() implementation just returns a list of the web IDs. While avoiding leaks is smart, an ID isn’t nearly as useful as the full SPWeb and opening a new SPWeb from the ID is an expensive operation. What if I wanted a filtered enumeration of the SPWeb objects?

Well if we use one of the patterns described above, we should be safe if we call Dispose() for each when we’re done, right? I probably wouldn’t bother asking if there weren’t a catch, so I’ll answer my question with another question: When would Dispose() be called on the webs for which the predicate is false? It wouldn’t! To prevent these leaks, we need to be a bit more sophisticated:

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                       Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs)
        if (predicate(web))
            yield return web;
        else
            web.Dispose();
}

Or using Cast<T>():

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                        Func<SPWeb, bool> predicate)
{
    return webs.Cast<SPWeb>().Where(w =>
    {
        bool r = predicate(w);
        if (!r)
            w.Dispose();
        return r;
    });
}

Again, a detailed IL investigation would likely prove one preferable to the other, but the principle is the same.

Finally, since caller-dependent disposal is unreliable and delegates are fun, I figure we could use a Dispose-safe filtered ForEach:

public static void ForEach(this SPWebCollection webs,
                           Action<SPWeb> action,
                           Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        action(web);
        web.Dispose();
    }
}

Which would let us do something like this to print all publishing sites in a collection:

site.AllWebs.ForEach(
    w => { Console.WriteLine(w.Title); },
    w => { return w.Features.Contains(FeatureIds.OfficePublishingWeb); }
);

That is, if we define yet another useful, if simple, extension method:

public static bool Contains(this SPFeatureCollection features, Guid featureID)
{
    return features[featureID] != null;
}

And a final note: why did the LINQ team use Func<T, bool> instead of Predicate<T>, which has existed since .NET 2.0?

Safely Process SPSite.AllWebs

Probably the ugliest scenario for properly disposing SPWeb objects is cleaning up when you enumerate SPSite.AllWebs. To simplify that task, I present a pair of LINQ-inspired extension methods:

public static void ForEach(this SPWebCollection webs, Action<SPWeb> action)
{
    foreach (SPWeb web in webs)
    {
        action(web);
        web.Dispose();
    }
}

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs, Func<SPWeb, TResult> action)
{
    List<TResult> res = new List<TResult>(webs.Count);
    webs.ForEach(w => res.Add(action(w)));
    return res;
}

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs, Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

Combined with lambda expressions, we can cleanly handle tasks that would ordinarily require more code and explicit disposal. Want a list of your web URLs?

var urls = site.AllWebs.Select(w => { return w.Url; });

How about updating a property on every web?

site.AllWebs.ForEach(w =>
{
    w.Properties["MyProp"] = DateTime.Now.ToString();
    w.Properties.Update();
});

We can even leverage anonymous types:

var props = site.AllWebs.Select(w =>
{
    return new
    {
        w.Title,
        w.Url,
        MyProp = w.Properties["MyProp"]
    };
});

Speaking of LINQ and SharePoint, check out Adam Buenz‘s post on using LINQ’s IEnumerable.Cast<T> with SharePoint collections to get IQueryable support. And while using LINQ for filtering and such may be prettier, resist the urge to skip CAML altogether: there is definitely a performance advantage in filtering your SPListItemCollection with an SPQuery, especially for large lists. I can’t seem to find any hard data on this, so I nominate Waldek Mastykarz to investigate – his analyses of other performance topics were great.

Update 12/10/2008: New, improved Select! Discussed here.