Implementing LINQ Select for SPWebCollection

If I’m going to suggest that others use yield return, I suppose I should take my own advice. In particular, my original implementation of SPWebCollection.Select can be improved using the same techniques I used for Where:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

And we might as well support a filtered Select as well:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector,
                                                   Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

Implementations using Cast<T>() are left as an exercise for the reader.

The function itself isn’t particularly interesting, but I did stumble on something I found rather surprising. When I first wrote up my function, I typed the following:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        yield return selector(web);
        web.Dispose();
    }
}

I’m really not sure why I typed it that way…obviously you can’t keep going after you return, right? Well it turns out you can. The generated class just waits to call Dispose() until the next call to MoveNext(), effectively picking up where it left off. Here’s what that looks like in Reflector:

switch (this.<>1__state)
{
    case 0:
        this.<>1__state = -1;
        this.<>7__wrap1a = this.webs.GetEnumerator();
        this.<>1__state = 1;
        while (this.<>7__wrap1a.MoveNext())
        {
            this.<web>5__19 = (SPWeb) this.<>7__wrap1a.Current;
            this.<>2__current = this.selector(this.<web>5__19);
            this.<>1__state = 2;
            return true;
        Label_0080:
            this.<>1__state = 1;
            this.<web>5__19.Dispose();
        }
        this.<>m__Finally1c();
        break;

    case 2:
        goto Label_0080;
}
return false;

As Select will almost always be used in a foreach, with back-to-back calls of MoveNext(), the distinction is mostly academic. Still, I prefer to know that the web will be disposed immediately after the selector is done with it.

Advertisements

Implementing LINQ Where for SharePoint

First, a huge thanks to Waldek Mastykarz for running with my suggestion to run some performance tests on list item filtering. In short, CAML wins by a factor of 300, which I expect would be even more pronounced on larger lists and under load.

In his test, Waldek implements Where() as follows:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    List<SPListItem> list = new List<SPListItem>();
    foreach (SPListItem item in items)
        if (predicate(item))
            list.Add(item);
    return list;
}

This works as expected, but allocates a secondary data structure to store the filtered items. The preferred approach is to use the yield return syntax:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    foreach (SPListItem item in items)
        if (predicate(item))
            yield return item;
}

The actual IL this generates is too complex to go into here, but I highly suggest checking it out in Reflector. In short, the compiler creates a private class that provides a filtered enumerator without actually building an intermediate data structure, instead filtering in MoveNext(). Using yield also defers execution until the collection is actually enumerated, though I can’t think of a SharePoint example where this would actually matter.

Another alternative, which also defers execution, is to leverage LINQ’s Cast<T>() operator and LINQ’s IEnumerable<T>.Where():

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    return items.Cast<SPListItem>().Where(predicate);
}

I imagine the compiler would optimize the yield-generated class in much the same way it optimizes LINQ’s internal implementation, but I will leave that research for a later date. It would also be interesting to compare the performance between the different implementations, though in a SharePoint context I expect the difference would be insignificant compared with the more expensive operations needed to retrieve the data.

The Problem with IDisposable

In my previous post, I suggested a Dispose-safe implementation of SPWebCollection.ForEach(), which Waldek leveraged for his Where implementation. Presumably because he was concerned about leaking SPWebs, his Where() implementation just returns a list of the web IDs. While avoiding leaks is smart, an ID isn’t nearly as useful as the full SPWeb and opening a new SPWeb from the ID is an expensive operation. What if I wanted a filtered enumeration of the SPWeb objects?

Well if we use one of the patterns described above, we should be safe if we call Dispose() for each when we’re done, right? I probably wouldn’t bother asking if there weren’t a catch, so I’ll answer my question with another question: When would Dispose() be called on the webs for which the predicate is false? It wouldn’t! To prevent these leaks, we need to be a bit more sophisticated:

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                       Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs)
        if (predicate(web))
            yield return web;
        else
            web.Dispose();
}

Or using Cast<T>():

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                        Func<SPWeb, bool> predicate)
{
    return webs.Cast<SPWeb>().Where(w =>
    {
        bool r = predicate(w);
        if (!r)
            w.Dispose();
        return r;
    });
}

Again, a detailed IL investigation would likely prove one preferable to the other, but the principle is the same.

Finally, since caller-dependent disposal is unreliable and delegates are fun, I figure we could use a Dispose-safe filtered ForEach:

public static void ForEach(this SPWebCollection webs,
                           Action<SPWeb> action,
                           Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        action(web);
        web.Dispose();
    }
}

Which would let us do something like this to print all publishing sites in a collection:

site.AllWebs.ForEach(
    w => { Console.WriteLine(w.Title); },
    w => { return w.Features.Contains(FeatureIds.OfficePublishingWeb); }
);

That is, if we define yet another useful, if simple, extension method:

public static bool Contains(this SPFeatureCollection features, Guid featureID)
{
    return features[featureID] != null;
}

And a final note: why did the LINQ team use Func<T, bool> instead of Predicate<T>, which has existed since .NET 2.0?

Safely Process SPSite.AllWebs

Probably the ugliest scenario for properly disposing SPWeb objects is cleaning up when you enumerate SPSite.AllWebs. To simplify that task, I present a pair of LINQ-inspired extension methods:

public static void ForEach(this SPWebCollection webs, Action<SPWeb> action)
{
    foreach (SPWeb web in webs)
    {
        action(web);
        web.Dispose();
    }
}

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs, Func<SPWeb, TResult> action)
{
    List<TResult> res = new List<TResult>(webs.Count);
    webs.ForEach(w => res.Add(action(w)));
    return res;
}

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs, Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

Combined with lambda expressions, we can cleanly handle tasks that would ordinarily require more code and explicit disposal. Want a list of your web URLs?

var urls = site.AllWebs.Select(w => { return w.Url; });

How about updating a property on every web?

site.AllWebs.ForEach(w =>
{
    w.Properties["MyProp"] = DateTime.Now.ToString();
    w.Properties.Update();
});

We can even leverage anonymous types:

var props = site.AllWebs.Select(w =>
{
    return new
    {
        w.Title,
        w.Url,
        MyProp = w.Properties["MyProp"]
    };
});

Speaking of LINQ and SharePoint, check out Adam Buenz‘s post on using LINQ’s IEnumerable.Cast<T> with SharePoint collections to get IQueryable support. And while using LINQ for filtering and such may be prettier, resist the urge to skip CAML altogether: there is definitely a performance advantage in filtering your SPListItemCollection with an SPQuery, especially for large lists. I can’t seem to find any hard data on this, so I nominate Waldek Mastykarz to investigate – his analyses of other performance topics were great.

Update 12/10/2008: New, improved Select! Discussed here.

The New Definitive SPSite/SPWeb Disposal Article

As I’ve posted before, properly cleaning up your SPSite and SPWeb objects is an important yet confusing consideration in writing good SharePoint object model code. There are some decent articles out there already, but Stefan Goßner just posted the best guidance – by far – I’ve seen on the subject: Disposing SPWeb and SPSite objects. Seriously, go read it.

In my view, Stefan hit on four key points that bear repeating:

1. Ensure that you only dispose SPSite and SPWeb objects that your code owns.

Examples that you don’t own, most of which I have seen incorrectly disposed in various examples online:

  • SPContext.Current.Web
  • SPContext.Current.Site
  • SPContext.Current.Site.RootWeb
  • SPFeatureReceiverProperties.Feature.Parent for a Site- or Web-scoped feature receiver
  • SPWebEventProperties.Web
  • SPListEventProperties.Web
  • SPListEventProperties.List.Web
  • SPItemEventProperties.ListItem.Web
  • UnsecuredLayoutsPage.Web (and LayoutsPageBase.Web)
  • SPWebProvisioningProperties.Web
  • SPControl.GetContextWeb()
  • SPControl.GetContextSite()

In general, there are three common cases where you do own the object:

  • When you new up your own SPSite object:
    SPSite site = new SPSite(url);
  • When you explicitly call SPSite.OpenWeb():
    SPWeb web = site.OpenWeb(url);
  • When you iterate through SPSite.AllWebs – this is an extremely expensive operation, so avoid if possible:
    foreach(SPWeb web in site.AllWebs) {
    // …
    web.Dispose();
    }

If you think of any other examples I should include in these lists, leave a comment and I’ll update the post.

2. An SPWeb or SPSite object you own should be disposed as soon as it is no longer needed.

Stefan’s post has a great explanation, which he neatly summarizes:

You should dispose a SPWeb or SPSite object after the last access to a child object of this object. Be aware that a sub site (SPWeb) is not a child object. But (e.g.) a list or a folder or list item is a child object for this scenario.

3. Using a disposed SPWeb or SPSite object (like might happen if you Dispose something that doesn’t belong to you) can cause exceptions.

While it’s great to Dispose properly, it’s dangerous to Dispose improperly: when in doubt, in my opinion, err on the side of caution and let SharePoint clean up the mess. For webs, that happens when the parent site is disposed. For sites, that happens when the thread ends. If you do have memory problems, check out Stefan’s post on Troubleshooting SPSite/SPWeb leaks in WSS v3 and MOSS 2007.

4. SPSite and SPWeb objects should be disposed in the same method they get allocated.

And as a corollary to this, I agree with Chris O’Brien‘s conclusion, almost as an afterthought, in a recent post:

However, if the caller has to provide the IDisposable objects, it can then be responsible for calling Dispose() because it knows when they are no longer needed. Most often the caller will simply be passing a reference to SPContext.Current.Web, but if the caller happens to be a Feature receiver then SPFeatureProperties.ListItem.Parent.Web would be passed, or if no existing SPWeb object was available to the caller (e.g. console app) a new SPWeb object would have to be instantiated and passed. Disposals then become much simpler because they can always happen in the same place as any instantiations.

If the logic that utilizes the SPSite/SPWeb is encapsulated within a function that receives it as an argument, there is no risk of leaks. Most simply, there is little risk when the code looks like this:

using (SPWeb web = SPContext.Current.Site.OpenWeb()) {
  DoSomethingWith(web);
}

Now, some developers like to encapsulate the SPSite/SPWeb retrieval logic in dedicated methods. A safe approach is to leverage delegates. Unfortunately, variations of an unsafe alternative are much more common (in this case, copied from an otherwise brilliant MVP’s blog):

public static SPWeb ReturnSPWeb(string url) {
  using (var site = new SPSite(url)) {
    using (SPWeb web = site.OpenWeb()) {
      return web;
    }
  }
}

When you return from within a using block, the compiler will insert a Dispose on the used object before returning. Open your assembly in ildasm if you don’t believe me. So the above code is effectively equivalent to the following:

public static SPWeb ReturnSPWeb(string url) {
  var site = new SPSite(url);
  SPWeb web = site.OpenWeb();
  web.Dispose();
  site.Dispose();

  return web;
}

Oops! Before they even got a chance to do anything, the objects were cleaned up, and more importantly any code that depends on the returned SPWeb is completely unsafe! The same would apply for child objects (SPList, etc) returned from within a using block around the parent object. Even if the code works most of the time, the bugs it could introduce would be nearly impossible to track down.

Thanks again to Stefan for the excellent post!

Last updated 12/8/2008

SPSite/SPWeb Leaks Revisited

A while back I posted a rather clumsy technique to mitigate an SPWeb leak discussed here. I knew there had to be a better way, and Rob Garrett‘s use of delegates seems to have potential.

But first, I should point out a rather subtle leak in Chris’s and my code:

        list = currentContext.Site.AllWebs["MyWeb"].Lists["MyList"];

See it? How about now:

        SPWeb web = currentContext.Site.AllWebs["MyWeb"];
        list = web.Lists["MyList"];

The context SPSite shouldn’t be disposed, but AllWebs returns an SPWeb that should (according to Roger Lamb):

        using (SPWeb web = currentContext.Site.AllWebs["MyWeb"]) {
          list = web.Lists["MyList"];
        }

That Chris and I (and our readers) can overlook this in posts about SPWeb leaks is a testament to how tricky this stuff can be.

Super Delegates?

So how would Rob’s technique be used to DoSomething? Well none of his helpers take advantage of SPContext, so first let’s add a helper for that:

public static void GetContextWebByTitle(string title, Action<SPSite, SPWeb> action) {
  if (String.IsNullOrEmpty(title))
    throw new ArgumentNullException("title");
  if (null == action)
    throw new ArgumentNullException("action");

  SPContext currentContext = SPContext.Current;
  if (null == currentContext)
    throw new SPException("Context is null!");

  SPSite site = currentContext.Site;
  using (SPWeb web = site.AllWebs[title]) {
    if (null == web)
      throw new SPException("Web not found");
    action(site, web);
  }
}

Now we can refactor into a method that matches the delegate:

public void DoSomething(SPSite site, SPWeb web) {
  SPList list = web.Lists["MyList"];

  // do something with list..
  foreach (SPListItem item in list.Items) {
    processItem(item);
  }
}

And our original DoSomething() just determines which helper to call:

public void DoSomething() {
  if (SPContext.Current != null)
    SPHelper.GetContextWebByTitle("MyWeb", DoSomething);
  else
    SPHelper.GetWebByTitle("http://litwaredemo", "MyWeb", DoSomething);
}

Pretty simple. But we’re still thinking a bit too much – couldn’t GetWebByTitle check context for us? Of course:

  if (SPContext.Current != null)
    GetContextWebByTitle(title, action);
  else
    using (var site = new SPSite(url)) {
      ...

So we don’t have to think at all:

public void DoSomething() {
  SPHelper.GetWebByTitle("http://litwaredemo", "MyWeb", DoSomething);
}

I still need to try this technique in some real code, but I like the theory. It certainly makes sense to separate the logic to create and dispose SPSite/SPWeb objects from the code to manipulate them, and it’s even better to standardize that logic. But even with an arsenal of slick helpers, a leaky delegate can take us back to where we started:

public void DoSomething(SPSite site) {
  foreach (SPWeb web in site.AllWebs)
    processWeb(web);
}

Not that this diminishes the value of Rob’s solution, it just reinforces the need for developers to know the disposal patterns even with help.

Disposing list’s SPSite/SPWeb without ParentWeb

Update 12/10/2008: Don’t use this technique; use the guidance here instead: The New Definitive SPSite/SPWeb Disposal Article

Chris O’Brien’s recent post discusses the need to dispose of SPSite and SPWeb objects, but only if they didn’t come from SPContext. To do this he depends on a list’s ParentWeb property, which currently returns the same instance as the original SPWeb. However, he notes that this strategy depends on the internal implementation of ParentWeb, which (however unlikely) might change.

As a future-proof alternative, I propose capturing the SPWeb as an out parameter if it will need to be disposed:

public void DoSomething() {
  SPWeb webToDispose = null;
  SPList list = getList(out webToDispose);

  // do something with list..
  foreach (SPListItem item in list.Items) {
    processItem(item);
  }

  // ** PROBLEM - how do we now dispose of the SPSite/SPWeb objects we created earlier? **
  // ** SOLUTION - if we didn't use context, webToDispose has reference
  if (webToDispose != null) {
    webToDispose.Dispose();
    webToDispose.Site.Dispose();
  }
}

private SPList getList(out SPWeb webToDispose) {
  webToDispose = null;

  SPContext currentContext = SPContext.Current;
  SPList list = null;

  if (currentContext != null) {
    list = currentContext.Site.AllWebs["MyWeb"].Lists["MyList"];
  }
  else {
    SPSite site = new SPSite("http://solutionizing.net");
    webToDispose = site.OpenWeb("/MyWeb");
    list = webToDispose.Lists["MyList"];
  }

  return list;
}

This code should be functionally equivalent and perhaps a bit easier to read, but without the dependency on ParentWeb. Thoughts?

Also, to add to Chris’s list of required reading, check out Roger Lamb’s excellent SharePoint 2007 and WSS 3.0 Dispose Patterns by Example.

Update 7/23/2008: How ironic…Chris and I have a leak! If you can’t spot it, I explain here.