Using IDisposables with LINQ

Objects that implement IDisposable are everywhere. The interface even gets its own language features (C#, VB, F#). However, LINQ throws a few wrenches into things:

  1. LINQ’s query syntax depends on expressions; using blocks are statements.
  2. When querying a sequence of IDisposable objects, there’s no easy way to ensure disposal after each element has been consumed.
  3. Returning deferred queries from within a using statement is often desired, but fails spectacularly.

There are possible work-arounds for each issue…

  1. Put the using statement in a method (named or anonymous) that is called from the query. See also: Thinking Functional: Using.
  2. Use a method that creates a dispose-safe iterator of the sequence, like AsSafeEnumerable().
  3. Refactor the method to inject the IDisposable dependency, as shown in the first part of Marc’s answer here.

But, as you might have guessed, I would like to propose a better solution. The code is really complex, so bear with me:

public static IEnumerable<T> Use<T>(this T obj) where T : IDisposable
{
    try
    {
        yield return obj;
    }
    finally
    {
        if (obj != null)
            obj.Dispose();
    }
}

That’s it. We’re turning our IDisposable object into a single-element sequence. The trick is that the C# compiler will build an iterator for us that properly handles the finally clause, ensuring that our object will be disposed. It might be helpful to set a breakpoint on the finally clause to get a better idea what’s happening.

So how can this simple method solve all our problems? First up: “using” a FileStream object created in a LINQ query:

var lengths = from path in myFiles
              from fs in File.OpenRead(path).Use()
              select new { path, fs.Length };

Since the result of Use() is a single-element sequence, we can think of from fs in something.Use() as an assignment of that single value, something, to fs. In fact, it’s really quite similar to an F# use binding in that it will automatically clean itself up when it goes out of scope (by its enumerator calling MoveNext()).

Next, disposing elements from a collection. I’ll use the same SharePoint problem that AsSafeEnumerable() solves:

var webs = from notDisposed in site.AllWebs
           from web in notDisposed.Use()
           select web.Title;

I find this syntax rather clumsy compared with AsSafeEnumerable(), but it’s there if you need it.

Finally, let’s defer disposal of a LINQ to SQL DataContext until after the deferred query is executed, as an answer to the previously-linked Stack Overflow question:

IQueryable<MyType> MyFunc(string myValue)
{
    return from dc in new MyDataContext().Use()
           from row in dc.MyTable
           where row.MyField == myValue
           select row;
}

void UsingFunc()
{
    var result = MyFunc("MyValue").OrderBy(row => row.SortOrder);
    foreach(var row in result)
    {
        //Do something
    }
}

The result of MyFunc now owns its destiny completely. It doesn’t depend on some potentially disposed DataContext – it just creates one that it will dispose when it’s done. There are probably situations where you would want to share a DataContext rather than create one on demand (I don’t use LINQ to SQL, I just blog about it), but again it’s there if you need it.

I’ve only started using this approach recently, so if you have any problems with it please share.

Improve Your Code Golf Game with LINQ

I always enjoy a good coding challenge, and variations of code golf are most common. For the uninitiated, code golf provides a problem with the objective of providing a solution that requires the fewest keystrokes or lines. While production code certainly deserves more white space than games tend to afford, there are still some lessons we can learn from the experience.

This particular post comes on the heels of Scott Hanselman‘s casual challenge to clean up some of his code in as few lines as possible. The general requirements are as follows:

  1. Accept a path to a project
  2. For each of a few files in each project…
  3. Back up the file
  4. Perform a few string replacements in the file
  5. Save the updated version of the file

As I was looking over his code for some chances to optimize, it quickly became clear that the bulk of the “hard” stuff could be solved through some LINQ-supported functional programming. I posted a first draft, upon which others iterated, but his spam filter ate this new version so I thought it might be educational to walk through it here instead.

First, let’s define a few arrays that will specify the entirety of our configuration, along with the input array of project paths:

static void Main(string[] args)
{
    if (args.Length == 0) Console.WriteLine("Usage: ASPNETMVCUpgrader pathToProject1 [pathToProject2] [pathToProject3]");

    var configFiles = new[] { "web.config", @"Views\web.config" };
    var changes = new[] {
        new { Regex = new Regex(@"(?<1>System.Web.Mvc, Version=)1.0(?<2>.0.0,)", RegexOptions.Compiled), Replacement = "${1}1.1${2}"},
        new { Regex = new Regex(@"(?<1>System.Web.Routing, Version=)3.5(?<2>.0.0,)", RegexOptions.Compiled), Replacement = "${1}4.0${2}"} };

The regular expressions are based on those provided by commenter Dušan Radovanović. Next, we can use some LINQ to build a list of all our files to update:

    var filesToUpdate = from file in configFiles
                        from projectPath in args
                        let path = Path.Combine(projectPath, file)
                        where File.Exists(path)
                        select new { Path = path, Content = File.ReadAllText(path) };

If you’re not familiar with C# 3.0, by line this does the following:

  1. Let file be the current item in configFiles.
  2. Let projectPath be the current item in args.
  3. Let path be the combined value of projectPath and file.
  4. Only include path values for files that exist.
  5. Create new anonymous objects with Path and Content properties set to the path and file contents, respectively.

As with most LINQ operations, execution of this code will be deferred until filesToUpdate is enumerated.

Now we’re ready to update our files. First, I’ll define a sequence of our possible backup file names, which will add “.backup_XX” to the file name.* Since the sequence is lazily evaluated, we can just call LINQ’s First() to find an available backup file name. Note that First() would throw an exception if all 100 files existed, as the backupFileNames sequence would be empty.

    foreach (var file in filesToUpdate)
    {
        var backupFileNames = from n in Enumerable.Range(0, 100)
                              let backupPath = string.Format("{0}.backup_{1:00}", file.Path, n)
                              where !File.Exists(backupPath)
                              select backupPath;

        File.Move(file.Path, backupFileNames.First());

Finally, we need to actually update the file content. To do that, we’ll use LINQ’s Aggregate operator:

        string newContent = changes.Aggregate(file.Content, (s, c) => c.Regex.Replace(s, c.Replacement));
        File.WriteAllText(file.Path, newContent);
        Console.WriteLine("Done converting: {0}", file.Path);
    }
}

Aggregate takes two parameters: a seed value and a function that defines the aggregation. In our case, the seed value is of type string and the function is of type Func<string, 'a, string>, where 'a is our anonymous type with Regex and Replacement properties. In practice, this call is going to take our original content and apply each of our changes in succession, using the result of one replacement as the input to the next. In functional terminology, Aggregate is known as a left fold; for more on Aggregate and folds, see this awesome post by language guru Bart de Smet.

What strikes me about this code is that it’s both terse and expressive. And for the purposes of the challenge, we can rewrite some of the queries in extension method syntax:

static void Main(string[] args)
{
  if (args.Length == 0) Console.WriteLine("Usage: ASPNETMVCUpgrader pathToProject1 [pathToProject2] [pathToProject3]");

  var configFiles = new[] { "web.config", @"Views\web.config" };
  var changes = new[] {
    new { Regex = new Regex(@"(?<1>System.Web.Mvc, Version=)1.0(?<2>.0.0,)", RegexOptions.Compiled), Replacement = "${1}1.1${2}"},
    new { Regex = new Regex(@"(?<1>System.Web.Routing, Version=)3.5(?<2>.0.0,)", RegexOptions.Compiled), Replacement = "${1}4.0${2}"} };

  var files = from path in configFiles.SelectMany(file => args, (file, arg) => Path.Combine(arg, file))
              where File.Exists(path) select new { Path = path, Content = File.ReadAllText(path) };

  foreach (var file in files)
    try
    {
      File.Move(file.Path, Enumerable.Range(0, 100).Select(n => string.Format("{0}.backup_{1:00}", file.Path, n)).First(p => !File.Exists(p)));
      File.WriteAllText(file.Path, changes.Aggregate(file.Content, (s, c) => c.Regex.Replace(s, c.Replacement)));
      Console.WriteLine("Done converting: {0}", file.Path);
    }
    catch (Exception ex) { Console.WriteLine("Error with: {0}" + Environment.NewLine + "Exception: {1}", file.Path, ex.Message); }
}

* The original code had the most recent backup with extension .mvc10backup, with the next oldest backup called .mvc10backup2. My original version extended this concept to “unlimited” backups with old backups continuously incremented so the lower values were more recent. It could probably be improved, but I thought I’d include the adapted code here for completeness:

  foreach (var file in files)
    try
    {
      var backupPaths = Enumerable.Repeat<int?>(null, 1)
            .Concat(Enumerable.Range(2, int.MaxValue - 2).Select(i => (int?)i))
            .Select(i => Path.ChangeExtension(filename, ".mvc10backup" + i));
      string toCopy = file.Path;
      foreach (var f in backupPaths.TakeWhile(_ => toCopy != null))
      {
          string temp = null;
          if (File.Exists(f))
              File.Move(f, temp = f + "TEMP");
          File.Move(toCopy, f);
          toCopy = temp;
      }
      File.WriteAllText(file.Path, changes.Aggregate(file.Content, (s, c) => c.Regex.Replace(s, c.Replacement)));
      Console.WriteLine("Done converting: {0}", file.Path);
    }
    catch (Exception ex) { Console.WriteLine("Error with: {0}" + Environment.NewLine + "Exception: {1}", file.Path, ex.Message); }
}
Posted in .NET. Tags: , . Comments Off on Improve Your Code Golf Game with LINQ

Refactoring with LINQ & Iterators: FindDescendantControl and GetDescendantControls

A while back I put together a quick and dirty implementation of a FindControl extension method:

public static T FindControl<T>(this Control root, string id) where T : Control
{
    Control c = root;
    Queue<Control> q = new Queue<Control>();

    if (c == null || c.ID == id)
        return c as T;
    do
    {
        foreach (Control child in c.Controls)
        {
            if (child.ID == id)
                return child as T;
            if (child.HasControls())
                q.Enqueue(child);
        }
        c = q.Dequeue();
    } while (c != null);
    return null;
}

It got the job done (if the control exists!), but I think we can do better.

Refactoring with Iterators

My first concern is that the method is doing too much. Rather than searching for the provided ID, the majority of the code is devoted to navigating the control’s descendents. Let’s factor out that logic into its own method:

public static IEnumerable<Control> GetDescendantControls(this Control root)
{
    var q = new Queue<Control>();

    var current = root;
    while (true)
    {
        if (current != null && current.HasControls())
            foreach (Control child in current.Controls)
                q.Enqueue(child);

        if (q.Count == 0)
            yield break;

        current = q.Dequeue();
        yield return current;
    }
}

The new method is almost as long as the old one, but now satisfies the Single Responsibility Principle. I also added a check to prevent calling Dequeue() on an empty queue. For those that have studied algorithms, note that this is a breadth-first tree traversal.

Now we can update FindControl:

public static T FindControl<T>(this Control root, string id) where T : Control
{
    Control c = root;

    if (c == null || c.ID == id)
        return c as T;

    foreach (Control child in c.GetDescendantControls())
    {
        if (child.ID == id)
            return child as T;
    }
    return null;
}

With the control tree traversal logic extracted, this updated version is already starting to smell better. But we’re not done yet.

DRY? Don’t Repeat Someone Else, Either

My second concern is how we’re checking for the ID in question. It’s not that the equality operator is a bad choice, as it will work in many scenarios, but rather that it’s not consistent with the existing FindControl method. In particular, the existing FindControl understands naming containers (IDs that contain ‘$’ or ‘:’). Rather than implement our own comparison logic, we should just leverage the framework’s existing implementation:

public static T FindControl<T>(this Control root, string id) where T : Control
{
    if (id == null)
        throw new ArgumentNullException("id");

    if (root == null)
        return null;

    Control c = root.FindControl(id);
    if (c != null)
        return c as T;

    foreach (Control child in c.GetDescendantControls())
    {
        c = child.FindControl(id);
        if (c != null)
            return child as T;
    }
    return null;
}

Fun fact: FindControl will throw a NullReferenceException if id is null.

Refactoring with LINQ

So we have extracted the descendant logic and leaned on the framework for finding the controls, but I’m still not quite satisfied. The method just feels too…procedural. Let’s break down what we’re really trying to do:

  1. Look at the current control and all its descendants.
  2. Use FindControl on each with the specified ID.
  3. When we find the control, return it as type T.

As the subheading might suggest, we can express these steps quite nicely with LINQ:

  1. var controls = root.AsSingleton().Concat(root.GetDescendantControls());
  2. var foundControls = from c in controls
                        let found = c.FindControl(id)
                        where found != null
                        select found;
  3. return foundControls.FirstOrDefault() as T;

Behind the scenes, this is how I might have thought through this code:

  1. We use AsSingleton() (my new preferred name, to align with F#’s Seq.singleton, for AsEnumerable(), which I introduced here) and Concat() to prepend root to the list of its descendants, returned as a lazy enumeration.
  2. We use a query over those controls to retrieve matches from FindControl(), again returned as a lazy enumeration.
  3. We grab the first control found, or null if none match, and return it as T.

Because all our enumerations are lazy, we put off traversal of the entire control tree until we know we need to. In fact, if our ID is found in the root control, GetDescendantControls() won’t even be called! Through just a bit of refactoring, we have both an efficient and readable solution.

For completeness, here’s the final version with a more descriptive name to contrast with the existing FindControl():

public static T FindDescendantControl<T>(this Control root, string id) where T : Control
{
    if (id == null)
        throw new ArgumentNullException("id");

    if (root == null)
        return null;

    var controls = root.AsSingleton().Concat(root.GetDescendantControls());

    var foundControls = from c in controls
                        let found = c.FindControl(id)
                        where found != null
                        select found;

    return foundControls.FirstOrDefault() as T;
}

I have added these methods, along with AsSingleton() and a host of others, to the SharePoint Extensions Lib project. Check it out!

Generic Method Invocation with Expression Trees

It’s against the rules and completely unsupported, but sometimes it’s just so much easier to use a base class’s private/internal members. Reflection has always been an option, but performance is less than ideal. Lightweight Code Generation is an option, but emitting IL isn’t for everyone. Since .NET 3.5 came out, there have been several discussions of using expression trees as a developer-friendly yet efficient alternative. There is an up-front cost to compile the expression into IL, but the resulting delegate can be reused with performance very close to direct invocation.

Alkampfer provides a great overview of expression tree method invocation in this article, which inspired this more general solution.

First, let’s set up a cache to store our compiled delegates. I didn’t put much effort into making it efficiently thread-safe, but suggestions are certainly welcome.

private static Dictionary<string, Delegate> accessors = new Dictionary<string, Delegate>();
private static object accessorLock = new object();
private static D GetCachedAccessor<D>(string key)
                 where D : class // Constraint cannot be special class 'System.Delegate'
{
    D result = null;
    Delegate cachedDelegate;
    lock (accessorLock)
    {
        if (accessors.TryGetValue(key, out cachedDelegate))
        {
            Debug.WriteLine("Found cache entry for " + key);
            result = cachedDelegate as D;
        }
    }
    return result;
}
private static void SetCachedAccessor(string key, Delegate value)
{
    if (value != null)
        lock (accessorLock)
        {
            accessors[key] = value;
        }
}

GetFieldAccessor

Now we can dive into our expression trees. As a warm-up, here’s a relatively simple cached field accessor, inspired by Roger Alsing‘s great post:

public static Func<T, R> GetFieldAccessor<T, R>(string fieldName)
{
    Type typeT = typeof(T);

    string key = string.Format("{0}.{1}", typeT.FullName, fieldName);
    Func<T, R> result = GetCachedAccessor<Func<T, R>>(key);

    if (result == null)
    {
        var param = Expression.Parameter(typeT, "obj");
        var member = Expression.PropertyOrField(param, fieldName);
        var lambda = Expression.Lambda<Func<T, R>>(member, param);

        Debug.WriteLine("Caching " + key + " : " + lambda.Body);
        result = lambda.Compile();
        SetCachedAccessor(key, result);
    }
    return result;
}

The method returns a function that will accept an object of type T and return its fieldName property with type R. For example, we can wrap this in an extension method to check if an SPWeb has been disposed:

public static bool GetIsClosed(this SPWeb web)
{
    return GetFieldAccessor<SPWeb, bool>("m_closed")(web);
}

Because the delegate is cached, successive calls of GetFieldAccessor() will immediately return the necessary delegate without recompilation.

GetMethodAccessor

Building a method accessor is a bit trickier because of the various combinations of parameter and return types. One option is to explicitly define overloads for various method signatures, as seen in the article referenced earlier. Instead, I figure we can let the caller specify the desired delegate signature and figure out the intended method based on that.

public static D GetMethodAccessor<D>(string methodName, BindingFlags bindingAttr)
                where D : class // Constraint cannot be special class 'System.Delegate'
{
    Type[] args = typeof(D).GetGenericArguments();
    Type objType = args[0];

    Type[] argTypes = args.Skip(1).ToArray();
    string[] argTypesArray = argTypes.Select(t => t.Name).ToArray();
    string key = string.Format("{0}.{1}({2})", objType.FullName, methodName, string.Join(",", argTypesArray));

    D result = GetCachedAccessor<D>(key);
    if (result == null)
    {
        MethodInfo mi = objType.GetMethod(methodName, bindingAttr, null, argTypes, null);

        if (mi == null || mi.ReturnType != typeof(void))
        {
            argTypes = argTypes.Take(argTypesArray.Length - 1).ToArray();
            mi = objType.GetMethod(methodName, bindingAttr, null, argTypes, null);
        }

        if (mi == null)
            throw new ArgumentException("Could not find appropriate overload.", methodName);

        var param = Expression.Parameter(objType, "obj");
        var arguments = argTypes.Select((t, i) => Expression.Parameter(t, "p" + i)).ToArray();
        var invoke = Expression.Call(param, mi, arguments);
        var lambda = Expression.Lambda<D>(invoke, param.AsEnumerable().Concat(arguments));

        Debug.WriteLine("Caching " + key + " : " + lambda.Body);
        result = lambda.Compile();
        SetCachedAccessor(key, result as Delegate);
    }
    return result;
}

As you can see, we depend heavily on the generic arguments of the delegate type. This means passing a closed delegate type to this function won’t work – it needs to be Func, Action, or something compatible with the expected argument structure. So what is that structure? The processing logic works like this:

  1. Take the first generic argument as the type whose method we are going to invoke.
  2. Fetch an array, skipping the first argument, that we pass to GetMethod as the argument types.
  3. If GetMethod can’t find an appropriate overload, or if the method’s return type is not void, then we shouldn’t have used all of the arguments as parameters.
  4. Redefine our parameter array without the last argument – this is our delegate’s non-void return type.
  5. Try GetMethod again with the trimmed array; throw if we still don’t find a match.

Once we have the details of our method, we can build the expression tree. I use the mapi-style Select overload to build an array of typed parameters named p0, p1, etc., which is then passed to Expression.Call to represent the method invocation. Finally, Expression.Lambda expects a list of all parameters including the instance param. Rather than allocate an intermediate data structure, I use a trick I picked up from Keith Rimington:

public static IEnumerable<T> AsEnumerable<T>(this T obj)
{
    yield return obj;
}

By turning param into a single-element IEnumerable<ParameterExpression>, we can simply Concat the rest of the arguments. Beautiful.

GetMethodAccessor Usage

The usage is a bit more complex then GetFieldAccessor, but still quite manageable:

public static bool SetBoolValue(this SPField field, string attrName, bool attrValue)
{
    Func<SPField, string, bool, bool> lambda =
        GetMethodAccessor<Func<SPField, string, bool, bool>>("SetFieldBoolValue",
                                                             BindingFlags.Instance | BindingFlags.NonPublic);
    return lambda(field, attrName, attrValue);
}

The intermediate variable is unnecessary, but makes it easier to see what’s going on. SPField.SetFieldBoolValue is of type Func<string, bool, bool>, so our delegate needs to be Func<SPField, string, bool, bool> to accept the instance variable first. The parameters for GetMethodAccessor are identical to what we would pass to field.GetType().GetMethod() if we were using normal reflection. Then we invoke lambda to effectively call field.SetFieldBoolValue(attrName, attrValue).

For methods that return void, we just pass an Action type instead:

public static void SetHidden(this SPField field, bool value)
{
    GetMethodAccessor<Action<SPField, bool>>("SetHidden", BindingFlags.Instance | BindingFlags.NonPublic)(field, value);
}

And these can be used like any other extension methods:

SPField field = GetField();
field.SetBoolValue("CanToggleHidden", !field.CanToggleHidden);
field.SetBoolValue("CanBeDeleted", !field.CanBeDeleted);
field.SetHidden(!field.Hidden);
field.Update();

Which will show the following in DebugView:

Caching Microsoft.SharePoint.SPField.SetFieldBoolValue(String,Boolean,Boolean) : obj.SetFieldBoolValue(p0, p1)
Found cache entry for Microsoft.SharePoint.SPField.SetFieldBoolValue(String,Boolean,Boolean)
Caching Microsoft.SharePoint.SPField.SetHidden(Boolean) : obj.SetHidden(p0)

Because of the penalty for compilation, this technique is not right for all situations. But for frequent access to inaccessible members, it might be worth a try.

LINQ Tip: Enumerable.OfType

In the past I’ve mentioned LINQ’s Cast<T>() as an efficient way to convert a SharePoint collection into an IEnumerable<T> that has access to LINQ’s various extension methods. Fundamentally, Cast<T>() is implemented like this:

public IEnumerable<T> Cast<T>(this IEnumerable source)
{
  foreach(object o in source)
    yield return (T) o;
}

Using an explicit cast performs well, but will result in an InvalidCastException if the cast fails. A less efficient yet useful variation on this idea is OfType<T>():

public IEnumerable<T> OfType<T>(this IEnumerable source)
{
  foreach(object o in source)
    if(o is T)
      yield return (T) o;
}

The returned enumeration will only include elements that can safely be cast to the specified type. Why would this be useful?

Example 1: SPWindowsServiceInstance

SharePoint, especially with MOSS, has several different services that can run on the various servers in a farm. We know where our web services are running, but where are the various windows services running?

var winsvc = from svr in SPFarm.Local.Servers
             from inst in svr.ServiceInstances.OfType<SPWindowsServiceInstance>()
             select new
             {
                 Server = svr.Name,
                 ID = inst.Id,
                 ServiceType = inst.Service.GetType().Name
             };

Example 2: SPDocumentLibrary

SharePoint provides a few special subclasses of SPList for specific kinds of lists. These include SPDocumentLibrary, SPPictureLibrary and the essentially obsolete SPIssueList. We can use OfType() to retrieve only lists of a certain type, like this LINQified MSDN sample that enumerates all files in a site collection’s libraries, excluding catalogs and form libraries:

SPSite site = SPContext.Current.Site;
var docs = from web in site.AllWebs.AsSafeEnumerable()
           from lib in web.Lists.OfType<SPDocumentLibrary>()
           from SPListItem doc in lib.Items
           where !lib.IsCatalog && lib.BaseTemplate != SPListTemplateType.XMLForm
           select new { WebTitle = web.Title, ListTitle = lib.Title,
                        ItemTitle = doc.Fields.ContainsField("Title") ? doc.Title : "" };

foreach (var doc in docs)
  Label1.Text += SPEncode.HtmlEncode(doc.WebTitle) + " -- " +
                 SPEncode.HtmlEncode(doc.ListTitle) + " -- " +
                 SPEncode.HtmlEncode(doc.ItemTitle) + "<BR>";

Example 3: SPFieldUser

Finally, let’s pull a list of all user fields attached to lists in the root web. This could also be used to easily find instances of a custom field type.

var userFields = from SPList list in site.RootWeb.Lists
                 from fld in list.Fields.OfType<SPFieldUser>()
                 select new
                 {
                     ListTitle = list.Title,
                     FieldTitle = fld.Title,
                     InternalName = fld.InternalName,
                     PresenceEnabled = fld.Presence
                 };

Contrived examples, perhaps, but potentially useful nonetheless.

LINQ for SPWebCollection Revisited: AsSafeEnumerable

I have previously discussed implementations of some core LINQ extension methods for SPWebCollection, a process complicated by the fact that SPWeb is IDisposable. The core logic is correct, but the lack of try...finally means exceptions will cause leaks. Since this affects any enumeration of the list, not just LINQ-specific operations, I started to look into making the enumerator smarter. This turned out to be much simpler than I expected:

public static IEnumerable<SPWeb> AsSafeEnumerable(this SPWebCollection webs)
{
  foreach (SPWeb web in webs)
  {
    try
    {
      yield return web;
    }
    finally
    {
      web.Dispose();
    }
  }
}

Internally, this logic is wrapped in a generated class (disassembled at the end of this post) that implements IEnumerable<SPWeb> and IEnumerator<SPWeb>. The finally block is compiled into its own method which is called in two places: MoveNext() before moving to the next item, and Dispose(). An often-overlooked tidbit about foreach is that the compiler will generate a try...finally block if the enumerator is (or might be) IDisposable, which ours is (inherited from IEnumerator<T>). So whether the loop advances normally or exits early (exceptions, breaks, etc), the current SPWeb will be disposed properly.

The bonus in handling disposal through the enumerator is that our LINQ extension methods can be delegated to the framework:

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                       Func<SPWeb, bool> predicate)
{
  return webs.AsSafeEnumerable().Where(predicate);
}
public static void ForEach(this SPWebCollection webs,
                           Action<SPWeb> action,
                           Func<SPWeb, bool> predicate)
{
  foreach (SPWeb web in webs.Where(predicate))
    action(web);
}

public static void ForEach(this SPWebCollection webs, Action<SPWeb> action)
{
  webs.ForEach(action, w => true);
}

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector,
                                                   Func<SPWeb, bool> predicate)
{
  return webs.Where(predicate).Select(selector);
}

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
  return webs.AsSafeEnumerable().Select(selector);
}

This approach was always possible for collections of normal objects using Cast<T>(), but is now available for SPWebCollection and SPSiteCollection as well.

Using with LINQ

I’ve talked a lot about LINQ-enabling SharePoint collections, but I’ve never posted actual LINQ code using it. With the above extensions defined, we can safely use LINQ against the original collection:

var webs = from w in site.AllWebs
           where string.IsNullOrEmpty(w.Theme)
           orderby w.Title
           select w.Title;

Note in this example that we can use orderby, even though we haven’t defined OrderBy() on SPWebCollection, because the call is chained from the IEnumerable<SPWeb> returned by Where(). However, this won’t compile:

var webs = from w in site.AllWebs
            orderby w.Title
            select w.Title;

Giving the following error:

Could not find an implementation of the query pattern for source type ‘Microsoft.SharePoint.SPWebCollection’.  ‘OrderBy’ not found.  Consider explicitly specifying the type of the range variable ‘w’.

We also don’t want to take the suggestion to specify a type for ‘w’:

var webs = from SPWeb w in site.AllWebs
            orderby w.Title
            select w.Title;

As this inserts a Cast, leaking every SPWeb:

IEnumerable<string> webs2 = site.AllWebs.Cast<SPWeb>().OrderBy<SPWeb, string>(delegate (SPWeb w) {
  return w.Title;
}).Select<SPWeb, string>(delegate (SPWeb w) {
  return w.Title;
});

For operators we haven’t defined, we can always call AsSafeEnumerable() to get an IEnumerable<SPWeb> that has everything:

var lists = from w in site.AllWebs.AsSafeEnumerable()
            from SPList l in w.Lists
            where !l.Hidden && !w.IsRootWeb
            select new { WebTitle = w.Title, ListTitle = l.Title };

AsSafeEnumerable Disassembled

[CompilerGenerated]
private sealed class <AsSafeEnumerable>d__0 : IEnumerable<SPWeb>, IEnumerable,
                                              IEnumerator<SPWeb>, IEnumerator, IDisposable
{
    // Fields
    private int <>1__state;
    private SPWeb <>2__current;
    public SPWebCollection <>3__webs;
    public IEnumerator <>7__wrap2;
    public IDisposable <>7__wrap3;
    private int <>l__initialThreadId;
    public SPWeb <web>5__1;
    public SPWebCollection webs;

    // Methods
    [DebuggerHidden]
    public <AsSafeEnumerable>d__0(int <>1__state)
    {
        this.<>1__state = <>1__state;
        this.<>l__initialThreadId = Thread.CurrentThread.ManagedThreadId;
    }

    private void <>m__Finally4()
    {
        this.<>1__state = -1;
        this.<>7__wrap3 = this.<>7__wrap2 as IDisposable;
        if (this.<>7__wrap3 != null)
        {
            this.<>7__wrap3.Dispose();
        }
    }

    private void <>m__Finally5()
    {
        this.<>1__state = 1;
        this.<web>5__1.Dispose();
    }

    private bool MoveNext()
    {
        bool flag;
        try
        {
            int num = this.<>1__state;
            if (num != 0)
            {
                if (num != 3)
                {
                    goto Label_009A;
                }
                goto Label_0073;
            }
            this.<>1__state = -1;
            this.<>7__wrap2 = this.webs.GetEnumerator();
            this.<>1__state = 1;
            while (this.<>7__wrap2.MoveNext())
            {
                this.<web>5__1 = (SPWeb) this.<>7__wrap2.Current;
                this.<>1__state = 2;
                this.<>2__current = this.<web>5__1;
                this.<>1__state = 3;
                return true;
            Label_0073:
                this.<>1__state = 2;
                this.<>m__Finally5();
            }
            this.<>m__Finally4();
        Label_009A:
            flag = false;
        }
        fault
        {
            this.System.IDisposable.Dispose();
        }
        return flag;
    }

    [DebuggerHidden]
    IEnumerator<SPWeb> IEnumerable<SPWeb>.GetEnumerator()
    {
        Enumerable.<AsSafeEnumerable>d__0 d__;
        if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId) && (this.<>1__state == -2))
        {
            this.<>1__state = 0;
            d__ = this;
        }
        else
        {
            d__ = new Enumerable.<AsSafeEnumerable>d__0(0);
        }
        d__.webs = this.<>3__webs;
        return d__;
    }

    [DebuggerHidden]
    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.System.Collections.Generic.IEnumerable<Microsoft.SharePoint.SPWeb>.GetEnumerator();
    }

    [DebuggerHidden]
    void IEnumerator.Reset()
    {
        throw new NotSupportedException();
    }

    void IDisposable.Dispose()
    {
        switch (this.<>1__state)
        {
            case 1:
            case 2:
            case 3:
                try
                {
                    switch (this.<>1__state)
                    {
                    }
                    break;
                    try
                    {
                    }
                    finally
                    {
                        this.<>m__Finally5();
                    }
                }
                finally
                {
                    this.<>m__Finally4();
                }
                break;
        }
    }

    // Properties
    SPWeb IEnumerator<SPWeb>.Current
    {
        [DebuggerHidden]
        get
        {
            return this.<>2__current;
        }
    }

    object IEnumerator.Current
    {
        [DebuggerHidden]
        get
        {
            return this.<>2__current;
        }
    }
}

Implementing LINQ Select for SPWebCollection

If I’m going to suggest that others use yield return, I suppose I should take my own advice. In particular, my original implementation of SPWebCollection.Select can be improved using the same techniques I used for Where:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

And we might as well support a filtered Select as well:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector,
                                                   Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        TResult ret = selector(web);
        web.Dispose();
        yield return ret;
    }
}

Implementations using Cast<T>() are left as an exercise for the reader.

The function itself isn’t particularly interesting, but I did stumble on something I found rather surprising. When I first wrote up my function, I typed the following:

public static IEnumerable<TResult> Select<TResult>(this SPWebCollection webs,
                                                   Func<SPWeb, TResult> selector)
{
    foreach (SPWeb web in webs)
    {
        yield return selector(web);
        web.Dispose();
    }
}

I’m really not sure why I typed it that way…obviously you can’t keep going after you return, right? Well it turns out you can. The generated class just waits to call Dispose() until the next call to MoveNext(), effectively picking up where it left off. Here’s what that looks like in Reflector:

switch (this.<>1__state)
{
    case 0:
        this.<>1__state = -1;
        this.<>7__wrap1a = this.webs.GetEnumerator();
        this.<>1__state = 1;
        while (this.<>7__wrap1a.MoveNext())
        {
            this.<web>5__19 = (SPWeb) this.<>7__wrap1a.Current;
            this.<>2__current = this.selector(this.<web>5__19);
            this.<>1__state = 2;
            return true;
        Label_0080:
            this.<>1__state = 1;
            this.<web>5__19.Dispose();
        }
        this.<>m__Finally1c();
        break;

    case 2:
        goto Label_0080;
}
return false;

As Select will almost always be used in a foreach, with back-to-back calls of MoveNext(), the distinction is mostly academic. Still, I prefer to know that the web will be disposed immediately after the selector is done with it.

Implementing LINQ Where for SharePoint

First, a huge thanks to Waldek Mastykarz for running with my suggestion to run some performance tests on list item filtering. In short, CAML wins by a factor of 300, which I expect would be even more pronounced on larger lists and under load.

In his test, Waldek implements Where() as follows:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    List<SPListItem> list = new List<SPListItem>();
    foreach (SPListItem item in items)
        if (predicate(item))
            list.Add(item);
    return list;
}

This works as expected, but allocates a secondary data structure to store the filtered items. The preferred approach is to use the yield return syntax:

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    foreach (SPListItem item in items)
        if (predicate(item))
            yield return item;
}

The actual IL this generates is too complex to go into here, but I highly suggest checking it out in Reflector. In short, the compiler creates a private class that provides a filtered enumerator without actually building an intermediate data structure, instead filtering in MoveNext(). Using yield also defers execution until the collection is actually enumerated, though I can’t think of a SharePoint example where this would actually matter.

Another alternative, which also defers execution, is to leverage LINQ’s Cast<T>() operator and LINQ’s IEnumerable<T>.Where():

public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
                                            Func<SPListItem, bool> predicate)
{
    return items.Cast<SPListItem>().Where(predicate);
}

I imagine the compiler would optimize the yield-generated class in much the same way it optimizes LINQ’s internal implementation, but I will leave that research for a later date. It would also be interesting to compare the performance between the different implementations, though in a SharePoint context I expect the difference would be insignificant compared with the more expensive operations needed to retrieve the data.

The Problem with IDisposable

In my previous post, I suggested a Dispose-safe implementation of SPWebCollection.ForEach(), which Waldek leveraged for his Where implementation. Presumably because he was concerned about leaking SPWebs, his Where() implementation just returns a list of the web IDs. While avoiding leaks is smart, an ID isn’t nearly as useful as the full SPWeb and opening a new SPWeb from the ID is an expensive operation. What if I wanted a filtered enumeration of the SPWeb objects?

Well if we use one of the patterns described above, we should be safe if we call Dispose() for each when we’re done, right? I probably wouldn’t bother asking if there weren’t a catch, so I’ll answer my question with another question: When would Dispose() be called on the webs for which the predicate is false? It wouldn’t! To prevent these leaks, we need to be a bit more sophisticated:

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                       Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs)
        if (predicate(web))
            yield return web;
        else
            web.Dispose();
}

Or using Cast<T>():

public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
                                        Func<SPWeb, bool> predicate)
{
    return webs.Cast<SPWeb>().Where(w =>
    {
        bool r = predicate(w);
        if (!r)
            w.Dispose();
        return r;
    });
}

Again, a detailed IL investigation would likely prove one preferable to the other, but the principle is the same.

Finally, since caller-dependent disposal is unreliable and delegates are fun, I figure we could use a Dispose-safe filtered ForEach:

public static void ForEach(this SPWebCollection webs,
                           Action<SPWeb> action,
                           Func<SPWeb, bool> predicate)
{
    foreach (SPWeb web in webs.Where(predicate))
    {
        action(web);
        web.Dispose();
    }
}

Which would let us do something like this to print all publishing sites in a collection:

site.AllWebs.ForEach(
    w => { Console.WriteLine(w.Title); },
    w => { return w.Features.Contains(FeatureIds.OfficePublishingWeb); }
);

That is, if we define yet another useful, if simple, extension method:

public static bool Contains(this SPFeatureCollection features, Guid featureID)
{
    return features[featureID] != null;
}

And a final note: why did the LINQ team use Func<T, bool> instead of Predicate<T>, which has existed since .NET 2.0?