First, a huge thanks to Waldek Mastykarz for running with my suggestion to run some performance tests on list item filtering. In short, CAML wins by a factor of 300, which I expect would be even more pronounced on larger lists and under load.
In his test, Waldek implements Where() as follows:
public static IEnumerable<SPListItem> Where(this SPListItemCollection items, Func<SPListItem, bool> predicate) { List<SPListItem> list = new List<SPListItem>(); foreach (SPListItem item in items) if (predicate(item)) list.Add(item); return list; }
This works as expected, but allocates a secondary data structure to store the filtered items. The preferred approach is to use the yield return
syntax:
public static IEnumerable<SPListItem> Where(this SPListItemCollection items, Func<SPListItem, bool> predicate) { foreach (SPListItem item in items) if (predicate(item)) yield return item; }
The actual IL this generates is too complex to go into here, but I highly suggest checking it out in Reflector. In short, the compiler creates a private class that provides a filtered enumerator without actually building an intermediate data structure, instead filtering in MoveNext(). Using yield also defers execution until the collection is actually enumerated, though I can’t think of a SharePoint example where this would actually matter.
Another alternative, which also defers execution, is to leverage LINQ’s Cast<T>() operator and LINQ’s IEnumerable<T>.Where():
public static IEnumerable<SPListItem> Where(this SPListItemCollection items, Func<SPListItem, bool> predicate) { return items.Cast<SPListItem>().Where(predicate); }
I imagine the compiler would optimize the yield-generated class in much the same way it optimizes LINQ’s internal implementation, but I will leave that research for a later date. It would also be interesting to compare the performance between the different implementations, though in a SharePoint context I expect the difference would be insignificant compared with the more expensive operations needed to retrieve the data.
The Problem with IDisposable
In my previous post, I suggested a Dispose-safe implementation of SPWebCollection.ForEach(), which Waldek leveraged for his Where implementation. Presumably because he was concerned about leaking SPWebs, his Where() implementation just returns a list of the web IDs. While avoiding leaks is smart, an ID isn’t nearly as useful as the full SPWeb and opening a new SPWeb from the ID is an expensive operation. What if I wanted a filtered enumeration of the SPWeb objects?
Well if we use one of the patterns described above, we should be safe if we call Dispose() for each when we’re done, right? I probably wouldn’t bother asking if there weren’t a catch, so I’ll answer my question with another question: When would Dispose() be called on the webs for which the predicate is false? It wouldn’t! To prevent these leaks, we need to be a bit more sophisticated:
public static IEnumerable<SPWeb> Where(this SPWebCollection webs, Func<SPWeb, bool> predicate) { foreach (SPWeb web in webs) if (predicate(web)) yield return web; else web.Dispose(); }
Or using Cast<T>():
public static IEnumerable<SPWeb> Where(this SPWebCollection webs, Func<SPWeb, bool> predicate) { return webs.Cast<SPWeb>().Where(w => { bool r = predicate(w); if (!r) w.Dispose(); return r; }); }
Again, a detailed IL investigation would likely prove one preferable to the other, but the principle is the same.
Finally, since caller-dependent disposal is unreliable and delegates are fun, I figure we could use a Dispose-safe filtered ForEach:
public static void ForEach(this SPWebCollection webs, Action<SPWeb> action, Func<SPWeb, bool> predicate) { foreach (SPWeb web in webs.Where(predicate)) { action(web); web.Dispose(); } }
Which would let us do something like this to print all publishing sites in a collection:
site.AllWebs.ForEach( w => { Console.WriteLine(w.Title); }, w => { return w.Features.Contains(FeatureIds.OfficePublishingWeb); } );
That is, if we define yet another useful, if simple, extension method:
public static bool Contains(this SPFeatureCollection features, Guid featureID) { return features[featureID] != null; }
And a final note: why did the LINQ team use Func<T, bool> instead of Predicate<T>, which has existed since .NET 2.0?