First, a huge thanks to Waldek Mastykarz for running with my suggestion to run some performance tests on list item filtering. In short, CAML wins by a factor of 300, which I expect would be even more pronounced on larger lists and under load.
In his test, Waldek implements Where() as follows:
public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
Func<SPListItem, bool> predicate)
{
List<SPListItem> list = new List<SPListItem>();
foreach (SPListItem item in items)
if (predicate(item))
list.Add(item);
return list;
}
This works as expected, but allocates a secondary data structure to store the filtered items. The preferred approach is to use the yield return
syntax:
public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
Func<SPListItem, bool> predicate)
{
foreach (SPListItem item in items)
if (predicate(item))
yield return item;
}
The actual IL this generates is too complex to go into here, but I highly suggest checking it out in Reflector. In short, the compiler creates a private class that provides a filtered enumerator without actually building an intermediate data structure, instead filtering in MoveNext(). Using yield also defers execution until the collection is actually enumerated, though I can’t think of a SharePoint example where this would actually matter.
Another alternative, which also defers execution, is to leverage LINQ’s Cast<T>() operator and LINQ’s IEnumerable<T>.Where():
public static IEnumerable<SPListItem> Where(this SPListItemCollection items,
Func<SPListItem, bool> predicate)
{
return items.Cast<SPListItem>().Where(predicate);
}
I imagine the compiler would optimize the yield-generated class in much the same way it optimizes LINQ’s internal implementation, but I will leave that research for a later date. It would also be interesting to compare the performance between the different implementations, though in a SharePoint context I expect the difference would be insignificant compared with the more expensive operations needed to retrieve the data.
The Problem with IDisposable
In my previous post, I suggested a Dispose-safe implementation of SPWebCollection.ForEach(), which Waldek leveraged for his Where implementation. Presumably because he was concerned about leaking SPWebs, his Where() implementation just returns a list of the web IDs. While avoiding leaks is smart, an ID isn’t nearly as useful as the full SPWeb and opening a new SPWeb from the ID is an expensive operation. What if I wanted a filtered enumeration of the SPWeb objects?
Well if we use one of the patterns described above, we should be safe if we call Dispose() for each when we’re done, right? I probably wouldn’t bother asking if there weren’t a catch, so I’ll answer my question with another question: When would Dispose() be called on the webs for which the predicate is false? It wouldn’t! To prevent these leaks, we need to be a bit more sophisticated:
public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
Func<SPWeb, bool> predicate)
{
foreach (SPWeb web in webs)
if (predicate(web))
yield return web;
else
web.Dispose();
}
Or using Cast<T>():
public static IEnumerable<SPWeb> Where(this SPWebCollection webs,
Func<SPWeb, bool> predicate)
{
return webs.Cast<SPWeb>().Where(w =>
{
bool r = predicate(w);
if (!r)
w.Dispose();
return r;
});
}
Again, a detailed IL investigation would likely prove one preferable to the other, but the principle is the same.
Finally, since caller-dependent disposal is unreliable and delegates are fun, I figure we could use a Dispose-safe filtered ForEach:
public static void ForEach(this SPWebCollection webs,
Action<SPWeb> action,
Func<SPWeb, bool> predicate)
{
foreach (SPWeb web in webs.Where(predicate))
{
action(web);
web.Dispose();
}
}
Which would let us do something like this to print all publishing sites in a collection:
site.AllWebs.ForEach(
w => { Console.WriteLine(w.Title); },
w => { return w.Features.Contains(FeatureIds.OfficePublishingWeb); }
);
That is, if we define yet another useful, if simple, extension method:
public static bool Contains(this SPFeatureCollection features, Guid featureID)
{
return features[featureID] != null;
}
And a final note: why did the LINQ team use Func<T, bool> instead of Predicate<T>, which has existed since .NET 2.0?