Join SharePoint Lists with LINQ

I just read yet another post by Adam Buenz that got me thinking, this time about querying multiple SharePoint lists. Here’s the code he came up with:

var resultSet  = list1.Items.Cast<SPListItem>()
.Where(i => Equals (String.Compare(i["Property To Match #1"].ToString(), "Example String Literal"), 0))
.SelectMany(x => list2.Items.Cast<SPListItem>()
    .Where(i => Equals(String.Compare(new SPFieldLookupValue(x["Client"].ToString()).LookupValue, (string) i["Property To Match #2"]), 0)));

My first thought was that we could make it more readable with LINQ syntax:

var res = from SPListItem pi in list1.Items
          where pi["Property To Match #1"] as string == "Example String Literal"
          from SPListItem ci in list2.Items
          where new SPFieldLookupValue(ci["Client"] as string).LookupValue == pi["Property To Match #2"]
          select new { Parent = pi, Child = ci };

Behind the scenes, this will translate into equivalent extension method calls. The other adjustments are based on personal preference: ToString() can cause null reference exceptions, as string will not; and String.Compare() != String.Equals().

Next, let’s optimize the actual SharePoint queries. As a general rule we should always specify the desired ViewFields to eliminate unused data, and our first where clause should be handled with CAML if possible [see also, Is it a good idea to use lambda expressions for querying SharePoint data?].

var pItems = list1.GetItems(new SPQuery() {
    Query = "... ['Property To Match #1'] == 'Example String Literal'...",
    ViewFields = "..."
});
var cItems = list2.GetItems(new SPQuery() {
    ViewFields = "..."
});
var res = from SPListItem pi in pItems
          from SPListItem ci in cItems
          where new SPFieldLookupValue(ci["Client"] as string).LookupValue == pi["Property To Match #2"]
          select new { Parent = pi, Child = ci };

Now that we’re getting our data as efficiently as possible, we can look at what LINQ is doing with them. Behind the scenes, SelectMany is essentially implemented like this:

public static IEnumerable<TResult> SelectMany<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, IEnumerable<TResult>> selector)
{
    foreach(TSource element in source)
        foreach(TResult childElement in selector(element))
            yield return childElement;
}

For each item in our parent collection (source), the entire child collection is enumerated in search of items that match the predicate. This seems rather inefficient since we’re comparing the same values each time. Conveniently, LINQ provides a join operator for this purpose:

var res = from SPListItem pi in pItems
          join SPListItem ci in cItems
              on pi["Property To Match #2"]
              equals new SPFieldLookupValue(ci["Client"] as string).LookupValue
          select new { Parent = pi, Child = ci };

Behind the scenes, this translates into a call to the Join method:

var res = pItems.Cast<SPListItem>().Join(cItems.Cast<SPListItem>(),
              pi => pi["Property To Match #2"],
              ci => new SPFieldLookupValue(ci["Client"] as string).LookupValue,
              (pi, ci) => new { Parent = pi, Child = ci }
          );

Note that the left- and right-hand sides of the equals keyword are treated separately. The left-hand side operates on the first collection, the right-hand side operates on the second collection, and obviously both expressions must return the same type. This might be easier to see from an implementation of Join:

public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    IEnumerable<TInner> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, TInner, TResult> resultSelector)
{
    ILookup<TKey, TInner> lookup = inner.ToLookup(innerKeySelector);
    return from outerItem in outer
           from innerItem in lookup[outerKeySelector(outerItem)]
           select resultSelector(outerItem, innerItem);
}

So in our case, Join will build a lookup of all child items based on the lookup value, and then perform a SelectMany to cross join the parent items with the child items found from a lookup by the matched property. This dictionary lookup will almost certainly perform better than a full enumeration of the list, especially for larger lists and more complex keys.