Hacking LINQ Expressions: Join With Comparer

In this installment of my Hacking LINQ series we’ll take a look at providing an IEqualityComparer for use in a LINQ join clause.

The Problem

Many of the Standard Query Operators require comparing sequence elements and the default query providers are kind enough to give us overloads that accept a suitable comparer. Among these operators, Join and GroupJoin have perhaps the most useful query syntax:

 var res = from s in States
          join a in AreaCodes
            on s.Abbr equals a.StateAbbr
          select new { s.Name, a.AreaCode };

While a bit more verbose, I find the intent much easier to read then the method equivalent:

var res = States.Join(AreaCodes,
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode });

Or maybe I’ve just spent too much time in SQL. Either way, I thought it would be useful to support joins by a comparer.

The Goal

We will use another extension method to specify how the join should be performed:

var res = from s in States
          join a in AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase)
            on s.Abbr equals a.StateAbbr
          select new { s.Name, a.AreaCode };

We can also support the same syntax for group joins:

var res = from s in States
          join a in AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase)
            on s.Abbr equals a.StateAbbr into j
          select new { s.Name, Count = j.Count() };

The Hack

As with most LINQ hacks, we’re going to use the result of WithComparer to call a specialized version of Join or GroupJoin, in this case by providing a replacement for the join’s inner sequence:

var res = States.Join(AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase),
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode });

Eventually leading to this method call:

var res = States.Join(AreaCodes,
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode },
                      StringComparer.OrdinalIgnoreCase);

Since we need both the inner collection we’re extending and the comparer, we can guess our extension method will be implemented something like this:

public static JoinComparerProvider<T, TKey> WithComparer<T, TKey>(
    this IEnumerable<T> inner, IEqualityComparer<TKey> comparer)
{
    return new JoinComparerProvider<T, TKey>(inner, comparer);
}

With a trivial provider implementation:

public sealed class JoinComparerProvider<T, TKey>
{
    internal JoinComparerProvider(IEnumerable<T> inner, IEqualityComparer<TKey> comparer)
    {
        Inner = inner;
        Comparer = comparer;
    }

    public IEqualityComparer<TKey> Comparer { get; private set; }
    public IEnumerable<T> Inner { get; private set; }
}

The final piece is our Join overload:

public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    JoinComparerProvider<TInner, TKey> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, TInner, TResult> resultSelector)
{
    return outer.Join(inner.Inner, outerKeySelector, innerKeySelector,
                      resultSelector, inner.Comparer);
}

Implementations of GroupJoin and their IQueryable counterparts are similarly trivial.

Posted in .NET, LINQ. Tags: , , . Comments Off on Hacking LINQ Expressions: Join With Comparer

Join SharePoint Lists with LINQ

I just read yet another post by Adam Buenz that got me thinking, this time about querying multiple SharePoint lists. Here’s the code he came up with:

var resultSet  = list1.Items.Cast<SPListItem>()
.Where(i => Equals (String.Compare(i["Property To Match #1"].ToString(), "Example String Literal"), 0))
.SelectMany(x => list2.Items.Cast<SPListItem>()
    .Where(i => Equals(String.Compare(new SPFieldLookupValue(x["Client"].ToString()).LookupValue, (string) i["Property To Match #2"]), 0)));

My first thought was that we could make it more readable with LINQ syntax:

var res = from SPListItem pi in list1.Items
          where pi["Property To Match #1"] as string == "Example String Literal"
          from SPListItem ci in list2.Items
          where new SPFieldLookupValue(ci["Client"] as string).LookupValue == pi["Property To Match #2"]
          select new { Parent = pi, Child = ci };

Behind the scenes, this will translate into equivalent extension method calls. The other adjustments are based on personal preference: ToString() can cause null reference exceptions, as string will not; and String.Compare() != String.Equals().

Next, let’s optimize the actual SharePoint queries. As a general rule we should always specify the desired ViewFields to eliminate unused data, and our first where clause should be handled with CAML if possible [see also, Is it a good idea to use lambda expressions for querying SharePoint data?].

var pItems = list1.GetItems(new SPQuery() {
    Query = "... ['Property To Match #1'] == 'Example String Literal'...",
    ViewFields = "..."
});
var cItems = list2.GetItems(new SPQuery() {
    ViewFields = "..."
});
var res = from SPListItem pi in pItems
          from SPListItem ci in cItems
          where new SPFieldLookupValue(ci["Client"] as string).LookupValue == pi["Property To Match #2"]
          select new { Parent = pi, Child = ci };

Now that we’re getting our data as efficiently as possible, we can look at what LINQ is doing with them. Behind the scenes, SelectMany is essentially implemented like this:

public static IEnumerable<TResult> SelectMany<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, IEnumerable<TResult>> selector)
{
    foreach(TSource element in source)
        foreach(TResult childElement in selector(element))
            yield return childElement;
}

For each item in our parent collection (source), the entire child collection is enumerated in search of items that match the predicate. This seems rather inefficient since we’re comparing the same values each time. Conveniently, LINQ provides a join operator for this purpose:

var res = from SPListItem pi in pItems
          join SPListItem ci in cItems
              on pi["Property To Match #2"]
              equals new SPFieldLookupValue(ci["Client"] as string).LookupValue
          select new { Parent = pi, Child = ci };

Behind the scenes, this translates into a call to the Join method:

var res = pItems.Cast<SPListItem>().Join(cItems.Cast<SPListItem>(),
              pi => pi["Property To Match #2"],
              ci => new SPFieldLookupValue(ci["Client"] as string).LookupValue,
              (pi, ci) => new { Parent = pi, Child = ci }
          );

Note that the left- and right-hand sides of the equals keyword are treated separately. The left-hand side operates on the first collection, the right-hand side operates on the second collection, and obviously both expressions must return the same type. This might be easier to see from an implementation of Join:

public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    IEnumerable<TInner> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, TInner, TResult> resultSelector)
{
    ILookup<TKey, TInner> lookup = inner.ToLookup(innerKeySelector);
    return from outerItem in outer
           from innerItem in lookup[outerKeySelector(outerItem)]
           select resultSelector(outerItem, innerItem);
}

So in our case, Join will build a lookup of all child items based on the lookup value, and then perform a SelectMany to cross join the parent items with the child items found from a lookup by the matched property. This dictionary lookup will almost certainly perform better than a full enumeration of the list, especially for larger lists and more complex keys.