Selecting Static Results with Dynamic LINQ

Dynamic LINQ (DLINQ) is a LINQ extension provided in the VS 2008 Samples. Scott Guthrie provides a good overview here: Dynamic LINQ (Part 1: Using the LINQ Dynamic Query Library), but the executive summary is that it implements certain query operations on IQueryable (the non-generic variety), with filtering, grouping and projection specified with strings rather than statically-typed expressions.

I’ve never had a use for it, but a question on Stack Overflow caused me to take a second look…

…the selected groupbyvalue (Group) will always be a string, and the sum will always be a double, so I want to be able to cast into something like a List, where Result is an object with properties Group (string) and TotalValue (double).

Before we can solve the problem, let’s take a closer look at why it is being asked…

DynamicExpression.CreateClass

We can use the simplest of dynamic queries to explore a bit:

[Test]
public void DLINQ_IdentityProjection_ReturnsDynamicClass()
{
    IQueryable nums = Enumerable.Range(1, 5).AsQueryable();
    IQueryable q = nums.Select("new (it as Value)");
    Type elementType = q.ElementType;

    Assert.AreEqual("DynamicClass1", elementType.Name);
    CollectionAssert.AreEqual(new[] { typeof(int) },
        elementType.GetProperties().Select(p => p.PropertyType).ToArray());
}

DLINQ defines a special expression syntax for projection that is used to specify what values should be returned and how. it refers to the current element, which in our case is an int.

The result in question comes from DynamicQueryable.Select():

public static IQueryable Select(this IQueryable source, string selector, params object[] values)
{
    LambdaExpression lambda = DynamicExpression.ParseLambda(source.ElementType, null, selector, values);
    return source.Provider.CreateQuery(
        Expression.Call(
            typeof(Queryable), "Select",
            new Type[] { source.ElementType, lambda.Body.Type },
            source.Expression, Expression.Quote(lambda)));
}

The non-generic return type suggest that the type of the values returned is unknown at compile time. If we check an element’s type at runtime, we’ll see something like DynamicClass1. Tracing down the stack from DynamicExpression.ParseLambda(), we eventually find that DynamicClass1 is generated by a call to DynamicExpression.CreateClass() in ExpressionParser.ParseNew(). CreateClass() in turn delegates to a static ClassFactory which manages a dynamic assembly in the current AppDomain to hold the new classes, each generated by Reflection.Emit. The resulting type is then used to generate the MemberInit expression that constructs the object.

Dynamic to Static

While dynamic objects are useful in some situations (thus support in C# 4), in this case we want to use static typing. Let’s specify our result type with a generic method:

IQueryable<TResult> Select<TResult>(this IQueryable source, string selector, params object[] values);

We just need a mechanism to insert our result type into DLINQ to supersede the dynamic result. This is surprisingly easy to implement, as ParseLambda() already accepts a resultType argument. We just need to capture it…

private Type resultType;
public Expression Parse(Type resultType)
{
    this.resultType = resultType;
    int exprPos = token.pos;
    // ...

…and then update ParseNew() to use the specified type:

Expression ParseNew()
{
    // ...
    NextToken();
    Type type = this.resultType ?? DynamicExpression.CreateClass(properties);
    MemberBinding[] bindings = new MemberBinding[properties.Count];
    for (int i = 0; i < bindings.Length; i++)
        bindings[i] = Expression.Bind(type.GetProperty(properties[i].Name), expressions[i]);
    return Expression.MemberInit(Expression.New(type), bindings);
}

If resultType is null, as it is in the existing Select() implementation, a DynamicClass is used instead.

The generic Select<TResult> is then completed by referencing TResult as appropriate:

public static IQueryable<TResult> Select<TResult>(this IQueryable source, string selector, params object[] values)
{
    LambdaExpression lambda = DynamicExpression.ParseLambda(source.ElementType, typeof(TResult), selector, values);
    return source.Provider.CreateQuery<TResult>(
        Expression.Call(
            typeof(Queryable), "Select",
            new Type[] { source.ElementType, typeof(TResult) },
            source.Expression, Expression.Quote(lambda)));
}

With the following usage:

public class ValueClass { public int Value { get; set; } }

[Test]
public void DLINQ_IdentityProjection_ReturnsStaticClass()
{
    IQueryable nums = Enumerable.Range(1, 5).AsQueryable();
    IQueryable<ValueClass> q = nums.Select<ValueClass>("new (it as Value)");
    Type elementType = q.ElementType;

    Assert.AreEqual("ValueClass", elementType.Name);
    CollectionAssert.AreEqual(nums.ToArray(), q.Select(v => v.Value).ToArray());
}

Note that the property names in TResult must match those in the Select query exactly. Changing the query to “new (it as value)” results in an unhandled ArgumentNullException in the Expression.Bind() call seen in the for loop of ParseNew() above, as the “value” property cannot be found.

Selecting Anonymous Types

So we can select dynamic types or existing named types, but what if we want to have the benefits of static typing without having to declare a dedicated ValueClass, as we can with anonymous types and normal static LINQ? As a variation on techniques used elsewhere, let’s can define an overload of Select() that accepts an instance of the anonymous type whose values we will ignore but using its type to infer the desired return type. The overload is trivial:

public static IQueryable<TResult> Select<TResult>(this IQueryable source, TResult template, string selector, params object[] values)
{
    return source.Select<TResult>(selector, values);
}

With usage looking like this (note the required switch to var q):

[Test]
public void DLINQ_IdentityProjection_ReturnsStaticClass()
{
    IQueryable nums = Enumerable.Range(1, 5).AsQueryable();
    var q = nums.Select(new { Value = 0 }, "new (it as Value)");
    Type elementType = q.ElementType;

    Assert.IsTrue(elementType.Name.Contains("AnonymousType"));
    CollectionAssert.AreEqual(nums.ToArray(), q.Select(v => v.Value).ToArray());
}

However, if we try the above we encounter an unfortunate error:

The property ‘Int32 Value’ has no ‘set’ accessor

As you may or may not know, anonymous types in C# are immutable (modulo changes to objects they reference), with their values set through a compiler-generated constructor. (I’m not sure if this is true in VB.) With this knowledge in hand, we can update ParseNew() to check if resultType has such a constructor that we could use instead:

    // ...
    Type type = this.resultType ?? DynamicExpression.CreateClass(properties);

    var propertyTypes = type.GetProperties().Select(p => p.PropertyType).ToArray();
    var ctor = type.GetConstructor(propertyTypes);
    if (ctor != null)
        return Expression.New(ctor, expressions);

    MemberBinding[] bindings = new MemberBinding[properties.Count];
    for (int i = 0; i < bindings.Length; i++)
        bindings[i] = Expression.Bind(type.GetProperty(properties[i].Name), expressions[i]);
    return Expression.MemberInit(Expression.New(type), bindings);
}

And with that we can now project from a dynamic query onto static types, both named and anonymous, with a reasonably natural interface.

Due to licensing I can’t post the full example, but if you’re at all curious about Reflection.Emit or how DLINQ works I would encourage you to dive in and let us know what else you come up with. Things will get even more interesting with the combination of LINQ, the DLR and C# 4’s dynamic in the coming months.

Advertisements

HTTP Error Codes in WatiN 1.3

One of the biggest surprises when I started working with WatiN was the omission of a mechanism to check for error conditions. A partial solution using a subclass has been posted before, but it doesn’t quite cover all the bases. Specifically, it’s missing a mechanism to attach existing Internet Explorer instances to objects of the enhanced subtype. Depending on the site under test’s use of pop-ups, this could be a rather severe limitation. So let’s see how we can fix it.

As WatiN is open source, one option is to just patch the existing implementation to include the desired behavior. I’ve uploaded a patch with tests here, but the gist of the patch is quite similar to the solution referenced above:

protected void AttachEventHandlers()
{
    ie.BeforeNavigate2 += (object pDisp, ref object URL, ref object Flags, ref object TargetFrameName, ref object PostData, ref object Headers, ref bool Cancel) =>
    {
        ErrorCode = null;
    };
    ie.NavigateError += (object pDisp, ref object URL, ref object Frame, ref object StatusCode, ref bool Cancel) =>
    {
        ErrorCode = (HttpStatusCode)StatusCode;
    };
}

/// <summary>
/// HTTP Status Code of last error, or null if the last request was successful
/// </summary>
public HttpStatusCode? ErrorCode
{
    get;
    private set;
}

Before every request we clear out the error code, with errors captured as an enum value borrowed from System.Net.

We complete the patch by placing calls to our AttachEventHandlers() method in two places:

  1. The constructor that accepts an existing SHDocVw.InternetExplorer handle.
  2. The CreateNewIEAndGoToUri() method used by every other constructor.

At this point we can now assert success:

using (IE ie = new IE("https://solutionizing.net/"))
{
    Assert.That(ie.ErrorCode, Is.Null);
}

Or specific kinds of failure:

using (IE ie = new IE("https://solutionizing.net/4040404040404"))
{
    Assert.That(ie.ErrorCode, Is.EqualTo(HttpStatusCode.NotFound));
}

See the patch above for a more complete set of example tests.

Private Strikes Again

It’s wonderful that we have the option to make our own patched build with the desired behavior, but what if we would rather use the binary distribution? Well through the magic of inheritance we can get most of the way there pretty easily:

public class MyIE : IE
{
    public MyIE()
    {
        Initialize();
    }
    public MyIE(object shDocVwInternetExplorer)
        : base(shDocVwInternetExplorer)
    {
        Initialize();
    }
    public MyIE(string url)
        : base(url)
    {
        Initialize();
    }

    // Remaining c'tors left as an exercise

    // Property named ie for consistency with the private field in the parent
    protected InternetExplorer ie
    {
        get { return (InternetExplorer)InternetExplorer; }
    }

    protected void Initialize()
    {
        AttachEventHandlers();
    }

    // AttachEventHandlers() and ErrorCode as defined above
}

But as I suggested before, this is where we run into a bit of a snag. The IE class also provides a set of static AttachToIE() methods that, as their name suggests, return an IE object for an existing Internet Explorer window. These static methods have the downside that they are hard-coded to return objects of type IE, not our enhanced MyIE type. And because all the relevant helper methods are private and not designed for reuse, we have no choice but to pull them into our subclass in their entirety:

    public new static MyIE AttachToIE(BaseConstraint findBy)
    {
        return findIE(findBy, Settings.AttachToIETimeOut, true);
    }
    public new static MyIE AttachToIE(BaseConstraint findBy, int timeout)
    {
        return findIE(findBy, timeout, true);
    }
    public new static MyIE AttachToIENoWait(BaseConstraint findBy)
    {
        return findIE(findBy, Settings.AttachToIETimeOut, false);
    }
    public new static MyIE AttachToIENoWait(BaseConstraint findBy, int timeout)
    {
        return findIE(findBy, timeout, false);
    }

    private static MyIE findIE(BaseConstraint findBy, int timeout, bool waitForComplete)
    {
        SHDocVw.InternetExplorer internetExplorer = findInternetExplorer(findBy, timeout);

        if (internetExplorer != null)
        {
            MyIE ie = new MyIE(internetExplorer);
            if (waitForComplete)
            {
                ie.WaitForComplete();
            }

            return ie;
        }

        throw new IENotFoundException(findBy.ConstraintToString(), timeout);
    }

    protected static SHDocVw.InternetExplorer findInternetExplorer(BaseConstraint findBy, int timeout)
    {
        Logger.LogAction("Busy finding Internet Explorer matching constriant " + findBy.ConstraintToString());

        SimpleTimer timeoutTimer = new SimpleTimer(timeout);

        do
        {
            Thread.Sleep(500);

            SHDocVw.InternetExplorer internetExplorer = findInternetExplorer(findBy);

            if (internetExplorer != null)
            {
                return internetExplorer;
            }
        } while (!timeoutTimer.Elapsed);

        return null;
    }

    private static SHDocVw.InternetExplorer findInternetExplorer(BaseConstraint findBy)
    {
        ShellWindows allBrowsers = new ShellWindows();

        int browserCount = allBrowsers.Count;
        int browserCounter = 0;

        IEAttributeBag attributeBag = new IEAttributeBag();

        while (browserCounter < browserCount)
        {
            attributeBag.InternetExplorer = (SHDocVw.InternetExplorer) allBrowsers.Item(browserCounter);

            if (findBy.Compare(attributeBag))
            {
                return attributeBag.InternetExplorer;
            }

            browserCounter++;
        }

        return null;
    }

The original version of the first findInternetExplorer() is private. Were it protected instead, we would only have had to implement our own findIE() to wrap the found InternetExplorer object in our subtype.

I won’t go so far as to say private methods are a code smell, but they certainly can make the O in OCP more difficult to achieve.

So there you have it, two different techniques for accessing HTTP error codes in WatiN 1.3. At some point I’ll look at adding similar functionality to 2.0, if it’s not already there. And if someone on the project team see this, feel free to run with it.

Posted in .NET. Tags: . Comments Off on HTTP Error Codes in WatiN 1.3

Script to Enable HTTP Compression (Gzip/Deflate) in IIS 6

One of the easiest ways to improve web site performance is to enable HTTP compression (often referred to as GZIP compression), which trades CPU time to compress content for a reduced payload delivered over the wire. In the vast majority of cases, the trade-off is a good one.

When implementing HTTP compression, your content will break down into three categories:

  1. Content that should not be compressed because it is already compressed: images, PDF files, audio, video, etc.
  2. Static content that can be compressed once and cached for later.
  3. Dynamic content that needs to be compressed for every request.

Excluding already-compressed content will need to be considered regardless of the techniques used to compress categories 2 and 3.

Since version 5, IIS has included support for both kinds of HTTP compression. This can be enabled through the management interface, but you will almost certainly want to tweak the default configuration in the metabase (see script below). While IIS works great for compressing static files, its extension-based configuration is rather limited when serving up dynamic content, especially if you don’t use extensions (as with most ASP.NET MVC routes) or you serve dynamic content that should not be compressed. A better solution is provided in HttpCompress by Ben Lowery, a configurable HttpModule that allows content to be excluded from compression by MIME type. A standard configuration might look something like this:

<configuration>
  ...
  <blowery.web>
    <httpCompress preferredAlgorithm="gzip" compressionLevel="normal">
      <excludedMimeTypes>
        <add type="image/jpeg" />
        <add type="image/png" />
        <add type="image/gif" />
        <add type="application/pdf" />
      </excludedMimeTypes>
      <excludedPaths></excludedPaths>
    </httpCompress>
  </blowery.web>
  ...
</configuration>

To supplement the compressed dynamic content, you should also enable static compression for the rest of your not-already-compressed content. The script should be pretty self-explanatory, but I’ll draw attention to a few things:

  • The tcfpath variable at the top is currently set to IIS’s default location, which you are free to change.
  • The extlist variable accepts a space-delimited list of file extensions that should be compressed. Again, only include files types that are not already compressed, as recompressing a file wastes cycles and can actually make some files larger.
  • There are a few other metabase properties that can also be set, including compression level, but these are the bare minimum.
  • I have been told repeatedly that IISRESET should be sufficient to apply the metabase changes, but I could not get it to work as consistently as manually restarting the IIS Admin Service — YMMV.
  • If all goes well, the nice arrow at the end will point to True.

If you have anything else to add, or have problems with the script, please let me know.

@echo off
set adsutil=C:\Inetpub\AdminScripts\adsutil.vbs
set tcfpath=%windir%\IIS Temporary Compressed Files
set extlist=css htm html js txt xml

mkdir "%tcfpath%"

echo Ensure IIS_WPG has Full Control on %tcfpath%

explorer "%tcfpath%\.."
pause

cscript.exe %adsutil% set w3svc/Filters/Compression/Parameters/HcDoStaticCompression true
cscript.exe %adsutil% set w3svc/Filters/Compression/Parameters/HcCompressionDirectory "%tcfpath%"
cscript.exe %adsutil% set w3svc/Filters/Compression/DEFLATE/HcFileExtensions %extlist%
cscript.exe %adsutil% set w3svc/Filters/Compression/GZIP/HcFileExtensions %extlist%

echo Restart IIS Admin Service - IISRESET does not seem to work
pause

echo Close Services to continue...
Services.msc

cscript.exe %adsutil% get w3svc/Filters/Compression/Parameters/HcDoStaticCompression

echo Should be True -----------------------------^^

pause
Posted in .NET. Tags: , , . Comments Off on Script to Enable HTTP Compression (Gzip/Deflate) in IIS 6

Is Functional Abstraction Too Clever?

I received a rather interesting comment on a recent Stack Overflow answer:

This code seems too clever by half. Is it art? – PeterAllenWebb

The code in question was a functional solution to an algorithm described approximately as follows:

Draw n−1 numbers at random, in the range 1 to m−1. Add 0 and m to the list and order these numbers. The difference between each two consecutive numbers gives you a return value.

Which I solved like this, with n = slots and m = max:

static int[] GetSlots(int slots, int max)
{
    return new Random().Values(1, max)
                       .Take(slots - 1)
                       .Append(0, max)
                       .OrderBy(i => i)
                       .Pairwise((x, y) => y - x)
                       .ToArray();
}

Using a few extension methods:

  • Values() returns an infinite sequence of random values within the specified range.
  • Append() takes a params array and appends its arguments to the original sequence.
  • Pairwise() generates a sequence from calculations on pairs of consecutive elements in the original sequence.

I can see how one would think the code is clever; however, I’m not sure what would qualify it as too clever. Every method call has a well-defined purpose and maps directly to part of the original algorithm:

  1. From random numbers in the range 1 to m−1…
  2. …draw n−1.
  3. Add 0 and m to the list…
  4. …and order these numbers.
  5. The difference between each two consecutive numbers…
  6. …gives you a return value [in the array].

As far as I’m concerned, a solution couldn’t get much clearer than this, but that’s easy enough for me to say—what do you think? Is there a better way to express the algorithm? Would an imperative solution with shared state be more readable? How about maintainable?

For example, one could add the requirement that the random numbers not be repeated so that the difference between adjacent numbers is always nonzero. Updating the functional solution is as simple as adding a Distinct() call:

    return new Random().Values(1, max)
                       .Distinct()
                       .Take(slots - 1)
                       ...

To me, this is the value proposition of functional programming. By expressing the algorithm in terms of common operations, we’re able to spend more time thinking about the problem than the details of the solution. A similar change in an imperative implementation would almost certainly have been more involved and prone to error.

For completeness, here are the implementations of the extension methods used:

public static IEnumerable<int> Values(this Random random, int minValue, int maxValue)
{
    while (true)
        yield return random.Next(minValue, maxValue);
}

public static IEnumerable<TResult> Pairwise<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TSource, TResult> resultSelector)
{
    TSource previous = default(TSource);

    using (var it = source.GetEnumerator())
    {
        if (it.MoveNext())
            previous = it.Current;

        while (it.MoveNext())
            yield return resultSelector(previous, previous = it.Current);
    }
}

public static IEnumerable<T> Append<T>(this IEnumerable<T> source, params T[] args)
{
    return source.Concat(args);
}

This also reminds me that Jimmy posted a similar Append() method as part of his latest post on missing LINQ operators. I used to use a version similar to his, but have found the params version to be more flexible (and easier to implement). Its Prepend() counterpart is similarly trivial:

public static IEnumerable<T> Prepend<T>(this IEnumerable<T> source, params T[] args)
{
    return args.Concat(source);
}
Posted in .NET. Tags: . Comments Off on Is Functional Abstraction Too Clever?

Refactoring with Iterators: Prime Factors

Andrew Woodward recently posted a comparison of his test-driven Prime Factors solution to one written by Uncle Bob. In the comments, someone suggested that Andrew use an iterator instead so I thought I’d give it a try.

First, let’s repost the original code:

private const int SMALLEST_PRIME = 2;

public List<int> Generate(int i)
{
    List<int> primes = new List<int>();
    int divider = SMALLEST_PRIME;
    while (HasPrimes(i))
    {
        while (IsDivisable(i, divider))
        {
            i = AddPrimeToProductsAndReduce(i, primes, divider);
        }
        divider++;
    }
    return primes;
}

private bool IsDivisable(int i, int divider)
{
    return i%divider == 0;
}

private bool HasPrimes(int i)
{
    return i >= SMALLEST_PRIME;
}

private int AddPrimeToProductsAndReduce(int i, List<int> primes, int prime)
{
    primes.Add(prime);
    i /= prime;
    return i;
}

By switching our method to return IEnumerable<int>, we can replace the primes list with an iterator. We will also remove the AddPrimeToProducts functionality from that helper method since we don’t have the list any more:

public IEnumerable<int> Generate(int i)
{
    int divider = SMALLEST_PRIME;
    while (HasPrimes(i))
    {
        while (IsDivisable(i, divider))
        {
            yield return divider;
            i = Reduce(i, divider);
        }
        divider++;
    }
}

private int Reduce(int i, int prime)
{
    return i / prime;
}

I think this is a good change for three reasons:

  1. There’s nothing about the problem that requires a List<int> be returned, we just want a sequence of the factors.
  2. AddPrimeToProductsAndReduce suggested that it had a side effect, but exactly what wasn’t immediately obvious.
  3. It’s much easier to see what values are being included in the result.

That said, I think we can clean this up even more with a second iterator. Specifically, I think we should break out the logic for our candidate factors:

private IEnumerable<int> Divisors
{
    get
    {
        int x = SMALLEST_PRIME;
        while (true)
            yield return x++;
    }
}

Which allows us to separate the logic for generating a divider from the code that consumes it:

public IEnumerable<int> Generate(int toFactor)
{
    foreach (var divider in Divisors)
    {
        if (!HasPrimes(toFactor))
            break;

        while (IsDivisable(toFactor, divider))
        {
            yield return divider;
            toFactor = Reduce(toFactor, divider);
        }
    }
}

We should also eliminate the negation by flipping HasPrimes to become IsFactored:

public IEnumerable<int> Generate(int toFactor)
{
    foreach (var divider in Divisors)
    {
        if (IsFactored(toFactor))
            break;

        while (IsDivisable(toFactor, divider))
        {
            yield return divider;
            toFactor = Reduce(toFactor, divider);
        }
    }
}

private bool IsFactored(int i)
{
    return i <= 1;
}

This does introduce a (very) minor inefficiency in that the Divisors enumerator will MoveNext() one extra time before breaking out of the loop, which could be mitigated by checking IsFactored both before the foreach and after the while loop. Less readable, insignificantly more efficient…take your pick.

The other advantage to breaking out the logic to generate Divisors is that we can easily pick smarter candidates. One option is to skip even numbers greater than 2. An even better optimization takes advantage of the fact that all primes greater than 3 are of the form x±1 where x is a multiple of 6:

private IEnumerable<int> Divisors
{
    get
    {
        yield return 2;
        yield return 3;
        int i = 6;
        while (true)
        {
            yield return i - 1;
            yield return i + 1;
            i += 6;
        }
    }
}

Implementing this sort of logic in the original version would have been much more difficult, both in terms of correctness and readability.

Posted in .NET. Tags: , . Comments Off on Refactoring with Iterators: Prime Factors

Hacking LINQ Expressions: Join With Comparer

In this installment of my Hacking LINQ series we’ll take a look at providing an IEqualityComparer for use in a LINQ join clause.

The Problem

Many of the Standard Query Operators require comparing sequence elements and the default query providers are kind enough to give us overloads that accept a suitable comparer. Among these operators, Join and GroupJoin have perhaps the most useful query syntax:

 var res = from s in States
          join a in AreaCodes
            on s.Abbr equals a.StateAbbr
          select new { s.Name, a.AreaCode };

While a bit more verbose, I find the intent much easier to read then the method equivalent:

var res = States.Join(AreaCodes,
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode });

Or maybe I’ve just spent too much time in SQL. Either way, I thought it would be useful to support joins by a comparer.

The Goal

We will use another extension method to specify how the join should be performed:

var res = from s in States
          join a in AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase)
            on s.Abbr equals a.StateAbbr
          select new { s.Name, a.AreaCode };

We can also support the same syntax for group joins:

var res = from s in States
          join a in AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase)
            on s.Abbr equals a.StateAbbr into j
          select new { s.Name, Count = j.Count() };

The Hack

As with most LINQ hacks, we’re going to use the result of WithComparer to call a specialized version of Join or GroupJoin, in this case by providing a replacement for the join’s inner sequence:

var res = States.Join(AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase),
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode });

Eventually leading to this method call:

var res = States.Join(AreaCodes,
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode },
                      StringComparer.OrdinalIgnoreCase);

Since we need both the inner collection we’re extending and the comparer, we can guess our extension method will be implemented something like this:

public static JoinComparerProvider<T, TKey> WithComparer<T, TKey>(
    this IEnumerable<T> inner, IEqualityComparer<TKey> comparer)
{
    return new JoinComparerProvider<T, TKey>(inner, comparer);
}

With a trivial provider implementation:

public sealed class JoinComparerProvider<T, TKey>
{
    internal JoinComparerProvider(IEnumerable<T> inner, IEqualityComparer<TKey> comparer)
    {
        Inner = inner;
        Comparer = comparer;
    }

    public IEqualityComparer<TKey> Comparer { get; private set; }
    public IEnumerable<T> Inner { get; private set; }
}

The final piece is our Join overload:

public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    JoinComparerProvider<TInner, TKey> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, TInner, TResult> resultSelector)
{
    return outer.Join(inner.Inner, outerKeySelector, innerKeySelector,
                      resultSelector, inner.Comparer);
}

Implementations of GroupJoin and their IQueryable counterparts are similarly trivial.

Posted in .NET, LINQ. Tags: , , . Comments Off on Hacking LINQ Expressions: Join With Comparer

Hacking LINQ Expressions: Select With Index

First, a point of clarification: I use LINQ Expressions to mean (Language-INtegrated) Query Expressions (the language feature) rather than Expression Trees (the .NET 3.5 library in System.Linq.Expressions).

So what do I mean by “Hacking LINQ Expressions”? Quite simply, I’m not content with the rather limited set of operations that query expressions allow me to represent. By understanding how queries are translated, we can use various techniques to broaden our expressive reach. I have already documented one such hack for managing IDisposable objects with LINQ, so I guess we can call this the second in an unbounded series.

The Problem

In thinking over use cases for functional construction of web control trees, I paused to think through how I would express alternate row styling. My mind immediately jumped to the overload of Select() that exposes the current element’s index:

Controls.Add(
    new Table().WithControls(
        data.Select((x, i) =>
            new TableRow() {
                CssClass = i % 2 == 0 ? "" : "alt"
            }.WithControls(
                new TableCell().WithControls(x)
            )
        )
    )
);

This works fine for simple cases, but breaks down for more complex queries:

Controls.Add(
    new Table().WithControls((
        from x in Xs
        join y in Ys on x.Key equals y.Key
        select new { x, y }
        ).Select((z, i) =>
            new TableRow() {
                CssClass = i % 2 == 0 ? "" : "alt"
            }.WithControls(
                new TableCell().WithControls(z.x.ValueX, z.y.ValueY)
            )
        )
    )
);

The Goal

Instead, I propose a simple extension method to retrieve an index at arbitrary points in a query:

var res = from x in data
          from i in x.GetIndex()
          select new { x, i };

Or our control examples:

Controls.Add(
    new Table().WithControls(
        from x in data
        from i in x.GetIndex()
        select new TableRow() {
            CssClass = i % 2 == 0 ? "" : "alt"
        }.WithControls(
            new TableCell().WithControls(x)
        )
    )
);

Controls.Add(
    new Table().WithControls(
        from x in Xs
        join y in Ys on x.Key equals y.Key
        from i in y.GetIndex()
        select new TableRow() {
            CssClass = i % 2 == 0 ? "" : "alt"
        }.WithControls(
            new TableCell().WithControls(x.ValueX, y.ValueY)
        )
    )
);

Much like in the IDisposable solution, we use a from clause to act as an intermediate assignment. But in this case our hack is a bit trickier than a simple iterator.

The Hack

For this solution we’re going to take advantage of how multiple from clauses are translated:

var res = data.SelectMany(x => x.GetIndex(), (x, i) => new { x, i });

Looking at the parameter list, we see that our collectionSelector should return the result of x.GetIndex() and our resultSelector‘s second argument needs to be an int:

public static IEnumerable<TResult> SelectMany<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, SelectIndexProvider> collectionSelector,
    Func<TSource, int, TResult> resultSelector)

The astute observer will notice that the signature of this resultSelector exactly matches the selector used by Select‘s with-index overload, trivializing the method implementation:

{
    return source.Select(resultSelector);
}

Note that we’re not even using collectionSelector! We’re just using its return type as a flag to force the compiler to use this version of SelectMany(). The rest of the pieces are incredibly simple now that we know the actual SelectIndexProvider value is never used:

public sealed class SelectIndexProvider
{
    private SelectIndexProvider() { }
}

public static SelectIndexProvider GetIndex<T>(this T element)
{
    return null;
}

And for good measure, an equivalent version to extend IQueryable<>:

public static IQueryable<TResult> SelectMany<TSource, TResult>(
    this IQueryable<TSource> source,
    Expression<Func<TSource, SelectIndexProvider>> collectionSelector,
    Expression<Func<TSource, int, TResult>> resultSelector)
{
    return source.Select(resultSelector);
}

Because we’re just calling Select(), the query expression isn’t even aware of the call to GetIndex():

System.Linq.Enumerable+<RangeIterator>d__b1.Select((x, i) => (x * i))

We’re essentially providing our own syntactic sugar over the sugar already provided by query expressions. Pretty sweet, eh?

As a final exercise for the reader, what would this print?

var res = from x in Enumerable.Range(1, 5)
          from i in x.GetIndex()
          from y in Enumerable.Repeat(i, x)
          where y % 2 == 1
          from j in 0.GetIndex()
          select i+j;

foreach (var r in res)
    Console.WriteLine(r);
Posted in .NET, LINQ. Tags: . 2 Comments »