Is Functional Abstraction Too Clever?

I received a rather interesting comment on a recent Stack Overflow answer:

This code seems too clever by half. Is it art? – PeterAllenWebb

The code in question was a functional solution to an algorithm described approximately as follows:

Draw n−1 numbers at random, in the range 1 to m−1. Add 0 and m to the list and order these numbers. The difference between each two consecutive numbers gives you a return value.

Which I solved like this, with n = slots and m = max:

static int[] GetSlots(int slots, int max)
{
    return new Random().Values(1, max)
                       .Take(slots - 1)
                       .Append(0, max)
                       .OrderBy(i => i)
                       .Pairwise((x, y) => y - x)
                       .ToArray();
}

Using a few extension methods:

  • Values() returns an infinite sequence of random values within the specified range.
  • Append() takes a params array and appends its arguments to the original sequence.
  • Pairwise() generates a sequence from calculations on pairs of consecutive elements in the original sequence.

I can see how one would think the code is clever; however, I’m not sure what would qualify it as too clever. Every method call has a well-defined purpose and maps directly to part of the original algorithm:

  1. From random numbers in the range 1 to m−1…
  2. …draw n−1.
  3. Add 0 and m to the list…
  4. …and order these numbers.
  5. The difference between each two consecutive numbers…
  6. …gives you a return value [in the array].

As far as I’m concerned, a solution couldn’t get much clearer than this, but that’s easy enough for me to say—what do you think? Is there a better way to express the algorithm? Would an imperative solution with shared state be more readable? How about maintainable?

For example, one could add the requirement that the random numbers not be repeated so that the difference between adjacent numbers is always nonzero. Updating the functional solution is as simple as adding a Distinct() call:

    return new Random().Values(1, max)
                       .Distinct()
                       .Take(slots - 1)
                       ...

To me, this is the value proposition of functional programming. By expressing the algorithm in terms of common operations, we’re able to spend more time thinking about the problem than the details of the solution. A similar change in an imperative implementation would almost certainly have been more involved and prone to error.

For completeness, here are the implementations of the extension methods used:

public static IEnumerable<int> Values(this Random random, int minValue, int maxValue)
{
    while (true)
        yield return random.Next(minValue, maxValue);
}

public static IEnumerable<TResult> Pairwise<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TSource, TResult> resultSelector)
{
    TSource previous = default(TSource);

    using (var it = source.GetEnumerator())
    {
        if (it.MoveNext())
            previous = it.Current;

        while (it.MoveNext())
            yield return resultSelector(previous, previous = it.Current);
    }
}

public static IEnumerable<T> Append<T>(this IEnumerable<T> source, params T[] args)
{
    return source.Concat(args);
}

This also reminds me that Jimmy posted a similar Append() method as part of his latest post on missing LINQ operators. I used to use a version similar to his, but have found the params version to be more flexible (and easier to implement). Its Prepend() counterpart is similarly trivial:

public static IEnumerable<T> Prepend<T>(this IEnumerable<T> source, params T[] args)
{
    return args.Concat(source);
}
Advertisement
Posted in .NET. Tags: . Comments Off on Is Functional Abstraction Too Clever?

Refactoring with Iterators: Prime Factors

Andrew Woodward recently posted a comparison of his test-driven Prime Factors solution to one written by Uncle Bob. In the comments, someone suggested that Andrew use an iterator instead so I thought I’d give it a try.

First, let’s repost the original code:

private const int SMALLEST_PRIME = 2;

public List<int> Generate(int i)
{
    List<int> primes = new List<int>();
    int divider = SMALLEST_PRIME;
    while (HasPrimes(i))
    {
        while (IsDivisable(i, divider))
        {
            i = AddPrimeToProductsAndReduce(i, primes, divider);
        }
        divider++;
    }
    return primes;
}

private bool IsDivisable(int i, int divider)
{
    return i%divider == 0;
}

private bool HasPrimes(int i)
{
    return i >= SMALLEST_PRIME;
}

private int AddPrimeToProductsAndReduce(int i, List<int> primes, int prime)
{
    primes.Add(prime);
    i /= prime;
    return i;
}

By switching our method to return IEnumerable<int>, we can replace the primes list with an iterator. We will also remove the AddPrimeToProducts functionality from that helper method since we don’t have the list any more:

public IEnumerable<int> Generate(int i)
{
    int divider = SMALLEST_PRIME;
    while (HasPrimes(i))
    {
        while (IsDivisable(i, divider))
        {
            yield return divider;
            i = Reduce(i, divider);
        }
        divider++;
    }
}

private int Reduce(int i, int prime)
{
    return i / prime;
}

I think this is a good change for three reasons:

  1. There’s nothing about the problem that requires a List<int> be returned, we just want a sequence of the factors.
  2. AddPrimeToProductsAndReduce suggested that it had a side effect, but exactly what wasn’t immediately obvious.
  3. It’s much easier to see what values are being included in the result.

That said, I think we can clean this up even more with a second iterator. Specifically, I think we should break out the logic for our candidate factors:

private IEnumerable<int> Divisors
{
    get
    {
        int x = SMALLEST_PRIME;
        while (true)
            yield return x++;
    }
}

Which allows us to separate the logic for generating a divider from the code that consumes it:

public IEnumerable<int> Generate(int toFactor)
{
    foreach (var divider in Divisors)
    {
        if (!HasPrimes(toFactor))
            break;

        while (IsDivisable(toFactor, divider))
        {
            yield return divider;
            toFactor = Reduce(toFactor, divider);
        }
    }
}

We should also eliminate the negation by flipping HasPrimes to become IsFactored:

public IEnumerable<int> Generate(int toFactor)
{
    foreach (var divider in Divisors)
    {
        if (IsFactored(toFactor))
            break;

        while (IsDivisable(toFactor, divider))
        {
            yield return divider;
            toFactor = Reduce(toFactor, divider);
        }
    }
}

private bool IsFactored(int i)
{
    return i <= 1;
}

This does introduce a (very) minor inefficiency in that the Divisors enumerator will MoveNext() one extra time before breaking out of the loop, which could be mitigated by checking IsFactored both before the foreach and after the while loop. Less readable, insignificantly more efficient…take your pick.

The other advantage to breaking out the logic to generate Divisors is that we can easily pick smarter candidates. One option is to skip even numbers greater than 2. An even better optimization takes advantage of the fact that all primes greater than 3 are of the form x±1 where x is a multiple of 6:

private IEnumerable<int> Divisors
{
    get
    {
        yield return 2;
        yield return 3;
        int i = 6;
        while (true)
        {
            yield return i - 1;
            yield return i + 1;
            i += 6;
        }
    }
}

Implementing this sort of logic in the original version would have been much more difficult, both in terms of correctness and readability.

Posted in .NET. Tags: , . Comments Off on Refactoring with Iterators: Prime Factors

Hacking LINQ Expressions: Join With Comparer

In this installment of my Hacking LINQ series we’ll take a look at providing an IEqualityComparer for use in a LINQ join clause.

The Problem

Many of the Standard Query Operators require comparing sequence elements and the default query providers are kind enough to give us overloads that accept a suitable comparer. Among these operators, Join and GroupJoin have perhaps the most useful query syntax:

 var res = from s in States
          join a in AreaCodes
            on s.Abbr equals a.StateAbbr
          select new { s.Name, a.AreaCode };

While a bit more verbose, I find the intent much easier to read then the method equivalent:

var res = States.Join(AreaCodes,
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode });

Or maybe I’ve just spent too much time in SQL. Either way, I thought it would be useful to support joins by a comparer.

The Goal

We will use another extension method to specify how the join should be performed:

var res = from s in States
          join a in AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase)
            on s.Abbr equals a.StateAbbr
          select new { s.Name, a.AreaCode };

We can also support the same syntax for group joins:

var res = from s in States
          join a in AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase)
            on s.Abbr equals a.StateAbbr into j
          select new { s.Name, Count = j.Count() };

The Hack

As with most LINQ hacks, we’re going to use the result of WithComparer to call a specialized version of Join or GroupJoin, in this case by providing a replacement for the join’s inner sequence:

var res = States.Join(AreaCodes.WithComparer(StringComparer.OrdinalIgnoreCase),
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode });

Eventually leading to this method call:

var res = States.Join(AreaCodes,
                      s => s.Abbr, a => a.StateAbbr,
                      (s, a) => new { s.Name, a.AreaCode },
                      StringComparer.OrdinalIgnoreCase);

Since we need both the inner collection we’re extending and the comparer, we can guess our extension method will be implemented something like this:

public static JoinComparerProvider<T, TKey> WithComparer<T, TKey>(
    this IEnumerable<T> inner, IEqualityComparer<TKey> comparer)
{
    return new JoinComparerProvider<T, TKey>(inner, comparer);
}

With a trivial provider implementation:

public sealed class JoinComparerProvider<T, TKey>
{
    internal JoinComparerProvider(IEnumerable<T> inner, IEqualityComparer<TKey> comparer)
    {
        Inner = inner;
        Comparer = comparer;
    }

    public IEqualityComparer<TKey> Comparer { get; private set; }
    public IEnumerable<T> Inner { get; private set; }
}

The final piece is our Join overload:

public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    JoinComparerProvider<TInner, TKey> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, TInner, TResult> resultSelector)
{
    return outer.Join(inner.Inner, outerKeySelector, innerKeySelector,
                      resultSelector, inner.Comparer);
}

Implementations of GroupJoin and their IQueryable counterparts are similarly trivial.

Posted in .NET, LINQ. Tags: , , . Comments Off on Hacking LINQ Expressions: Join With Comparer

Hacking LINQ Expressions: Select With Index

First, a point of clarification: I use LINQ Expressions to mean (Language-INtegrated) Query Expressions (the language feature) rather than Expression Trees (the .NET 3.5 library in System.Linq.Expressions).

So what do I mean by “Hacking LINQ Expressions”? Quite simply, I’m not content with the rather limited set of operations that query expressions allow me to represent. By understanding how queries are translated, we can use various techniques to broaden our expressive reach. I have already documented one such hack for managing IDisposable objects with LINQ, so I guess we can call this the second in an unbounded series.

The Problem

In thinking over use cases for functional construction of web control trees, I paused to think through how I would express alternate row styling. My mind immediately jumped to the overload of Select() that exposes the current element’s index:

Controls.Add(
    new Table().WithControls(
        data.Select((x, i) =>
            new TableRow() {
                CssClass = i % 2 == 0 ? "" : "alt"
            }.WithControls(
                new TableCell().WithControls(x)
            )
        )
    )
);

This works fine for simple cases, but breaks down for more complex queries:

Controls.Add(
    new Table().WithControls((
        from x in Xs
        join y in Ys on x.Key equals y.Key
        select new { x, y }
        ).Select((z, i) =>
            new TableRow() {
                CssClass = i % 2 == 0 ? "" : "alt"
            }.WithControls(
                new TableCell().WithControls(z.x.ValueX, z.y.ValueY)
            )
        )
    )
);

The Goal

Instead, I propose a simple extension method to retrieve an index at arbitrary points in a query:

var res = from x in data
          from i in x.GetIndex()
          select new { x, i };

Or our control examples:

Controls.Add(
    new Table().WithControls(
        from x in data
        from i in x.GetIndex()
        select new TableRow() {
            CssClass = i % 2 == 0 ? "" : "alt"
        }.WithControls(
            new TableCell().WithControls(x)
        )
    )
);

Controls.Add(
    new Table().WithControls(
        from x in Xs
        join y in Ys on x.Key equals y.Key
        from i in y.GetIndex()
        select new TableRow() {
            CssClass = i % 2 == 0 ? "" : "alt"
        }.WithControls(
            new TableCell().WithControls(x.ValueX, y.ValueY)
        )
    )
);

Much like in the IDisposable solution, we use a from clause to act as an intermediate assignment. But in this case our hack is a bit trickier than a simple iterator.

The Hack

For this solution we’re going to take advantage of how multiple from clauses are translated:

var res = data.SelectMany(x => x.GetIndex(), (x, i) => new { x, i });

Looking at the parameter list, we see that our collectionSelector should return the result of x.GetIndex() and our resultSelector‘s second argument needs to be an int:

public static IEnumerable<TResult> SelectMany<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, SelectIndexProvider> collectionSelector,
    Func<TSource, int, TResult> resultSelector)

The astute observer will notice that the signature of this resultSelector exactly matches the selector used by Select‘s with-index overload, trivializing the method implementation:

{
    return source.Select(resultSelector);
}

Note that we’re not even using collectionSelector! We’re just using its return type as a flag to force the compiler to use this version of SelectMany(). The rest of the pieces are incredibly simple now that we know the actual SelectIndexProvider value is never used:

public sealed class SelectIndexProvider
{
    private SelectIndexProvider() { }
}

public static SelectIndexProvider GetIndex<T>(this T element)
{
    return null;
}

And for good measure, an equivalent version to extend IQueryable<>:

public static IQueryable<TResult> SelectMany<TSource, TResult>(
    this IQueryable<TSource> source,
    Expression<Func<TSource, SelectIndexProvider>> collectionSelector,
    Expression<Func<TSource, int, TResult>> resultSelector)
{
    return source.Select(resultSelector);
}

Because we’re just calling Select(), the query expression isn’t even aware of the call to GetIndex():

System.Linq.Enumerable+<RangeIterator>d__b1.Select((x, i) => (x * i))

We’re essentially providing our own syntactic sugar over the sugar already provided by query expressions. Pretty sweet, eh?

As a final exercise for the reader, what would this print?

var res = from x in Enumerable.Range(1, 5)
          from i in x.GetIndex()
          from y in Enumerable.Repeat(i, x)
          where y % 2 == 1
          from j in 0.GetIndex()
          select i+j;

foreach (var r in res)
    Console.WriteLine(r);
Posted in .NET, LINQ. Tags: . 2 Comments »

Functional Construction for ASP.NET Web Forms

System.Xml.Linq (a.k.a. LINQ to XML) introduces a nifty approach to creating XML elements called functional construction. I’m not entirely sure why they call it functional given that constructing an object graph is a decidedly non-functional task in the traditional sense of the word, but I digress.

Functional construction has three key features:

  1. Constructors accept arguments of various types, handling them appropriately.
  2. Constructors accept a params array of type Object to enable creation of complex objects.
  3. If an argument implements IEnumerable, the objects within the sequence are added.

If you haven’t seen it in action, I encourage you to take a look at the examples on MSDN and elsewhere—it really is pretty slick. This post will show how a similar technique can be used to build control trees in ASP.NET web forms (and probably WinForms with minimal adjustment).

Basic functional construction can be implemented using two relatively simple extension methods:

public static void Add(this ControlCollection @this, object content)
{
    if (content is Control)
        @this.Add((Control)content);
    else if (content is IEnumerable)
        foreach (object c in (IEnumerable)content)
            @this.Add(c);
    else if (content != null)
        @this.Add(new LiteralControl(content.ToString()));
}

public static void Add(this ControlCollection @this, params object[] args)
{
    @this.Add((IEnumerable)args);
}

We handle four cases:

  1. Control? Add it.
  2. Sequence? Add each.
  3. Other value? Add literal.
  4. Null? Ignore.

And our params overload just calls its arguments a sequence and defers to the other.

In the time-honored tradition of contrived examples:

Controls.Add(
    new Label() { Text = "Nums:" },
    "&nbsp;",
    from i in Enumerable.Range(1, 6)
    group i by i % 2
);

This would render “Nums: 135246”. Note that the result of that LINQ expression is a sequence of sequences, which is flattened automatically and converted into literals. For comparison, here’s an equivalent set of statements:

Controls.Add(new Label() { Text = "Nums:" });
Controls.Add(new LiteralControl("&nbsp;"));
foreach (var g in from i in Enumerable.Range(1, 6)
                  group i by i % 2)
    foreach (var i in g)
        Controls.Add(new LiteralControl(i.ToString()));

Hopefully seeing them side by side makes it clear why this new method of construction might have merit. But we’re not done yet.

Expressions, Expressions, Expressions

Many language features introduced in C# 3.0 and Visual Basic 9 make expressions increasingly important. By expressions I mean a single “line” of code that returns a value. For example, an object initializer is a single expression…

var tb = new TextBox()
{
    ID = "textBox1",
    Text = "Text"
};

… that represents several statements …

var tb = new TextBox()
tb.ID = "textBox1";
tb.Text = "Text";

That single TextBox expression can then be used in a number of places that its statement equivalent can’t: in another object initializer, in a collection initializer, as a parameter to a method, in a .NET 3.5 expression tree, the list goes on. Unfortunately, many older APIs simply aren’t built to work in an expression-based world. In particular, initializing subcollections is a considerable pain. However, we can extend the API to handle this nicely:

public static T WithControls<T>(this T @this, params object[] content) where T : Control
{
    if(@this != null)
        @this.Controls.Add(content);
    return @this;
}

The key is the return value: Control in, Control out. We can now construct and populate a container control with a single expression. For example, we could build a dictionary list (remember those?) from our groups:

Controls.Add(
    new HtmlGenericControl("dl")
    .WithControls(
        from i in Enumerable.Range(1, 6)
        group i by i % 2 into g
        select new [] {
            new HtmlGenericControl("dt")
            { InnerText = g.Key == 0 ? "Even" : "Odd" },
            new HtmlGenericControl("dd")
            .WithControls(g)
        }
    )
);

Which would render this:

Odd
135
Even
246

Without the ability to add controls within an expression, this result would require nested loops with local variables to store references to the containers. The actual code produced by the compiler would be nearly identical, but I find the expressions much easier to work with. Similarly, we can easily populate tables. Let’s build a cell per number:

Controls.Add(
    new Table().WithControls(
        from i in Enumerable.Range(1, 6)
        group i by i % 2 into g
        select new TableRow().WithControls(
            new TableCell()
            { Text = g.Key == 0 ? "Even" : "Odd" },
            g.Select(n => new TableCell().WithControls(n))
        )
    )
);

In a future post I’ll look at some other extensions we can use to streamline the construction and initialization of control hierarchies.

Simplifying LazyLinq

This is the fourth in a series of posts on LazyLinq, a wrapper to support lazy initialization and deferred disposal of a LINQ query context:

  1. Introducing LazyLinq: Overview
  2. Introducing LazyLinq: Internals
  3. Introducing LazyLinq: Queryability
  4. Simplifying LazyLinq
  5. Introducing LazyLinq: Lazy DataContext

As I was iterating on the proof of concept for LazyLinq, I always wanted to get rid of the TQuery type parameter. I thought I needed it to distinguish between ordered and unordered wrapped queries, but it just felt messy. The underlying provider mechanism didn’t need it, so why should I?

Well after taking a closer look at the SQL query provider, I figured out how to eliminate it. The object of inspiration is System.Data.Linq.DataQuery<T>, defined as follows:

internal sealed class DataQuery<T> :
    IOrderedQueryable<T>, IQueryable<T>,
    IOrderedQueryable, IQueryable,
    IEnumerable<T>, IEnumerable,
    IQueryProvider,
    IListSource

The key was realizing that IOrderedQueryable<> and ILazyOrderedQueryable<> don’t actually do anything. Implementation-wise, they’re just IQueryable<> or ILazyQueryable<> with an extra interface on top. It’s only on the design side that it actually matters, essentially providing a hook for additional ordering with ThenBy. In LINQ to SQL’s case, that means supporting orderability is as simple as specifying that the query object is both IQueryable<> and IOrderedQueryable<>.

So how does this revelation simplify Lazy LINQ? First, it allows us to remove TQuery from the interfaces:

    public interface ILazyContext<TContext> : IDisposable
    {
        TContext Context { get; }
        ILazyQueryable<TContext, TResult>
            CreateQuery<TResult>(Func<TContext, IQueryable<TResult>> queryBuilder);
        TResult Execute<TResult>(Func<TContext, TResult> action);
    }

    public interface ILazyQueryable<TContext, TSource> : IQueryable<TSource>
    {
        ILazyContext<TContext> Context { get; }
        Func<TContext, IQueryable<TSource>> QueryBuilder { get; }
    }

    public interface ILazyOrderedQueryable<TContext, TSource>
        : ILazyQueryable<TContext, TSource>, IOrderedQueryable<TSource>
    { }

Note that we can also eliminate ILazyContext.CreateOrderedQuery(), instead assuming that CreateQuery() will return something that can be treated as ILazyOrderedQueryable<> as necessary.

For the concrete implementations, we take the cue from DataQuery<T>, letting LazyQueryableImpl implement ILazyOrderedQueryable<> so we can eliminate LazyOrderedQueryableImpl:

    class LazyQueryableImpl<TContext, TSource>
        : ILazyQueryable<TContext, TSource>, ILazyOrderedQueryable<TContext, TSource>
    {
        // Implementation doesn't change
    }

Finally, our sorting query operations will look more like their counterparts in System.Linq.Queryable, casting the result of CreateQuery() to ILazyOrderedQueryable<>. To keep things readable, we’ll split our CreateOrderedQuery<> helper into separate versions for OrderBy and ThenBy. Note how the types of queryOperation map to the usage of OrderBy (unordered to ordered) and ThenBy (ordered to ordered):

        private static ILazyOrderedQueryable<TContext, TResult>
            CreateOrderByQuery<TSource, TContext, TResult>(
                this ILazyQueryable<TContext, TSource> source,
                Func<IQueryable<TSource>, IOrderedQueryable<TResult>> queryOperation
            )
        {
            return (ILazyOrderedQueryable<TContext, TResult>) source.Context.CreateQuery<TResult>(
               context => queryOperation(source.QueryBuilder(context)));
        }

        private static ILazyOrderedQueryable<TContext, TResult>
            CreateThenByQuery<TSource, TContext, TResult>(
                this ILazyQueryable<TContext, TSource> source,
                Func<IOrderedQueryable<TSource>, IOrderedQueryable<TResult>> queryOperation
            )
        {
            return (ILazyOrderedQueryable<TContext, TResult>) source.Context.CreateQuery<TResult>(
               context => queryOperation((IOrderedQueryable<TSource>) source.QueryBuilder(context)));
        }

Removing TQuery from the query operators is left as an exercise for the reader. Or you can just get the source on CodePlex.

A Note on IOrderedEnumerable<>

Having taken advantage of how LINQ to IQueryable handles orderability, it’s worth pointing out that LINQ to Objects uses a different approach, specifying new behavior in IOrderedEnumerable<> that is used to support multiple sort criteria:

public interface IOrderedEnumerable<TElement> : IEnumerable<TElement>, IEnumerable
{
    IOrderedEnumerable<TElement>
        CreateOrderedEnumerable<TKey>(
            Func<TElement, TKey> keySelector,
            IComparer<TKey> comparer, bool descending);
}

Introducing LazyLinq: Queryability

This is the third in a series of posts on LazyLinq, a wrapper to support lazy initialization and deferred disposal of a LINQ query context:

  1. Introducing LazyLinq: Overview
  2. Introducing LazyLinq: Internals
  3. Introducing LazyLinq: Queryability
  4. Simplifying LazyLinq
  5. Introducing LazyLinq: Lazy DataContext

Having defined the LazyLinq interfaces and provided concrete implementations, we’re left to provide support for the standard query operators.

Learning from Queryable

Before we try to query ILazyQueryable, it’s instructive to look at how System.Linq.Queryable works. There are essentially three types of operators on IQueryable<>:

  • Deferred queries returning IQueryable<>: Select, Where, etc.
  • Deferred query returning IOrderedQueryable<>: OrderBy, OrderByDescending, ThenBy, ThenByDescending
  • Everything else: Aggregate, Count, First, etc.

Reflecting Queryable.Select, modulo error checking, we see the following:

public static IQueryable<TResult> Select<TSource, TResult>(
    this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector)
{
    return source.Provider
        .CreateQuery<TResult>(
            Expression.Call(null,
                ((MethodInfo) MethodBase.GetCurrentMethod())
                    .MakeGenericMethod(new Type[] { typeof(TSource), typeof(TResult) }),
                new Expression[] { source.Expression, Expression.Quote(selector) }));
}

The source‘s Provider is used to construct a new query whose expression includes the call to Select with the given parameters. An ordered query follows a similar pattern, trusting that the query provider will return an IOrderedQueryable<> as appropriate:

public static IOrderedQueryable<TSource> OrderBy<TSource, TKey>(
    this IQueryable<TSource> source, Expression<Func<TSource, TKey>> keySelector)
{
    return (IOrderedQueryable<TSource>) source.Provider
        .CreateQuery<TSource>(
            Expression.Call(null,
                ((MethodInfo) MethodBase.GetCurrentMethod())
                    .MakeGenericMethod(new Type[] { typeof(TSource), typeof(TKey) }),
                new Expression[] { source.Expression, Expression.Quote(keySelector) }));
}

And finally, everything that’s not a query is handled by the provider’s Execute method:

public static TSource First<TSource>(this IQueryable<TSource> source)
{
    return source.Provider
        .Execute<TSource>(
            Expression.Call(null,
                ((MethodInfo) MethodBase.GetCurrentMethod())
                    .MakeGenericMethod(new Type[] { typeof(TSource) }),
                new Expression[] { source.Expression }));
}

Querying ILazyQueryable

You may have noticed that the above scenarios map rather closely to the methods provided by ILazyContext:

  • ILazyQueryable<TContext, TResult, TQuery>
        CreateQuery<TResult, TQuery>(Func<TContext, TQuery> queryBuilder)
        where TQuery : IQueryable<TResult>;
  • ILazyOrderedQueryable<TContext, TResult, TQuery>
        CreateOrderedQuery<TResult, TQuery>(Func<TContext, TQuery> queryBuilder)
        where TQuery : IOrderedQueryable<TResult>;
  • TResult Execute<TResult>(Func<TContext, TResult> action);

However, rather than expression trees we’re pushing around delegates. Execute seems pretty simple, so let’s start there:

public static TSource First<TSource, TContext, TQuery>(
        this ILazyQueryable<TContext, TSource, TQuery> source
    ) where TQuery : IQueryable<TSource>
{
    Func<TContext, TResult> action = context => ???;
    return source.Context.Execute(action);
}

So our source has a Context, which knows how to Execute an action from context to some result. To find that result, we need to leverage the other property of source: QueryBuilder. Recalling that QueryBuilder is a function from TContext to TQuery, and that TQuery is an IQueryable<TSource>, we see something on which we can execute Queryable.First:

public static TSource First<TSource, TContext, TQuery>(
        this ILazyQueryable<TContext, TSource, TQuery> source
    ) where TQuery : IQueryable<TSource>
{
    Func<TContext, TResult> action =
        context => source.QueryBuilder(context).First();
    return source.Context.Execute(action);
}

Now seeing as we have dozens of methods to implement like this, it seems an opportune time for a bit of eager refactoring. Recognizing that the only variance is the method call on the IQueryable, let’s extract an extension method that does everything else:

private static TResult Execute<TSource, TContext, TResult, TQuery>(
        this ILazyQueryable<TContext, TSource, TQuery> source,
        Func<TQuery, TResult> queryOperation
    ) where TQuery : IQueryable<TSource>
{
    return source.Context.Execute(
        context => queryOperation(source.QueryBuilder(context)));
}

From there, additional lazy operators are just a lambda expression away:

public static TSource First<TSource, TContext, TQuery>(
        this ILazyQueryable<TContext, TSource, TQuery> source
    ) where TQuery : IQueryable<TSource>
{
    return source.Execute(q => q.First());
}

public static TAccumulate Aggregate<TContext, TSource, TQuery, TAccumulate>(
        this ILazyQueryable<TContext, TSource, TQuery> source,
        TAccumulate seed, Expression<Func<TAccumulate, TSource, TAccumulate>> func
    ) where TQuery : IQueryable<TSource>
{
    return source.Execute(q => q.Aggregate(seed, func));
}

And now having done the hard part (finding an IQueryable), we can translate that understanding to make similar helpers for queries:

private static ILazyQueryable<TContext, TResult, IQueryable<TResult>>
    CreateQuery<TSource, TContext, TQuery, TResult>(
        this ILazyQueryable<TContext, TSource, TQuery> source,
        Func<TQuery, IQueryable<TResult>> queryOperation
    ) where TQuery : IQueryable<TSource>
{
    return source.Context.CreateQuery<TResult, IQueryable<TResult>>(
        context => queryOperation(source.QueryBuilder(context)));
}

private static ILazyOrderedQueryable<TContext, TResult, IOrderedQueryable<TResult>>
    CreateOrderedQuery<TSource, TContext, TQuery, TResult>(
        this ILazyQueryable<TContext, TSource, TQuery> source,
        Func<TQuery, IOrderedQueryable<TResult>> queryOperation
    ) where TQuery : IQueryable<TSource>
{
    return source.Context.CreateOrderedQuery<TResult, IOrderedQueryable<TResult>>(
        context => queryOperation(source.QueryBuilder(context)));
}

With similarly trivial query operator implementations:

public static ILazyQueryable<TContext, TResult, IQueryable<TResult>>
    Select<TContext, TSource, TQuery, TResult>(
        this ILazyQueryable<TContext, TSource, TQuery> source,
        Expression<Func<TSource, TResult>> selector
    ) where TQuery : IQueryable<TSource>
{
    return source.CreateQuery(q => q.Select(selector));
}

public static ILazyOrderedQueryable<TContext, TSource, IOrderedQueryable<TSource>>
    OrderBy<TContext, TSource, TQuery, TKey>(
        this ILazyQueryable<TContext, TSource, TQuery> source,
        Expression<Func<TSource, TKey>> keySelector
    ) where TQuery : IQueryable<TSource>
{
    return source.CreateOrderedQuery(q => q.OrderBy(keySelector));
}

And the end result:

LazyLinq.First

Introducing LazyLinq: Internals

This is the second in a series of posts on LazyLinq, a wrapper to support lazy initialization and deferred disposal of a LINQ query context:

  1. Introducing LazyLinq: Overview
  2. Introducing LazyLinq: Internals
  3. Introducing LazyLinq: Queryability
  4. Simplifying LazyLinq
  5. Introducing LazyLinq: Lazy DataContext

My first post introduced the three interfaces that LazyLinq provides. Next, we get to implement them.

Implementing ILazyQueryable

First, the interface:

    public interface ILazyQueryable<TContext, TSource, TQuery>
        : IQueryable<TSource>
        where TQuery : IQueryable<TSource>
    {
        ILazyContext<TContext> Context { get; }
        Func<TContext, TQuery> QueryBuilder { get; }
    }

We’ll start simple with an implicit implementation of the interface and a trivial constructor:

    class LazyQueryableImpl<TContext, TSource, TQuery>
            : ILazyQueryable<TContext, TSource, TQuery>
            where TQuery : IQueryable<TSource>
    {
            public ILazyContext<TContext> Context { get; private set; }
            public Func<TContext, TQuery> QueryBuilder { get; private set; }

            internal LazyQueryableImpl(ILazyContext<TContext> deferredContext, Func<TContext, TQuery> queryBuilder)
            {
                if (deferredContext == null) throw new ArgumentNullException("deferredContext");
                if (queryBuilder == null) throw new ArgumentNullException("queryBuilder");

                Context = deferredContext;
                QueryBuilder = queryBuilder;
            }

Next, a lazy-loaded query built from our lazy context:

            protected TQuery Query
            {
                get
                {
                    if (query == null)
                    {
                        query = QueryBuilder(Context.Context);
                        if (query == null)
                            throw new InvalidOperationException("Query built as null.");
                    }
                    return query;
                }
            }

And the internals of managing Context, which implements IDisposable:

            private void Dispose()
            {
                Context.Dispose();
                query = default(TQuery);
            }

            private IEnumerator<TSource> GetEnumerator()
            {
                try
                {
                    foreach (var i in Query)
                        yield return i;
                }
                finally
                {
                    Dispose();
                }
            }

Since Query depends on Context, once Context is disposed we need to reset Query so a new one can be built (if possible). Note that we use an iterator here to return an IEnumerator<TSource>, rather than the usual IEnumerable<>.

Finally, we’ll close out by explicitly implementing IQueryable:

            IEnumerator<TSource> IEnumerable<TSource>.GetEnumerator()
            {
                return GetEnumerator();
            }

            IEnumerator IEnumerable.GetEnumerator()
            {
                return GetEnumerator();
            }

            Type IQueryable.ElementType
            {
                get { return Query.ElementType; }
            }

            Expression IQueryable.Expression
            {
                get { return Query.Expression; }
            }

            IQueryProvider IQueryable.Provider
            {
                get { return Query.Provider; }
            }
        }

If this seemed relatively simple, you’re right. We’re just building a lazy-loaded Query proxy, with a bit of plumbing to clean up our Context.

Implementing ILazyOrderedQueryable

Not very exciting, but for completeness:

        public interface ILazyOrderedQueryable<TContext, TSource, TQuery>
            : ILazyQueryable<TContext, TSource, TQuery>, IOrderedQueryable<TSource>
            where TQuery : IOrderedQueryable<TSource>
        { }

        class LazyOrderedQueryableImpl<TContext, TSource, TQuery>
                : LazyQueryableImpl<TContext, TSource, TQuery>, ILazyOrderedQueryable<TContext, TSource, TQuery>
                where TQuery : IOrderedQueryable<TSource>
        {
            internal LazyOrderedQueryableImpl(ILazyContext<TContext> lazyContext, Func<TContext, TQuery> queryBuilder)
                : base(lazyContext, queryBuilder)
            {
            }
        }

LazyQueryable Factory

Consumers of this API should never need to know about these implementation details, so we can hide them behind a factory class:

        public static class LazyQueryable
        {
            public static ILazyQueryable<TContext, TResult, TQuery> CreateQuery<TContext, TResult, TQuery>(
                ILazyContext<TContext> context, Func<TContext, TQuery> queryBuilder)
                where TQuery : IQueryable<TResult>
            {
                return new LazyQueryableImpl<TContext, TResult, TQuery>(context, queryBuilder);
            }
            public static ILazyOrderedQueryable<TContext, TResult, TQuery> CreateOrderedQuery<TContext, TResult, TQuery>(
                ILazyContext<TContext> context, Func<TContext, TQuery> queryBuilder)
                where TQuery : IOrderedQueryable<TResult>
            {
                return new LazyOrderedQueryableImpl<TContext, TResult, TQuery>(context, queryBuilder);
            }
        }

Implementing ILazyContext

Again, we’ll start with the interface:

        public interface ILazyContext<TContext> : IDisposable
        {
            TContext Context { get; }

            ILazyQueryable<TContext, TResult, TQuery>
                CreateQuery<TResult, TQuery>(Func<TContext, TQuery> queryBuilder)
                where TQuery : IQueryable<TResult>;

            ILazyOrderedQueryable<TContext, TResult, TQuery>
                CreateOrderedQuery<TResult, TQuery>(Func<TContext, TQuery> queryBuilder)
                where TQuery : IOrderedQueryable<TResult>;

            TResult Execute<TResult>(Func<TContext, TResult> action);
        }

Now we can start fulfilling our requirements:

1. Lazily expose the Context.

        class LazyContextImpl<TContext> : ILazyContext<TContext>, IDisposable
        {
            public Func<TContext> ContextBuilder { get; private set; }

            private TContext context;
            public TContext Context
            {
                get
                {
                    if (context == null)
                    {
                        context = ContextBuilder();
                        if (context == null)
                            throw new InvalidOperationException("Context built as null.");
                    }
                    return context;
                }
            }

2. Produce lazy wrappers to represent queries retrieved from a context by a delegate.

            public ILazyQueryable<TContext, TResult, TQuery> CreateQuery<TResult, TQuery>(
                Func<TContext, TQuery> queryBuilder)
                where TQuery : IQueryable<TResult>
            {
                return LazyQueryable.CreateQuery<TContext, TResult, TQuery>(this, queryBuilder);
            }

            public ILazyOrderedQueryable<TContext, TResult, TQuery> CreateOrderedQuery<TResult, TQuery>(
                Func<TContext, TQuery> queryBuilder)
                where TQuery : IOrderedQueryable<TResult>
            {
                return LazyQueryable.CreateOrderedQuery<TContext, TResult, TQuery>(this, queryBuilder);
            }

3. Execute an action on the context.

There are two ways to “complete” a query, and we need to clean up context after each. The first was after enumeration, implemented above. The second is on execute, implemented here:

            public TResult Execute<TResult>(Func<TContext, TResult> expression)
            {
                try
                {
                    return expression(Context);
                }
                finally
                {
                    Dispose();
                }
            }

4. Ensure the context is disposed as necessary.

We don’t require that TContext is IDisposable, but we need to handle if it is. We also clear context to support reuse.

            public void Dispose()
            {
                var disposable = context as IDisposable;
                if (disposable != null)
                    disposable.Dispose();

                context = default(TContext);
            }

Constructors

With our requirements met, we just need a way to create our context. We provide two options:

            internal LazyContextImpl(TContext context) : this(() => context) { }

            internal LazyContextImpl(Func<TContext> contextBuilder)
            {
                if (contextBuilder == null) throw new ArgumentNullException("contextBuilder");

                ContextBuilder = contextBuilder;
            }

The former wraps an existing TContext instance in a closure, meaning every time ContextBuilder is called it returns the same instance. The latter accepts any delegate that returns a TContext. The most common such delegate would be a simple instantiation: () => new MyDataContext().

It should be clear now why we would want to clear our context on dispose. If ContextBuilder returns a new context instance each time, it’s perfectly safe to discard of the old (disposed) context to trigger the creation of a new one. Conversely, if the builder returns a single instance, using the context after disposal would trigger an ObjectDisposedException or something similar.

LazyContext Factory

For consistency, we should also provide factory methods to hide this specific implementation:

            public static class LazyContext
            {
                public static ILazyContext<T> Create<T>(T context)
                {
                    return new LazyContextImpl<T>(context);
                }
                public static ILazyContext<T> Create<T>(Func<T> contextBuilder)
                {
                    return new LazyContextImpl<T>(contextBuilder);
                }
            }

Lazy Extensions

And last, but certainly not least, we’re ready to reimplement our Use() extension methods:

    public static class Lazy
    {
        public static ILazyContext<TContext> Use<TContext>(this TContext @this)
        {
            return LazyContext.Create<TContext>(@this);
        }

        public static ILazyContext<TContext> Use<TContext>(this Func<TContext> @this)
        {
            return LazyContext.Create<TContext>(@this);
        }
    }

With several usage possibilities:

    var r1 = from x in new MyDataContext().Use() ...;

    Func<MyDataContext> f1 = () => new MyDataContext();
    var r2 = from x in f1.Use() ...;

    var r3 = from x in new Func<MyDataContext>(() => new MyDataContext()).Use() ...;

    var r4 = from x in Lazy.Use(() => new MyDataContext()) ...;

Or maybe we can make it even easier. Maybe…

Introducing LazyLinq: Overview

This is the first in a series of posts on LazyLinq, a wrapper to support lazy initialization and deferred disposal of a LINQ query context, including LINQ to SQL’s DataContext:

  1. Introducing LazyLinq: Overview
  2. Introducing LazyLinq: Internals
  3. Introducing LazyLinq: Queryability
  4. Simplifying LazyLinq
  5. Introducing LazyLinq: Lazy DataContext

Motivation

I recently posted an approach to dealing with IDisposable objects and LINQ. In the comments at LosTechies, Steve Gentile mentioned that my IQueryable example didn’t actually compile:

IQueryable<MyType> MyFunc(string myValue)
{
    return from dc in new MyDataContext().Use()
           from row in dc.MyTable
           where row.MyField == myValue
           select row;
}

Steve suggested using AsQueryable() on the result of the query, which does indeed fix the build. However, the purpose of returning an IQueryable is that it would allow us to perform additional query operations using the original query provider. Since the query result isn’t IQueryable, AsQueryable() will use a query provider based on LINQ to Objects, with an additional performance penalty to compile the expression trees into IL.

Even worse, because Use() is returning an IEnumerable<T> the entire query is actually executed with LINQ to Objects. Even though dc.MyTable is IQueryable, the translated query treats it as a simple IEnumerable, essentially performing a SELECT * before executing all query operations on entity objects in memory. It should go without saying that this is less than ideal.

Introducing LazyLinq

After several iterations, I think I have a better solution. In this post I’ll review the architecture of the solution, with posts to follow detailing the implementation.

LazyLinq is implemented around three interfaces. The first serves as a deferred query context provider:

    public interface ILazyContext<TContext> : IDisposable
    {
        TContext Context { get; }

        ILazyQueryable<TContext, TResult, TQuery>
            CreateQuery<TResult, TQuery>(Func<TContext, TQuery> queryBuilder)
            where TQuery : IQueryable<TResult>;

        ILazyOrderedQueryable<TContext, TResult, TQuery>
            CreateOrderedQuery<TResult, TQuery>(Func<TContext, TQuery> queryBuilder)
            where TQuery : IOrderedQueryable<TResult>;

        TResult Execute<TResult>(Func<TContext, TResult> action);
    }

An implementer of ILazyContext has four responsibilities:

  1. Lazily expose the Context.
  2. Produce lazy wrappers to represent queries retrieved from a context by a delegate.
  3. Execute an action on the context.
  4. Ensure the context is disposed as necessary.

The remaining interfaces serve as lazy query wrappers, corresponding to IQueryable<T> and IOrderedQueryable<T>:

    public interface ILazyQueryable<TContext, TSource, TQuery>
        : IQueryable<TSource>
        where TQuery : IQueryable<TSource>
    {
        ILazyContext<TContext> Context { get; }
        Func<TContext, TQuery> QueryBuilder { get; }
    }
    public interface ILazyOrderedQueryable<TContext, TSource, TQuery>
        : ILazyQueryable<TContext, TSource, TQuery>, IOrderedQueryable<TSource>
        where TQuery : IOrderedQueryable<TSource>
    { }

An implementer of ILazyQueryable has four responsibilities:

  1. Expose the Context from which it was created.
  2. Expose a delegate that represents how the deferred query is built from Context.
  3. Implement IQueryable for the deferred query.
  4. Ensure the context is disposed after the query is enumerated.

If it seems like these interfaces don’t do much, you’re absolutely correct. As we’ll see later, the light footprint gives us considerable flexibility.

LINQ to ILazyContext

Defining a few interfaces is all well and good, but the real goal is to simplify working with our disposable context. What if I told you that our original use case didn’t need to change at all (other than the lazy return type)?

ILazyQueryable<MyType> MyFunc(string myValue)
{
    return from dc in new MyDataContext().Use()
           from row in dc.MyTable
           where row.MyField == myValue
           select row;
}

We can’t implement it yet, but our new Use() extension method will have this signature:

    public static ILazyContext<TContext> Use<TContext>(this TContext @this) { ... }

This is where we really start to bend LINQ against its will. As the first step in the query translation process, the compiler will translate our from clauses into a call to SelectMany. All we need to do is provide a SelectMany method for ILazyContext that the compiler will find acceptable:

    public static ILazyQueryable<TContext, TResult, IQueryable<TResult>> SelectMany<TContext, TCollection, TResult>(
        this ILazyContext<TContext> lazyContext,
        Expression<Func<TContext, IQueryable<TCollection>>> collectionSelector,
        Expression<Func<TContext, TCollection, TResult>> resultSelector)
    {

The method signature is a slight variation from the corresponding overload of Queryable.SelectMany(), changed to require that collectionSelector returns an IQueryable that we can defer. If it doesn’t, the compiler will complain:

An expression of type ‘System.Collections.Generic.IEnumerable<int>’ is not allowed in a subsequent from clause in a query expression with source type ‘Solutionizing.Linq.Test.MyDataContext<Solutionizing.Linq.Test.MyDataContext>’. Type inference failed in the call to ‘SelectMany’.

Now that we’ve hijacked the query, we can control the rest of the translation process with the returned ILazyQueryable. Recalling that our ILazyContext knows how to make an ILazyQueryable, we just need to give it a QueryBuilder delegate:

        return lazyContext.CreateQuery<TResult, IQueryable<TResult>>(context =>
            {
                Func<TContext, IQueryable<TCollection>> getQueryFromContext = collectionSelector.Compile();
                IQueryable<TCollection> query = getQueryFromContext(context);

                ParameterExpression rangeParameter = resultSelector.Parameters[1];
                InvocationExpression invoke = Expression.Invoke(resultSelector, Expression.Constant(context), rangeParameter);
                Expression<Func<TCollection, TResult>> selector = Expression.Lambda<Func<TCollection, TResult>>(invoke, rangeParameter);

                return query.Select(selector);
            });
    }

This is pretty dense, so let’s walk through it:

  1. Our lambda expression’s context parameter represents the MyDataContext that will be passed in eventually.
  2. We’re going to manipulate the expression trees passed into the method, which will look something like this:
    • collectionSelector: dc => dc.MyTable
    • resultSelector: (dc, row) => new { dc = dc, row = row }
  3. Compiling collectionSelector produces a delegate we can invoke on context to get an IQueryable<TCollection>context.MyTable, in this case.
  4. Before we can use resultSelector on MyTable, we need to wrap it in a lambda expression to eliminate its first parameter.:
    1. Save the second parameter (row) to use later.
    2. Create a new invocation expression that will represent calling resultSelector with the current context and our saved row parameter.
    3. Create a new lambda expression that will accept that same row parameter and return the invocation expression.
  5. The resulting selector, of type Expression<Func<TCollection, TResult>>, can then be passed to query.Select() which happily returns the desired IQueryable<TResult>.

Essentially we’re pretending that the SelectMany call is just a Select call on the IQueryable<TCollection> generated by collectionSelector, all wrapped in a lazy query.

Hopefully this overview has piqued your interest. Next time we’ll look at a base implementation of the interfaces.

Using IDisposables with LINQ

Objects that implement IDisposable are everywhere. The interface even gets its own language features (C#, VB, F#). However, LINQ throws a few wrenches into things:

  1. LINQ’s query syntax depends on expressions; using blocks are statements.
  2. When querying a sequence of IDisposable objects, there’s no easy way to ensure disposal after each element has been consumed.
  3. Returning deferred queries from within a using statement is often desired, but fails spectacularly.

There are possible work-arounds for each issue…

  1. Put the using statement in a method (named or anonymous) that is called from the query. See also: Thinking Functional: Using.
  2. Use a method that creates a dispose-safe iterator of the sequence, like AsSafeEnumerable().
  3. Refactor the method to inject the IDisposable dependency, as shown in the first part of Marc’s answer here.

But, as you might have guessed, I would like to propose a better solution. The code is really complex, so bear with me:

public static IEnumerable<T> Use<T>(this T obj) where T : IDisposable
{
    try
    {
        yield return obj;
    }
    finally
    {
        if (obj != null)
            obj.Dispose();
    }
}

That’s it. We’re turning our IDisposable object into a single-element sequence. The trick is that the C# compiler will build an iterator for us that properly handles the finally clause, ensuring that our object will be disposed. It might be helpful to set a breakpoint on the finally clause to get a better idea what’s happening.

So how can this simple method solve all our problems? First up: “using” a FileStream object created in a LINQ query:

var lengths = from path in myFiles
              from fs in File.OpenRead(path).Use()
              select new { path, fs.Length };

Since the result of Use() is a single-element sequence, we can think of from fs in something.Use() as an assignment of that single value, something, to fs. In fact, it’s really quite similar to an F# use binding in that it will automatically clean itself up when it goes out of scope (by its enumerator calling MoveNext()).

Next, disposing elements from a collection. I’ll use the same SharePoint problem that AsSafeEnumerable() solves:

var webs = from notDisposed in site.AllWebs
           from web in notDisposed.Use()
           select web.Title;

I find this syntax rather clumsy compared with AsSafeEnumerable(), but it’s there if you need it.

Finally, let’s defer disposal of a LINQ to SQL DataContext until after the deferred query is executed, as an answer to the previously-linked Stack Overflow question:

IQueryable<MyType> MyFunc(string myValue)
{
    return from dc in new MyDataContext().Use()
           from row in dc.MyTable
           where row.MyField == myValue
           select row;
}

void UsingFunc()
{
    var result = MyFunc("MyValue").OrderBy(row => row.SortOrder);
    foreach(var row in result)
    {
        //Do something
    }
}

The result of MyFunc now owns its destiny completely. It doesn’t depend on some potentially disposed DataContext – it just creates one that it will dispose when it’s done. There are probably situations where you would want to share a DataContext rather than create one on demand (I don’t use LINQ to SQL, I just blog about it), but again it’s there if you need it.

I’ve only started using this approach recently, so if you have any problems with it please share.