Selecting Static Results with Dynamic LINQ

Dynamic LINQ (DLINQ) is a LINQ extension provided in the VS 2008 Samples. Scott Guthrie provides a good overview here: Dynamic LINQ (Part 1: Using the LINQ Dynamic Query Library), but the executive summary is that it implements certain query operations on IQueryable (the non-generic variety), with filtering, grouping and projection specified with strings rather than statically-typed expressions.

I’ve never had a use for it, but a question on Stack Overflow caused me to take a second look…

…the selected groupbyvalue (Group) will always be a string, and the sum will always be a double, so I want to be able to cast into something like a List, where Result is an object with properties Group (string) and TotalValue (double).

Before we can solve the problem, let’s take a closer look at why it is being asked…

DynamicExpression.CreateClass

We can use the simplest of dynamic queries to explore a bit:

[Test]
public void DLINQ_IdentityProjection_ReturnsDynamicClass()
{
    IQueryable nums = Enumerable.Range(1, 5).AsQueryable();
    IQueryable q = nums.Select("new (it as Value)");
    Type elementType = q.ElementType;

    Assert.AreEqual("DynamicClass1", elementType.Name);
    CollectionAssert.AreEqual(new[] { typeof(int) },
        elementType.GetProperties().Select(p => p.PropertyType).ToArray());
}

DLINQ defines a special expression syntax for projection that is used to specify what values should be returned and how. it refers to the current element, which in our case is an int.

The result in question comes from DynamicQueryable.Select():

public static IQueryable Select(this IQueryable source, string selector, params object[] values)
{
    LambdaExpression lambda = DynamicExpression.ParseLambda(source.ElementType, null, selector, values);
    return source.Provider.CreateQuery(
        Expression.Call(
            typeof(Queryable), "Select",
            new Type[] { source.ElementType, lambda.Body.Type },
            source.Expression, Expression.Quote(lambda)));
}

The non-generic return type suggest that the type of the values returned is unknown at compile time. If we check an element’s type at runtime, we’ll see something like DynamicClass1. Tracing down the stack from DynamicExpression.ParseLambda(), we eventually find that DynamicClass1 is generated by a call to DynamicExpression.CreateClass() in ExpressionParser.ParseNew(). CreateClass() in turn delegates to a static ClassFactory which manages a dynamic assembly in the current AppDomain to hold the new classes, each generated by Reflection.Emit. The resulting type is then used to generate the MemberInit expression that constructs the object.

Dynamic to Static

While dynamic objects are useful in some situations (thus support in C# 4), in this case we want to use static typing. Let’s specify our result type with a generic method:

IQueryable<TResult> Select<TResult>(this IQueryable source, string selector, params object[] values);

We just need a mechanism to insert our result type into DLINQ to supersede the dynamic result. This is surprisingly easy to implement, as ParseLambda() already accepts a resultType argument. We just need to capture it…

private Type resultType;
public Expression Parse(Type resultType)
{
    this.resultType = resultType;
    int exprPos = token.pos;
    // ...

…and then update ParseNew() to use the specified type:

Expression ParseNew()
{
    // ...
    NextToken();
    Type type = this.resultType ?? DynamicExpression.CreateClass(properties);
    MemberBinding[] bindings = new MemberBinding[properties.Count];
    for (int i = 0; i < bindings.Length; i++)
        bindings[i] = Expression.Bind(type.GetProperty(properties[i].Name), expressions[i]);
    return Expression.MemberInit(Expression.New(type), bindings);
}

If resultType is null, as it is in the existing Select() implementation, a DynamicClass is used instead.

The generic Select<TResult> is then completed by referencing TResult as appropriate:

public static IQueryable<TResult> Select<TResult>(this IQueryable source, string selector, params object[] values)
{
    LambdaExpression lambda = DynamicExpression.ParseLambda(source.ElementType, typeof(TResult), selector, values);
    return source.Provider.CreateQuery<TResult>(
        Expression.Call(
            typeof(Queryable), "Select",
            new Type[] { source.ElementType, typeof(TResult) },
            source.Expression, Expression.Quote(lambda)));
}

With the following usage:

public class ValueClass { public int Value { get; set; } }

[Test]
public void DLINQ_IdentityProjection_ReturnsStaticClass()
{
    IQueryable nums = Enumerable.Range(1, 5).AsQueryable();
    IQueryable<ValueClass> q = nums.Select<ValueClass>("new (it as Value)");
    Type elementType = q.ElementType;

    Assert.AreEqual("ValueClass", elementType.Name);
    CollectionAssert.AreEqual(nums.ToArray(), q.Select(v => v.Value).ToArray());
}

Note that the property names in TResult must match those in the Select query exactly. Changing the query to “new (it as value)” results in an unhandled ArgumentNullException in the Expression.Bind() call seen in the for loop of ParseNew() above, as the “value” property cannot be found.

Selecting Anonymous Types

So we can select dynamic types or existing named types, but what if we want to have the benefits of static typing without having to declare a dedicated ValueClass, as we can with anonymous types and normal static LINQ? As a variation on techniques used elsewhere, let’s can define an overload of Select() that accepts an instance of the anonymous type whose values we will ignore but using its type to infer the desired return type. The overload is trivial:

public static IQueryable<TResult> Select<TResult>(this IQueryable source, TResult template, string selector, params object[] values)
{
    return source.Select<TResult>(selector, values);
}

With usage looking like this (note the required switch to var q):

[Test]
public void DLINQ_IdentityProjection_ReturnsStaticClass()
{
    IQueryable nums = Enumerable.Range(1, 5).AsQueryable();
    var q = nums.Select(new { Value = 0 }, "new (it as Value)");
    Type elementType = q.ElementType;

    Assert.IsTrue(elementType.Name.Contains("AnonymousType"));
    CollectionAssert.AreEqual(nums.ToArray(), q.Select(v => v.Value).ToArray());
}

However, if we try the above we encounter an unfortunate error:

The property ‘Int32 Value’ has no ‘set’ accessor

As you may or may not know, anonymous types in C# are immutable (modulo changes to objects they reference), with their values set through a compiler-generated constructor. (I’m not sure if this is true in VB.) With this knowledge in hand, we can update ParseNew() to check if resultType has such a constructor that we could use instead:

    // ...
    Type type = this.resultType ?? DynamicExpression.CreateClass(properties);

    var propertyTypes = type.GetProperties().Select(p => p.PropertyType).ToArray();
    var ctor = type.GetConstructor(propertyTypes);
    if (ctor != null)
        return Expression.New(ctor, expressions);

    MemberBinding[] bindings = new MemberBinding[properties.Count];
    for (int i = 0; i < bindings.Length; i++)
        bindings[i] = Expression.Bind(type.GetProperty(properties[i].Name), expressions[i]);
    return Expression.MemberInit(Expression.New(type), bindings);
}

And with that we can now project from a dynamic query onto static types, both named and anonymous, with a reasonably natural interface.

Due to licensing I can’t post the full example, but if you’re at all curious about Reflection.Emit or how DLINQ works I would encourage you to dive in and let us know what else you come up with. Things will get even more interesting with the combination of LINQ, the DLR and C# 4’s dynamic in the coming months.

Advertisements

Introducing LazyLinq: Overview

This is the first in a series of posts on LazyLinq, a wrapper to support lazy initialization and deferred disposal of a LINQ query context, including LINQ to SQL’s DataContext:

  1. Introducing LazyLinq: Overview
  2. Introducing LazyLinq: Internals
  3. Introducing LazyLinq: Queryability
  4. Simplifying LazyLinq
  5. Introducing LazyLinq: Lazy DataContext

Motivation

I recently posted an approach to dealing with IDisposable objects and LINQ. In the comments at LosTechies, Steve Gentile mentioned that my IQueryable example didn’t actually compile:

IQueryable<MyType> MyFunc(string myValue)
{
    return from dc in new MyDataContext().Use()
           from row in dc.MyTable
           where row.MyField == myValue
           select row;
}

Steve suggested using AsQueryable() on the result of the query, which does indeed fix the build. However, the purpose of returning an IQueryable is that it would allow us to perform additional query operations using the original query provider. Since the query result isn’t IQueryable, AsQueryable() will use a query provider based on LINQ to Objects, with an additional performance penalty to compile the expression trees into IL.

Even worse, because Use() is returning an IEnumerable<T> the entire query is actually executed with LINQ to Objects. Even though dc.MyTable is IQueryable, the translated query treats it as a simple IEnumerable, essentially performing a SELECT * before executing all query operations on entity objects in memory. It should go without saying that this is less than ideal.

Introducing LazyLinq

After several iterations, I think I have a better solution. In this post I’ll review the architecture of the solution, with posts to follow detailing the implementation.

LazyLinq is implemented around three interfaces. The first serves as a deferred query context provider:

    public interface ILazyContext<TContext> : IDisposable
    {
        TContext Context { get; }

        ILazyQueryable<TContext, TResult, TQuery>
            CreateQuery<TResult, TQuery>(Func<TContext, TQuery> queryBuilder)
            where TQuery : IQueryable<TResult>;

        ILazyOrderedQueryable<TContext, TResult, TQuery>
            CreateOrderedQuery<TResult, TQuery>(Func<TContext, TQuery> queryBuilder)
            where TQuery : IOrderedQueryable<TResult>;

        TResult Execute<TResult>(Func<TContext, TResult> action);
    }

An implementer of ILazyContext has four responsibilities:

  1. Lazily expose the Context.
  2. Produce lazy wrappers to represent queries retrieved from a context by a delegate.
  3. Execute an action on the context.
  4. Ensure the context is disposed as necessary.

The remaining interfaces serve as lazy query wrappers, corresponding to IQueryable<T> and IOrderedQueryable<T>:

    public interface ILazyQueryable<TContext, TSource, TQuery>
        : IQueryable<TSource>
        where TQuery : IQueryable<TSource>
    {
        ILazyContext<TContext> Context { get; }
        Func<TContext, TQuery> QueryBuilder { get; }
    }
    public interface ILazyOrderedQueryable<TContext, TSource, TQuery>
        : ILazyQueryable<TContext, TSource, TQuery>, IOrderedQueryable<TSource>
        where TQuery : IOrderedQueryable<TSource>
    { }

An implementer of ILazyQueryable has four responsibilities:

  1. Expose the Context from which it was created.
  2. Expose a delegate that represents how the deferred query is built from Context.
  3. Implement IQueryable for the deferred query.
  4. Ensure the context is disposed after the query is enumerated.

If it seems like these interfaces don’t do much, you’re absolutely correct. As we’ll see later, the light footprint gives us considerable flexibility.

LINQ to ILazyContext

Defining a few interfaces is all well and good, but the real goal is to simplify working with our disposable context. What if I told you that our original use case didn’t need to change at all (other than the lazy return type)?

ILazyQueryable<MyType> MyFunc(string myValue)
{
    return from dc in new MyDataContext().Use()
           from row in dc.MyTable
           where row.MyField == myValue
           select row;
}

We can’t implement it yet, but our new Use() extension method will have this signature:

    public static ILazyContext<TContext> Use<TContext>(this TContext @this) { ... }

This is where we really start to bend LINQ against its will. As the first step in the query translation process, the compiler will translate our from clauses into a call to SelectMany. All we need to do is provide a SelectMany method for ILazyContext that the compiler will find acceptable:

    public static ILazyQueryable<TContext, TResult, IQueryable<TResult>> SelectMany<TContext, TCollection, TResult>(
        this ILazyContext<TContext> lazyContext,
        Expression<Func<TContext, IQueryable<TCollection>>> collectionSelector,
        Expression<Func<TContext, TCollection, TResult>> resultSelector)
    {

The method signature is a slight variation from the corresponding overload of Queryable.SelectMany(), changed to require that collectionSelector returns an IQueryable that we can defer. If it doesn’t, the compiler will complain:

An expression of type ‘System.Collections.Generic.IEnumerable<int>’ is not allowed in a subsequent from clause in a query expression with source type ‘Solutionizing.Linq.Test.MyDataContext<Solutionizing.Linq.Test.MyDataContext>’. Type inference failed in the call to ‘SelectMany’.

Now that we’ve hijacked the query, we can control the rest of the translation process with the returned ILazyQueryable. Recalling that our ILazyContext knows how to make an ILazyQueryable, we just need to give it a QueryBuilder delegate:

        return lazyContext.CreateQuery<TResult, IQueryable<TResult>>(context =>
            {
                Func<TContext, IQueryable<TCollection>> getQueryFromContext = collectionSelector.Compile();
                IQueryable<TCollection> query = getQueryFromContext(context);

                ParameterExpression rangeParameter = resultSelector.Parameters[1];
                InvocationExpression invoke = Expression.Invoke(resultSelector, Expression.Constant(context), rangeParameter);
                Expression<Func<TCollection, TResult>> selector = Expression.Lambda<Func<TCollection, TResult>>(invoke, rangeParameter);

                return query.Select(selector);
            });
    }

This is pretty dense, so let’s walk through it:

  1. Our lambda expression’s context parameter represents the MyDataContext that will be passed in eventually.
  2. We’re going to manipulate the expression trees passed into the method, which will look something like this:
    • collectionSelector: dc => dc.MyTable
    • resultSelector: (dc, row) => new { dc = dc, row = row }
  3. Compiling collectionSelector produces a delegate we can invoke on context to get an IQueryable<TCollection>context.MyTable, in this case.
  4. Before we can use resultSelector on MyTable, we need to wrap it in a lambda expression to eliminate its first parameter.:
    1. Save the second parameter (row) to use later.
    2. Create a new invocation expression that will represent calling resultSelector with the current context and our saved row parameter.
    3. Create a new lambda expression that will accept that same row parameter and return the invocation expression.
  5. The resulting selector, of type Expression<Func<TCollection, TResult>>, can then be passed to query.Select() which happily returns the desired IQueryable<TResult>.

Essentially we’re pretending that the SelectMany call is just a Select call on the IQueryable<TCollection> generated by collectionSelector, all wrapped in a lazy query.

Hopefully this overview has piqued your interest. Next time we’ll look at a base implementation of the interfaces.

Generic Method Invocation with Expression Trees

It’s against the rules and completely unsupported, but sometimes it’s just so much easier to use a base class’s private/internal members. Reflection has always been an option, but performance is less than ideal. Lightweight Code Generation is an option, but emitting IL isn’t for everyone. Since .NET 3.5 came out, there have been several discussions of using expression trees as a developer-friendly yet efficient alternative. There is an up-front cost to compile the expression into IL, but the resulting delegate can be reused with performance very close to direct invocation.

Alkampfer provides a great overview of expression tree method invocation in this article, which inspired this more general solution.

First, let’s set up a cache to store our compiled delegates. I didn’t put much effort into making it efficiently thread-safe, but suggestions are certainly welcome.

private static Dictionary<string, Delegate> accessors = new Dictionary<string, Delegate>();
private static object accessorLock = new object();
private static D GetCachedAccessor<D>(string key)
                 where D : class // Constraint cannot be special class 'System.Delegate'
{
    D result = null;
    Delegate cachedDelegate;
    lock (accessorLock)
    {
        if (accessors.TryGetValue(key, out cachedDelegate))
        {
            Debug.WriteLine("Found cache entry for " + key);
            result = cachedDelegate as D;
        }
    }
    return result;
}
private static void SetCachedAccessor(string key, Delegate value)
{
    if (value != null)
        lock (accessorLock)
        {
            accessors[key] = value;
        }
}

GetFieldAccessor

Now we can dive into our expression trees. As a warm-up, here’s a relatively simple cached field accessor, inspired by Roger Alsing‘s great post:

public static Func<T, R> GetFieldAccessor<T, R>(string fieldName)
{
    Type typeT = typeof(T);

    string key = string.Format("{0}.{1}", typeT.FullName, fieldName);
    Func<T, R> result = GetCachedAccessor<Func<T, R>>(key);

    if (result == null)
    {
        var param = Expression.Parameter(typeT, "obj");
        var member = Expression.PropertyOrField(param, fieldName);
        var lambda = Expression.Lambda<Func<T, R>>(member, param);

        Debug.WriteLine("Caching " + key + " : " + lambda.Body);
        result = lambda.Compile();
        SetCachedAccessor(key, result);
    }
    return result;
}

The method returns a function that will accept an object of type T and return its fieldName property with type R. For example, we can wrap this in an extension method to check if an SPWeb has been disposed:

public static bool GetIsClosed(this SPWeb web)
{
    return GetFieldAccessor<SPWeb, bool>("m_closed")(web);
}

Because the delegate is cached, successive calls of GetFieldAccessor() will immediately return the necessary delegate without recompilation.

GetMethodAccessor

Building a method accessor is a bit trickier because of the various combinations of parameter and return types. One option is to explicitly define overloads for various method signatures, as seen in the article referenced earlier. Instead, I figure we can let the caller specify the desired delegate signature and figure out the intended method based on that.

public static D GetMethodAccessor<D>(string methodName, BindingFlags bindingAttr)
                where D : class // Constraint cannot be special class 'System.Delegate'
{
    Type[] args = typeof(D).GetGenericArguments();
    Type objType = args[0];

    Type[] argTypes = args.Skip(1).ToArray();
    string[] argTypesArray = argTypes.Select(t => t.Name).ToArray();
    string key = string.Format("{0}.{1}({2})", objType.FullName, methodName, string.Join(",", argTypesArray));

    D result = GetCachedAccessor<D>(key);
    if (result == null)
    {
        MethodInfo mi = objType.GetMethod(methodName, bindingAttr, null, argTypes, null);

        if (mi == null || mi.ReturnType != typeof(void))
        {
            argTypes = argTypes.Take(argTypesArray.Length - 1).ToArray();
            mi = objType.GetMethod(methodName, bindingAttr, null, argTypes, null);
        }

        if (mi == null)
            throw new ArgumentException("Could not find appropriate overload.", methodName);

        var param = Expression.Parameter(objType, "obj");
        var arguments = argTypes.Select((t, i) => Expression.Parameter(t, "p" + i)).ToArray();
        var invoke = Expression.Call(param, mi, arguments);
        var lambda = Expression.Lambda<D>(invoke, param.AsEnumerable().Concat(arguments));

        Debug.WriteLine("Caching " + key + " : " + lambda.Body);
        result = lambda.Compile();
        SetCachedAccessor(key, result as Delegate);
    }
    return result;
}

As you can see, we depend heavily on the generic arguments of the delegate type. This means passing a closed delegate type to this function won’t work – it needs to be Func, Action, or something compatible with the expected argument structure. So what is that structure? The processing logic works like this:

  1. Take the first generic argument as the type whose method we are going to invoke.
  2. Fetch an array, skipping the first argument, that we pass to GetMethod as the argument types.
  3. If GetMethod can’t find an appropriate overload, or if the method’s return type is not void, then we shouldn’t have used all of the arguments as parameters.
  4. Redefine our parameter array without the last argument – this is our delegate’s non-void return type.
  5. Try GetMethod again with the trimmed array; throw if we still don’t find a match.

Once we have the details of our method, we can build the expression tree. I use the mapi-style Select overload to build an array of typed parameters named p0, p1, etc., which is then passed to Expression.Call to represent the method invocation. Finally, Expression.Lambda expects a list of all parameters including the instance param. Rather than allocate an intermediate data structure, I use a trick I picked up from Keith Rimington:

public static IEnumerable<T> AsEnumerable<T>(this T obj)
{
    yield return obj;
}

By turning param into a single-element IEnumerable<ParameterExpression>, we can simply Concat the rest of the arguments. Beautiful.

GetMethodAccessor Usage

The usage is a bit more complex then GetFieldAccessor, but still quite manageable:

public static bool SetBoolValue(this SPField field, string attrName, bool attrValue)
{
    Func<SPField, string, bool, bool> lambda =
        GetMethodAccessor<Func<SPField, string, bool, bool>>("SetFieldBoolValue",
                                                             BindingFlags.Instance | BindingFlags.NonPublic);
    return lambda(field, attrName, attrValue);
}

The intermediate variable is unnecessary, but makes it easier to see what’s going on. SPField.SetFieldBoolValue is of type Func<string, bool, bool>, so our delegate needs to be Func<SPField, string, bool, bool> to accept the instance variable first. The parameters for GetMethodAccessor are identical to what we would pass to field.GetType().GetMethod() if we were using normal reflection. Then we invoke lambda to effectively call field.SetFieldBoolValue(attrName, attrValue).

For methods that return void, we just pass an Action type instead:

public static void SetHidden(this SPField field, bool value)
{
    GetMethodAccessor<Action<SPField, bool>>("SetHidden", BindingFlags.Instance | BindingFlags.NonPublic)(field, value);
}

And these can be used like any other extension methods:

SPField field = GetField();
field.SetBoolValue("CanToggleHidden", !field.CanToggleHidden);
field.SetBoolValue("CanBeDeleted", !field.CanBeDeleted);
field.SetHidden(!field.Hidden);
field.Update();

Which will show the following in DebugView:

Caching Microsoft.SharePoint.SPField.SetFieldBoolValue(String,Boolean,Boolean) : obj.SetFieldBoolValue(p0, p1)
Found cache entry for Microsoft.SharePoint.SPField.SetFieldBoolValue(String,Boolean,Boolean)
Caching Microsoft.SharePoint.SPField.SetHidden(Boolean) : obj.SetHidden(p0)

Because of the penalty for compilation, this technique is not right for all situations. But for frequent access to inaccessible members, it might be worth a try.