Skip to content

2010 07 03 building tuple expressions

Fabian Schmied edited this page Jul 3, 2010 · 1 revision

Published on July 3nd, 2010 at 20:24

Building tuple expressions

In my previous post I explained how the re-linq SQL backend moves OrderBy clauses from sub-queries to the outermost surrounding query in order to be compatible with SQL sub-query restrictions. Because the outer query might not have access to the properties required for the ordering, it is necessary to replace the subquery’s projection with a tuple containing both the original projection and all the ordering expressions:

from o in Orders
from product in (from oi in o.OrderItems
                 orderby oi.Quantity descending
                 select oi.Product)
select new { Order = o, Product = product }

becomes

from o in Orders
from product in (  
  from oi in o.OrderItems
  select new KeyValuePair<string, int\> (oi.Product, oi.Quantity))
orderby product.Value descending
select new { Order = o, Product = product.Key }

The SQL backend uses the predefined KeyValuePair type defined by .NET to construct tuples, but this presents a problem: what to do if there is more than one OrderBy expression in the sub-query? KeyValuePair can only store two values – the original projection and one OrderBy expression. So what if we have to store more than that?

[Side note: Of course, we could just implement our own Triple with three elements. And Quadruple, Quintuple, Sextuple with four, five, and six items. But we can’t build a tuple type that can take an arbitrary number of items. We could use an array, but the re-linq backend does not currently support analyzing array index operations.]

One elegant solution is to use KeyValuePair as a recursive data type. If we need to store more than two values, we’ll put a second KeyValuePair into the first one’s Value property – and thus gain one more place to store something:

Values to be stored Expression
1 1
1,2 new KVP(1,2)
1,2,3 new KVP(1,new KVP(2,3))
1,2,3,4 new KVP(1,new KVP(2,new KVP(3,4)))

As you can see, this is quite flexible. It is, by the way, how many functional and logic-oriented programming languages construct lists.

Since building such tuple expressions can be handy in different situations while building a LINQ provider, re-linq now contains a helper class, TupleExpressionBuilder, which helps with building such tuples and accessing the members held by a tuple.

Here is the code:

public class TupleExpressionBuilder
{
 public static Expression AggregateExpressionsIntoTuple (  
 IEnumerable<Expression\> expressions)
  {
    ArgumentUtility.CheckNotNull ("expressions", expressions);

    return expressions
        .Reverse ()
        .Aggregate ((current, expression) => CreateTupleExpression (  
            expression, current));
  }

  public static IEnumerable<Expression\> GetExpressionsFromTuple (  
      Expression tupleExpression)
  {
    ArgumentUtility.CheckNotNull ("tupleExpression", tupleExpression);

    while (tupleExpression.Type.IsGenericType  
        && tupleExpression.Type.GetGenericTypeDefinition()  
            == typeof (KeyValuePair<,>))
    {
      yield return Expression.MakeMemberAccess (  
          tupleExpression,  
          tupleExpression.Type.GetProperty ("Key"));
      tupleExpression = Expression.MakeMemberAccess (  
          tupleExpression,  
          tupleExpression.Type.GetProperty ("Value"));
    }

    yield return tupleExpression;
  }

  private static Expression CreateTupleExpression (  
      Expression left,  
      Expression right)
  {
    var tupleType =  
        typeof (KeyValuePair<,>).MakeGenericType (left.Type, right.Type);
    var newTupleExpression =
        Expression.New (
            tupleType.GetConstructor (new[] { left.Type, right.Type }),
            new[] { left, right },
            new[] {  
                tupleType.GetMethod ("get_Key"),  
                tupleType.GetMethod ("get_Value") });
    return newTupleExpression;
  }
}

Is anybody else feeling the LISP-ness of those tuples? I only say car and cdr
(Although, actually, I prefer the ./2 predicate.)

Exercise for the reader: Get rid of the while loop in GetExpressionsFromTuple. It would be so much easier if we had a NIL element at the end of the list…

- Fabian

Comments

Stefan Wenig - July 5th, 2010 at 8:48

Cool! Is the transformed LINQ query (the one with KeyValuePair) an actual intermediate result that is then used to generate SQL from? If so, when do you apply that transformation?

Fabian Schmied - July 5th, 2010 at 11:16

Yeah, hard to find with search engines that ignore punctuation characters :)

./2 is the list constructor of Prolog. In Prolog, a list [1, 2, 3] can be written as .(1, .(2, .(3, []))) (or as [1|[2|[3|[]]]], or in a few other notations…). Why you’d do that? Because you can easily write algorithms that recurse over the elements of a list that way. Most functional languages have a smilar notation for lists.

Note that Prolog (like many functional languages) uses a NIL element to terminate the list, which makes many recursive algorithms easier to write declaratively.

We originally also has a NIL element at the end of our KeyValuePair list structure, but we got rid of it because it required special casing in other situations. Now, however, the GetExpressionsFromTuple algorithm requires a dedicated "end" case (yield return tupleExpression) that makes it hard to write declaratively using LINQ.

Clone this wiki locally