Jan 17 2012

Extending the world, step 2

Tagged: functional c# linq extending the world

In the previous episode of this short series, I showed how you can define a type in a way that enables it to participate to Linq queries. Linq query syntax defines a set of operators, which are mapped to methods that types involved in a Linq query must provide. If they natively do not, we can still make them work because extension methods come to the rescue. How those methods should be named and shaped is clearly documented, and it would be too much to be explained here, so I will skip these details. I'll go back to our exercise and I'll use it to give you some more details about a couple of interesting points.

So we have our Twitter client:

public static class NiceTwitterClient 
{ 
    public static Searcher Search(string from) 
    { 
        return new Searcher(from); 
    }
}

returning an instance of a Searcher type, and then we have our query:

var twits =  from t in NiceTwitterClient.Search("@wasp_twit")
             where t.Contains("#csharp")
             where t.Contains("cats")
             orderby t.Date descending
             select t;

When we write:

var twits =  from t in NiceTwitterClient.Search("@wasp_twit") ...

we retrieve an instance of Searcher through our call to Search and we "put it into" Linq. With the first part of the expression from t we also define a range variable t which Linq will use when building the subsequent calls. Because we must be compliant to the query syntax, Linq will try to map its operators to corresponding Searcher methods (or extension methods), therefore they must have appropriate definitions. In our sample query, we go on with:

... where t.Contains("#csharp") ...

which Linq translates into this:

... NiceTwitterClient.Search("@wasp_twit")
                     .Where(t => t.Contains("#csharp")) ...

Here you can see how the range variable t is used to build a lambda expression. So, when we compile our code, the compiler will look for an available method named Where on type Searcher. C# syntax allows us to use lambda expressions to represent both delegates of certain types and code expressions built on the same types, so thanks to C# type inference the code above would match both the following definitions (I wrapped in square brackets the code you would add in case of extension methods):

public [static] Searcher Where([this Searcher source, ] 
    Func<SearcherClause, bool> filterClause) 
    { ... }

public [static] Searcher Where([this Searcher source, ] 
    Expression<Func<SearcherClause, bool>> filterClause) 
    { ... }

Our Searcher happens to be implementing the second version of the Where method, so the compiler will happily match that one and we will get called there. Our method will return a (possibly new) instance of Searcher, and we will be ready for a new pass, looking again for the right match. In our sample query we go on with another where clause:

... where t.Contains("cats") ...

which will be resolved as before. You probably already noticed that I specified 2 consecutive where clauses, instead of doing it just once and joining the 2 predicates in a logical and, and you might be wondering why. Well, simply because I'm lazy :) Let's see how our Where method is implemented and everything will be clearer:

public Searcher Where(Expression<Func<SearcherClause, bool>> filterClause)
{
    var mce = filterClause.Body 
              as MethodCallExpression;
    if (mce != null)
    {
        var a = mce.Arguments[0];
        if (a is ConstantExpression)
            return new Searcher(((ConstantExpression)a).Value.ToString(), this);
        if (a is MemberExpression)
        {
            var me = a as MemberExpression;
            if (me.Expression is ConstantExpression)
            {
                var value = ((ConstantExpression)me.Expression).Value;
                if (value is string)
                    return new Searcher(value.ToString(), this);
                var type = value.GetType();
                var fi = type.GetFields()[0];
                return new Searcher(fi.GetValue(value).ToString(), this);
            }
        }
    }
    throw new NotSupportedException(
        "When doing Where I understand calls to Contains method of SearcherClause.");
}

The implementation deals with expression trees, and you already know that those can be arbitrarily complex, and they easily get very hard to analyze. I built a naive implementation which just handles a couple of simple cases, and operation like logical and/or are not managed at all, so I just added an implicit semantic to the Where methods which states that composing Where call has the same effect as building a logical and on the involved expressions.

There are a few more interesting things to say here. The first one is about the return value of the Where method, which is an instance of the Searcher type which receives a reference to the current Searcher (the this parameter), holding it along with the specified filter values extracted from the expression. This way we build a "chain" of Searcher instances. This is just one of the possible ways to deal with the problem, probably the easier and therefore the most suited to an exercise like this. That chain will be used at the end of the computation to determine the true call to the "real" Twitter client.

The second interesting thing is a peculiar view we might adopt to analyze the code. Our Where method might be seen as a way to navigate through types, and in this case we may say that Where "goes from Searcher to Searcher", so it actually keeps us in the same context. If we move forward to the next line in our query:

... orderby t.Date descending ...

we are asking Linq to perform an orderby descending, which it will translate into this:

... searcher_from_previous_where.OrderByDescending(t => t.Date) ...

This will match to this implementation:

public Sorter OrderByDescending(Expression<Func<SorterClause, object>> sortClause)
{
    return order(sortClause, false);
}

private Sorter order(Expression<Func<SorterClause, object>> sortClause, 
    bool ascending)
{
    var ue = sortClause.Body as UnaryExpression;
    if (ue != null && 
        ue.Operand is MemberExpression)
        return new Sorter(((MemberExpression)ue.Operand).Member.Name, ascending, this);
    throw new NotSupportedException(
        "When doing OrderBy I understand UnaryExpression only, accessing members of SorterClause.");
}

Here we can see that we are again doing some expression tree analysis (even more naive that before), and that our OrderByDescending "goes from Searcher to Sorter", changing our context. Our Sorter type is quite similar to Searcher, but it actually defines a new context, letting us know that we moved from the "filter" phase to the "sort" phase. Sorter does not offer Where methods anymore, but only Select methods, which will be analyzed later, and this way it guides us to the right path while building our query, letting us take advantage of features like Visual Studio Intellisense which will suggest us only appropriate methods, and leveraging the compiler support which will emit errors in case we try to use unsupported Linq query operators.

When we are done with sorting, our Linq query is dealing with a Sorter instance, and our implementation offers us just one "way out": using the select operator. A Linq query must end with a call to a select operator (or equivalent, like for grouping), so we should just add appropriate implementations of Select returning an output which is meaningful for us, and in this case an IEnumerable<T> will be just fine. The actual T will be left generic in an implementation, whereas in another available overload it will be closed to a simple Twit type we expose to make things simpler. Under the hood, the actual implementation of IEnumerable<T> will just be an internal type aware of all the parameters we've been collecting along the chained calls, and on the act of "enumerating" it these parameters will be used all together to call the original ugly client, to receive its response and eventually to convert it in an enumeration of Twit instances (possibly transformed via the correspondant Select overload).

I guess this naive exercise should demonstrate how Linq could be extended, in fact we shaped our types in order to have them nicely integrated with Linq query syntax. But there are many more different things we can do in this context to reinforce our initial concept of Linq as a "world" where we can put our things and have them play nicely. We'll see some more of them in the next episodes :)

(w)asp.net