Jan 17 2012

Extending the world, step 1

Tagged: functional c# linq extending the world

After the general introduction to this short series of post, I would like to set up a sort of roadmap, defining the goals that I have while writing this stuff. But first, a sort of disclaimer: I'm not a big expert of these things at all, and I'm quite new to this kind of approach, but that's exactly the reason why I want to write about it, to illustrate things from the point of view of someone who's really learning to use this new mindset and the related techniques, because I know there are a lot of guys like me out there :) The code I will show will be written by myself in a sort of "learn by trial" process, but it will be strongly inspired by several sources, which I will try to mention as most precisely as possible. If someone reading these posts will find errors or omissions his comments will be welcomed.

The points I would like to demonstrate are:

how you can extend Linq to understand new types
how you can extend your types to make them live "better" inside Linq
how you can apply both strategies to write more functional code

The first 2 point will be mostly treated together because they are deeply related, I will try to make clear which side of the medal I will be looking at. And, by the way, I will not write at all about the process of writing a Query Provider, there is a lot of material about it and at the same time it's a hard problem which indeed I had to solve already, I would never be able to talk about it in a useful form in just a few posts.

Let's start. First important statement: Linq is not for IEnumerable and IQueryable only. Linq is there for any type you want, and its query syntax is just "syntactic sugar" around the composition of chained calls over available methods, which might be extension methods too. As long as Linq is able to find (extension) methods with appropriate signatures for the types of the objects you are supplying as "sources", it will be able to understand and execute the query you are writing. You already know that the query syntax understands a limited sets of methods, but the approach is still valid even on methods that are not among the ones Linq handles, as long as those methods are written with "composition" in mind. What that's mean? Well, to me it's simply that they should be "functions": they receive inputs and always produce 1 output (not out or ref parameters here). If the output is of a type Linq is able to understand, you will be able to keep on chaining things as you like, writing "computations", as they are called. Your code starts moving from an imperative style to a declarative one, and the advantage here is that the concern is not on the "how" of things, but on "what" you want from a computation. All the details get hidden inside the functions, and you just declare how you want to compose them, supplying "things" to customize their behavior. This classical imperative code:

foreach(var i in getNumbers)
    Console.WriteLine(i);

could be rewritten like this:

getNumbers.ForEach(Console.WriteLine);

where ForEach is an extension method for IEnumerable<T>. You might be thinking: there's no such method on IEnumerable<T>, but only on List<T>. Right, but that's where the magic begin: you can write it easily in a static class, and then add the right using directive to have it available:

public static IEnumerable<T> ForEach<T>(
    this IEnumerable<T> source, 
    Action<T> action) 
    {
        foreach (T o in source) 
        { 
            action(o); 
            yield return o; 
        }
    }

What you are doing is defining a function in a composable way, extracting a common task (enumerating elements to do something on each one of them) and make it available for future uses. This is a quite stupid case, and I don't mean you should really use this kind of ForEach, in future I will try to show you more complex scenarios where you will hopefully get to appreciate the approach better. Such a function allows us to seamlessly integrate "lateral" actions in our computations:

var q = getNumbers()
        .ForEach(Console.WriteLine)
        .Where(i => i%2 == 0)
        .OrderByDescending(i => i)
        .Select(i => 3*i);

I guess you can already see some advantage of this approach: such a query would be much more "noisy" to write in the old plain imperative style, and our "insertion" of a cross cutting concern, like outputting an intermediate result to the console, has been easy and clean thanks to our extension method. I have more good news about it: we don't have to start writing dozens of extension methods for every little operation like this one, because out there we can find several open source libraries full of useful little functions we can use in our computations. I can recommend MoreLinq, but it's not the only one for sure, and sometimes it's just more fun to try to write your own extension methods.

So far I've been cheating, you are right, I said I would have talked about types which are not IEnumerable<T>. I just wanted to start easy, now let's do another baby step towards our goal. Imagine you have this Twitter client library you found somewhere, which works just fine but exposes an ugly API, and when you have to use it your code starts look just bad. This library is as stupid to use as in the next example:

var client = new UglyTwitterClient();
var twits = client.GetTwitsFromSomeone(
    "@wasp_twit", 
    "date desc", 
    "#csharp", "cats");

Well, maybe it's not so ugly, I don't know, but let's say it is. For sure there are aspects, like the properties of a twit, which could be more strongly checked than just putting them in strings. And we are querying a system without using Linq, no way! We would prefer to write something like this:

var twits = from t in NiceTwitterClient
                      .Search("@wasp_twit")
            where t.Contains("#csharp")
            where t.Contains("cats")
            orderby t.Date descending
            select t;

Ok, it's easy, you have this method Search which returns an IEnumerable<string> so I can call Contains(), and then... wait, we have a Date property on the range variable, so it must be some type, maybe a Twit type we can enumerate in order... enumerate where? On the client? No, I cannot simply download all the twits of this guy locally to filter them! Gosh, I have to implement IQueryable, parse the expression tree, manage all the corner cases, do the remote call, unpack the data...

As I said, here we don't do IQueryable. Here we extend Linq with our types, and that's how this new Linq-enable client can be written. We need our NiceTwitterClient and its static method Search:

public static class NiceTwitterClient
{
    public static Searcher Search(string from)
    {
        return new Searcher(from);
    }	
    ...
}

Search creates an instance of this Searcher type, which will allow us to implement a "fluent interface" which will also be Linq-aware. To do so, our Searcher will expose the Linq methods we decide to expose: Where, OrderBy, OrderByDescending:

public Searcher Where(
    Expression<Func<SearcherClause, bool>> fc) 
{ ... } 
public Sorter OrderBy(
    Expression<Func<SorterClause, bool>> sc) 
{ ... } 
public Sorter OrderByDescending(
    Expression<Func<SorterClause, bool>> sc) 
{ ... }

Here we can notice another aspect of Linq extensibility, probably an expected one: extension methods are not the only way to join the Linq world, instance methods will do too. Those methods receive instances of Expression class, or "expression trees", and return instances of Searcher or Sorter types. The latter is very similar to Searcher, but it's meant to store info about the sorting clause we want in our query. Those types are really defining how our fluent interface evolves: Searcher exposes only these 3 methods and 2 overloads of a Select method, which will be discussed in details next time; Sorter exposes the Select overloads only. Linq query syntax is able to map them to the where, orderby, orderby descending and select operators, and environments like Visual Studio are able to help us with Intellisense. About this last point, this approach is even better than the IQueryable one, because extending through this well known interface forces us to deal with ALL the Linq operators, which will be shown as available by the Intellisense and should be handled as invalid, and that's possible at runtime only. Our approach makes Intellisense smarter, and moves the check of query correctness at compile time. Neat :)

Next time I will illustrate the implementation of this exercise, and please remember that it is an exercise indeed, the code will just illustrate the concepts and should not be considered as complete or "production ready" at all, but if you are new to this concepts and features I'm quite sure it will be stimulating.

(w)asp.net