Jan 17 2012

Extending the world, step 3

Tagged: functional c# linq extending the world

So far in our series we have seen how Linq is a world that can be extended if we craft our types the right way. We learned that Linq is definitely not just for IEnumerable or IQueryable, and I'm more and more convinced that what's most interesting about Linq is the way it guides you to write more functional code. I'm not a functional programmer (yet), so please do not expect too much knowledge about that from my posts, but I'm slowly gaining a more functional view on things, and I'm feeling that it's a good thing to have. There are a few aspects of functional programming that Linq tries to enforce:

no side effects: the code we write should not deal with the production of anything which is not strictly related to our result, and which in some way modifies a "global state"
immutability: what's defined inside a Linq expression should be immutable, and this is a good thing because it helps avoiding side effects (see previous point)
declarativity: inside Linq we do not think in terms of state and statements, but in terms of declarations: we tend to describe more what we want from a computation, and less how we want any result to be computed
compositionality: within a Linq expression we can compose computations, chaining hen and mixing them thanks to the operators we have available (of course the types we are using in our expression must support them), resulting in better readability of the intents we have (see previous point)

There are more, of course, but let's stay with these and let's try to expand them. To me it is easier to do it with samples, so here we go.

Let's imagine that we have to compute 3 values and to sum them, but let's say that our 3 inputs are calculated or retrieved by someone who may fail (a db connection? a web service? you name it). Those are precious resources, and we do not want to waste them, so in a sequential world (too early to talk about parallelism or async, sorry... :) ) we would prefer to avoid the subsequent calls if a previous one failed, invalidating the whole computation. At the same time, if something went wrong, we do not want the whole process to fail, but we want to collect information about the error that occurred. How would you write such a code? Maybe something like this:

int r = 0;
Exception error;
try
{
	r = fi();
}
catch(Exception ex)
{
	error = ex;
}
if (error == null)
{
	try
	{
		r = r + fj();
	}
	catch(Exception ex)
	{
		error = ex;
	}
}
if (error == null)
{
	try
	{
		r = r + fk();
	}
	catch(Exception ex)
	{
		error = ex;
	}
}
if (error == null)
{
	...
}
else
{
	...
}

Not so good looking... I will not explain the details, the code is trivial, and actually it could be improved a little, but still you would have something where you would not easily spot the true essence of the algorithm, which indeed is:

int r = fi() + fj() + fk();

What's really noisy here is the error handling, which is distracting you from the heart of the computation, but you cannot avoid it and it makes your code less readable. You would like to clean it up, moving the error handling "somewhere else". Something like the well known AOP approach, but you still want to check if something went wrong in the context of your code. And if you go back to our list of functional characteristics, you can also look at the errors as "side effects", which should really be avoided, or at least isolated from our computation.

A functional programmer at this point would say the magic world: you need a Monad! He might also say that you need a Maybe Monad, or something similar. What a Monad really is, it is still not 100% clear to me, and not being a functional programmer (yet) I try to bring that idea in the space I know better and with the concept I'm used to deal with. So, to me a Monad is a "container", or a "decorator", which is able to "augment" some other instance, or set of instances, of some type T. Let's call out Monad M<T>. A Monad should also, in some way, supply 2 special "static" functions:

return, or unit: such a function, given a type T, returns an M<T>; its job is to take something and "put it" into the Monad
bind: such a function should receive an instance of M<T>, and a function that goes from T to M<R>, and returns an instance of M<R>; in other words, it's job is to "extract" the value(s) of type T from M<T>, then to apply the function that transform the value(s) of type T in value(s) of type R and puts it (them) in M<R>, finally returning the instance of M<R> created

Let's write the signatures:

unit: Func<T, M<T>>
bind: Func<M<T>, Func<T, M<R>>, M<R>>

What's important is that both know the internals and the mechanics of the Monad, and they should be the only ones to know such details. We'll get back to them later, for now let's supposed we have them, and let's use unit like this:

int r = M.Unit(fi) + M.Unit(fj) + M.Unit(fk);

What did we do here? We simply put each of our functions (in a Monad we can put everything, so functions are welcome to) inside an instance of our Monad, and then we sum those instances as if we were still dealing with functions return values. Lovely... but this does not work. It does not even compile, of course, we do not have the + operator defined on our Monad, and we cannot define one because M<T> is generic, and we don't know how to implement the sum. And then, what about implementing all the other operators (-, *, /, ...)? We do not want to do that, of course, the goal of our Monad is to deal with whatever is NOT the computation. So what?

Enter Linq. As the title of this series says, let's extend the world (Linq) to understand our Monad. We already learned how to extend Linq, we just have to supply the right functions to implement the right operators. But first, why do we need Linq? Because Linq has a nice syntax that let us declare computations in a clean way, and because incidentally (well, probably not so incidentally) the select operator would help us a lot. Let's see it's signature:

Func<M<T>, Func<T, R>, M<R>>

It's similar to "bind", but it's not the same: select receives an "augmented" value (or values), extracts it (them) from the "augmenting" Monad, applies to the value(s) a function which transform it (them) to something of type R, and finally puts it (them) in M again, returning it to then following piece of computation. We have 3 interesting points here:

the function that goes from T to R is supplied by us, we are the ones knowing how to do that
select will just extract things from Monad, then will call us to transform them in something else, and finally will put the result back in the Monad
select will use bind to do part of its job, and this will keep the deep knowledge of the internals of the Monad inside bind (and unit): we will say that select is written in terms of bind

Let's simplify for a moment our algorithm to this:

int r = fi();

This seems stupid, but later you will understand why I did it. To do this calculation "inside the Monad", before we tried this:

int r = M.From(fi);

but this would not compile. Let's suppose we have the right implementation of the extension method Select for our M<T>, then we can try this:

var r = from x in M.From(fi) select x;

and this works! Actually, this expression is equivalent to this:

calling unit: put fi() inside M
calling select: extracting back fi() from M, "do something" with fi(), give the result back to us to "transform it" in something else (the select x part, which in this case is just leaving the value as it is), and putting back the result in the Monad, which is returned to the left

You will have also noticed that now the lefthand side of the computation is not int r anymore, but var r, where var actually stands for M<int>. So now we are holding an instance of a Monad over our original type int, and this instance of the Monad will contain the result of our computation... well, maybe :) We'll dig more about this next time.

The final piece to get to a meaningful computation is SelectMany, whose signature is exactly what we need to join different pieces. As long as every part is contained in the same type of Monad M<T>, SelectMany enhances select and enables us to write this:

var r = from x in M.From(fi) 
		from y in M.From(fj) 
		from x in M.From(fk) 
		select x + y + k;

I'm sure that you can see how readable this is, the algorithm (x + y + z) is very clear and the noise introduced by M.From and by the query syntax is not too bad, and actually gets almost invisible when the computations get more complex. Probably still too noisy for someone using F# or Haskell, but for us in C# it's not bad at all.

This episode it's been more about the concepts, and actually I did not say anything concrete about how to solve the error handling problem, this will be the topic of the next episode, where we will implement a specific Monad to solve this issue. Its implementation will also allow us to appreciate how unit, bind and select are implemented to accomplish our goals.

(w)asp.net