Jan 17 2012

Extending the world, step 4

Tagged: functional c# linq extending the world

Last time we introduced a new simple exercise, where we have a computation which might fails and we want to to try to remove all the noise that the error handling introduces all around the algorithm. We talked about Monads (in a very simplified way) and we defined a way we could write our computation inside a Linq expression, using a Monad to isolate the unwanted error handling code. What we still have to do is actually implement this new Monad and see how we can leverage this concepts and strategies to reach our goal.

Following a few simple steps, we had arrived to this way to express our computation:

var r = from x in M.Unit(fi)
        from y in M.Unit(fj)
        from x in M.Unit(fk)
        select x + y + k;

We already said that this "would" compile as long as we provide the right methods to enable Linq to host and execute our code on our Monad. Let's first give a better definition about our Monad, so that we will be able to figure out how we should implement all the details. Basically our Monad will have to isolate the error handling code from the actual computation, which we should be able to express in a compositional way. The idea is that the Monad will be the only actor aware of the error handling, and it will be in charge of keeping track of computation executions and of their actual success or failure. We already said that a Monad by definition has to supply unit and bind, and it is actually bind that will takes care of the specific behavior of our Monad.

We still have to give this Monad a name, and given its description I think that ValueOrError is a good one. Let's write a (very) basic implementation of a class that is able to hold a value or an error condition:

public class ValueOrError<T>
{
	private readonly Exception error;
	private readonly T value;
	private readonly Func<T> eval;
	internal ValueOrError(Func<T> eval)
	{
		this.eval = eval;
	}
	internal ValueOrError(T value)
	{
		this.value = value;
	}
	internal ValueOrError(Exception error)
	{
		this.error = error;
	}
	public T Value
	{
		get { return eval != null ? eval() : value; }
	}
	public Exception Error
	{
		get { return error; }
	}
	public bool HasError
	{
		get { return error != null; }
	}
	public bool HasValue
	{
		get { return !HasError; }
	}
}

We wrote a generic type, and we specified a costructor to hold an error, and 2 constructor to store respectively a value of type T, or a function which is able to produce a value of type T. We could make this code more interesting, but it is enough for our exercise. We also did not manage any specific execution behavior yet, we will do that later. We already know that, to be a Monad, a type must supply 2 special functions, unit and bind. Let's write those as extension methods of ValueOrError<T> inside some static class (ValueOrErrorExtensions for example), and let's start with Unit:

public static ValueOrError<T> Unit<T>(this T source)
{
	return new ValueOrError<T>(source);
}
public static ValueOrError<T> Unit<T>(this Func<T> source)
{
	return new ValueOrError<T>(source);
}

For Unit we have 2 overloads which are mapping the 2 constructors dealing with "values" we have in ValueOrError<T>, and these overloads fulfill the unit mission: put a value inside the Monad.

Let's move on to bind, here we need an implementation which has to have the signature we described in the previous episode:

M<TR> Bind<TS, TR>(this M<TS> source, Func<TS, M<TR>> selector)

A bind is supposed to accomplish these steps:

extract the value(s) of type TS from the input ValueOrError<TS>
apply the supplied function, which transforms the value(s) of type TS in value(s) of type TR and then puts it (or them) in ValueOrError<TR>
return the just created instance of ValueOrError<TR>

This is what we expect from a bind, but while doing this job this method can actually do more things, and in our scenario we know that we need someone who handles errors in a proper way. Let's define what "proper" is:

Bind should check if the source ValueOrError<TS> already contains an error, if yes we know that we should not go on with further processing, so we just create an instance of ValueOrError<TR> (because we have to go from TS to TR), using the constructor which receives an Exception and supplying the error condition of the source ValueOrError<TR> as its input; this way we are stopping the execution chain and passing along the already generated error condition, although wrapped in a new Monad
on the other hand, if the source is a valid value, we can go on with the computation, so we just need to extract the source value and pass it to the selector function which puts the value inside a Monad of type ValueOrError<TR>; if we are in a case where the Monad has been supplied a Func<T> as a constructor parameter, reading its value will actually execute the supplied function, so it is enough to do that inside a try...catch block and, in case of error, return an instance of ValueOrError<TR> containing the thrown exception as the error condition

This is definitely easier to read in code:

public static ValueOrError<TR> Bind<TS, TR>(
        this ValueOrError<TS> source, 
        Func<TS, ValueOrError<TR>> selector)
{
	if (source.HasError)
		return new ValueOrError<TR>(source.Error);
	try
	{
		return selector(source.Value);
	}
	catch (Exception ex)
	{
		return new ValueOrError<TR>(ex);
	}
}

It's very simple, but it's clean and isolated in one point. We could have also moved the try...catch block directly inside the ValueOrError<T> class, but this is a detail and it is not so relevant for our discussion. What's relevant is how to trigger this mechanism, and we already saw that we are going to have this code run inside Linq expressions. Let's write the simplest Linq expression we can, as we did the last time:

var r = from i in ValueOrError.Unit(fi) select i;

which is equivalent to this:

var r = ValueOrError.Unit(fi).Select(i => i);

We need to implement Select, and we can take advantage of Bind and Unit to do it in a simple way:

public static ValueOrError<TR> Select<TS, TR>(
        this ValueOrError<TS> source, 
        Func<TS, TR> selector)
{
	return source.Bind(s => selector(s).Unit());
}

Again very simple, and our knowledge about how to go from T to ValueOrError<T>, about how to go from ValueOrError<TS> to ValueOrError<TR>, and most important about how to deal with errors, is hidden inside Unit and Bind. The result of our computation will be an instance of ValueOrError<T> containing either the computed value fi(), or the error condition encountered during the execution. This was very trivial and we might have done it without all the Monad theory and the Linq support, but as soon as we try to implement more complex computations we start appreciating the added value of these mechanisms. Let's go back to our "complex" algorithm:

var r = from x in ValueOrError.Unit(fi)
        from y in ValueOrError.Unit(fj)
        from x in ValueOrError.Unit(fk)
        select x + y + k;

If we'd try to write the expression as a composition of method calls, we should come up with something like this:

var q = ValueOrError.Unit(fi)
                    .SelectMany( i      => ValueOrError.Unit(fj), 
                                (i,  j) => new {i, j})
                    .SelectMany( @t     => ValueOrError.Unit(fk), 
                                (@t, k) => @t.i + @t.j + k);

Now it is clear that we need to implement SelectMany for our Monad, and at the same time we can appreciate that the query syntax is much more readable than the methods chain, which by the way makes explicit a temporary range variable @t that the query syntax manages to keep hidden. Let's implement SelectMany, in terms of Unit and Select (which is based on Unit and Bind):

public static ValueOrError<TR> SelectMany<TS, TC, TR>(
        this ValueOrError<TS> source, 
        Func<TS, ValueOrError<TC>> matcher, 
        Func<TS, TC, TR> selector)
{
	return source.Bind(s => matcher(s).Select(t => selector(s, t)));
}

If we analyze its signature and we forget for a moment about the last parameter, we can see that it matches the Bind signature, and the last parameter just adds an additional level of indirection, which brings us some more flexibility and actually enables the nice query syntax we are trying to use. SelectMany itself is just using Bind and Select, and there are several amazing things around its implementation:

these very short and neat lines of codes inside our extension methods enable such a clear and coincise Linq expression describing our computation
if you try to follow every single method call to get to the final result, you will immediately notice how complex the call sequence can get, and all this complexity is handled by 4 method, where 3 of them are just one line of code, and the more complex one is just an if statement followed by a function call inside a try...catch block
we already said that we wrote Select and SelectMany in terms of Bind and Unit, but what we did not said is that, following this pattern, whatever Monad you will be writing you will just need to reimplement Bind and Unit, whereas Select and SelectMany will stay exactly the same!
if you think that methods like SelectMany are hard to understand, you should just use the trick to "follow the types" and you will be able to write that line of code by yourself; let's try together:
- SelectMany and Bind have the same first argument and the first return type, so we can start writing this: return source.Bind(...);
- Bind expects a second argument of type Func<TS, ValueOrError<TR>>, and we have matcher and selector with signatures which do not help, but we also have Select available, which goes from ValueOrError<T1> to ValueOrError<T2>, so we can call matcher on TS, obtaining a ValueOrError<TC>, and then call Select on it to go to ValueOrError<TR> thanks to selector, which can be called from "inside" Select supplying it the instances of TS and TC that at that point would be available

So far we wrote nice and clean code, but we didn't really analyze what this code does when dealing with real scenarios. Let's try to follow a chain of computations considering a case where our 3 functions succeed:

fi is put inside a ValueOrError<...> (let's call it VOE^fi, and we'll implicitly use this notation for similar actions in further steps)
SelectMany is called over VOE^fi
inside SelectMany, Bind is called over VOE^fi
VOE^fi is checked for errors, it does not have any therefore we go on
Value is extracted from VOE^fi => fi() is called, let's call its return value fi! (we'll implicitly use this notation for similar actions)
fi! is passed to Bind's selector, which at this point was matcher(s).Select(t => selector(s, t)), which via substitution becomes ValueOrError.Unit(fj).Select(t => selector(s, t)), which eventually becomes (VOE^fj).Select(t => selector(s, t))
Select is called over VOE^fj, which means (VOE^fj).Bind(s => selector(s).Unit())
Bind is called over VOE^fj, we do not detect error conditions so we call fj(), obtaining fj!, and we pass it to s => selector(s).Unit(), which by substitution is ((i, j) => new {i, j}).Unit(), where i = fi! and j = fj!; the anonymous type contains the first 2 results and it is put back in the ValueOrError<...> via Unit()
we repeat the same steps with the second SelectMany, this will let our anonymous type pass through, fk() will be called, and at the end of the same chain we just described we will get to (@t, k) => @t.i + @t.j + k, where @t is our previous anonymous type and k = fk!: the actual sum will be performed

It is not exactly easy to explain, I hope I did not loose you here, if you try to follow such a code with a debugger you will be able to track down what's happening. But we built all this infrastructure to deal with errors, so we should show an example of such a case, and let's suppose the very first computation fails:

fi is put inside a ValueOrError<...>
SelectMany is called over VOE^fi
inside SelectMany, Bind is called over VOE^fi
VOE^fi is checked for errors, it does not have any so we go on (it's not been computed yet)
Value is extracted from VOE^fi => fi() is called and this time it fails, so an exception is raised and trapped, and a VOE^ERROR is returned
because fi() failed, selector is never called, so Bind returns VOE^ERROR to SelectMany, which returns it to its caller
the second SelectMany is called, and we soon reach again a Bind call, but this time we detect an error condition on the source (VOE^ERROR) and therefore we immediately return a new VOE^ERROR containing the same error as the source one, and we skip again the selector call, so SelectMany will return VOE^ERR as a result of the whole computation

Easy, isn't it? :) In both cases we will receive a ValueOrError<T> instance, and we will just have to examine if we have met an error condition or not. What also interesting is the way we actually calculated our computation, we used the term "substitution" several times, and this is another clue that what we are doing is quite functional: our computations can be treated as mathematical expressions where you calculate intermediate results and then just use them to replace parts of expressions (lambda calculus, I guess...).

If you are still with me, I am sure you would be curious to try something like this, and you might want to have some real code to test. Well, my next post will be dedicated to deliver you a sample project that I'm preparing, along with some reference material you can use to investigate more the topics of this series.

(w)asp.net