Errors, exceptions and monads
Almost two weeks ago, while working on GUAC I realized that I was typing the same Go construct over and over again:
if err != nil {
return nil, err
}
While I have been using Go professionally for barely half a year, this got me thinking up a tweet which I will now expand to a full blog article. This is because I have used various programming languages, with different ways of handling exceptions, so I could do a (not exhaustive) summary of the options and argue that/why a certain one (as can be hinted from the title) is the one I consider to be the better of these. Plus, I can use this article as a golden test for syntax highlighting in various programming languages so that when I change parts of the blog engine I can prevent style breakages there.
C 🔗
We cannot discuss error handling without first talking about how C handles
this. In general, functions return a specific value (or set of values) to
signal error: NULL
, -1
or negative numbers, or other specific
constants. The programmer is then, usually, supposed to look at the value of
errno
(or similar global, thread-unsafe error variables) to
determine the reason for the error.
For an example, see the following code which prints the contents of a (large)
file from a specific offset, up to an optional length. Since the file can be
large, code is using mmap
to map relevant parts of the file in memory
instead of seeking through the entire file.
int output_file_range(const char *filename, off_t offset, size_t length)
{
char *addr;
int fd;
struct stat sb;
;
off_t pa_offsetssize_t s;
= open(filename, O_RDONLY);
fd if (fd == -1) {
("open");
perrorreturn -1;
}
if (fstat(fd, &sb) == -1) {
("fstat");
perror(fd);
closereturn -1;
}
= offset & ~(sysconf(_SC_PAGE_SIZE) - 1);
pa_offset if (offset >= sb.st_size) {
(stderr, "offset is past end of file\n");
fprintf(fd);
closereturn -1;
}
if (length == 0 || (offset + length > sb.st_size)) {
= sb.st_size - offset;
length }
= mmap(NULL, length + offset - pa_offset, PROT_READ,
addr , fd, pa_offset);
MAP_PRIVATEif (addr == MAP_FAILED) {
("mmap");
perror(fd);
closereturn -1;
}
= write(STDOUT_FILENO, addr + offset - pa_offset, length);
s if (s != length) {
if (s == -1) {
("write");
perror(addr, length + offset - pa_offset);
munmap(fd);
closereturn -1;
}
(stderr, "partial write");
fprintf(addr, length + offset - pa_offset);
munmap(fd);
closereturn -1;
}
(addr, length + offset - pa_offset);
munmap(fd);
closereturn 0;
}
This is 58 lines of code, with a lot of error checking and very prone to
leaking resources (for example, look how many times we had to call
close(fd)
to make sure the file descriptor is released on every error
after the file is open
ed). Macros and/or careful usages of goto
can reduce some of the repetition, but at the cost of some readability.
Removing all error checking makes the happy-path logic visible:
int output_file_range(const char *filename, off_t offset, size_t length)
{
char *addr;
int fd;
struct stat sb;
;
off_t pa_offset
= open(filename, O_RDONLY);
fd (fd, &sb);
fstat= offset & ~(sysconf(_SC_PAGE_SIZE) - 1);
pa_offset if (length == 0 || (offset + length > sb.st_size)) {
= sb.st_size - offset;
length }
= mmap(NULL, length + offset - pa_offset, PROT_READ,
addr , fd, pa_offset);
MAP_PRIVATE(STDOUT_FILENO, addr + offset - pa_offset, length);
write(addr, length + offset - pa_offset);
munmap(fd);
closereturn 0;
}
Unfortunately, this 20 lines code will fail spectacularly as soon as execution strays off of the expected path. So, while it achieves the most in readability, it is also the least robust approach.
Java 🔗
While we should go to C++ for the next design point in the area, I’ll choose
Java first and discuss C++ in the next section. In Java, we have checked
exceptions, each function must declare exceptions that it throws or catch
anything that it doesn’t declare (excluding some standard language exceptions
such as NullPointerException
, etc.). This adds to the verbosity of
the language, as can be seen from the following code that runs a BFS over some
graph stored in some database:
public ArrayList<Node> Path(
Context ctx, String sSource, String sTarget, int maxPathLength)
throws InvalidPathLengthException, NumberFormatException, DatabaseError {
if (maxPathLength < 0) {
throw new InvalidPathLengthException(maxPathLength);
}
int source = Integer.parseInt(sSource);
int target = Integer.parseInt(sTarget);
Driver driver = ctx.GetDatabaseConnection();
return BFS(ctx, driver, source, target, maxPathLength);
}
private ArrayList<Node> BFS(
Context ctx, Driver driver, int source, int target, int maxLength)
throws DatabaseError {
Queue<Integer> queue = new LinkedList<>
Map<Integer, BFSNode> nodeMap = new HashMap();
.put(source, new BFSNode());
nodeMap.add(source);
queue
boolean found = false;
while (queue.size() > 0) {
int now = queue.poll();
= nodeMap.get(now);
BFSNode node
if (now == target) {
= true;
found break;
}
if (node.depth >= maxLength) {
break;
}
= RunNeighborQuery(driver);
neighborIDs for (int neighbor : neighborIDs) {
if (!nodeMap.containsKey(neighbor)) { // not seen before
.put(neighbor, new BFSNode(now, node.depth + 1));
nodeMap}
if (!nodeMap.get(neighbor).expanded) {
.add(neighbor);
queue}
}
.expanded = true;
node}
if (!found) {
return null;
}
return BuildPath(source, target, nodeMap);
}
This code is long, and I’ve hidden building the path from the list of visited
nodes and their parents as well as the definition of BFSNode
(which
should contain the depth in the BFS tree, the parent and a boolean
to
mark whether the node has been fully expanded).
The fact that Java forces us to handle exceptions or declare that we are
throw
ing them upwards is a nice feature. We can write code focusing
on the happy-path and have a guarantee that somewhere else in the code
something is responsible for handling exceptions and returning valid messages
to the user, without the code progressing past the exceptional state. Well,
almost, there’s still that pesky NullPointerException
(and
similar).
But we also trade in some verbosity, and, potentially some code bloat.
C++ 🔗
It is time to look at C++ now. Since it evolved from C, it still supports
returning special values and testing errno
or other global variables.
But it also supports throwing up exceptions, at a cost of generating a larger
binary and some potential slowdown.
There is a third option: return a Status
or StatusOr<T>
value,
like that used by Abseil. Here, instead of a special value for error
we have a special value for the non-error case: Status::OK
. If using
StatusOr<T>
, the operator*
implementation returns the
T
value if the Status
is OK
. Programmer is expected to
check status before retrieving the wrapped variable.
The Status
pattern can be coupled with macros to make reading code
focus on the happy path but still properly handle exceptions:
#define REQUIRE(CONTEXT, EXPRESSION, STATUS) \
do { \
if (!EXPRESSION) { \
CONTEXT->RecordFailure(__FILE__, __LINE__, STATUS); \
return; \
} \
} while(0)
void Compute(Context *context, Value x, Value y) {
int xInt, yInt;
(context, ParseValue(x, &xInt), UserError("Cannot parse", x));
REQUIRE(context, ParseValue(y, &yInt), UserError("Cannot parse", y));
REQUIRE->SetOuput(xInt + yInt);
context}
This is a very small toy example, but this is the pattern that TensorFlow uses
to operate on tensors which can be user controlled. Each REQUIRE
would
validate tensor dimensions, shapes, etc., and then output tensors in the
context are created and filled with the proper values. If a validation fails,
context->RecordFailure
records a specific error value for
Status
and aborts function execution.
The caller is supposed to check the value of context->status()
before
continuing, but it is so easy to make mistakes. In fact, this has resulted in
several CVEs in TensorFlow.
Python 🔗
Nothing special here, except the fact that types don’t really exist. Newer versions of Python allow writing type declarations but these are only checked by a type checker before running. At runtime, these type annotation don’t show up in the AST and this can lead to bugs such as the following:
def foo(m: dict[int, bool]):
2] = True
m[# More code lines follow
3] : False
m[# More code lines follow
But I digress, this article is not about typecheckers or crazy programming languages bugs (every language, including Haskell has some).
On the topic of exceptions and error handling, Python recommends using
exceptions, via raise
and except
. There’s no possibility
to mark that a function raises a specific exception except in the function
docstring, but this can quickly get stale.
Go 🔗
Most other programming languages use one of the above mentioned styles and there’s even less to say about them that what I had about Python. There are two classes of exceptions to this rule though. One is represented by Go and the other will be discussed in the next section.
In Go, there is a special error
type and there is a convention to have
functions return two values: the real value and one value of type
error
. Then, callers need to check explicitly for these error cases.
Copying from GUAC, here’s an example of starting a BFS search, the
equivalent of the Java example above:
func (c *demoClient) Path(ctx context.Context,
string, target string, maxPathLength int) ([]model.Node, error) {
source if maxPathLength <= 0 {
return nil, gqlerror.Errorf("maxPathLength argument " +
"must be positive, got %d", maxPathLength)
}
, err := strconv.ParseUint(source, 10, 32)
sourceIDif err != nil {
return nil, err
}
, err := strconv.ParseUint(target, 10, 32)
targetIDif err != nil {
return nil, err
}
.m.RLock()
cdefer c.m.RUnlock()
return c.bfs(uint32(sourceID), uint32(targetID), maxPathLength)
}
There is nothing to force the programmer to check for errors, like in all of
the previously considered languages (or risk panic
s). The only benefit
here is that the syntax is homogeneous so one can make a Vim macro to type
those 3 lines with only a press of 3 keys.
Plus, by using defer
, we can ensure that there are no resource leaks,
though again it is the programmer’s responsibility to write the proper code.
Haskell 🔗
We now turn to Haskell as the other exception to the general patterns of handling code exceptions. In fact, here we use a pattern that is used for other use cases too: monads (as could have been hinted from the title).
I’ll use the same example I’m giving in the Haskell 102 course at
Google: consider the case where you have to write a
function that divides an integer by 2 to produce a new integer. The question
is how do we handle the case when the input is not even? We could
error
or we could truncate the answer to an integer. But neither
of these approaches is useful for the user. Instead, let’s first enrich the
return type to signal the error: we’ll wrap the answer in Maybe
and return Nothing
when we cannot perform the division or
Just
the answer otherwise (contrast this with StatusOr<T>
in the C++ approach):
div2 :: Int -> Maybe Int
div2 x| even x = Just $ x `div` 2
| otherwise = Nothing
But we can do more. We can return an error string in the failure case, if we change the wrap type:
div2' :: Int -> Either String Int
div2' x| even x = Right $ x `div` 2
| otherwise = Left $ printf "Received %d which is not even" x
Or, we can change the error type, instead of a string we could use a custom error type:
data MyError = NotEven Int | ... deriving (Show, Eq)
div2'' :: Int -> Either MyError Int
div2'' x| even x = Right $ x `div` 2
| otherwise = Left $ NotEven x
Comparing the 3 functions, we see that the only thing that changes is the
return type and the expression on the otherwise
branch. But we are
not done yet. Haskell forces us to check the error case, unlike all other
scenarios considered so far: imagine we now want to write a div4
function where we divide by 2 twice. Since div2
returns a Maybe Int
we cannot say that we just pass the result of div2
to the next div2
call. Haskell’s strong type system forces us
to do explicit unwrapping:
div4_explicit :: Int -> Maybe Int
= case div2 x of
div4_explicit x Just y -> div2 y
Nothing -> Nothing
Similar code can be written for the other 2 examples.
Of course, in longer applications, we wouldn’t want to do the explicit
wrapping and unwrapping ourselves. It would be no better than Go’s if err != nil
construct if we were to stop here. Instead, Haskell offers us a
typeclass (Monad
) with an operator (>>=)
(called
bind) which allows us to write more concise code. If we are to write the
div4
family of functions using (>>=)
we would have the
following expressions:
div4 :: Int -> Maybe Int
= div2 x >>= div2
div4 x
div4' :: Int -> Either String Int
= div2' x >>= div2'
div4' x
div4'' :: Int -> Either MyError Int
= div2'' x >>= div2'' div4'' x
Wait, what? The only thing that has changed between these 3 functions is the type signature. The textual implementation is the same (modulo renaming the function): in all 3 cases, it looks like the value flows from left to right, doing the division. Only the happy path is explicitly written.
But how is this possible? The secret lies in the implementation of
(>>=)
:
instance Monad Maybe where
Just x) >>= f = f x
(Nothing >>= _ = Nothing
instance Monad (Either e) where
Right r) >>= f = f r
(Left l) >>= _ = Left l (
Observe how in both cases the function is only applied on the “right” result
(wrapped by Just
if using Maybe
or Right
if using Either ...
). With this, we can even write:
= div2 x >>= div2 >>= div2 >>= div2 >>= div2 >>= div2 div64 x
It’s like our (>>=)
.{haskell} pipeline knows how to remove invalid values
without us having to explicitly write that code. And, the only thing that
changes is the function signature!
But wait! There’s more. Let’s shift gears a little to a more complex example.
Suppose we have a database of contacts: phone numbers and next of kin. We want
to call the next of kin of someone. Our API is made of 3 functions (for
simplicity, here we assume we use Maybe
and return
Nothing
on errors – converting this to Either
with
string errors or custom errors is left as an exercise to the reader):
getUser :: ID -> Maybe User
getNextOfKin :: User -> Maybe ID
getPhoneNumber :: User -> Maybe PhoneNumber
By using these 3 functions, we need to write the following:
callNextOfKin :: ID -> Maybe PhoneNumber
Without using bind, we would need to unwrap each of the values:
callNextOfKin :: ID -> Maybe PhoneNumber
=
callNextOfKin uid case getUser uid of
Nothing -> Nothing
Just u -> case getNextOfKin u of
Nothing -> Nothing
Just k -> getPhoneNumber k
But this makes it harder to see the happy path, as error handling is included in the same block as the code. So, let’s try using bind and anonymous lambda functions:
callNextOfKin :: ID -> Maybe PhoneNumber
=
callNextOfKin uid >>= \u ->
getUser uid >>= getPhoneNumber getNextOfKin u
Now we only have a pipeline for the happy path. The error handling is hidden
behind (>>=)
’s implementation.
But, an even clearer approach is to make use of the do
notation
syntactic sugar:
callNextOfKin :: ID -> Maybe PhoneNumber
= do
callNextOfKin uid <- getUser uid
u <- getNextOfKin u
k getPhoneNumber k
Doesn’t this look like imperative code? We can even change the API to have the
return type be Either ...
(to return more informative error
messages) or IO
(to get access to an external database and/or perform
side effects) and nothing needs to change. This is why (>>=)
is
sometimes called programmable semicolon, making Haskell the best imperative
language.
Haskellized Go 🔗
Now let’s return from our detour through the ideal land of Haskell and consider an even more idealized version of Go. This doesn’t exist, but it would be such a nice addition to Go to have the following construct:
func foo(a, b, c, d string) (int, error) {
{
monadic := strconv.ParseUint(a, 10, 32)
x := strconv.ParseUint(b, 10, 32)
y := strconv.ParseUint(c, 10, 32)
z := strconv.ParseUint(d, 10, 32)
t return x + y + z + t
}
}
This would desugar to:
func foo(a, b, c, d string) (int, error) {
, err := strconv.ParseUint(a, 10, 32)
xif err != nil {
return nil, err
}
, err := strconv.ParseUint(b, 10, 32)
yif err != nil {
return nil, err
}
, err := strconv.ParseUint(c, 10, 32)
zif err != nil {
return nil, err
}
, err := strconv.ParseUint(d, 10, 32)
tif err != nil {
return nil, err
}
return x + y + z + t, nil
}
The happy path is the only visible in the first example and each time any of the returned function would return an error, the execution would stop there and return that upstream. This is the idea behind the 2-weeks old tweet of mine:
There are moments when I wish go had monadic error handling. Maybe a “monadic” construct that introduces a new block within which any error means
return nil,err
?After typing so many
if err != nil { return nil, err }
today and seeing how the algorithm is obstructed this is 1
Of course, it might be required to also implement sum-types for this (similar
to Maybe
above, having multiple separate constructors for disjoint
sets of values). But I’d be really happy with this construct, if I am to write
more and more Go code in the future :)
Caveat: I must reiterate that I’ve been only writing Go for less than a year, so it is very likely that I am missing something in the design. Don’t throw too many rotten eggs :)
Comments:
There are 0 comments (add more):