Formal methods

Proving your software to be correct sounds like a good idea, and formal logic is fun! So what's stopping me from using it more?

Bugs in software come in different flavours, one of which is a failure to turn squishy requirements as understood by humans into cold, hard logic. Sometimes we don't the understand the requirements or the domain properly, and sometimes we just didn't quite wield whatever tool we're using correctly. Unfortunately, specifications are just as susceptible to these sorts of bugs as code, so they don't really help with that sort of bug. In many cases where we're encoding business logic, the code often ends up looking exactly the same as the specification since there's nothing interesting going on from an algorithm point-of-view.

That isn't to say that there are no applications where formal proofs of correctness aren't useful. For instance, I'd imagine that a specification for SQL SELECT statements is considerably simpler than the actual implementations. If the code feels more like an algorithm than domain modelling, then the constraints are probably simpler than the implementation, making it more amenable to formal methods.

The idea of proving two pieces of code equivalent is quite appealing. We could prove that a refactoring step, such as extracting a method, has preserved behaviour. We could also prove that an optimisation hasn't changed behaviour. Going back to the SQL example, we might have a reference implementation of SQL SELECTs. This implementation would be written to be as clearly correct as possible, but would probably be horribly slow. We could then prove that the heavily optimised version would give the same result as the reference implementation.

Although specifying the entire behaviour of an application is often impractical, formal methods can still be useful in verifying some useful properties. For instance, can we prove that certain errors don't occur? This is exactly what the type checker in statically-typed languages do. Microsoft's Terminator project has been used to prove that device drivers terminate i.e. they don't go into an infinite loop.

Even for small problems, writing correct specifications is tricky. Suppose we want to write a specification for a function to sort a list. Assuming the list is called L, with elements L_i, and R = sort(L) a first attempt might be:

∀i . 0 ≤ i < (length(R) - 1) => (R_i ≤ R_{i + 1})

More informally: every element of the result must be less than or equal to the following element. The problem? This is trivially implemented by having sort return an empty list. We can fix this by ensuring R is the same length as L:

(∀i . 0 ≤ i < (length(R) - 1) => (R_i ≤ R_{i + 1})) and
(length(L) = length(R))

Now, the simplest implementation of sort is to return a list of length(L) zeroes. No problem, we'll just make sure that every element of L is in R:

(∀i . 0 ≤ i < (length(R) - 1) => (R_i ≤ R_{i + 1})) and
(length(L) = length(R)) and
(∀i . 0 ≤ i < length(L) => contains(R, L_i))

This works fine (I think!) so long as L contains no duplicates. Unfortunately, a function that was given [1, 2, 2] and returned [1, 1, 2] would satisfy this specification. Therefore, we should check the count of each element is the same for both lists:

(∀i . 0 ≤ i < (length(R) - 1) => (R_i ≤ R_{i + 1})) and
(length(L) = length(R)) and
(∀i . 0 ≤ i < length(R) => count(L, L_i) = count(R, L_i))

We also need to define count:

count([], _) = 0
count([X|XS], X) = 1 + count(XS, X)
count([X|XS], Y) = count(XS, Y) if X ≠ Y

Mike's corner of the web.

Formal methods