Mike's corner of the web.

Polymorphism and reimplementing integers

Tuesday 18 December 2012 20:39

While taking part in the Global Day of Coderetreat, one of the sessions had us implementing Conway's Game of Life without using any "if" statements to force us to use polymorphism instead. Anything that was an "if" in spirit, such as a switch statement or storing functions in a dictionary, was also forbidden. For most of the code, this was fairly straightforward, but the interesting problem was the code that decided whether a cell lived or died based on how many neighbours it had:

if numberOfNeighbours in [2, 3]:
    return CellState.LIVE
else:
    return CellState.DEAD

Polymorphism allows different code to be executed depending on the type of a value. In this particular case, we need to execute different code depending on which value we have. It follows that each number has to have a different type so we can give it different behaviour:

class Zero(object):
    def increment(self):
        return One()
        
    def live_cell_next_generation():
        return CellState.DEAD
        
class One(object):
    def increment(self):
        return Two()
        
    def live_cell_next_generation():
        return CellState.DEAD
        
class Two(object):
    def increment(self):
        return Three()
        
    def live_cell_next_generation():
        return CellState.LIVE
        
class Three(object):
    def increment(self):
        return FourOrMore()
        
    def live_cell_next_generation():
        return CellState.LIVE

class FourOrMore(object):
    def increment(self):
        return FourOrMore()
        
    def live_cell_next_generation():
        return CellState.DEAD

In the code that counts the number of neighbours, we use our new number system by starting with Zero and incrementing when we find a neighbour. To choose the next state of the cell, rather than inspecting the number of neighbours, we ask the number of neighbours for the next state directly:

numberOfNeighbours.live_cell_next_generation()

And now we have no "if"s! It's possible to move the logic for choosing the next cell out of the number classes, for instance using the visitor pattern, which might feel a bit more natural. I suspect that reimplementing the natural numbers is still going to feel about the same amount of crazy though.

Topics: Software design

Don't make big decisions, make big decisions irrelevant

Monday 10 December 2012 11:42

We're often faced with decisions that we'll have to live with for a long time. What language should we write our application in? What framework should we use? What will our architecture look like? We spend lots of time and effort in trying to find the right answer, but we often forget the alternative: instead of making this big decision, could we make the decision irrelevant?

Suppose you need to pick a language to build your system in. This is tricky since it often takes months or even years to discover all the annoyances and issues of a language, by which point rewriting the entire system in another language is impractical. An alternative is to split your system up into components, and make communication between components language-agnostic, for instance by only allowing communication over HTTP. Then, the choice of language affects only a single component, rather than the entire system. You could change the language each component is written in one-by-one, or leave older components that don't need much development in their original language. Regardless, picking the “wrong” language no longer has such long-lasting effects.

This flexibility in language isn't without cost though – now you potentially have to know multiple languages to work on a system, rather than just one. What if there's a component written in language that nobody on the team understands anymore? There's also the overhead of using HTTP. Not only is an HTTP request slower than an ordinary function call, it makes the call-site more complicated.

Making any big decision irrelevant has a cost associated with it, but confidently making the “right” decision upfront is often impossible. For any big decision, it's worth considering: what's the cost of making the wrong decision versus the cost of making the decision irrelevant?

Topics: Software development

Modularity through HTTP

Monday 10 December 2012 10:41

As programmers, we spend quite a lot of effort in pursuit of some notion of modularity. We hope that this allows us to solve problems more easily by splitting them up, as well as then letting us reuse parts of the code in other applications. Plenty of attempts have been made to get closer to this ideal, object-orientation perhaps being the most obvious example, yet one of the most successful approaches to modularity is almost accidental: the web.

Modularity makes our code easier to reason about by allowing us to take our large problem, split it into small parts, and solve those small parts without having to worry about the whole. Programming languages give us plenty of ways to do this, functions and classes among them. So far, so good. But modularity has some other benefits that we’d like to be able to take advantage of. If I’ve written an independent module, say to send out e-mails to my customers, I’d like to be able to reuse that module in another application. And by creating DLLs or JARs or your platform’s package container of choice, you can do just that – provided your new application is on the same platform. Want to use a Java library from C#? Well, good luck – it might be possible, but it’s not going to be smooth sailing.

What’s more, just because the library exists, it doesn’t mean it’s going to be a pleasant experience. If nobody can understand the interface to your code, nobody’s going to use it. Let’s say we want to write out an XML document to an output stream in Java. You’d imagine this would be a simple one-liner. You’d be wrong:

import org.w3c.dom.*;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;

private static final void writeDoc(Document doc, OutputStream out) 
        throws IOException{
    try {
        Transformer t = TransformerFactory.newInstance().newTransformer();
        t.setOutputProperty(
            OutputKeys.DOCTYPE_SYSTEM, doc.getDoctype().getSystemId());
        t.transform(new DOMSource(doc), new StreamResult(out));
    } catch(TransformerException e) {
       throw new AssertionError(e); // Can't happen!
    }
}

The result is that most of the code we write is just a variation on a theme. Odds are, somebody else has written the code before. Despite our best efforts, we’ve fallen a little short.

However, the web brings us a little closer to the ideal. If I want to send e-mails to my customers, I could write my own e-mail sending library. More likely, I’d use an existing one for my language. But even then, I probably wouldn’t have some niceties like A/B testing or DKIM signing. Instead, I could just fire some HTTP at MailChimp, and get a whole slew of features without getting anywhere near the code that implements them.

The web is inherently language agnostic. So long as your language can send and receive text over HTTP, and probably parse some JSON, you’re about as well-equipped as everybody else. Instead of building libraries for a specific language, you can build a service that can be used from virtually every language.

The text-based nature of HTTP also helps to limit on the complexity of the API. As SOAP will attest, you can still make a horrible mess using HTTP, but that horrible mess is plain to see. Complex data structures are tedious to marshal to and from text, providing a strong incentive to keep things simple. Spotting the complexities in a class hierarchy is often not as easy.

HTTP doesn’t solve every problem – using it inside an inner loop that’s executed thousands of times per second probably isn’t such a good idea. What’s more, this approach might introduce some new problems. For instance, if we’re combining existing applications using HTTP for communication, we often need to add a thin shim to each application. For instance, you might need to write a small plugin in PHP if you want to integrate WordPress into your system. Now, instead of a system written in one language, you’ve got to maintain a system with several distinct languages and platforms.

Even then, we should strive to avoid reimplementing the same old thing. As programmers, we consistently underestimate the cost of building a system, not to mention ongoing maintenance. By integrating existing applications, even if they’re in an unfamiliar languages, we save ourselves those development and maintenance costs, as well as being able to pick the best solution for our problem. Thanks to the web, HTTP is often the easiest way to get there.

In case you recognised the topic, an edited version of this post was used as the Simple-Talk editorial a few months ago.

Topics: Software development, Software design

Convert Python packages to single-file Python scripts with stickytape

Wednesday 31 October 2012 20:00

Every so often, you have an idea for a little project that's both useful and fun to implement (but mainly fun), and you get pleasantly surprised when you manage to knock it out in a day. stickytape is one of those little projects. You can use it to take a Python script that depends on a load of pure-Python modules, and convert it into a single-file Python script. Admittedly, it's all one big hack -- the single-file script writes out all of its dependencies to temporary files before running the original script. It does the job for my purposes though, and it's much quicker and smaller than setting up or copying virtualenvs.

All you need to do it point it at the script you want to transform, and where stickytape should be searching for packages and modules:

stickytape scripts/blah --add-python-path . --output-file /tmp/blah-standalone

You can also point stickytape at a specific Python binary to use the values in sys.path to search for packages and modules:

stickytape scripts/blah --python-binary _virtualenv/bin/python \
    --output-file /tmp/blah-standalone

You can find stickytape both on GitHub and PyPI, meaning you can install it using easy_install or pip:

pip install stickytape

Topics: Python

Shed is self-hosting

Sunday 14 October 2012 21:09

As of 11 August 2012, the Shed programming language is self-hosting. That is, the compiler written in Shed could successfully compile itself, and also pass of its tests. So pleased was I at the time, I apparently forgot to note the date, so I had to go and look back at the commit logs to find the date that it happened.

Shed might be able to compile itself, but the implementation of the compiler isn't anywhere near finished. To be able to reach this point as quickly as sensible, I've implemented a fairly minimal subset of the language and the compiler. The next big step is to implement type-checking in the compiler, along with corresponding language features such as interfaces.

Since maintaining two separate compilers for the same language seems redundant, not to mention time-consuming and tedious, I've decided to deprecate the JavaScript compiler in favour of the Shed compiler. It also forces the Shed compiler to be of higher quality – I've already added line and character numbers to error messages, something that the JavaScript compiler had and I quickly came to miss. Trying to hunt down syntax errors given only the name of the file and the expected/actual token gets frustrating quickly.

The next couple of things to work on will probably be to add reference resolution, and to modify the parser to make new lines significant so that I can get rid of all those semicolons.

Topics: Shed

Downloading source control URIs with Blah

Saturday 13 October 2012 16:00

Blah is a little Python library that allows source control URIs to be downloaded into a local directory:

import blah

blah.fetch("git+https://github.com/mwilliamson/blah.git", "/tmp/blah")
print open("/tmp/blah/README.md").read()

It can also be used as a script:

blah fetch git+https://github.com/mwilliamson/blah.git /tmp/blah

The format of the URI is the name of the VCS (version control system), then a plus character, and then the URL of the repository. Optionally, you can include a specific revision by appending a hash and an identifier for a specific revision:

blah.fetch("git+https://github.com/mwilliamson/blah.git#74d69b4", "/tmp/blah")

At the moment, only Git and Mercurial (hg) are supported. If you want to give it a whirl, you can grab it off PyPI:

$ pip install blah

Topics: Python

Functions and corresponding methods with the same name in Shed

Thursday 4 October 2012 20:47

In Shed, we sometimes define functions that usually delegate to a method, but also have some special cases or defaults if that method isn't implemented. For instance, the function represent should produce a string representation of an object. If the argument implements the method represent, it calls that method. Otherwise, it uses the name of the class of the object to generate the string. In code:

def represent fun(value: any) =>
    if isInstance(value, Representable) then
        value.represent()
    else
        defaultRepresent(value)

A problem arises when we want to call the function represent from within an implementation of the represent method. For instance, if we were implementing represent for the Some class:

def Some class(value: any) => {
    // Skipping the rest of the implementation of Some for brevity
    def represent fun() =>
        "Some(" + represent(value) + ")"
}

This code won't compile since we're calling represent with a single argument, value, but within the scope of the class, represent refers to the zero-argument function that implements represent specifically for Some.

There are several possible solutions to this problem, but the simplest one seems to be to use a different name for the method than for the corresponding function. For consistency, we can introduce the convention that the method name should be a simple variation on the function name. For instance, we might choose to use a leading underscore:

def represent fun(value: any) =>
    if isInstance(value, Representable) then
        value._represent()
    else
        defaultRepresent(value)

Although the leading underscore is perhaps a little ugly, that ugliness does help to reinforce the idea that you shouldn't be calling the method _represent directly. Instead, you should be using the represent method. More generally, instead of calling a method _foo, you should be calling foo (unless you're actually implementing foo).

Topics: Functional programming, Language design, Shed

Applicative functors in uncurried languages

Sunday 9 September 2012 20:47

Note: this post assumes you already have some familiarity with applicative functors

In this post, I'll show how to implement applicative functors in JavaScript, specifically for options, and then show an alternative formulation that's arguably better suited to languages that generally have uncurried functions (that is, languages that tend to have functions that accept multiple arguments rather than a single argument).

First of all, let's implement the option type (otherwise known as the maybe type) in JavaScript as a functor:

var none = {
    map: function(func) {
        return none;
    },
    
    bind: function(func) {
        return none;
    },
    
    toString: function() {
        return "none";
    }
};

function some(value) {
    return {
        map: function(func) {
            return some(func(value));
        },
        
        bind: function(func) {
            return func(value);
        },
        
        toString: function() {
            return "some(" + value + ")";
        }
    };
}

var functor = {
    map: function(func, option) {
        return option.map(func)
    },
    unit: some,
    applyFunctor: function(funcOption, argOption) {
        return funcOption.bind(function(func) {
            return argOption.map(func);
        });
    }
};

We can then use option values as applicative functors. Let's try our implementation out to make sure it behaves as we expect:

var four = some(4);
var six = some(6);

function add(first, second) {
    return first + second;
};

function curry(func, numberOfArguments) {
    return function(value) {
        if (numberOfArguments === 1) {
            return func(value);
        } else {
            return curry(func.bind(null, value), numberOfArguments - 1);
        }
    };
}

functor.applyFunctor(functor.map(curry(add, 2), four), six);
// => some(10)
functor.applyFunctor(functor.map(curry(add, 2), none), six);
// => none
functor.applyFunctor(functor.map(curry(add, 2), four), none);
// => none

Note that the use of the functor required us to curry the add function. This isn't a problem in functional languages such as Haskell, since functions tend to be curried by default. However, in languages that usually define functions to have multiple arguments (uncurried languages, for short), such as JavaScript, things get a little untidy.

My understanding of applicative functors is that they allow functors, or rather map, to be generalised to functions that accept more than one argument, such as add. Therefore, in an uncurried language, we might imagine the following cleaner API:

functor.applyFunctorUncurried(add, four, six);
// => some(10)
functor.applyFunctorUncurried(add, none, six);
// => none
functor.applyFunctorUncurried(add, four, none);
// => none

And such an API turns out to be not too hard to implement:

functor.applyFunctorUncurried = function(func) {
    var args = Array.prototype.slice.call(arguments, 1);
    return args.reduce(
        functor.applyFunctor,
        functor.unit(curry(func, args.length))
    );
}

Interestingly, the implementation of applyFunctorUncurried is most easily expressed in terms of the original applyFunctor. I've found cases like this explain why functional languages tend to favour curried functions: it often makes the implementation of higher-order functions such as applyFunctor much more straightforward.

This raises an interesting question: are these two formulations of applyFunctor of equal power? That is, is it possible to implement each in terms of the other? It's straightforward to see that we can implement applyFunctorUncurried in terms of applyFunctor since it's precisely the implementation above. What about implementing applyFunctor in terms of applyFunctorUncurried? This turns out to be pretty straightforward too:

function applyFunctor(funcFunctor, argFunctor) {
    return functor.applyFunctorUncurried(apply, funcFunctor, argFunctor);
}

function apply(func, value) {
    return func(value);
}

Please let me know if you spot mistakes in any of the above -- I've not exactly been rigorous in proof!

I'd be curious to know if there are any languages that include the alternative formulation of applyFunctor, and whether there are common cases where the original formulation is preferable even in uncurried languages.

Topics: Functional programming, Language design, JavaScript

Peaks and troughs in software development

Monday 20 August 2012 19:47

The problem with a smooth development process is that every day is pretty much the same as the last. You might be writing great code and solving interesting problems with other passionate people, but constantly working on the same thing can begin to feel dull or even frustrating. By having a silky-smooth development process with reliable code and regular releases, you've removed those natural peaks and troughs, like the high of fixing another critical bug in production before you head home and crash. I think it was Steve Freeman who once mentioned that sometimes it's valuable to put some of those peaks and troughs back in, but preferably without putting critical bugs back in.

For instance, I like the idea of spending one day a week working on unprioritised work. It might be that the developers are keen to try out a new rendering architecture that'll halve page load times, or that there's a piece of code that can be turned into a separate library that'll be useful on other projects. Maybe there's a little visual bug that's never going to be deemed important enough to be prioritised, but a developer takes enough pride in their work to spend half an hour fixing it. This feels like a peak to me: there's a lot of value to the product in polishing the user experience, in refactoring the code, and trying out risky ideas, and the developers get to scratch some of their own itches.

However, it's regularity can make it feel routine, and you're still working on the same product. As useful as these small, regular peaks and troughs are, I think you also need the occasional Everest. Maybe it's saying “This week, I'm going to try something I've never tried before that's completely unrelated to the project”. Or perhaps you need a Grand Canyon: “Today, we're just going to concentrate on being better programmers by doing a code retreat”. Finding something that works is hard, and you can't even reuse the same idea too much without risking its value as an artificial peak or trough. But I think it's important to keep trying. You don't just want a project and its team to be alive: you need them to be invigorated.

Topics: Software development

Safer mutation: change the value, change the name

Saturday 16 June 2012 12:33

Many advocates of functional programming suggest that the concept of state, the idea that a value can change and mutate over time, makes reasoning about your program much harder, leading to more bugs. Most languages allow some form of mutability, and can therefore implement both functional and imperative algorithms, even if the preference is strongly towards immutability. In a completely pure functional language, mutability is entirely removed. Since some concepts are arguably easier to understand and implement when using mutable state, this can mean certain problems are harder to solve in a purely functional language. But what if we allowed a limited form of mutability in such a way that we still preserve many of the nicer properties of functional programming, such as referential transparency?

To take a simple example: suppose we want to append an item to the end of a list. In an imperative language, we might write something like this:

list.append("first")

so now list has an extra item, meaning that the original value of list no longer exists. In a functional programming language, we'd create a new value instead of mutating the original list:

val longerList = list.append("first")

We can now use both list and longerList, since list was not modified during the append. This means we never need to reason about what state list is in – its value never changes. The trade-off is that a functional append tends to be more expensive than an imperative append. If we don't actually want to use list again, then this is arguably a bad trade-off. What if we could allow the list to be mutated under the covers, but still be able to present a programming model that appears to preserve immutability? So, we write the same code:

val longerList = list.append("first")

but list is now implemented as a mutable list. The compiler must now ensure that list is never used after the append operation. This means the actual implementation is effectively the same as when written in an imperative style, but we ensure that whenever we change the value of an object, we also change the name used to access it.

This approach does have some severe limitations. For instance, sharing mutable state between many objects is likely to be impossible. If we allowed mutable state to be shared, then mutating that state inside one object would require marking all objects that hold that state to be unusable. In general, having the compiler keep track of this is likely to be unfeasible.

Yet this sharing of mutable state is arguably the worst form of mutablility. It means that changing something in one part of your system could change something in another far away part of the system. This idea of changing the name whenever we change the value is most useful for mutability in the small, when we just want to implement a particular algorithm efficiently.

However, there still might cases where you'd quite reasonably want to share mutable state between, say, just two objects. The more interesting question is: is it possible to handle this case without requiring the user to write an excessive number of hints to the compiler?

Topics: Language design, Functional programming