Monday 18 February 2013 10:38
Code reuse is often discussed, but what about test reuse? I don't just mean
reusing common code between tests -- I mean running exactly the same tests
against different code. Imagine you're writing a number of different
implementations of the same interface. If you write a suite of tests against the
interface, any one of your implementations should be able to make the tests
pass. Taking the idea even further, I've found that you can reuse
the same tests whenever you're exposing the same functionality through different
methods, whether as a library, an HTTP API, or a command line interface.
As an example, suppose you want to start up a virtual machine from some
Python code. We could use QEMU, a command line application on Linux that lets
you start up virtual machines. Invoking QEMU directly is a bit ugly, so we wrap it
up in a class. As an example of usage, here's what a single test case might look
like:
def can_run_commands_on_machine():
provider = QemuProvider()
with provider.start("ubuntu-precise-amd64") as machine:
shell = machine.shell()
result = shell.run(["echo", "Hello there"])
assert_equal("Hello there\n", result.output)
We create an instance of QemuProvider
, use the start
method to start a virtual machine, and then run a command on the virtual machine,
and check the output. However, other than the original construction of the
virtual machine provider, there's nothing in the test that relies on QEMU
specifically. So, we could rewrite the test to accept provider
as an argument to make it implementation agnostic:
def can_run_commands_on_machine(provider):
with provider.start("ubuntu-precise-amd64") as machine:
shell = machine.shell()
result = shell.run(["echo", "Hello there"])
assert_equal("Hello there\n", result.output)
If we decided to implement a virtual machine provider using a different
technology, for instance by writing the class VirtualBoxProvider
,
then we can reuse exactly the same test case. Not only does this save us
from duplicating the test code, it means that we have a degree of confidence
that each implementation can be used in the same way.
If other people are implementing your interface, you could provide the same
suite of tests so they can run it against their own implementation. This can
give them some confidence that they've implemented your interface
correctly.
What about when you're implementing somebody else's
interface? Writing your own set of implementation-agnostic tests and
running it existing implementations is a great way to check that you've
understood the interface. You can then use the same tests against your code
to make sure your own implementation is correct.
We can take the idea of test reuse a step further by testing user interfaces
with the same suites of tests that we use to implement the underlying library.
Using our virtual machine example, suppose we write a command line interface (CLI)
to let people start virtual machines manually. We could test the CLI by writing
a separate suite of tests. Alternatively, we could write an adaptor that invokes
our own application to implement the provider interface:
class CliProvider(object):
def start(self, image_name):
output = subprocess.check_output([
_APPLICATION_NAME, "start", image_name
])
return CliMachine(_parse_output(output))
Now, we can make sure that our command-line interface behaves correctly
using the same suite of tests that we used to test the underlying code. If
our interface is just a thin layer on top of the underlying code, then
writing such an adaptor is often reasonably straightforward.
I often find writing clear and clean UI tests is hard. Keeping a clean separation between
the intent of the test and the implementation is often tricky, and it takes
discipline to stop the implementation details from leaking out. Reusing tests
in this way forces you to hide those details behind the common interface.
If you're using nose in Python to
write your tests, then I've put the code I've been using to do this in a separate
library called
nose-set-tests.
Topics: Testing, Software design, Python
Sunday 10 February 2013 14:45
Over the last few months, I've frequently needed to use SSH from Python, but
didn't find any of the existing solutions to be well-suited for what I needed
(see below for discussion of other solutions). So, I've created
spur.py to
make using SSH from Python easy. For instance, to run echo
over
SSH:
import spur
shell = spur.SshShell(hostname="localhost", username="bob", password="password1")
result = shell.run(["echo", "-n", "hello"])
print result.output # prints hello
shell.run()
executes a command, and returns the result once it's
finished executing. If you don't want to wait until the command has finished,
you can call shell.spawn()
instead, which returns a process object:
process = shell.spawn(["sh", "-c", "read value; echo $value"])
process.stdin_write("hello\n")
result = process.wait_for_result()
print result.output # prints hello
spur.py also allows commands to be run locally using the same interface:
import spur
shell = spur.LocalShell()
result = shell.run(["echo", "-n", "hello"])
print result.output # prints hello
For a complete list of supported operations, take a look at the
project on GitHub.
spur.py is certainly not the only way to use SSH from Python, and it's
possible that one of the other solutions might be better suited for what you
need. I've come across three other main alternatives.
The first is to shell out to ssh
. It works, but it's ugly.
The second is to use Fabric. Unfortunately,
I found Fabric to be a bit too high-level. It's useful for implementing
deployment scripts using SSH, but I found it awkward to use as a general-purpose
library for SSH.
Finally, there's paramiko.
I found paramiko to be a bit too low-level, but both Fabric and spur.py
are built on top of paramiko.
Topics: Python
Tuesday 18 December 2012 21:01
When exploring unfamiliar ideas, the best approach is often to take them
to the extreme. For instance, suppose you're trying to follow the principle
"tell, don't ask". I've often
found it tricky to know where to draw the line, but as an exercise, try writing
your code without a single getter or setter. This may seem ludicrous, but by
throwing pragmatism completely out the window, you're forced to move outside
your comfort zone. While some the code might be awful, some of it might present
ideas in a new way.
As an example, suppose I have two coordinates which represent the top-left
and bottom-right corners of a rectangle, and I want to iterate through every
integer coordinate in that rectangle. My first thought might be:
def find_coordinates_in_rectangle(top_left, bottom_right):
for x in range(top_left.x - 1, bottom_right.x + 2):
for y in range(top_left.y - 1, bottom_right.y + 2):
yield Coordinate(x, y)
Normally, I might be perfectly happy with this code (although there is a bit
of duplication!) But if we've forbidden getters or setters, then we can't
retrieve the x
and y
values from each coordinate.
Instead, we can write something like:
def find_coordinates_in_rectangle(top_left, bottom_right):
return top_left.all_coordinates_in_rectangle_to(bottom_right)
The method name needs a bit more thought, but the important difference is that
we've moved some of the knowledge of our coordinate system into the actual
coordinate class. Whether or not this turns out to be a good idea, it's food
for thought that we might not have come across without such a severe constraint
as "no getters or setters".
Topics: Software design
Tuesday 18 December 2012 20:39
While taking part in the
Global Day of Coderetreat,
one of the sessions had us implementing Conway's Game of Life without using
any "if" statements to force us to use polymorphism instead.
Anything that was an "if" in spirit, such as a switch statement or storing
functions in a dictionary, was also forbidden. For most of the code,
this was fairly straightforward, but the interesting problem was the code that
decided whether a cell lived or died based on how many neighbours it had:
if numberOfNeighbours in [2, 3]:
return CellState.LIVE
else:
return CellState.DEAD
Polymorphism allows different code to be executed depending on the type of
a value. In this particular case, we need to execute different code depending
on which value we have. It follows that each number has to have a different
type so we can give it different behaviour:
class Zero(object):
def increment(self):
return One()
def live_cell_next_generation():
return CellState.DEAD
class One(object):
def increment(self):
return Two()
def live_cell_next_generation():
return CellState.DEAD
class Two(object):
def increment(self):
return Three()
def live_cell_next_generation():
return CellState.LIVE
class Three(object):
def increment(self):
return FourOrMore()
def live_cell_next_generation():
return CellState.LIVE
class FourOrMore(object):
def increment(self):
return FourOrMore()
def live_cell_next_generation():
return CellState.DEAD
In the code that counts the number of neighbours, we use our new number
system by starting with Zero
and incrementing when we find a
neighbour. To choose the next state of the cell, rather than inspecting the
number of neighbours, we ask the number of neighbours for the next state directly:
numberOfNeighbours.live_cell_next_generation()
And now we have no "if"s! It's possible to move the logic for choosing the
next cell out of the number classes, for instance using the visitor pattern,
which might feel a bit more natural. I suspect that reimplementing the natural
numbers is still going to feel about the same amount of crazy though.
Topics: Software design
Monday 10 December 2012 11:42
We're often faced with decisions that we'll have to live with for a long time.
What language should we write our application in? What framework should we use?
What will our architecture look like? We spend lots of time and effort
in trying to find the right answer, but we often forget the alternative: instead
of making this big decision, could we make the decision irrelevant?
Suppose you need to pick a language to build your system in. This is tricky
since it often takes months or even years to discover all the annoyances and
issues of a language, by which point rewriting the entire system in another
language is impractical. An alternative is to split your system up into components,
and make communication between components language-agnostic, for instance
by only allowing communication over HTTP.
Then, the choice of language affects only a single component, rather than
the entire system. You could change the language each component is written in
one-by-one, or leave older components that don't need much development in their
original language. Regardless, picking the “wrong” language no longer has such
long-lasting effects.
This flexibility in language isn't without cost though – now you potentially
have to know multiple languages to work on a system, rather than just one.
What if there's a component written in language that nobody on the team understands
anymore? There's also the overhead of using HTTP. Not only is an HTTP request
slower than an ordinary function call, it makes the call-site more
complicated.
Making any big decision irrelevant has a cost associated with it, but confidently making
the “right” decision upfront is often impossible. For any big decision, it's
worth considering: what's the cost of making the wrong decision versus the cost
of making the decision irrelevant?
Topics: Software development
Monday 10 December 2012 10:41
As programmers, we spend quite a lot of effort in pursuit of some notion of
modularity. We hope that this allows us to solve problems more easily by
splitting them up, as well as then letting us reuse parts of the code in other
applications. Plenty of attempts have been made to get closer to this ideal,
object-orientation perhaps being the most obvious example, yet one of the most
successful approaches to modularity is almost accidental: the web.
Modularity makes our code easier to reason about by allowing us to take our
large problem, split it into small parts, and solve those small parts without
having to worry about the whole. Programming languages give us plenty of ways to
do this, functions and classes among them. So far, so good. But modularity has
some other benefits that we’d like to be able to take advantage of. If I’ve
written an independent module, say to send out e-mails to my customers, I’d
like to be able to reuse that module in another application. And by creating
DLLs or JARs or your platform’s package container of choice, you can do just
that – provided your new application is on the same platform. Want to use a Java
library from C#? Well, good luck – it might be possible, but it’s not going to
be smooth sailing.
What’s more, just because the library exists, it doesn’t mean it’s going to
be a pleasant experience. If nobody can understand the interface to your code,
nobody’s going to use it. Let’s say we want to write out an XML document to an
output stream in Java. You’d imagine this would be a simple one-liner.
You’d be wrong:
import org.w3c.dom.*;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
private static final void writeDoc(Document doc, OutputStream out)
throws IOException{
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(
OutputKeys.DOCTYPE_SYSTEM, doc.getDoctype().getSystemId());
t.transform(new DOMSource(doc), new StreamResult(out));
} catch(TransformerException e) {
throw new AssertionError(e); // Can't happen!
}
}
The result is that most of the code we write is just a variation on a theme.
Odds are, somebody else has written the code before. Despite our best efforts,
we’ve fallen a little short.
However, the web brings us a little closer to the ideal. If I want to send
e-mails to my customers, I could write my own e-mail sending library. More
likely, I’d use an existing one for my language. But even then, I probably
wouldn’t have some niceties like A/B testing or DKIM signing. Instead, I could
just fire some HTTP at MailChimp, and get a whole slew of features without
getting anywhere near the code that implements them.
The web is inherently language agnostic. So long as your language can send
and receive text over HTTP, and probably parse some JSON, you’re about as
well-equipped as everybody else. Instead of building libraries for a specific
language, you can build a service that can be used from virtually every language.
The text-based nature of HTTP also helps to limit on the complexity of the API.
As SOAP will attest, you can still make a horrible mess using HTTP, but that
horrible mess is plain to see. Complex data structures are tedious to marshal
to and from text, providing a strong incentive to keep things simple. Spotting
the complexities in a class hierarchy is often not as easy.
HTTP doesn’t solve every problem – using it inside an inner loop that’s
executed thousands of times per second probably isn’t such a good idea. What’s
more, this approach might introduce some new problems. For instance, if we’re
combining existing applications using HTTP for communication, we often need to
add a thin shim to each application. For instance, you might need to write a
small plugin in PHP if you want to integrate WordPress into your system. Now,
instead of a system written in one language, you’ve got to maintain a system with
several distinct languages and platforms.
Even then, we should strive to avoid reimplementing the same old thing. As
programmers, we consistently underestimate the cost of building a system, not to
mention ongoing maintenance. By integrating existing
applications, even if they’re in an unfamiliar languages, we save ourselves
those development and maintenance costs, as well as being able to pick the best
solution for our problem. Thanks to the web, HTTP is often the easiest way to
get there.
In case you recognised the topic, an edited version of this post was used as the
Simple-Talk editorial
a few months ago.
Topics: Software development, Software design
Wednesday 31 October 2012 20:00
Every so often, you have an idea for a little project that's both useful and fun
to implement (but mainly fun), and you get pleasantly surprised when you
manage to knock it out in a day.
stickytape is one of
those little projects. You can use it to take a Python script that depends
on a load of pure-Python modules, and convert it into a single-file Python
script. Admittedly, it's all one big hack -- the single-file script writes out
all of its dependencies to temporary files before running the original script.
It does the job for my purposes though, and it's much quicker and smaller than
setting up or copying virtualenvs.
All you need to do it point it at the script you want to transform, and where
stickytape should be searching for packages and modules:
stickytape scripts/blah --add-python-path . --output-file /tmp/blah-standalone
You can also point stickytape at a specific Python binary to use the values
in sys.path
to search for packages and modules:
stickytape scripts/blah --python-binary _virtualenv/bin/python \
--output-file /tmp/blah-standalone
You can find stickytape both on GitHub and
PyPI, meaning you can install it
using easy_install or pip:
pip install stickytape
Topics: Python
Sunday 14 October 2012 21:09
As of 11 August 2012,
the Shed programming language is self-hosting. That is, the compiler written in
Shed could successfully compile itself, and also pass of its tests. So pleased
was I at the time, I apparently forgot to note the date, so I had to go and look
back at the commit logs to find the date that it happened.
Shed might be able to compile itself, but the implementation of the compiler isn't anywhere near finished.
To be able to reach this point as quickly as sensible, I've implemented a fairly
minimal subset of the language and the compiler. The next big step is to implement
type-checking in the compiler, along with corresponding language features
such as interfaces.
Since maintaining two separate compilers for the same language seems
redundant, not to mention time-consuming and tedious, I've decided to deprecate
the JavaScript compiler in favour of the Shed compiler. It also forces the
Shed compiler to be of higher quality – I've already added line and
character numbers to error messages, something that the JavaScript compiler had
and I quickly came to miss. Trying to hunt down syntax errors given only
the name of the file and the expected/actual token gets frustrating quickly.
The next couple of things to work on will probably be to add reference resolution,
and to modify the parser to make new lines significant so that I can get rid of
all those semicolons.
Topics: Shed
Saturday 13 October 2012 16:00
Blah is a little Python library that allows source control URIs to be downloaded
into a local directory:
import blah
blah.fetch("git+https://github.com/mwilliamson/blah.git", "/tmp/blah")
print open("/tmp/blah/README.md").read()
It can also be used as a script:
blah fetch git+https://github.com/mwilliamson/blah.git /tmp/blah
The format of the URI is the name of the VCS (version control system), then
a plus character, and then the URL of the repository. Optionally, you can
include a specific revision by appending a hash and an identifier for
a specific revision:
blah.fetch("git+https://github.com/mwilliamson/blah.git#74d69b4", "/tmp/blah")
At the moment, only Git and Mercurial (hg) are supported. If you want to give
it a whirl, you can grab it off PyPI:
$ pip install blah
Topics: Python
Thursday 4 October 2012 20:47
In Shed, we sometimes define functions that usually delegate to a method,
but also have some special cases or defaults if that method isn't implemented.
For instance, the function represent
should produce a string
representation of an object. If the argument implements the method
represent
, it calls that method. Otherwise, it uses the name
of the class of the object to generate the string. In code:
def represent fun(value: any) =>
if isInstance(value, Representable) then
value.represent()
else
defaultRepresent(value)
A problem arises when we want to call the function represent
from
within an implementation of the represent
method. For instance,
if we were implementing represent
for the Some
class:
def Some class(value: any) => {
// Skipping the rest of the implementation of Some for brevity
def represent fun() =>
"Some(" + represent(value) + ")"
}
This code won't compile since we're calling represent
with a
single argument, value
, but within the scope of the class,
represent
refers to the zero-argument function that implements
represent
specifically for Some
.
There are several possible solutions to this problem, but the simplest one
seems to be to use a different name for the method than for the corresponding
function. For consistency, we can introduce the convention that the method name should be a simple variation on the
function name. For instance, we might choose to use a leading underscore:
def represent fun(value: any) =>
if isInstance(value, Representable) then
value._represent()
else
defaultRepresent(value)
Although the leading underscore is perhaps a little ugly, that ugliness does
help to reinforce the idea that you shouldn't be calling
the method _represent
directly. Instead, you should be using
the represent
method. More generally, instead of calling a method
_foo
, you should be calling foo
(unless you're
actually implementing foo
).
Topics: Functional programming, Language design, Shed