Mike's corner of the web.

Archive: Python

Relocatable Python virtualenvs using Whack

Saturday 7 September 2013 17:25

One of the uses for Whack is creating relocatable (aka path-independent) Python virtualenvs. Normally, a virtualenv is tied to a specific absolute path, meaning that moving the virtualenv causes errors:

$ virtualenv venv
$ venv/bin/pip install glances
(Snipping pip output)
$ mv venv venv2
$ venv2/bin/glances -v
bash: venv2/bin/glances: /tmp/venv/bin/python: bad interpreter: No such file or directory

Copying the entire virtualenv has similar but subtler problems. Rather than getting a straightforward error, the scripts in the new virtualenv will use the Python interpreter and libraries in the original virtualenv.

Whack allows virtualenvs to be created, and then moved to any other location:

$ whack install \
    git+https://github.com/mwilliamson/whack-package-python-virtualenv-env.git \
    venv
$ venv/bin/pip install glances
(Snipping pip output)
$ whack deploy venv --in-place
$ # Now we can copy the virtualenv to any other path,
$ # and it will continue to work
$ mv venv venv2
$ venv2/bin/glances -v
Glances version 1.7.1 with PsUtil 1.0.1

The whack deploy command is necessary to add any newly-installed scripts in the virtualenv to the bin directory.

One question is: why not use the --relocatable argument that virtualenv itself provides? This works in many cases, and doesn't require installation of Whack, but it also comes with a warning from virtualenv's documentation:

The --relocatable option currently has a number of issues, and is not guaranteed to work in all circumstances. It is possible that the option will be deprecated in a future version of virtualenv.

Topics: Python, Whack, Programs

Adding git (or hg, or svn) dependencies in setup.py (Python)

Wednesday 29 May 2013 21:02

Update: the behaviour of pip has changed, meaning that the option --process-dependency-links is required when running pip install.

You can specify dependencies for your Python project in setup.py by referencing packages on the Python Package Index (PyPI). But what if you want to depend on your own package that you don't want to make public? Using dependency_links, you can reference a package's source repository directly.

For instance, mayo is a public package on PyPI, so I can reference it directly:

setup(
    install_requires=[
        "mayo>=0.2.1,<0.3"
    ],
    # Skipping other arguments to setup for brevity
)

But suppose that mayo is a private package that I don't want to share. Using the dependency_links argument, I can reference the package by its source repository. The only way I could get this working with git was to use an explict SSH git URL, which requires a small transformation from the SSH URLs that GitHub or BitBucket provide. For instance, if GitHub lists the SSH URL as:

git@github.com:mwilliamson/mayo.git

then we need to explictly set the URL as being SSH, which means adding ssh:// at the front, and replacing the colon after github.com with a forward slash. Finally, we need to indicate the URL is a git URL by adding git+ to the front. This gives a URL like:

git+ssh://git@github.com/mwilliamson/mayo.git

To use a specific commit, add an at symbol followed by a commit identifier. For instance, if we wanted to use version 0.2.1, which has the tag 0.2.1 in git:

git+ssh://git@github.com/mwilliamson/mayo.git@0.2.1

Then, we can use the URL in setup.py like so:

setup(
    install_requires=[
        "mayo==0.2.1"
    ],
    dependency_links=[
        "git+ssh://git@github.com/mwilliamson/mayo.git@0.2.1#egg=mayo-0.2.1"
    ]
    # Skipping other arguments to setup for brevity
)

Note that we depend on a specific version of the package, and that we use the URL fragment (the bit after #) to indicate both the package name and version.

Topics: Python

Test reuse

Monday 18 February 2013 10:38

Code reuse is often discussed, but what about test reuse? I don't just mean reusing common code between tests -- I mean running exactly the same tests against different code. Imagine you're writing a number of different implementations of the same interface. If you write a suite of tests against the interface, any one of your implementations should be able to make the tests pass. Taking the idea even further, I've found that you can reuse the same tests whenever you're exposing the same functionality through different methods, whether as a library, an HTTP API, or a command line interface.

As an example, suppose you want to start up a virtual machine from some Python code. We could use QEMU, a command line application on Linux that lets you start up virtual machines. Invoking QEMU directly is a bit ugly, so we wrap it up in a class. As an example of usage, here's what a single test case might look like:

def can_run_commands_on_machine():
    provider = QemuProvider()
    with provider.start("ubuntu-precise-amd64") as machine:
        shell = machine.shell()
        result = shell.run(["echo", "Hello there"])
        assert_equal("Hello there\n", result.output)

We create an instance of QemuProvider, use the start method to start a virtual machine, and then run a command on the virtual machine, and check the output. However, other than the original construction of the virtual machine provider, there's nothing in the test that relies on QEMU specifically. So, we could rewrite the test to accept provider as an argument to make it implementation agnostic:

def can_run_commands_on_machine(provider):
    with provider.start("ubuntu-precise-amd64") as machine:
        shell = machine.shell()
        result = shell.run(["echo", "Hello there"])
        assert_equal("Hello there\n", result.output)

If we decided to implement a virtual machine provider using a different technology, for instance by writing the class VirtualBoxProvider, then we can reuse exactly the same test case. Not only does this save us from duplicating the test code, it means that we have a degree of confidence that each implementation can be used in the same way.

If other people are implementing your interface, you could provide the same suite of tests so they can run it against their own implementation. This can give them some confidence that they've implemented your interface correctly.

What about when you're implementing somebody else's interface? Writing your own set of implementation-agnostic tests and running it existing implementations is a great way to check that you've understood the interface. You can then use the same tests against your code to make sure your own implementation is correct.

We can take the idea of test reuse a step further by testing user interfaces with the same suites of tests that we use to implement the underlying library. Using our virtual machine example, suppose we write a command line interface (CLI) to let people start virtual machines manually. We could test the CLI by writing a separate suite of tests. Alternatively, we could write an adaptor that invokes our own application to implement the provider interface:

class CliProvider(object):
    def start(self, image_name):
        output = subprocess.check_output([
            _APPLICATION_NAME, "start", image_name
        ])
        
        return CliMachine(_parse_output(output))

Now, we can make sure that our command-line interface behaves correctly using the same suite of tests that we used to test the underlying code. If our interface is just a thin layer on top of the underlying code, then writing such an adaptor is often reasonably straightforward.

I often find writing clear and clean UI tests is hard. Keeping a clean separation between the intent of the test and the implementation is often tricky, and it takes discipline to stop the implementation details from leaking out. Reusing tests in this way forces you to hide those details behind the common interface.

If you're using nose in Python to write your tests, then I've put the code I've been using to do this in a separate library called nose-set-tests.

Topics: Testing, Software design, Python

spur.py: A simplified interface for SSH and subprocess in Python

Sunday 10 February 2013 14:45

Over the last few months, I've frequently needed to use SSH from Python, but didn't find any of the existing solutions to be well-suited for what I needed (see below for discussion of other solutions). So, I've created spur.py to make using SSH from Python easy. For instance, to run echo over SSH:

import spur

shell = spur.SshShell(hostname="localhost", username="bob", password="password1")
result = shell.run(["echo", "-n", "hello"])
print result.output # prints hello

shell.run() executes a command, and returns the result once it's finished executing. If you don't want to wait until the command has finished, you can call shell.spawn() instead, which returns a process object:

process = shell.spawn(["sh", "-c", "read value; echo $value"])
process.stdin_write("hello\n")
result = process.wait_for_result()
print result.output # prints hello

spur.py also allows commands to be run locally using the same interface:

import spur

shell = spur.LocalShell()
result = shell.run(["echo", "-n", "hello"])
print result.output # prints hello

For a complete list of supported operations, take a look at the project on GitHub.

spur.py is certainly not the only way to use SSH from Python, and it's possible that one of the other solutions might be better suited for what you need. I've come across three other main alternatives.

The first is to shell out to ssh. It works, but it's ugly.

The second is to use Fabric. Unfortunately, I found Fabric to be a bit too high-level. It's useful for implementing deployment scripts using SSH, but I found it awkward to use as a general-purpose library for SSH.

Finally, there's paramiko. I found paramiko to be a bit too low-level, but both Fabric and spur.py are built on top of paramiko.

Topics: Python

Convert Python packages to single-file Python scripts with stickytape

Wednesday 31 October 2012 20:00

Every so often, you have an idea for a little project that's both useful and fun to implement (but mainly fun), and you get pleasantly surprised when you manage to knock it out in a day. stickytape is one of those little projects. You can use it to take a Python script that depends on a load of pure-Python modules, and convert it into a single-file Python script. Admittedly, it's all one big hack -- the single-file script writes out all of its dependencies to temporary files before running the original script. It does the job for my purposes though, and it's much quicker and smaller than setting up or copying virtualenvs.

All you need to do it point it at the script you want to transform, and where stickytape should be searching for packages and modules:

stickytape scripts/blah --add-python-path . --output-file /tmp/blah-standalone

You can also point stickytape at a specific Python binary to use the values in sys.path to search for packages and modules:

stickytape scripts/blah --python-binary _virtualenv/bin/python \
    --output-file /tmp/blah-standalone

You can find stickytape both on GitHub and PyPI, meaning you can install it using easy_install or pip:

pip install stickytape

Topics: Python

Downloading source control URIs with Blah

Saturday 13 October 2012 16:00

Blah is a little Python library that allows source control URIs to be downloaded into a local directory:

import blah

blah.fetch("git+https://github.com/mwilliamson/blah.git", "/tmp/blah")
print open("/tmp/blah/README.md").read()

It can also be used as a script:

blah fetch git+https://github.com/mwilliamson/blah.git /tmp/blah

The format of the URI is the name of the VCS (version control system), then a plus character, and then the URL of the repository. Optionally, you can include a specific revision by appending a hash and an identifier for a specific revision:

blah.fetch("git+https://github.com/mwilliamson/blah.git#74d69b4", "/tmp/blah")

At the moment, only Git and Mercurial (hg) are supported. If you want to give it a whirl, you can grab it off PyPI:

$ pip install blah

Topics: Python

Funk 0.2 released

Monday 19 October 2009 15:08

Funk 0.2 has just been released -- you can find it on the Cheese Shop, or you can always get the latest version from Gitorious. You can also take a peek at Funk's documentation.

The most important change is a change of syntax. Before, you might have written:

database = context.mock()
database.expects('save').with_args('python').returns(42)
database.allows('save').with_args('python').returns(42)
database.expects_call().with_args('python').returns(42)
database.allows_call().with_args('python').returns(42)
database.set_attr(connected=False)

Now, rather than calling the methods on the mock itself, you should use the functions in funk:

from funk import expects
from funk import allows
from funk import expects_call
from funk import allows_call
from funk import set_attr

...

database = context.mock()
expects(database).save.with_args('python').returns(42)
allows(database).save.with_args('python').returns(42)
expects_call(database).with_args('python').returns(42)
allows_call(database).with_args('python').returns(42)
set_attr(database, connected=False)

If you want, you can leave out the use of with_args, leading to a style very similar to JMock:

from funk import expects
from funk import allows
from funk import expects_call
from funk import allows_call

...

database = context.mock()
expects(database).save('python').returns(42)
allows(database).save('python').returns(42)
expects_call(database)('python').returns(42)
allows_call(database)('python').returns(42)

To help transition old code over, you can use funk.legacy:

from funk.legacy import with_context

@with_context
def test_view_saves_tags_to_database(context):
    database = context.mock()
    database.expects('save')

One final change in the interface is that has_attr has been renamed to set_attr. Hopefully, the interface should be more stable from now on.

There's also a new feature in that you can now specify base classes for mocks. Let's say we have a class called TagRepository, with a single method fetch_all(). If we try to mock calls to fetch_all(), everything will work fine. If we try to mock calls to any other methods on TagRepository, an AssertionError will be raised:

@with_context
def test_tag_displayer_writes_all_tag_names_onto_separate_lines(context):
    tag_repository = context.mock(TagRepository)
    expects(tag_repository).fetch_all().returns([Tag('python'), Tag('debian')]) # Works fine
    expects(tag_repository).fetch_all_tags() # Raises an AssertionError

Two words of caution about using this feature. Firstly, this only works if the method is explicitly defined on the base class. This is often not the case if the method is dynamically generated, such as by overriding __getattribute__ on the type.

Secondly, this is no substitute for integration testing. While its true that the unit test above would not have failed, there should have been some integration test in your system that would have failed due to the method name change. The aim of allowing you to specify the base class is so that you can find that failure a little quicker.

If you find any bugs or have any suggestions, please feel free to leave a comment.

Topics: Funk, Mocking, Python, Testing

Funk – A Python mocking framework

Monday 28 September 2009 17:42

Roll up, roll up! That's right folks, I've written a Python mocking framework.

Another mocking framework?

Yup. As for why, there are a few reasons.

The simplest is to see just how difficult it was to write a usable mocking framework. It turns out not to be all that difficult – I managed to write the core of the framework over a weekend, although plenty of time was spent tweaking and adding behaviour afterwards.

A somewhat better reason is that none of the existing Python mocking frameworks that I could find really did what I wanted. The closest I found was Fudge. My criteria were something like:

  • Not using the record/replay pattern.
  • Allowing the expected calls and their arguments to be set up beforehand.
  • Allowing differentiation between the methods that have to be called, and methods that can be called.

So what's wrong with Fudge?

Fudge meets all of these expectations. So what went wrong?

Firstly, I found Fudge too strict on ordering. Imagine I have a TagRepository that returns me tags from a database. I want to mock this object since I don't want to make a database trip in a unit test. So, in Fudge, I would set up the mock like so:

@with_fakes
def test_gets_python_and_debian_tags():
    tag_repository = fudge.Fake()
    tag_repository.expects('with_name').with_args('python').returns(python_tag)
    tag_repository.next_call().with_args('debian').returns(debian_tag)
    # Rest of the test

This would require me to get the Python tag before the Debian tag – yet I really didn't care which method I called first. I'm also not a fan of the syntax – for the first expectation, expects is used, yet for the second expectation, next_call is used.

The second problem I had was that, if you only set up one expectation on a method, you could call it many times. So, with the example above, if you had only set up the expectation for the Python tag, you could get the Python tag any number of times, so long as you asked for it at least once.

I dislike this since, by adding a second expectation, we have now changed the behaviour of the first. This does not lend itself to being able to modify or refactor the test quickly.

Finally, Fudge used a global context for mocks. The ramification of this is that, when using the decorator @with_fakes, each test inherits the mocks set up for the previous test. For instance:

@with_fakes
def test_tag_is_saved_if_name_is_valid():
    database = fudge.Fake()
    database.expects('save').with_args('python')
    tag_repository = TagRepository(database)
    tag_repository.save('python')

@with_fakes
def test_tag_is_not_saved_if_name_is_blank():
    tag_repository = TagRepository(None)
    tag_repository.save('')

The second test above would fail since it does not save the Python tag to the database created in the first test. This seemed somewhat unintuitive to me, so I ended up rolling my own decorator. At the start of each test, it would remove all of the mocks currently set up so that I could start from a blank slate.

The other effect is that it makes it difficult to nest mock contexts – admittedly, something I rarely need to do, but it can be useful to make a quick assertion that requires some amount of mocking.

Okay, what does Funk do differently?

Let's take a look at how we'd write that first test using Funk:

@with_context
def test_gets_python_and_debian_tags(context):
    tag_repository = context.mock()
    tag_repository.expects('with_name').with_args('python').returns(python_tag)
    tag_repository.expects('with_name').with_args('debian').returns(debian_tag)
    # Rest of the test

The first difference is that using the @with_context decorator means that we get a context passed in. If you want to build your own context, you can do so simply by calling funk.Context().

Secondly, Funk doesn't care what order you call the two methods in.

Finally, even if you only set one expectation, Funk expects the method to be called once. Setting up further expectations will not affect the existing expectations.

Show me the code!

You can see the Git repository over on Gitorious, or grab version 0.1 from the Cheese Shop. Feel free to try it out, although the API isn't 100% stable yet. The source includes some documentation, but you might also want to take a look at some of the tests to get an idea of what you can do.

Topics: Funk, Mocking, Python, Testing