Mike's corner of the web.

Archive: Mocking

Reflections on "Testing Without Mocks"

Saturday 7 January 2023 18:30

James Shore has written a new draft of his "Testing Without Mocks: A Pattern Language" article which I thoroughly recommend reading, or at least the accompanying Mastodon thread.

To help me understand how this approach differs from a typical "testing with mocks" approach, I found it useful to think about three key questions:

  1. Which parts of the system -- whether that be code, data, infrastructure or something else -- do you replace with test doubles?

  2. How do you make those test doubles?

  3. Who's responsible for making the test doubles?

By investigating the different answers to these questions, we can then see how we might mix and match those answers to explore other possible approaches. Let's dig in!

Question #1: which parts of the system do you replace with test doubles?

The typical approach with mocks is to replace an object's immediate dependencies with mocks. At the other end of the spectrum, if you're running end-to-end tests, you might use production code with non-production config, say to point the system at a temporary (but real) database.

Nullables says that we should run as much of our own code as possible, but avoid using real infrastructure. So, we replace those dependencies on infrastructure at the lowest level, for instance by replacing stdout.write().

If you're mocking immediate dependencies, then you often end up mocking an awful lot of your interfaces at one point or another. I suspect an advantage of replacing dependencies at the lowest level is a much smaller, and therefore more manageable, set of interfaces that you have to replace.

Question #2: how do we make the test doubles?

Rather using a mocking framework, or making the real thing with different config, you replace the lowest level of instructure with embedded stubs. One way I think of embedded stubs is as in-memory implementations that do just enough to let the code run and the tests pass: they don't need to be full implementations. For instance, instead of calling random(), just return a constant value.

We then make the other objects in our system nullable: that is, we provide a way to instantiate them with dependencies that are themselves either nullable or embedded stubs.

(This glosses over a lot of important detail in the original article, such as configurable responses and output tracking, which I'm leaving out for brevity.)

Question #3: who's responsible for making the test doubles?

Or, to put it another way, where does the code that sets up dependencies for tests live? The embedded stub pattern means that all of the code that implements an interface, whether for production or for testing, is in one place, rather than (for instance) each test case mocking an interface and having to correctly simulate how it works.

By putting this code in the same file as the production code, it means the knowledge of how the interface is supposed to work is in one place, reducing the risk of inconsistency and improving quality through repeated use.

Similarly, higher level interfaces have functions to create nullable instances in the same file as the functions that create the production instances. So, again, the knowledge of how to create a test instance of X is in one place, which is the same place as X itself, rather than scattered across multiple tests.

Mixing and matching

Now, I reckon you could pick and choose your answers to these questions. For instance, suppose your default is replacing immediate dependencies in the test case using a mocking framework. You could:

  • keep using a mocking framework (different answer to question #2), but
  • choose to mock the lowest level of infrastructure (same answer to question #1), and
  • put all of the code that sets up the mocks (directly or indirectly) in one place (same answer to question #3).

Or you could:

  • throw away the mocking framework and hand-write stubs (same answer to question #2), but
  • still replace immediate dependencies (different answer to question #1), and
  • write a separate implementation in each test case/suite (different answer to question #3).

These different combinations come with all sorts of different trade-offs, and some will be more useful than others. Personally, I've gotten a lot of mileage out of:

  • making test doubles without a mocking framework, and
  • putting the code to set up testable instances of X in the same place as X itself (so the knowledge of how X should work is in one place, and the code to simulate X isn't duplicated), but
  • varying exactly at what level dependencies are replaced: sometimes immediate dependencies, sometimes the lowest level of infrastructure, sometimes somewhere in the middle. I often find that "somewhere in the middle" is where the simplest and most stable interface to replace (and therefore the one that leads to less brittle tests with clearer intent) can be found. It's entirely possible that this is an artefact of poor design choices on my part though!


These three questions gave me a way to interrogate the approach that James Shore describes, as well as more traditional approaches such as end-to-end testing and testing with mocks. To be clear, I think these three questions are a way to interrogate and explore approaches, not to characterise them entirely.

Each combination of answers will present its own particular challenges that need solving: if you haven't already done so, I strongly encourage you to read James Shore's original article to see how he does so.

We can, to some extent, mix and match the answers of these approaches, allowing us to consider and explore alternatives that match our own preferences and context. Even if an approach isn't the right choice at a given moment, perhaps some aspects of the approach or the underlying thinking can lead us to interesting new thoughts.

Thanks to James Shore for responding to my ramblings when I originally thought about this on Mastodon.

Topics: Mocking, Testing

Dynamic languages and testing

Friday 23 October 2009 19:57

The debate between static and dynamic typing has gone on for years. Static typing is frequently promoted as effectively providing free tests. For instance, consider the following Python snippet:

def reverse_string(str):
reverse_string("Palindrome is not a palindrome") # Works fine
reverse_string(42) # 42 is not a string
reverse_string() # Missing an argument

In a static language, those last two calls would probably be flagged up as compilation errors, while in a dynamic language you'd have to wait until runtime to find the errors. So, has static typing caught errors we would have missed?

Not quite. We said we'd encounter these errors at runtime -- which includes tests. If we have good test coverage, then unit tests will pick up these sorts of trivial examples almost as quickly as the compiler. Many people claim that this means you need to invest more time in unit testing with dynamic languages. I've yet to find this to be true. Whether in a dynamic language or a static language, I end up writing very similar tests, and they rarely, if ever, have anything to do with types. If I've got the typing wrong in my implementation, then the unit tests just won't pass.

Far more interesting is what happens at the integration test level. Even with fantastic unit test coverage, you can't have tested every part of the system. When you wrote that unit, you made some assumptions about what types other functions and classes took as arguments and returned. You even made some assumptions about what the methods are called. If these assumptions were wrong (or, perhaps more likely, change) in a static language, then the compiler will tell you. Quite often in a static language, if you want to change an interface, you can just make the change and fix the compiler errors -- “following the red”.

In a dynamic language, it's not that straightforward. Your unit tests as well as your code will now be out-of-date with respect to its dependencies. The interfaces you've mocked in the unit tests won't necessarily fail because, in a dynamic language, they don't have to be closely tied to the real interfaces you'll be using.

Time for another example. Sticking with Python, let's say we have a web application that has a notion of tags. In our system, we save tags into the database, and can fetch them back out by name. So, we define a TagRepository class.

class TagRepository(object):
    def fetch_by_name(self, name):
        return tag

Then, in a unit test for something that uses the tag repository, we mock the class and the method fetch_by_name. (This example uses my Funk mocking framework. See what a subtle plug that was?)

tag_repository = context.mock()

So, our unit test passes, and our application works just fine. But what happens when we change the tag repository? Let's say we want to rename fetch_by_name to get_by_name. Our unit tests for TagRepository are updated accordingly, while the tests that mock the tag repository are now out-of-date -- but our unit tests will still pass.

One solution is to use our mocking framework differently -- for instance, you could have passed TagFetcher to Funk, so it will only allows methods defined on TagFetcher to be mocked:

tag_repository = context.mock(TagFetcher)
# The following will raise an exception if TagFetcher.fetch_by_name is not a method

Of course, this isn't always possible, for instance if you're dynamically generating/defining classes or their methods.

The real answer is that we're missing integration tests -- if all our tests pass after renaming TagRepository's methods, but not changing its dependents, then nowhere are we testing the integration between TagRepository and its dependents. We've left a part of our system completely untested. Just like with unit tests, the difference in integration tests between dynamic and static languages is minimal. In both, you want to be testing functionality. If you've got the typing wrong, then this will fall out of the integration tests since the functionality won't work.

So, does that mean the discussed advantages of static typing don't exist? Sort of. While compilation and unit tests are usually quick enough to be run very frequently, integration tests tend to take longer, since they often involve using the database or file IO. With static typing, you might be able to find errors more quickly.

However, if you've got the level of unit and integration testing you should have in any project, whether its written in a static or dynamic language, then I don't think using a static language will mean you have fewer errors or fewer tests. With the same level high level of testing, a project using a dynamic language is just as robust as one written in a static language. Since integration tests are often more difficult to write than unit tests, they are often missing. Yet relying on static typing is not the answer -- static typing says nothing about how you're using values, just their type, and as such make weak assertions about the functionality of your code. Static typing is no substitute for good tests.

Topics: Mocking, Testing

Funk 0.2 released

Monday 19 October 2009 15:08

Funk 0.2 has just been released -- you can find it on the Cheese Shop, or you can always get the latest version from Gitorious. You can also take a peek at Funk's documentation.

The most important change is a change of syntax. Before, you might have written:

database = context.mock()

Now, rather than calling the methods on the mock itself, you should use the functions in funk:

from funk import expects
from funk import allows
from funk import expects_call
from funk import allows_call
from funk import set_attr


database = context.mock()
set_attr(database, connected=False)

If you want, you can leave out the use of with_args, leading to a style very similar to JMock:

from funk import expects
from funk import allows
from funk import expects_call
from funk import allows_call


database = context.mock()

To help transition old code over, you can use funk.legacy:

from funk.legacy import with_context

def test_view_saves_tags_to_database(context):
    database = context.mock()

One final change in the interface is that has_attr has been renamed to set_attr. Hopefully, the interface should be more stable from now on.

There's also a new feature in that you can now specify base classes for mocks. Let's say we have a class called TagRepository, with a single method fetch_all(). If we try to mock calls to fetch_all(), everything will work fine. If we try to mock calls to any other methods on TagRepository, an AssertionError will be raised:

def test_tag_displayer_writes_all_tag_names_onto_separate_lines(context):
    tag_repository = context.mock(TagRepository)
    expects(tag_repository).fetch_all().returns([Tag('python'), Tag('debian')]) # Works fine
    expects(tag_repository).fetch_all_tags() # Raises an AssertionError

Two words of caution about using this feature. Firstly, this only works if the method is explicitly defined on the base class. This is often not the case if the method is dynamically generated, such as by overriding __getattribute__ on the type.

Secondly, this is no substitute for integration testing. While its true that the unit test above would not have failed, there should have been some integration test in your system that would have failed due to the method name change. The aim of allowing you to specify the base class is so that you can find that failure a little quicker.

If you find any bugs or have any suggestions, please feel free to leave a comment.

Topics: Funk, Mocking, Python, Testing

Funk – A Python mocking framework

Monday 28 September 2009 17:42

Roll up, roll up! That's right folks, I've written a Python mocking framework.

Another mocking framework?

Yup. As for why, there are a few reasons.

The simplest is to see just how difficult it was to write a usable mocking framework. It turns out not to be all that difficult – I managed to write the core of the framework over a weekend, although plenty of time was spent tweaking and adding behaviour afterwards.

A somewhat better reason is that none of the existing Python mocking frameworks that I could find really did what I wanted. The closest I found was Fudge. My criteria were something like:

  • Not using the record/replay pattern.
  • Allowing the expected calls and their arguments to be set up beforehand.
  • Allowing differentiation between the methods that have to be called, and methods that can be called.

So what's wrong with Fudge?

Fudge meets all of these expectations. So what went wrong?

Firstly, I found Fudge too strict on ordering. Imagine I have a TagRepository that returns me tags from a database. I want to mock this object since I don't want to make a database trip in a unit test. So, in Fudge, I would set up the mock like so:

def test_gets_python_and_debian_tags():
    tag_repository = fudge.Fake()
    # Rest of the test

This would require me to get the Python tag before the Debian tag – yet I really didn't care which method I called first. I'm also not a fan of the syntax – for the first expectation, expects is used, yet for the second expectation, next_call is used.

The second problem I had was that, if you only set up one expectation on a method, you could call it many times. So, with the example above, if you had only set up the expectation for the Python tag, you could get the Python tag any number of times, so long as you asked for it at least once.

I dislike this since, by adding a second expectation, we have now changed the behaviour of the first. This does not lend itself to being able to modify or refactor the test quickly.

Finally, Fudge used a global context for mocks. The ramification of this is that, when using the decorator @with_fakes, each test inherits the mocks set up for the previous test. For instance:

def test_tag_is_saved_if_name_is_valid():
    database = fudge.Fake()
    tag_repository = TagRepository(database)

def test_tag_is_not_saved_if_name_is_blank():
    tag_repository = TagRepository(None)

The second test above would fail since it does not save the Python tag to the database created in the first test. This seemed somewhat unintuitive to me, so I ended up rolling my own decorator. At the start of each test, it would remove all of the mocks currently set up so that I could start from a blank slate.

The other effect is that it makes it difficult to nest mock contexts – admittedly, something I rarely need to do, but it can be useful to make a quick assertion that requires some amount of mocking.

Okay, what does Funk do differently?

Let's take a look at how we'd write that first test using Funk:

def test_gets_python_and_debian_tags(context):
    tag_repository = context.mock()
    # Rest of the test

The first difference is that using the @with_context decorator means that we get a context passed in. If you want to build your own context, you can do so simply by calling funk.Context().

Secondly, Funk doesn't care what order you call the two methods in.

Finally, even if you only set one expectation, Funk expects the method to be called once. Setting up further expectations will not affect the existing expectations.

Show me the code!

You can see the Git repository over on Gitorious, or grab version 0.1 from the Cheese Shop. Feel free to try it out, although the API isn't 100% stable yet. The source includes some documentation, but you might also want to take a look at some of the tests to get an idea of what you can do.

Topics: Funk, Mocking, Python, Testing