Reflections on "Testing Without Mocks"
Saturday 7 January 2023 18:30
James Shore has written a new draft of his "Testing Without Mocks: A Pattern Language" article which I thoroughly recommend reading, or at least the accompanying Mastodon thread.
To help me understand how this approach differs from a typical "testing with mocks" approach, I found it useful to think about three key questions:
-
Which parts of the system -- whether that be code, data, infrastructure or something else -- do you replace with test doubles?
-
How do you make those test doubles?
-
Who's responsible for making the test doubles?
By investigating the different answers to these questions, we can then see how we might mix and match those answers to explore other possible approaches. Let's dig in!
Question #1: which parts of the system do you replace with test doubles?
The typical approach with mocks is to replace an object's immediate dependencies with mocks. At the other end of the spectrum, if you're running end-to-end tests, you might use production code with non-production config, say to point the system at a temporary (but real) database.
Nullables says that we should run as much of our own code as possible, but avoid using real infrastructure.
So, we replace those dependencies on infrastructure at the lowest level, for instance by replacing stdout.write()
.
If you're mocking immediate dependencies, then you often end up mocking an awful lot of your interfaces at one point or another. I suspect an advantage of replacing dependencies at the lowest level is a much smaller, and therefore more manageable, set of interfaces that you have to replace.
Question #2: how do we make the test doubles?
Rather using a mocking framework, or making the real thing with different config,
you replace the lowest level of instructure with embedded stubs.
One way I think of embedded stubs is as in-memory implementations that do just enough to let the code run and the tests pass:
they don't need to be full implementations.
For instance, instead of calling random()
, just return a constant value.
We then make the other objects in our system nullable: that is, we provide a way to instantiate them with dependencies that are themselves either nullable or embedded stubs.
(This glosses over a lot of important detail in the original article, such as configurable responses and output tracking, which I'm leaving out for brevity.)
Question #3: who's responsible for making the test doubles?
Or, to put it another way, where does the code that sets up dependencies for tests live? The embedded stub pattern means that all of the code that implements an interface, whether for production or for testing, is in one place, rather than (for instance) each test case mocking an interface and having to correctly simulate how it works.
By putting this code in the same file as the production code, it means the knowledge of how the interface is supposed to work is in one place, reducing the risk of inconsistency and improving quality through repeated use.
Similarly, higher level interfaces have functions to create nullable instances in the same file as the functions that create the production instances. So, again, the knowledge of how to create a test instance of X is in one place, which is the same place as X itself, rather than scattered across multiple tests.
Mixing and matching
Now, I reckon you could pick and choose your answers to these questions. For instance, suppose your default is replacing immediate dependencies in the test case using a mocking framework. You could:
- keep using a mocking framework (different answer to question #2), but
- choose to mock the lowest level of infrastructure (same answer to question #1), and
- put all of the code that sets up the mocks (directly or indirectly) in one place (same answer to question #3).
Or you could:
- throw away the mocking framework and hand-write stubs (same answer to question #2), but
- still replace immediate dependencies (different answer to question #1), and
- write a separate implementation in each test case/suite (different answer to question #3).
These different combinations come with all sorts of different trade-offs, and some will be more useful than others. Personally, I've gotten a lot of mileage out of:
- making test doubles without a mocking framework, and
- putting the code to set up testable instances of X in the same place as X itself (so the knowledge of how X should work is in one place, and the code to simulate X isn't duplicated), but
- varying exactly at what level dependencies are replaced: sometimes immediate dependencies, sometimes the lowest level of infrastructure, sometimes somewhere in the middle. I often find that "somewhere in the middle" is where the simplest and most stable interface to replace (and therefore the one that leads to less brittle tests with clearer intent) can be found. It's entirely possible that this is an artefact of poor design choices on my part though!
Conclusion
These three questions gave me a way to interrogate the approach that James Shore describes, as well as more traditional approaches such as end-to-end testing and testing with mocks. To be clear, I think these three questions are a way to interrogate and explore approaches, not to characterise them entirely.
Each combination of answers will present its own particular challenges that need solving: if you haven't already done so, I strongly encourage you to read James Shore's original article to see how he does so.
We can, to some extent, mix and match the answers of these approaches, allowing us to consider and explore alternatives that match our own preferences and context. Even if an approach isn't the right choice at a given moment, perhaps some aspects of the approach or the underlying thinking can lead us to interesting new thoughts.
Thanks to James Shore for responding to my ramblings when I originally thought about this on Mastodon.