One of my favourite definitions of clean code comes from Michael Feathers:
Clean code always looks like it was written by someone who cares.
For me, clean code means short functions do one thing at a single level of abstraction. As Bob Martin puts it, this results in code that reads like a newspaper article. You can read the body of the top-level function and get a high-level understanding of how it works. Reading the functions that get called might fill in some of the details, but shouldn't be necessary for that high-level understanding. Rather than having one long function, I should break it up into a composition of many small functions.
There are a bunch of rules that I try to follow when I write software. In no particular order:
- No duplication. Sometimes duplication is obvious (e.g. code that's been copied and pasted, maybe tweaked ever-so-slightly), sometimes it takes a while staring at something to realise that it's the same as something else.
- No work in constructors. Constructors should just assign their arguments to fields, or perform very simple, pure operations. If you need to do some work setting up a class, then do it in a separate method, whether in a static method in the same class (I call these named constructors), or another class.
- Use dependency injection. It makes it clear how your code interacts with the rest of the system, and doesn't couple you unnecessarily to specific implementations, which also happens to make testing easier.
- Prefer immutability over mutability.
- Avoid (implementation) inheritance. Specifically, avoid overriding methods. It makes the control flow of the class hard to understand since the jump to another class isn't explicit. It also tends to suggest that the responsibilities of the class are muddled up. For instance, when I see classes with abstract methods, those abstract methods can often be pulled out into a coherent interface that makes sense independently of the abstract class. Use composition instead.
Make the right thing easy, make the wrong thing hard (or ideally impossible).
- Consider wrapping primitives, not because you want to add functionality, but because you want to take it away. For instance, if you're using strings as unique identifiers, it doesn't usually make sense to be able to concatenate them. You also don't want to accidentally pass a different sort of string as an identifier.
- Use named arguments wherever there's the potential for ambiguity, such as arguments of the same type, or the argument list is longer than about two or three.
Don't allow construction of objects in invalid states.
For instance, rather than having a
log_inmethod that allows other methods in that instance to be safely called, have the
log_inreturn a new object so that those methods can't be called until you've logged in.
- Prefer to create a new interface than bodging an existing one. I've made many bad interfaces by trying to make it serve two very similar but not identical purposes. Better to have two separate interfaces that do one job very well: one can still delegate to the other, or both can delegate to the same code. If I'm not sure, I'll start by making a separate interface, and only combine it with another if they really do represent the same concept.
- Related to the previous point: design for use, not reuse. I don't think I've ever seen somebody produce better code by aiming for reusable code or flexibility.
- Prefer pipelines to layers. We often talk about software architecture in terms of layers, but I find that this can make each layer highly dependent on the one below it. Instead, if we talk about pipelines, each piece of code is less concerned about where it gets values from, meaning it's less tightly coupled to the rest of the system.
- The single responsibility principle: things that change together belong together. Objects should be cohesive.
if I need to implement caching for a
UserFetcherinterface, I'll tend to have a
CachingUserFetcherthat delegates to an non-caching
UserFetcherimplementation. The main advantage is that the code does just what's required for caching. Since it's not mixed in with, say, database code, bugs are much easier to spot, and the flow is much to reason about. Similarly with concurrent code. A couple of other possible advantages of splitting out concerns:
- Calls to the interface look the same regardless of whether you use caching or not. (Having said that, if you've added caching, you probably care about performance, so you'll probably need some awareness of how the caching works to be able to exploit it.)
- The cache is agnostic of the implementation it delegates to. (Again, not entirely true since different caching strategies might be appropriate for different underlying implementations, but my experience says you get lucky quite often.)
Try to make it sound like the domain.
connect()is better than
createConnection(). Accurate but verbose beats ambiguous or misleading. Verbose names often suggest that my model isn't quite right, or I don't quite understand what I'm doing yet.
A kata is a coding exercise where the goal is to learn about and practise programming, rather than to complete the task itself. As an example of what I consider to be clean code, I've put up my implementation of a ten pin bowling scorer on GitHub. I'm certainly not claiming it's perfect, but if you go through each commit in turn, hopefully you can spot lots of examples of particular techniques.
The first few commits, up to
incrementally implement the algorithm.
The code mostly works (aside from an unhandled case that I implement later).
I'd like to think it's reasonably readable in this state.
Game class makes fetching the number of pins knocked down by each throw easier
by defaulting the number of pins knocked down to zero.
This means that the user of the
Game can request any throw index,
and not have to worry about walking off the end of the list.
Over the next few commits, I try to make the code as understandable as possible. The result in commit e43fc01 is longer overall, but the process has been split into several smaller functions. Whereas before you had to read and understand most of the code at once, now you can hopefully read and understand each function individually without having to know the details of the lower functions. To me, the code feels less clever and more obvious, which is definitely worth aspiring to.
Each function concentrates on doing just one thing.
For instance, the
_frames function focuses on iterating through each frame.
It doesn't directly contain the logic for classifying frames (strike, spare, or neither),
nor the code for scoring them.
It has the bare minimum of code required to iterate through every frame.
Some functions might seem unnecessary.
_is_strike is a one-liner that's used in one place.
Although we could inline the function,
I feel extracting it makes the intent of the calling function much clearer.