Clean code

One of my favourite definitions of clean code comes from Michael Feathers:

Clean code always looks like it was written by someone who cares.

There are a bunch of rules that I try to follow when I write software. In no particular order:

Short functions do one thing at a single level of abstraction. You can read the body of the top-level function and get a high-level understanding of how it works. Reading the functions that get called might fill in some of the details, but shouldn't be necessary for that high-level understanding. Rather than having one long function, I should break it up into a composition of many small functions.
Avoid duplication. Sometimes duplication is obvious (e.g. code that's been copied and pasted, maybe tweaked ever-so-slightly), sometimes it takes a while staring at something to realise that it's the same as something else. I try to think about conceptual duplication rather than syntactic duplication: if I change this code here, will I necessarily have to change another piece of code? If so, there's conceptual duplication. Sometimes, it can't be removed -- for instance, a serialiser and corresponding deserialiser have conceptual duplication. In those cases, try to put them close together, to make the link between them as obvious as possible.
No work in constructors. Constructors should just assign their arguments to fields, or perform very simple, pure operations. If you need to do some work setting up a class, then do it in a separate method, whether in a static method in the same class (I call these named constructors), or another class.
Use dependency injection. It makes it clear how your code interacts with the rest of the system, and doesn't couple you unnecessarily to specific implementations, which also happens to make testing easier.
Prefer immutability over mutability.
Avoid (implementation) inheritance. Specifically, avoid overriding methods. It makes the control flow of the class hard to understand since the jump to another class isn't explicit. It also tends to suggest that the responsibilities of the class are muddled up. For instance, when I see classes with abstract methods, those abstract methods can often be pulled out into a coherent interface that makes sense independently of the abstract class. Use composition instead.
Make the right thing easy, make the wrong thing hard (or ideally impossible). For instance:
- Consider wrapping primitives, not because you want to add functionality, but because you want to take it away. For instance, if you're using strings as unique identifiers, it doesn't usually make sense to be able to concatenate them. You also don't want to accidentally pass a different sort of string as an identifier.
- Use named arguments wherever there's the potential for ambiguity, such as arguments of the same type, or the argument list is longer than about two or three.
- Don't allow construction of objects in invalid states. For instance, rather than having a log_in method that allows other methods in that instance to be safely called, have the log_in return a new object so that those methods can't be called until you've logged in.
Prefer to create a new interface rather than bodging an existing one. I've made many bad interfaces by trying to make it serve two very similar but not identical purposes. Better to have two separate interfaces that do one job very well: one can still delegate to the other, or both can delegate to some extracted code. If I'm not sure, I'll start by making a separate interface, and only combine it with another if they really do represent the same concept.
Related to the previous point: design for use, not reuse. I don't think I've ever seen somebody produce better code by aiming for reusable code or flexibility.
Prefer pipelines to layers. We often talk about software architecture in terms of layers, but I find that this can make each layer highly dependent on the one below it. Instead, if we talk about pipelines, each piece of code is less concerned about where it gets values from, meaning it's less tightly coupled to the rest of the system.
The single responsibility principle: things that change together belong together. Objects should be cohesive.
Separate concerns. For instance, if I need to implement caching for a UserFetcher interface, I'll tend to have a CachingUserFetcher that delegates to an non-caching UserFetcher implementation. The main advantage is that the code does just what's required for caching. Since it's not mixed in with, say, database code, bugs are much easier to spot, and the flow is much to reason about. Similarly with concurrent code. A couple of other possible advantages of splitting out concerns:
- Calls to the interface look the same regardless of whether you use caching or not. (Having said that, if you've added caching, you probably care about performance, so you'll probably need some awareness of how the caching works to be able to exploit it.)
- The cache is agnostic of the implementation it delegates to. (Again, not entirely true since different caching strategies might be appropriate for different underlying implementations, but my experience says you get lucky quite often.)
At a very coarse level, most code should be one of the following:
- A value object. Just a bunch of fields, which don't require testing. If it has any other methods defined on it, they should be simple, pure functions.
- Pure function, operating on value objects. Can be tested easily using unit tests without mocks or external systems.
- Adapters that translate between domains. A common example would be a data access layer that talks to a SQL database. This should just have the code required to translate between your domain and the external system. Tests should use a running instance (or as close as possible) of the external system. The more you trust the external system to behave correctly, the less you need to test edge cases. Mocks usually aren't helpful here, since the errors are usually in my (incorrect) assumptions about how the external system behaves, assumptions that tests using mocks rather than the real external system will just repeat. If you do have complex logic, consider if it's possible and coherent to pull it out into a separate, pure module that can be tested separately.
- Delegators break up the work, and pass the pieces onto other modules. These have little or no logic, such as branching. Although mocks can be used to test these, I find these sorts of tests to mirror the production code too closely to be helpful in finding bugs or more clearly state intent. Instead, I tend to use integration tests that use real instances of the various modules, and use the simplest possible cases that make sure everything is wired up correctly. Since these have little or no logic, testing of edge cases is usually well-covered by the tests for the modules that are being delegated to.
Names matter. A lot. Try to make it sound like the domain. For instance, connect() is better than createConnection(). Accurate but verbose beats ambiguous or misleading. Verbose names often suggest that my model isn't quite right, or I don't quite understand what I'm doing yet.

Katas

A kata is a coding exercise where the goal is to learn about and practise programming, rather than to complete the task itself. As an example of what I consider to be clean code, I've put up my implementation of a ten pin bowling scorer on GitHub. I'm certainly not claiming it's perfect, but if you go through each commit in turn, hopefully you can spot lots of examples of particular techniques.

The first few commits, up to 1c788ea5f2, incrementally implement the algorithm. The code mostly works (aside from an unhandled case that I implement later). I'd like to think it's reasonably readable in this state. The Game class makes fetching the number of pins knocked down by each throw easier by defaulting the number of pins knocked down to zero. This means that the user of the Game can request any throw index, and not have to worry about walking off the end of the list.

Over the next few commits, I try to make the code as understandable as possible. The result in commit e43fc01 is longer overall, but the process has been split into several smaller functions. Whereas before you had to read and understand most of the code at once, now you can hopefully read and understand each function individually without having to know the details of the lower functions. To me, the code feels less clever and more obvious, which is definitely worth aspiring to.

Each function concentrates on doing just one thing. For instance, the _frames function focuses on iterating through each frame. It doesn't directly contain the logic for classifying frames (strike, spare, or neither), nor the code for scoring them. It has the bare minimum of code required to iterate through every frame.

Some functions might seem unnecessary. For instance, _is_strike is a one-liner that's used in one place. Although we could inline the function, I feel extracting it makes the intent of the calling function much clearer.

Mike's corner of the web.

Clean code

Katas