Archive: Software development
Sunday 1 August 2021 12:55
Way back in 2013, I started mammoth.js,
a library that converts Word documents into HTML.
It's not a large project - roughly 3,000 lines -
nor is it particularly exciting.
I have a strong suspicion, though, that that tiny side project has had more of a positive impact than
the entirety of the decade I've spent working as a software developer.
I wrote the original version on a Friday afternoon at work,
when I realised some of my colleagues were spending hours and hours each week
painstakingly copying text from a Word document into our CMS and formatting it.
I wrote a tool to automate the process,
taking advantage of our consistent in-house styling to map Word styles to the right CSS classes
rather than producing the soup of HTML that Word would produce.
It wasn't perfect - my colleagues would normally still have to tweak a few things -
but I'd guess it saved them over 90% of the time they were spending before on a menial and repetitive task.
Since it seemed like this was likely a problem that other people had,
I made an open source implementation on my own time,
first in JavaScript, later with ports to Python and Java.
Since then, I've had messages from people telling me how much time it's saved them:
perhaps the most heartwarming being from someone telling me that the hours they saved each week were being spent with their son instead.
I don't know what the total amount of time saved is,
but I'm almost certain that it's at least hundreds of times more than the time I've spent working on the tool.
Admittedly, I've not really done all that much development on the project in recent years.
The stability of the docx format means that the core functionality continues to work without changes,
and most people use the same, small subset of features,
so adding support for more cases and more features has rapidly diminishing returns.
The nature of the project means that I don't actually need to support all that much of docx:
since it tries to preserve semantic information by converting Word styles to CSS classes,
rather than producing a high fidelity copy in HTML as Word does,
it can happily ignore most of the actual details of Word formatting.
By comparison, having worked as a software developer for over a decade,
the impact of the stuff I actually got paid to do seems underwhelming.
I've tried to pick companies working on domains that seem useful:
developer productivity, treating diseases, education.
While my success in those jobs has been variable -
in some cases, I'm proud of what I accomplished, in others I'm pretty sure my net effect was, at best, zero -
I'd have a tough time saying that the cumulative impact was greater than my little side project.
Sometimes I wonder whether it'd be possible to earn a living off mammoth.
Although there's an option to donate -
I currently get a grand total of £1.15 a week from regular donations -
it's not something I push very hard.
There are specific use cases that are more involved that I'll probably never be able to support in my spare time -
for instance, support for equations -
so potentially there's money to be made there.
I'm not sure it would make me any happier though.
If I were a solo developer,
I'd probably miss working with other people,
and I'm not sure I really have the temperament to do the work to get enough sales to live off.
Somehow, though, it feels like a missed opportunity.
Working on tools where the benefit is immediately visible is extremely satisfying,
and there's probably plenty of domains where software could still help
without requiring machine learning or a high-growth startup backed by venture capital.
I'm just not sure what the best way is for someone like me to actually do so.
Topics: Software development
Tuesday 26 February 2013 20:11
If you ask a programmer to list symptoms of low code quality, they could
probably produce a long list: deeply nested conditionals and loops,
long methods, overly terse variable names. Most of these code smells
tend to focus on the implementation of the code. They're about internal code
quality.
External code quality instead asks you to consider the programmer that has
to call your code. When trying to judge how easily somebody else can you
use your code, you might ask yourself:
- Do the class and method names describe what the caller wants to accomplish?
- How many times must we call into your code to complete a single, discrete task?
- Does your code have minimal dependencies on other parts of your codebase and external
libraries?
As an example, consider this snippet of Java to write an XML document to an OutputStream:
import org.w3c.dom.*;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
private static final void writeDoc(Document document, OutputStream output)
throws IOException {
try {
Transformer transformer =
TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(
OutputKeys.DOCTYPE_SYSTEM,
document.getDoctype().getSystemId()
);
transformer.transform(new DOMSource(document), new StreamResult(output));
} catch (TransformerException e) {
throw new AssertionError(e); // Can't happen!
}
}
While there are probably good reasons for all of those methods, and there are cases where
having a high level of control is valuable, this isn't a good API for our user that just
wants to write out their XML document to an output stream.
- Do the class and method names describe what they want to accomplish? We want to write
out our XML document, and instead we're talking about
TransformerFactory
and
OutputKeys.DOCTYPE_SYSTEM
.
- How many times must we call into your code to complete a single, discrete task? Writing
out an XML document seems simple, but we have to create an instance of a transformer factory,
then ask it for a transformer, set the output property (whatever that is), wrap up our
document and output stream, before we can finally use the transformer to write out our
document.
- Does your code have minimal dependencies on other parts of your codebase and external
libraries? The code above actually does quite well here, since that snippet should
work on a normal installation of Java.
So, why is it valuable to distinguish between internal and external code quality? The effect
of low internal code quality is contained within a small scope (by definition!). I'm certainly not advocating
one letter names for all local variables, but cleaning up that code is
comparatively straightforward compared to improving an API. The effects of
low external code quality tend to pervade your entire system. If you change the signature
of a method, you now have to change every use of that method.
When writing code, we often trade off code quality against speed of execution. Even when writing
good quality code, we're not going to spend weeks refactoring to make it perfect. I'm suggesting
that we should be spending more time worrying about the external quality of our code. Internal
quality is important, but it's not as important.
A good measure of whether a piece of your code has minimal dependencies is to try "libifying" it: turn it
into an independent library. If the code you write frequently depends on large parts of the
entire system, then it probably depends on too much. Once you've split out your code into
a separate library, there's a good chance that external code quality will improve. For starters,
once you've pulled out that code, you're unlikely to accidentally introduce new dependencies that
aren't really required. Beyond that: when you've written a bad API deep within the internals of your large system,
it's easy to ignore. If you've split it out into a library, it's much harder to ignore whether
your library makes it hard or easy to do what it says on the tin.
Decomposing your code into libraries has plenty of advantages, such as code reuse and
being able to test components independently. But I have a hypothesis that aggressively
libifying your code will leave you with a much higher quality of code in the long run.
Topics: Software development, Software design
Sunday 24 February 2013 22:52
Retrospectives are unfortunately named. The name (correctly) suggests looking back
over what has gone before, but I've noticed this leads many people to
run retrospectives after a project has finished. The other part of a retrospective
is looking forward: how can we improve in the future? What can we do differently?
What can we try?
Retrospectives after a completed project can certainly be educational, but the lessons
learnt and things to do in the future tend to be somewhat abstract and vague. Since
the project is over, you can't make immediate changes over the next couple of weeks,
so there's little motivation to come up with concrete actions. Retrospectives are about
improvement, but in this case you're often improving the vague notion of a similar project
in the future.
On the other hand, if you run a retrospective in the middle of a project, you can
try out new ideas quickly, perhaps as soon as you leave the retrospective. These ideas will
hopefully improve your working life within the next couple of weeks, rather than
affecting some vague future project. This gives a strong incentive to come up with useful,
concrete actions. If you're running regular retrospectives, you also have the opportunity to
experiment and iterate on ideas.
Retrospectives shouldn't be held at the end of a project out of a sense of obligation, or
the need to learn something from a failed project. Regular retrospectives in the middle
of a project give the best chance for real improvement.
Topics: Software development
Monday 10 December 2012 11:42
We're often faced with decisions that we'll have to live with for a long time.
What language should we write our application in? What framework should we use?
What will our architecture look like? We spend lots of time and effort
in trying to find the right answer, but we often forget the alternative: instead
of making this big decision, could we make the decision irrelevant?
Suppose you need to pick a language to build your system in. This is tricky
since it often takes months or even years to discover all the annoyances and
issues of a language, by which point rewriting the entire system in another
language is impractical. An alternative is to split your system up into components,
and make communication between components language-agnostic, for instance
by only allowing communication over HTTP.
Then, the choice of language affects only a single component, rather than
the entire system. You could change the language each component is written in
one-by-one, or leave older components that don't need much development in their
original language. Regardless, picking the “wrong” language no longer has such
long-lasting effects.
This flexibility in language isn't without cost though – now you potentially
have to know multiple languages to work on a system, rather than just one.
What if there's a component written in language that nobody on the team understands
anymore? There's also the overhead of using HTTP. Not only is an HTTP request
slower than an ordinary function call, it makes the call-site more
complicated.
Making any big decision irrelevant has a cost associated with it, but confidently making
the “right” decision upfront is often impossible. For any big decision, it's
worth considering: what's the cost of making the wrong decision versus the cost
of making the decision irrelevant?
Topics: Software development
Monday 10 December 2012 10:41
As programmers, we spend quite a lot of effort in pursuit of some notion of
modularity. We hope that this allows us to solve problems more easily by
splitting them up, as well as then letting us reuse parts of the code in other
applications. Plenty of attempts have been made to get closer to this ideal,
object-orientation perhaps being the most obvious example, yet one of the most
successful approaches to modularity is almost accidental: the web.
Modularity makes our code easier to reason about by allowing us to take our
large problem, split it into small parts, and solve those small parts without
having to worry about the whole. Programming languages give us plenty of ways to
do this, functions and classes among them. So far, so good. But modularity has
some other benefits that we’d like to be able to take advantage of. If I’ve
written an independent module, say to send out e-mails to my customers, I’d
like to be able to reuse that module in another application. And by creating
DLLs or JARs or your platform’s package container of choice, you can do just
that – provided your new application is on the same platform. Want to use a Java
library from C#? Well, good luck – it might be possible, but it’s not going to
be smooth sailing.
What’s more, just because the library exists, it doesn’t mean it’s going to
be a pleasant experience. If nobody can understand the interface to your code,
nobody’s going to use it. Let’s say we want to write out an XML document to an
output stream in Java. You’d imagine this would be a simple one-liner.
You’d be wrong:
import org.w3c.dom.*;
import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
private static final void writeDoc(Document doc, OutputStream out)
throws IOException{
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(
OutputKeys.DOCTYPE_SYSTEM, doc.getDoctype().getSystemId());
t.transform(new DOMSource(doc), new StreamResult(out));
} catch(TransformerException e) {
throw new AssertionError(e); // Can't happen!
}
}
The result is that most of the code we write is just a variation on a theme.
Odds are, somebody else has written the code before. Despite our best efforts,
we’ve fallen a little short.
However, the web brings us a little closer to the ideal. If I want to send
e-mails to my customers, I could write my own e-mail sending library. More
likely, I’d use an existing one for my language. But even then, I probably
wouldn’t have some niceties like A/B testing or DKIM signing. Instead, I could
just fire some HTTP at MailChimp, and get a whole slew of features without
getting anywhere near the code that implements them.
The web is inherently language agnostic. So long as your language can send
and receive text over HTTP, and probably parse some JSON, you’re about as
well-equipped as everybody else. Instead of building libraries for a specific
language, you can build a service that can be used from virtually every language.
The text-based nature of HTTP also helps to limit on the complexity of the API.
As SOAP will attest, you can still make a horrible mess using HTTP, but that
horrible mess is plain to see. Complex data structures are tedious to marshal
to and from text, providing a strong incentive to keep things simple. Spotting
the complexities in a class hierarchy is often not as easy.
HTTP doesn’t solve every problem – using it inside an inner loop that’s
executed thousands of times per second probably isn’t such a good idea. What’s
more, this approach might introduce some new problems. For instance, if we’re
combining existing applications using HTTP for communication, we often need to
add a thin shim to each application. For instance, you might need to write a
small plugin in PHP if you want to integrate WordPress into your system. Now,
instead of a system written in one language, you’ve got to maintain a system with
several distinct languages and platforms.
Even then, we should strive to avoid reimplementing the same old thing. As
programmers, we consistently underestimate the cost of building a system, not to
mention ongoing maintenance. By integrating existing
applications, even if they’re in an unfamiliar languages, we save ourselves
those development and maintenance costs, as well as being able to pick the best
solution for our problem. Thanks to the web, HTTP is often the easiest way to
get there.
In case you recognised the topic, an edited version of this post was used as the
Simple-Talk editorial
a few months ago.
Topics: Software development, Software design
Monday 20 August 2012 19:47
The problem with a smooth development process is that every day is pretty
much the same as the last. You might be writing great code and solving
interesting problems with other passionate people, but constantly
working on the same thing can begin to feel dull or even frustrating. By
having a silky-smooth development process with reliable code and regular releases,
you've removed those natural peaks and troughs, like the high of fixing another
critical bug in production before you head home and crash.
I think it was Steve Freeman
who once mentioned that sometimes it's valuable to put some of those peaks
and troughs back in, but preferably without putting critical bugs back in.
For instance, I like the idea of spending one day a week
working on unprioritised work. It might be that the developers are keen to try
out a new rendering architecture that'll halve page load times, or that
there's a piece of code that can be turned into a separate library that'll
be useful on other projects. Maybe there's a little visual bug that's never
going to be deemed important enough to be prioritised, but a developer
takes enough pride in their work to spend half an hour fixing it. This feels like
a peak to me: there's a lot of value to the product in polishing
the user experience, in refactoring the code, and trying out risky ideas, and
the developers get to scratch some of their own itches.
However, it's regularity can make it feel routine, and you're still working
on the same product. As useful as these small, regular peaks and troughs are, I think
you also need the occasional Everest. Maybe it's saying “This week, I'm going
to try something I've never tried before that's completely unrelated to the
project”. Or perhaps you need a Grand Canyon: “Today, we're just going to
concentrate on being better programmers by doing a code retreat”.
Finding something that works is hard, and you can't even reuse the same idea
too much without risking its value as an artificial peak or
trough. But I think it's important to keep trying. You don't just want a project
and its team to be alive: you need them to be invigorated.
Topics: Software development
Tuesday 27 September 2011 20:14
I heartily endorse this fine article on writing maintainable code. What do you mean I'm biased because I wrote it?
Topics: Software design, Software development, Testing
Sunday 21 February 2010 20:57
Improving performance is often a desirable goal. Sometimes you'll have a precise number for just how much performance needs to be improved by, particularly in real-time systems. More often, though, the request for improved performance is far more vague. So, what sort of numbers should we aim for when we want things to go faster? This depends on why you want faster performance -- do you just want to save a bit of time, or do you really want things to change?
Take, for instance, the time it takes to run your entire test suite. This can vary wildly, depending on the application, from seconds to days. Let's say we're working on a small project, and we have a test suite that covers the entire application in one minute. This is fast enough that we can run the entire suite every time before we commit, but we won't be running it every time we make a small change. If we made it go, say, twice as fast, we'd definitely save ourselves some time -- thirty seconds for every commit, if we really do run all the tests before every commit. This is still too slow to be running each time we make a small change, but what if we sped up the suite by an order of magnitude instead, so it takes only a few seconds to run? Now, running the entire suite every minute or two is practical, rather than just before every commit.
Sometimes, we really can get performance improvements of an order of magnitude, by improvements in technology or a clever new algorithm. Otherwise, we might still be able to do what all programmers do -- cheat. If we can get most of the benefit in a much shorter time, then this is often good enough. Going back to our test suite, if we can identify some subset of the tests that run in 10% of the time with 90% of the coverage, then most bugs we might introduce are still picked up, while our tool, the test suite, becomes more flexible.
By improving performance not by small amounts, but by orders of magnitude, we can change the way we use and think about our tools. Performance really does matter.
Topics: Software development