Mike's corner of the web.

The particular awkwardness of testing predicates

Sunday 15 January 2023 14:44

Predicates are awkward to test.

Or, to be more precise, predicates are awkward to test such that the test will reliably fail if the behaviour under test stops working.

To see why, let's look at an example: a permission check. Suppose I'm writing a system where only admins should have permission to publish articles. I might write the following predicate:

function hasPermissionArticlePublish(user: User): boolean {
    return user.isAdmin;
}

test("admins have permission to publish articles", () => {
    const user = {isAdmin: true};

    const result = hasPermissionArticlePublish(user);

    assert.isTrue(result);
});

test("users that aren't admins don't have permission to publish articles", () => {
    const user = {isAdmin: false};

    const result = hasPermissionArticlePublish(user);

    assert.isFalse(result);
});

I then realise that only active admins should have permission to publish articles, so I update the function (eliding the additional tests for brevity):

function hasPermissionArticlePublish(user: User): boolean {
    return user.isActive && user.isAdmin;
}

Since I'm using TypeScript, I'll need to update any existing tests to include the new field to keep the compiler happy:

test("users that aren't admins don't have permission to publish articles", () => {
    const user = {isActive: false, isAdmin: false};

    const result = hasPermissionArticlePublish(user);

    assert.isFalse(result);
});

Whoops! Although the test still passes, it's actually broken: since isActive is false, it'll pass regardless of the value of isAdmin. How can we prevent this sort of mistake?

Solution #1: Consistent construction of test data

The idea here is that we have a set of inputs that we know will make our unit under test behave one way, and then we make some minimal change to that set of inputs to make the unit under test behave another way.

In our example, we'd have a test for a user that does have permission:

test("active admins have permission to publish articles", () => {
    const user = {isActive: true, isAdmin: true};

    const result = hasPermissionArticlePublish(user);

    assert.isTrue(result);
});

We can extract the user from this test into a constant, and then make a minimal change to the user to cause the permission check to fail:

const userWithPermission = {isActive: true, isAdmin: true};

test("active admins have permission to publish articles", () => {
    const result = hasPermissionArticlePublish(userWithPermission);

    assert.isTrue(result);
});

test("users that aren't admins don't have permission to publish articles", () => {
    const user = {...userWithPermission, isAdmin: false};

    const result = hasPermissionArticlePublish(user);

    assert.isFalse(result);
});

This is a fairly unintrusive solution: the changes we needed to make were fairly small. The downside is that we've effectively coupled two of our tests together: our test that non-admins can't publish articles relies on another test to check that userWithPermission can indeed publish an article. As the predicate and the data become more complicated, maintaining the tests and the relationships between them becomes more awkward.

Solution #2: Testing the counter-case

To break the decoupling between test cases that our first solution introduced, we can instead test both of the cases we care about in a single test:

test("users that aren't admins don't have permission to publish articles", () => {
    const admin = {isActive: true, isAdmin: true};
    const nonAdmin = {...admin, isAdmin: false};

    const adminResult = hasPermissionArticlePublish(admin);
    const nonAdminResult = hasPermissionArticlePublish(nonAdmin);

    assert.isTrue(adminResult);
    assert.isFalse(nonAdminResult);
});

If we made the same mistake as before by setting isActive to false, then our first assertion would fail. Much like our first solution, we can now be more confident that we are indeed testing how the function under test behaves as we vary the isAdmin property, except that our confidence in this test no longer relies on another test.

Both this approach and the previous approach work less well when the predicate itself is more complex. When the predicate is checking that a set of conditions are all true, it's easy enough to take a user that satisfies all conditions and change one property so that a condition is no longer satisfied. When there are more complicated interactions between the inputs, this approach becomes trickier to use.

Solution #3: Returning more information from the function under test

A fundamental issue here is that there are many reasons why the permission check might fail, but we only get out a true or a false. In other words, the return value of the function doesn't give us enough information.

So, another solution would be to extract a function that returns the information we want, which can then be used both by the original permission check function and our tests.

function hasPermissionArticlePublish(user: User): boolean {
    return checkPermissionArticlePublish(user) === "SUCCESS";
}

type PermissionCheckResult =
    | "SUCCESS"
    | "FAILURE_USER_NOT_ACTIVE"
    | "FAILURE_USER_NOT_ADMIN";

function checkPermissionArticlePublish(user: User): PermissionCheckResult {
    if (!user.isActive) {
        return "FAILURE_USER_NOT_ACTIVE";
    } else if (!user.isAdmin) {
        return "FAILURE_USER_NOT_ADMIN";
    } else {
        return "SUCCESS";
    }
}

test("users that aren't admins don't have permission to publish articles", () => {
    const user = {isActive: true, isAdmin: false};

    const result = checkPermissionArticlePublish(user);

    assert.isEqual(result, "FAILURE_NOT_ADMIN");
});

This requires changing the code under test, but it allows us to get the answer we want (why did the permission check fail?) directly, rather than having to infer it in the tests. You'd probably also want to have a couple of test cases against hasPermissionArticlePublish directly, just to check it's using the result of checkPermissionArticlePublish correctly.

In this simple case, the extra code might not seem worth it, but being able to extract this sort of information can be increasingly useful as the condition becomes more complex. It's also a case where we might be willing to make our production code a little more complex in exchange for simplifying and having more confidence in our tests.

Conclusion

I've used them all of these techniques successfully in the past, and often switched between them as the problem being solved changes.

There are certainly other solutions -- for instance, property-based testing -- but hopefully the ones I've described give some food for thought if you find yourself faced with a similar challenge.

It's also worth noting that this problem isn't really specific to predicates: it happens any time that the function under test returns less information than would you would like to assert on in your test.

Interestingly, despite having faced this very problem many times, I haven't really seen anybody write about these specific techniques or the problem in general. It's probably just the case that I've been looking in the wrong places and my search skills are poor: pointers to other writing on the topic would be much appreciated!

Topics: Testing

Reflections on "Testing Without Mocks"

Saturday 7 January 2023 18:30

James Shore has written a new draft of his "Testing Without Mocks: A Pattern Language" article which I thoroughly recommend reading, or at least the accompanying Mastodon thread.

To help me understand how this approach differs from a typical "testing with mocks" approach, I found it useful to think about three key questions:

  1. Which parts of the system -- whether that be code, data, infrastructure or something else -- do you replace with test doubles?

  2. How do you make those test doubles?

  3. Who's responsible for making the test doubles?

By investigating the different answers to these questions, we can then see how we might mix and match those answers to explore other possible approaches. Let's dig in!

Question #1: which parts of the system do you replace with test doubles?

The typical approach with mocks is to replace an object's immediate dependencies with mocks. At the other end of the spectrum, if you're running end-to-end tests, you might use production code with non-production config, say to point the system at a temporary (but real) database.

Nullables says that we should run as much of our own code as possible, but avoid using real infrastructure. So, we replace those dependencies on infrastructure at the lowest level, for instance by replacing stdout.write().

If you're mocking immediate dependencies, then you often end up mocking an awful lot of your interfaces at one point or another. I suspect an advantage of replacing dependencies at the lowest level is a much smaller, and therefore more manageable, set of interfaces that you have to replace.

Question #2: how do we make the test doubles?

Rather using a mocking framework, or making the real thing with different config, you replace the lowest level of instructure with embedded stubs. One way I think of embedded stubs is as in-memory implementations that do just enough to let the code run and the tests pass: they don't need to be full implementations. For instance, instead of calling random(), just return a constant value.

We then make the other objects in our system nullable: that is, we provide a way to instantiate them with dependencies that are themselves either nullable or embedded stubs.

(This glosses over a lot of important detail in the original article, such as configurable responses and output tracking, which I'm leaving out for brevity.)

Question #3: who's responsible for making the test doubles?

Or, to put it another way, where does the code that sets up dependencies for tests live? The embedded stub pattern means that all of the code that implements an interface, whether for production or for testing, is in one place, rather than (for instance) each test case mocking an interface and having to correctly simulate how it works.

By putting this code in the same file as the production code, it means the knowledge of how the interface is supposed to work is in one place, reducing the risk of inconsistency and improving quality through repeated use.

Similarly, higher level interfaces have functions to create nullable instances in the same file as the functions that create the production instances. So, again, the knowledge of how to create a test instance of X is in one place, which is the same place as X itself, rather than scattered across multiple tests.

Mixing and matching

Now, I reckon you could pick and choose your answers to these questions. For instance, suppose your default is replacing immediate dependencies in the test case using a mocking framework. You could:

  • keep using a mocking framework (different answer to question #2), but
  • choose to mock the lowest level of infrastructure (same answer to question #1), and
  • put all of the code that sets up the mocks (directly or indirectly) in one place (same answer to question #3).

Or you could:

  • throw away the mocking framework and hand-write stubs (same answer to question #2), but
  • still replace immediate dependencies (different answer to question #1), and
  • write a separate implementation in each test case/suite (different answer to question #3).

These different combinations come with all sorts of different trade-offs, and some will be more useful than others. Personally, I've gotten a lot of mileage out of:

  • making test doubles without a mocking framework, and
  • putting the code to set up testable instances of X in the same place as X itself (so the knowledge of how X should work is in one place, and the code to simulate X isn't duplicated), but
  • varying exactly at what level dependencies are replaced: sometimes immediate dependencies, sometimes the lowest level of infrastructure, sometimes somewhere in the middle. I often find that "somewhere in the middle" is where the simplest and most stable interface to replace (and therefore the one that leads to less brittle tests with clearer intent) can be found. It's entirely possible that this is an artefact of poor design choices on my part though!

Conclusion

These three questions gave me a way to interrogate the approach that James Shore describes, as well as more traditional approaches such as end-to-end testing and testing with mocks. To be clear, I think these three questions are a way to interrogate and explore approaches, not to characterise them entirely.

Each combination of answers will present its own particular challenges that need solving: if you haven't already done so, I strongly encourage you to read James Shore's original article to see how he does so.

We can, to some extent, mix and match the answers of these approaches, allowing us to consider and explore alternatives that match our own preferences and context. Even if an approach isn't the right choice at a given moment, perhaps some aspects of the approach or the underlying thinking can lead us to interesting new thoughts.

Thanks to James Shore for responding to my ramblings when I originally thought about this on Mastodon.

Topics: Mocking, Testing

What's the value of company values?

Monday 3 January 2022 11:00

Integrity, communication, respect, and excellence. These were the values of a company that, in 2001, Fortune magazine named "America's Most Innovative Company" for the sixth time in a row. In that same year, amid the discovery of massive and sustained accountancy fraud, their share price collapsed. In December, they filed for bankruptcy. That company was Enron.

The story of Enron demonstrates it's not enough to write down some pleasant-sounding values and put them on your website. The values of an organisation are reflected by the everyday actions of everyone from the CEO to the interns.

At a previous company, I was heavily involved in the process of trying to set company values. We did better than Enron -- to the best of my knowledge, nobody committed fraud -- but I'm also not sure that we used them as effectively as we'd hoped. In any case, I thought it might useful to write down some things that I would have found helpful at the start of the process, both from that experience and from trying to influence culture more generally throughout my career.

The usual caveats of this just being my extremely biased opinion based on my limited experience apply.

What do you mean by values?

When people talk about company values, they normally mean one of two things:

  1. Principles that are intrinsically good. We should stick to our principles, even if it hurts our profits.

  2. Company-wide practices or mindsets that help us accomplish the goals of the company.

Neither one of these is necessarily better than the other, but they are different. My suggestion would be to keep the two lists separate since you'll want to treat them differently.

For instance, consider the value "Never make a sale that doesn't benefit the customer". What's the rationale for this value?

If it's a principle, then the point is that you'd rather go out of business than make money getting people to buy something that doesn't actually help them.

If it's the second type of value, then the rationale might be that such customers are unlikely to be happy in the long-term, and unhappy customers will cause reputational damage.

Now suppose you could find a way to persuade customers that there was benefit even when there isn't, or you discover that there's little risk of unhappy customers complaining prominently enough for any reputation damage to affect profits.

How you react to this new information depends on what type of value you have. In the first case, the new information is irrelevant. It changes nothing about the fundamental principle. In the second case, though, there's a logical argument to be made that the company's approach should be changed.

Being unambiguous about what type of values you're setting will help both in writing them, and in making decisions based on those values.

How specific should the values be?

Good values should guide decision-making and behaviour in an organisation. Most company's values are so vague as to be useless. For instance, if you have a company value of "communication", what's the situation where the value of "communication" prompts someone to behave differently? Don't all companies value communication, or at least claim to?

There's a few potential solutions to this.

The first is to give examples and tell unexpected stories. Nordstrom, the American retailer, explains their culture by telling the story of the Nordie who ironed a shirt for a customer who had a meeting in the afternoon, the Nordie who gift-wrapped presents bought at another retailer, the Nordie who refunded tyre chains -- even though Nordstrom don't sell tyre chains.

Nordstrom could have just said "We value customer service", but what retailer wouldn't say that? The concreteness and unexpectedness of the stories act as a jolt. If you heard them on your first day, it's much more likely to stick in your mind and actually affect how you do your job.

The second potential solution is to make the values themselves more specific. For instance, I gave an example above of "Never make a sale that doesn't benefit the customer". That's a value that's tied to how to make a specific decision: before somebody makes a sale, they should be asking themselves this question.

You won't be able to enumerate all of the values needed to cover every specific decision in the company, but that's okay: values are there to guide, not to direct every last decision. So long as the values are encouraging the behaviour you want to see, then they're doing their job. When someone needs to make a decision, there may not be a directly applicable value, but values can still set the tone, or have second-order effects that influence the decision.

The third potential solution is to describe the tension between values, and how to resolve them. The Agile Manifesto is an example of this:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

In other words, you can describe what you value and what you're willing to give up to get it.

How many?

If I had to pluck a number out of the air, I'd say something like three to five. From experience, that gives a nice balance between being memorable and capturing the important aspects of the culture. But I also would happily have fewer or more if I thought doing so was better in a given context.

Other things being equal, fewer values is better. A shorter list is easier to remember, and more likely to be used. On the other hand, being explicit about culture is often useful, especially for new employees.

Values don't need to be comprehensive, but there's also a tension in acting as a codification of the culture you want to have or retain. Do you need to codify a value if everybody in the company already follows it? Including the value might not change current behaviour, but it might preserve it.

If there's some component of your company culture that you consider vital, it should be reflected in the values. Equally, if there's something in your values that you don't think will actually influence the culture in a positive way, get rid of it.

How are you actually going to use the values?

One of the biggest risks is that you go through a process of setting values for the company, and then they live in some document on Google Drive that only gets looked at once a quarter.

For values to be useful, people need to remember them, and they need to use them. The goal is to actually live the values: having a written list of values is just a tool.

For making ideas memorable, I recommend reading Made to Stick by Chip and Dan Heath (from which I stole the Nordstrom stories from earlier -- concrete, unexpected stories are a good place to start).

As for using them: be explicit about when you expect or intend to use the values. Are they something you bring up in all one-to-ones across the company? When planning? In retrospectives?

Speaking of retrospectives: the philosophy of incremental improvement is a familiar one in the context of writing code or creating products, but perhaps less so for company culture. Yet those same forces of risk and uncertainty are at play. So, don't feel like you have to get your values right on the first go. Evaluate them as you go along. Do they still feel right? Are they encouraging the right behaviour? Are they actually being used? Is there anything you want to add, change or remove? Don't let the written values drift apart from reality.

Is writing down company values worth it?

Maybe. I think writing down company values and sticking them in a drawer never to be used, with ever-increasing dissonance between written and lived company values, is worse than not bothering with them in the first place.

However, your organisation will develop its own values and its own culture over time, regardless of what you do. Being explicit about those values, and trying to gently steer them in a positive direction seems worth a little effort.

Topics:

My tiny side project has had more impact than my decade in the software industry

Sunday 1 August 2021 12:55

Way back in 2013, I started mammoth.js, a library that converts Word documents into HTML. It's not a large project - roughly 3,000 lines - nor is it particularly exciting.

I have a strong suspicion, though, that that tiny side project has had more of a positive impact than the entirety of the decade I've spent working as a software developer.

I wrote the original version on a Friday afternoon at work, when I realised some of my colleagues were spending hours and hours each week painstakingly copying text from a Word document into our CMS and formatting it. I wrote a tool to automate the process, taking advantage of our consistent in-house styling to map Word styles to the right CSS classes rather than producing the soup of HTML that Word would produce. It wasn't perfect - my colleagues would normally still have to tweak a few things - but I'd guess it saved them over 90% of the time they were spending before on a menial and repetitive task.

Since it seemed like this was likely a problem that other people had, I made an open source implementation on my own time, first in JavaScript, later with ports to Python and Java. Since then, I've had messages from people telling me how much time it's saved them: perhaps the most heartwarming being from someone telling me that the hours they saved each week were being spent with their son instead.

I don't know what the total amount of time saved is, but I'm almost certain that it's at least hundreds of times more than the time I've spent working on the tool.

Admittedly, I've not really done all that much development on the project in recent years. The stability of the docx format means that the core functionality continues to work without changes, and most people use the same, small subset of features, so adding support for more cases and more features has rapidly diminishing returns. The nature of the project means that I don't actually need to support all that much of docx: since it tries to preserve semantic information by converting Word styles to CSS classes, rather than producing a high fidelity copy in HTML as Word does, it can happily ignore most of the actual details of Word formatting.

By comparison, having worked as a software developer for over a decade, the impact of the stuff I actually got paid to do seems underwhelming.

I've tried to pick companies working on domains that seem useful: developer productivity, treating diseases, education. While my success in those jobs has been variable - in some cases, I'm proud of what I accomplished, in others I'm pretty sure my net effect was, at best, zero - I'd have a tough time saying that the cumulative impact was greater than my little side project.

Sometimes I wonder whether it'd be possible to earn a living off mammoth. Although there's an option to donate - I currently get a grand total of £1.15 a week from regular donations - it's not something I push very hard. There are specific use cases that are more involved that I'll probably never be able to support in my spare time - for instance, support for equations - so potentially there's money to be made there.

I'm not sure it would make me any happier though. If I were a solo developer, I'd probably miss working with other people, and I'm not sure I really have the temperament to do the work to get enough sales to live off.

Somehow, though, it feels like a missed opportunity. Working on tools where the benefit is immediately visible is extremely satisfying, and there's probably plenty of domains where software could still help without requiring machine learning or a high-growth startup backed by venture capital. I'm just not sure what the best way is for someone like me to actually do so.

Topics: Software development

Adventures in WebAssembly object files and linking

Tuesday 20 April 2021 20:10

I've been tinkering with a toy compiler, and recently added support for compiling to WebAssembly (aka Wasm). At first, I compiled to the text format (aka .wat), and also wrote the runtime using the text format. I didn't want to write the entire runtime directly in Wasm though, so I looked into how to write the runtime in C instead. I managed to get things working after a few days of tinkering and some helpful advice, but since WebAssembly is comparatively new, guides were a bit thin on the ground. This is my attempt, then, to capture what I did in case it's of use to anyone.

My compiler originally worked by outputting everything -- code compiled from my language and the runtime alike -- into a single .wat file, and then compiling that into a WebAssembly module. Since I wanted to use different compilers for different parts of the final code -- my compiler for my language, Clang for the runtime written in C -- outputting everything into one enormous .wat file wasn't possible any more. Instead, Clang can produce a WebAssembly object file, which meant I needed to change my compiler to similarly produce WebAssembly object files, and then link them together using wasm-ld. In other words, at a high-level, I wanted to be able do something like this:

clang --target=wasm32 -c runtime.c -o runtime.o
my-compiler program.src -o program.o
wasm-ld runtime.o program.o -o program.wasm

WebAssembly object files are structured as WebAssembly modules with custom sections in. Since the text format doesn't support custom sections, the first step was to change my compiler to output in the binary format instead. Fortunately, having previously read the WebAssembly spec to understand the structure of WebAssembly and the text format, this was reasonably straightforward. The only bit I found particularly fiddly was dealing with LEB128 encoding.

The structure of object files aren't part of the WebAssembly spec itself, and so are documented separately. I'll leave the details to the docs, and just mention at a high level the changes I had to make compared to directly producing a Wasm module:

  • Instead of defining memory in the module and exporting it, import __linear_memory from the env module.

  • Instead of defining a table in the module, import __indirect_function_table from the env module.

  • Instead of having an immutable global containing a memory address, have a mutable global with the memory address being written as part of the initialisation function. This was necessary since the memory address needs to be relocatable, but relocations aren't valid in the global section.

  • Whenever emitting relocatable data -- in my case, this was type indices, function indices, global indices, table entry indices and memory addresses in the code section -- write the data as maximally padded LEB128 encoded integers so that updated values can be written without affecting the position of other bytes. For instance, the function index 3 should be written as 0x83 0x80 0x80 0x80 0x00.

  • Add a linking section i.e. a custom section with the name "linking".

  • Make sure all of the data that need to be contiguous are in the same data segment. For instance, my compiler compiles strings to a {length, UTF-8 data} struct. Previously, my compiler was generating this as two data segments, one for the length, and one for the UTF-8 data. Since the linker can, in principle (I think!), rearrange data segments, I changed this to be represented by a single data segment.

  • Write all of the data segments into a WASM_SEGMENT_INFO subsection in the linking section. Since each segment requires a name, I just gave each segment the name "DATA_SEGMENT_${dataSegmentIndex}".

  • Add a symbol table to the linking section. For my compiler, the symbols I needed to emit were:

    • Functions (both imported and defined)
    • Globals
    • Memory addresses (for my compiler, these have a 1:1 mapping with the data segments, so I just generated the name "DATA_${dataSegmentIndex}")
  • Instead of emitting a start section that points to the initialisation function, emit a WASM_INIT_FUNCS subsection with a single entry that points to the initialisation function in the linking section. Since the function is referenced by symbol index, not the function index, this section needs to go after the symbol table.

  • Add a relocation custom section for the code section. Anywhere in the code section that references relocatable entities should have a relocation entry. Note that the indices are for symbol entries, not the indices that are used by instructions (such as function indices). For my compiler, I emit relocation entries for:

    • Function indices (arguments to call instructions)
    • Global indices (arguments to global.set and global.get instructions)
    • Type indices (arguments to call_indirect instructions)
    • Memory addresses (arguments to i32.const instructions to produce values eventually used by load and store instructions)
    • Table entry indices (arguments to i32.const instructions to produce values eventually used by call_indirect instructions). Note that the index in the relocation entry itself should reference the index of function symbol, not a table entry index.

I didn't have any relocatable data in my data section, so I didn't need to worry about that.

With those changes, I was able to make an object file that could be combined with the object file from Clang to make the final Wasm module.

One thing worth noting is that a __wasm_call_ctors function is synthesised in the final Wasm module, which calls all of the initialisation functions. LLVM 12 has a change to wasm-ld which means that any entry point (considered to be any exported function) has a call to __wasm_call_ctors automatically inserted if there isn't an existing explicit call to __wasm_call_ctors. In other words, if you're using LLVM 12, you probably don't need to worry about calling __wasm_call_ctors yourself.

One last thought: I struggled to work out what the right place to ask for advice was, such as a mailing list or forum. I stumbled across the WebAssembly Discord server after I'd already found answers and gotten some helpful advice on a GitHub issue, but it seems pretty active, so that might a good starting place if you have questions or get stuck. If there's anywhere else with an active community, I'd love to hear about it!

Topics: WebAssembly

GraphQL composition can radically simplify data query maintenance

Saturday 27 March 2021 12:00

Using GraphQL has plenty of well-documented benefits, avoiding under- and over-fetching, strong typing, and tooling support among them. However, I think the value of the composability of GraphQL tends to be significantly underplayed. This composibility allows the data requirements for a component to be specified alongside the component itself, making the code easier to understand, and radically easier and safer to change.

Let's take an example. Suppose I'm building a music application in React, and I have a page showing a playlist of tracks. I might write a PlaylistPage component, which in turn uses a PlaylistTracks component to render the tracks in a playlist:

// PlaylistPage.js

const playlistQuery = graphql`
    query PlaylistQuery($playlistId: ID!) {
        playlist(id: $playlistId) {
            name
            tracks {
                id
                title
            }
        }
    }
`;

function PlaylistPage(props) {
    const {playlistId} = props;
    const result = useQuery(playlistQuery, {playlistId: playlistId});

    if (result.type === "loaded") {
        const {playlist} = result.data;
        return (
            <Page>
                <Heading>{playlist.name}</Heading>
                <PlaylistTracks playlist={playlist} />
            </Page>
        );
    } else ...
}

// PlaylistTracks.js

function PlaylistTracks(props) {
    const {playlist} = props;
    return (
        <ul>
            {playlist.tracks.map(track => (
                <li key={track.id}>
                    {track.title}
                </li>
            ))}
        </ul>
    );
}

The query for our page currently includes everything needed to render the page, much the same as if we'd used a REST endpoint to fetch the data. The name field is needed since that's used directly in the PlaylistPage component. The tracks field, with id and title subfields, is also needed since it's used by PlaylistTracks. This seems manageable for the moment: if we stop using PlaylistTracks in PlaylistPage, we should remove the tracks field from the query. If we start using the artist field on a playlist in PlaylistTracks, we should add the artist field to the query for PlaylistPage.

Now imagine that PlaylistPage uses many more components that each take the playlist as a prop, and those components might in turn use other components. If you stopped using a component, would you know which fields were safe to remove from the query? Imagine if a deeply nested component is used on many pages. If you change the component to use another field, are you sure you've updated all of the queries to now include that field?

As an alternative, we can define a fragment for PlaylistTracks that we can then use in PlaylistPage:

// PlaylistPage.js

const playlistQuery = graphql`
    query PlaylistQuery($playlistId: ID!) {
        playlist(id: $playlistId) {
            name
            ${PlaylistTracks.playlistFragment}
        }
    }
`;

export default function PlaylistPage(props) {
    const {playlistId} = props;
    const result = useQuery(playlistQuery, {playlistId: playlistId});

    if (result.type === "loaded") {
        return (
            <Page>
                <Heading>{playlist.name}</Heading>
                <PlaylistTracks playlist={result.data.playlist} />
            </Page>
        );
    } else ...
}

// PlaylistTracks.js

export default function PlaylistTracks(props) {
    const {playlist} = props;
    return (
        <ul>
            {playlist.tracks.map(track => (
                <li key={track.id}>
                    {track.title}
                </li>
            ))}
        </ul>
    );
}

PlaylistTracks.playlistFragment = graphql`
    ... on Playlist {
        tracks {
            id
            title
        }
    }
`;

If we stop using PlaylistTracks in PlaylistPage, it's very clear which part of the query we should also remove: ${PlaylistTracks.playlistFragment}. If we start using the artist field on a playlist in PlaylistTracks, we can add the artist field to playlistFragment on PlaylistTracks, without having to directly edit playlistQuery. If the component is used in twenty different pages and therefore twenty different queries need to fetch the data for PlaylistTracks, we still only need to update that one fragment.

On larger codebases, I've found this approach of colocating components with their data requirements radically simplifies data handling. Changes to a single component only require editing that one component, even when the data requirements change, with no worrying about whether the data is available, or why some data is queried and whether it can be safely removed.

Topics: Software design

GraphJoiner: Implementing GraphQL with joins

Thursday 27 October 2016 20:23

I've had the chance to play around with GraphQL recently, and so far have found it all quite pleasant. Compared to fixed REST endpoints, being able to describe the data you need has resulted in less churn in the API. It also allows descriptions of the data needed to be co-located with the components actually using the data, making the code easier to understand and change.

However, most implementations of GraphQL libraries require some batching logic to behave efficiently. For instance, consider the request:

{
    books(genre: "comedy") {
        title
        author {
            name
        }
    }
}

A naive GraphQL implementation would issue one SQL query to get the list of all books in the comedy genre, and then N queries to get the author of each book (where N is the number of books returned by the first query).

The standard solution I've seen is to batch together requests. In node.js this is reasonably straightforward using promises and the event loop: don't start fulfiling the promise for each author until other requests in the same tick in the event loop have run, at which point you can fulfil those promises in bulk. The downside? This is trickier to implement in other languages that aren't built around an event loop.

Another issue with the standard implementation is performance. Normally, you define a resolver that gets executed on each field. In our example, the resolver for books and author would issue a request to get data from the database, but most resolvers just read a field: for instance, the resolver for title can just read the title field from the book that we got back from the database. The problem? If some resolvers can return asynchronous results, then you always need to handle the possibility of an asynchronous result. When dealing with larger responses, this can mean most of the time is spent in the overhead in invoking resolvers.

As an alternative, I suggest that you can get the data you want in three steps. First, get all of the books in the comedy genre with the requested scalar fields (i.e. their titles), along with their author IDs. Next, get all of the authors of books in the comedy genre with the requested scalar fields (i.e. their names), along with their IDs. Finally, using the IDs that we've fetched, join the authors onto the books.

In other words: rather than calling resolve for every field in the response, and batching together requests, I suggest that you can execute intermediate queries for each non-trivial field in the request, and then join the results together. As proof that the idea is at least somewhat workable, I've created a library called GraphJoiner, available as node-graphjoiner for node.js and python-graphjoiner for Python. When a response contains arrays with hundreds or thousands of results, I've found not having to call resolvers on every response field has massively reduced execution time (it was previously the majority of time spent handling the request, far exceeding the time to actually execute the SQL query). Hopefully someone else finds it useful!

Topics: Software design

Power à la Carte: fine-grained control of power in programming languages

Monday 28 March 2016 11:05

Proposal: general purpose programming languages should provide fine-grained control of the power that a particular module can use. "Power" includes language features, or what modules it is allowed to depend on, whether built into the language, from another package, or within the same codebase.

Suppose I'm writing some code to calculate how much tax someone should pay in a given year. I might want to forbid the use of floating point arithmetic in any of these calculations, but this isn't possible in most languages. In this proposal, I can declare what domain a module belongs to. For each domain, I can then declare what language features it's allowed to use. In this case, I'd be able to declare code as belonging to the "tax calculation" domain, and ensure that that domain cannot use floating point.

Take mutability as another example. I find most code I write is easier to reason about if it doesn't use mutability, but there are a small set of problems that I find are best expressed with mutable code, either for clarity or performance. By choosing exactly when mutability is permitted, I can still use mutability when needed without introducing its complexity into the rest of my codebase. Provided that the code doesn't mutate its arguments nor global state, the rest of the code doesn't need to know that mutability is being used: the contract of the functions can still be pure.

For each domain, I can also declare what other domains that domain is allowed to access. This allows enforcement of separation of concerns i.e. we can ensure that domains aren't intertwined when they shouldn't be. For instance, how tax is calculated shouldn't depend on personal details of an individual, such as their name. This also allows us to see what domains are used without having to read and understand the code of a module.

As another example, suppose I'm writing code to calculate the trajectory of a spacecraft. If I represent values such as distance or speed as integers, then it's possible to do something nonsensical like adding together two distances that use different units, or add together a distance and a speed. Within the domain of the trajectory calculation, I could represent these values with specialised data types preventing the incorrect use of these types. At some point, I may wish to unwrap these values, say to print them, but I can enforce that such unwrapping never happens during calculations. I'll also need to create these wrapped values in the first place, but I can declare and ensure that this value creation only occurs at well-defined boundaries, such as when reading sensors, and never directly within the calculation code.

In other words: the "mechanics" domain uses the "integer arithmetic" domain in its implementation, but that fact is private to the "mechanics" domain. In the "trajectory" domain, I explicitly declare that I can use values from the "mechanics" domain (such as adding together distances or dividing a distance by a time to get a speed), but that doesn't allow me to get their underlying integer representation nor create them from integers. In the "sensor" domain, I explicitly declare that I can create these values, meaning I can read the raw integers from some hardware, and turn them into their appropriate data types.

In the previous example, we saw how we wanted the fact that the "mechanics" domain uses the "integer arithmetic" domain to be private, but there are times when we don't want to allow this hiding. For instance, it's useful to definitively know whether a piece of code might write to disk, even if the write doesn't happen directly in that code.

I might also want to be able to enforce separation of concerns at a much higher level. Being able to enforce this separation is one of the cited benefits of a microservice architecture: it would be nice to be able to get this benefit without having to build a distributed system.

I believe part of the potential benefit of this is from going through the process of declaring what other domains a domain can access. It's often extremely easy to pull in another dependency or to rely on another part of our code without thinking about whether that makes conceptual sense. This is still possible in the system I describe, but by making the programmer be explicit, they are prompted to stop and think. I've made many bad decisions because I didn't even realise I was making a decision.

Some of this separation of domains is already possible: although I can't restrict access to language features, I have some control over what other modules each module can access: for instance, by using separate jars in Java, or assemblies in .NET. However, these are quite coarse and heavyweight: it's hard to use them to express the level of fine-grained control that I've described. The other downside is that by requiring a dependency, you normally end up being able to access its transitive dependencies: again, something that's often undesirable as previously described.

Using something like algebraic effect systems is probably more complex than necessary. The use of a language feature or domain could be modelled as an effect, but tracking this on an expression granularity is probably unhelpful: instead, tracking it by module is both simpler and more appropriate.

In fact, something like Java's annotations, C#'s attributes or Python's decorators are probably sufficient to express what domain each module resides in. You'd then need to separately define what power each domain has, which could be written in a comparatively simple declaration language. I suspect the difficulty comes not in implementing such a checker, but in working out how to use effectively, particularly what granularity is appropriate.

Topics: Software design, Language design

Nope: a statically-typed subset of Python that compiles to JS and C#

Sunday 22 March 2015 15:55

Nope is a statically-typed subset of Python that uses comments as type annotations. For instance, here's the definition of a Fibonacci function:

#:: int -> int
def fib(n):
    seq = [0, 1]
    for i in range(2, n + 1):
        seq.append(seq[i - 1] + seq[i - 2])
    
    return seq[n]

print(fib(10))

And here's a generic identity function:

#:: T => T -> T
def identity(value):
    return value

Since the types are just comments, any Nope program is directly executable as a Python program without any extra dependencies.

Having written your program with type annotations, you can now compile it to some horrible-looking JavaScript or C# and run it:

$ python3 fib.py
55
$ nope compile fib.py --backend=dotnet --output-dir=/tmp/fib
$ /tmp/fib/fib.exe
55
$ nope compile fib.py --backend=node --output-dir=/tmp/fib
$ node /tmp/fib/fib.js
55

Why?

A little while ago, I wrote mammoth.js, a library for converting Word documents to HTML. Since some people found it useful, I ported it to Python. Both implementations are extremely similar, and don't heavily exploit the dynamic features of either language. Therefore, a third port, even to a statically-typed language such as C# or Java, is likely to end up looking extremely similar.

This led to the question: if I annotated the Python implementation with appropriate types, could I use it to generate vaguely sensible C# or Java code? Nope is an experiment to find out the answer.

Should I use it?

This project is primarily an experiment and a bit of fun. Feel free to have a play around, but I'd strongly suggest not using it for anything remotely practical. Not that you'd be able to anyway: many essential features are still missing, while existing features are likely to change. The type-checking is sufficiently advanced to allow partial type-checking of the Python port of Mammoth.

I discuss a few alternatives a little later on, and explain how Nope differs. In particular, there are a number of existing type checkers for Python, the most prominent being mypy.

Examples

Simple functions

Functions must have a preceding comment as a type annotation.

#:: int -> int
def triple(value):
    return value * 3
Functions with optional and named arguments

Optional arguments are indicated with a leading question mark, and must have a default value of None. For instance:

#:: name: str, ?salutation: str -> str
def greeting(name, salutation=None):
    if salutation is None:
        salutation = "Hello"
    
    print(greeting + " " + name)

print(greeting("Alice"))
print(greeting("Bob", salutation="Hi"))

Note that the type of salutation changes as it is reassigned in a branch of the if-else statement. It is initially of type str | none, which is the union of the formal argument type and none (since it's optional). After the if-else statement, it is of the narrower type str, which allows it to be safely used in the string concatenation.

Variables

Most of the time, Nope can infer a suitable type for variables. However, there are occasions where an explicit type is required, such as when inferring the type of empty lists. In these cases, an explicit type can be specified in a similar manner to functions:

#:: list[int]
x = []
Classes

When annotating the self argument of a method, you can use the explicit name of the class:

class Greeter(object):
    #:: Greeter, str -> str
    def hello(self, name):
        return "Hello " + name

For convenience, Nope also introduces Self into the scope of the class, which can be used instead of referring to the containing class directly:

class Greeter(object):
    #:: Self, str -> str
    def hello(self, name):
        return "Hello " + name

As with local variables, instance variables assigned in __init__ can have type annotations, but will often be fine using the inferred type:

class Greeter(object):
    #:: Self, str -> none
    def __init__(self, salutation):
        self._salutation = salutation

    #:: Self, str -> str
    def hello(self, name):
        return self._salutation + " " + name

Generic classes are also supported, although the exact syntax might change:

#:generic T
class Result(object):
    #:: Self, T, list[str] -> none
    def __init__(self, value, messages):
        self.value = value
        self.messages = messages
Code transformations

To preserve some of the advantages of working in a dynamic langauge, Nope supports code transformations: given the AST of a module, a transformation returns an AST that should be used for type-checking and code generation. So long as the transformation and the original runtime behaviour are consistent, this allows you to use code such as collections.namedtuple:

import collections

User = collections.namedtuple("User", [
    #:: str
    "username",
    #:: str
    "password",
])

The current implementation is a bit of a hack, but the ultimate goal is to let a user specify transformations to apply to their code. Ideally, this would allow Python libraries such as SQLAlchemy to be supported in a type-safe manner.

And the rest

Nope also supports while and for loops, try statements, raise statements, destructuring assignment and plenty more. However, I've left them out of this post since they look the same as they do in Python. The only difference is that Nope will detect when inappropriate types are used, such as when trying to raise a value that isn't an exception.

I've started using Nope on a branch of Mammoth. Only some modules are currently being type-checked by Mammoth, such as html_generation.

If you're feeling brave, Nope has a set of execution tests that check and compile sample programs. It's the not the greatest codebase in the world with many unhanded or improperly handled cases, but feel free to read the tests if you want to dive in and see exactly what Nope supports. At the moment, Nope compiles to Python (which means just copying the files verbatim), JavaScript and C# with varying degrees of feature-completeness. The C# implementation in particular has huge scope for optimisation (since it currently relies heavily on (ab)using dynamic), but should already be fast enough for many uses.

Type system

Nope ends up with a few different kinds of type in its type system. It would be nice to be able combine some of these, but for the time being I've preferred to avoid introducing extra complexity into existing types. At the moment, Nope supports:

  • Ordinary classes, made by using class in the usual way. Since inheritance is not yet supported, a type T is a subclass of itself and no other ordinary classes.
  • Function types, such as:
    • int, str -> str (Requires two positional arguments of type int and str respectively, and returns a str.)
    • int, x: int, ?y: str -> int (Has two required arguments, the second of which can be passed by the name x. Has an optional third argument called y.)
  • Structural types. For instance, we might define a structural type HasLength with the attribute __len__ of type -> int (takes no arguments, returns an integer). Any type with an appropriately typed attribute __len__ would be a subtype of HasLength. Currently not definable by users, but should be.
  • Generic types. Nope currently supports generic functions, generic classes and generic structural types. For instance, the generic structural type Iterable has a single formal type parameter, T, and an attribute __iter__ of type -> Iterator[T], where Iterator is also a generic structural type.
  • Type unions. For instance, a variable of type int | str could hold an int or a str at runtime.

What about IronPython or Jython?

Both IronPython and Jython aim to be Python implementations, rather than implementing a restricted subset. This allows them to run plenty of Python code with little to no modification.

Nope differs in that it aims to allow the generation of code without any additional runtime dependencies. Since the code is statically typed, it should also allow better integration with the platform, such as auto-complete or Intellisense in IDEs such as Visual Studio, Eclipse and IntelliJ.

What about mypy?

mypy is an excellent project that can type check Python code. If you want a project that you can practically use, then mypy is by far more appropriate. There are other type checkers that also use Python annotations to specify types, such as obiwan.

At this point, Nope differs in two main regards. Firstly, Nope's scope is slightly different in that I'm aiming to compile to other languages.

The second main difference is that Nope aims to have zero runtime dependencies. Since mypy uses Python annotations to add type information, programs written using mypy have mypy as a runtime dependency. This also allows some meta-programming with somewhat consistent type annotations, as shown in the collections.namedtuple from earlier:

import collections

User = collections.namedtuple("User", [
    #:: str
    "username",
    #:: str
    "password",
])

What next?

I'm only working on Nope in my spare time, so progress is a little slow. The aim is to get a C# version of Mammoth working by using Nope in the next few months. Although there might not be feature parity to begin with, I'd be delighted if I got far enough to get the core program working.

If, in a surprising turn of events, this turns out to work, I'd be interested to add effects to the type system. Koka is the most notable example I know of such a system, although I'll happily admit to being some unknowledgeable in this area. Thoughts welcome!

Topics: Software design, Language design

Code smears: code smells that spread across your codebase

Monday 5 January 2015 21:34

The term "code smells" gives us a handy shorthand for signs that our code might not be especially clean. In my experience, the worst sort of code smell are "code smears": those smells that tend to spread their badness across your codebase, which also makes them more difficult to fix as time goes on. For instance, suppose a function has too many arguments, making it difficult to see the purpose of each argument. That code smell isn't confined to the function definition: it's going to appear every time that the function is called.

Code smells tell us that there's probably room for improvement. Ideally, we'd avoid all code smells, but there often comes a point where our time is better spent elsewhere. Yet not all code smells are created equal. Some code smells can be fixed later with little or no impact on the rest of the codebase: for instance, if I take an extremely long function and break it up into smaller functions that compose together nicely, nobody who calls the function needs to be aware that anything has changed.

On the other hand, some code smells tend to make their presence felt across a codebase: for instance, if a function takes four boolean arguments, this leads to somewhat mysterious function calls such as update(true, false, true, true). Fixing this requires changing everywhere that calls that function. This task can range from tedious, if the function is called in many places, to impossible, if third parties are calling into our code. As time goes on, it's likely that there will be more callers of the our function, and so more work to clean it up. I'd suggest that this second group of code smells is more severe than the first, and therefore deserves the more severe name of code smears.

As a further incentive to avoid code smears moreso than code smells, I've found that code smears are often quicker and easier to fix so long as you do so as soon as they appear. Splitting up a long method can be time-consuming and difficult. Giving a function a better name or grouping together related arguments into a single object can often be accomplished in a couple of minutes provided that the function is only used in one or two places under your control.

Sometimes, it's a good trade-off not to fix a code smell. It's rarely a good trade-off to leave behind a code smear.

Topics: Software design