Mike's corner of the web.

GraphJoiner: Implementing GraphQL with joins

Thursday 27 October 2016 20:23

I've had the chance to play around with GraphQL recently, and so far have found it all quite pleasant. Compared to fixed REST endpoints, being able to describe the data you need has resulted in less churn in the API. It also allows descriptions of the data needed to be co-located with the components actually using the data, making the code easier to understand and change.

However, most implementations of GraphQL libraries require some batching logic to behave efficiently. For instance, consider the request:

{
    books(genre: "comedy") {
        title
        author {
            name
        }
    }
}

A naive GraphQL implementation would issue one SQL query to get the list of all books in the comedy genre, and then N queries to get the author of each book (where N is the number of books returned by the first query).

The standard solution I've seen is to batch together requests. In node.js this is reasonably straightforward using promises and the event loop: don't start fulfiling the promise for each author until other requests in the same tick in the event loop have run, at which point you can fulfil those promises in bulk. The downside? This is trickier to implement in other languages that aren't built around an event loop.

Another issue with the standard implementation is performance. Normally, you define a resolver that gets executed on each field. In our example, the resolver for books and author would issue a request to get data from the database, but most resolvers just read a field: for instance, the resolver for title can just read the title field from the book that we got back from the database. The problem? If some resolvers can return asynchronous results, then you always need to handle the possibility of an asynchronous result. When dealing with larger responses, this can mean most of the time is spent in the overhead in invoking resolvers.

As an alternative, I suggest that you can get the data you want in three steps. First, get all of the books in the comedy genre with the requested scalar fields (i.e. their titles), along with their author IDs. Next, get all of the authors of books in the comedy genre with the requested scalar fields (i.e. their names), along with their IDs. Finally, using the IDs that we've fetched, join the authors onto the books.

In other words: rather than calling resolve for every field in the response, and batching together requests, I suggest that you can execute intermediate queries for each non-trivial field in the request, and then join the results together. As proof that the idea is at least somewhat workable, I've created a library called GraphJoiner, available as node-graphjoiner for node.js and python-graphjoiner for Python. When a response contains arrays with hundreds or thousands of results, I've found not having to call resolvers on every response field has massively reduced execution time (it was previously the majority of time spent handling the request, far exceeding the time to actually execute the SQL query). Hopefully someone else finds it useful!

Topics: Software design

Thoughts? Comments? Feel free to drop me an email at hello@zwobble.org. You can also find me on Twitter as @zwobble.