Mike's corner of the web.

Power à la Carte: fine-grained control of power in programming languages

Monday 28 March 2016 11:05

Proposal: general purpose programming languages should provide fine-grained control of the power that a particular module can use. "Power" includes language features, or what modules it is allowed to depend on, whether built into the language, from another package, or within the same codebase.

Suppose I'm writing some code to calculate how much tax someone should pay in a given year. I might want to forbid the use of floating point arithmetic in any of these calculations, but this isn't possible in most languages. In this proposal, I can declare what domain a module belongs to. For each domain, I can then declare what language features it's allowed to use. In this case, I'd be able to declare code as belonging to the "tax calculation" domain, and ensure that that domain cannot use floating point.

Take mutability as another example. I find most code I write is easier to reason about if it doesn't use mutability, but there are a small set of problems that I find are best expressed with mutable code, either for clarity or performance. By choosing exactly when mutability is permitted, I can still use mutability when needed without introducing its complexity into the rest of my codebase. Provided that the code doesn't mutate its arguments nor global state, the rest of the code doesn't need to know that mutability is being used: the contract of the functions can still be pure.

For each domain, I can also declare what other domains that domain is allowed to access. This allows enforcement of separation of concerns i.e. we can ensure that domains aren't intertwined when they shouldn't be. For instance, how tax is calculated shouldn't depend on personal details of an individual, such as their name. This also allows us to see what domains are used without having to read and understand the code of a module.

As another example, suppose I'm writing code to calculate the trajectory of a spacecraft. If I represent values such as distance or speed as integers, then it's possible to do something nonsensical like adding together two distances that use different units, or add together a distance and a speed. Within the domain of the trajectory calculation, I could represent these values with specialised data types preventing the incorrect use of these types. At some point, I may wish to unwrap these values, say to print them, but I can enforce that such unwrapping never happens during calculations. I'll also need to create these wrapped values in the first place, but I can declare and ensure that this value creation only occurs at well-defined boundaries, such as when reading sensors, and never directly within the calculation code.

In other words: the "mechanics" domain uses the "integer arithmetic" domain in its implementation, but that fact is private to the "mechanics" domain. In the "trajectory" domain, I explicitly declare that I can use values from the "mechanics" domain (such as adding together distances or dividing a distance by a time to get a speed), but that doesn't allow me to get their underlying integer representation nor create them from integers. In the "sensor" domain, I explicitly declare that I can create these values, meaning I can read the raw integers from some hardware, and turn them into their appropriate data types.

In the previous example, we saw how we wanted the fact that the "mechanics" domain uses the "integer arithmetic" domain to be private, but there are times when we don't want to allow this hiding. For instance, it's useful to definitively know whether a piece of code might write to disk, even if the write doesn't happen directly in that code.

I might also want to be able to enforce separation of concerns at a much higher level. Being able to enforce this separation is one of the cited benefits of a microservice architecture: it would be nice to be able to get this benefit without having to build a distributed system.

I believe part of the potential benefit of this is from going through the process of declaring what other domains a domain can access. It's often extremely easy to pull in another dependency or to rely on another part of our code without thinking about whether that makes conceptual sense. This is still possible in the system I describe, but by making the programmer be explicit, they are prompted to stop and think. I've made many bad decisions because I didn't even realise I was making a decision.

Some of this separation of domains is already possible: although I can't restrict access to language features, I have some control over what other modules each module can access: for instance, by using separate jars in Java, or assemblies in .NET. However, these are quite coarse and heavyweight: it's hard to use them to express the level of fine-grained control that I've described. The other downside is that by requiring a dependency, you normally end up being able to access its transitive dependencies: again, something that's often undesirable as previously described.

Using something like algebraic effect systems is probably more complex than necessary. The use of a language feature or domain could be modelled as an effect, but tracking this on an expression granularity is probably unhelpful: instead, tracking it by module is both simpler and more appropriate.

In fact, something like Java's annotations, C#'s attributes or Python's decorators are probably sufficient to express what domain each module resides in. You'd then need to separately define what power each domain has, which could be written in a comparatively simple declaration language. I suspect the difficulty comes not in implementing such a checker, but in working out how to use effectively, particularly what granularity is appropriate.

Topics: Software design, Language design

Thoughts? Comments? Feel free to drop me an email at hello@zwobble.org. You can also find me on Twitter as @zwobble.