# Mike's corner of the web.

### What is the expression problem?

Sunday 27 May 2012 21:00

View interactive version of this post

#### The problem

The expression is a tricky problem in many languages that asks: given a set of functions that operate over a set of types, how do we allow both the set of functions and the set of types that those functions operate over be extended without losing type safety?

#### Abstract data types

Let's say we have the abstract syntax tree (AST) for a simple mathematical language that contains literals and additions. In ML, we can represent a node with the abstract data type (ADT) `node` which has two data constructors, `LiteralNode` and `AddNode`:

``````datatype node
= LiteralNode of real
| AddNode of node * node``````

We can then define a function `evaluate` that turns the AST into a single number.

``````datatype node
= LiteralNode of real
| AddNode of node * node

fun evaluate (LiteralNode value) = value
| evaluate (AddNode (left, right)) = (evaluate left) + (evaluate right)``````

Note that `evaluate` is type-safe since it handles all possible instances of `node`. Now, suppose we want to add another operation over nodes, say to turn the AST into a string. Using ADTs, this is simply a case of adding another function. Importantly, this doesn't require any modification to the existing source code.

``````datatype node
= LiteralNode of real
| AddNode of node * node

fun evaluate (LiteralNode value) = value
| evaluate (AddNode (left, right)) = (evaluate left) + (evaluate right)

fun nodeToString (LiteralNode value) = Real.toString value
| nodeToString (AddNode (left, right)) =
"(" ^ (nodeToString left) ^ " + " ^ (nodeToString right) ^ ")"
``````

The problem arises when we decide that we'd like a variant of our mathematical language with the negation operator. We'd like to be able to evaluate this extension of our mathematical language, but we're not concerned with turning negations into strings. There's no straightforward way of achieving this using ADTs -- we're forced to add another data constructor to `node`, which may not be possible if we don't own the original source code. We also add the appropriate case to `evaluate`.

``````datatype node
= LiteralNode of real
| AddNode of node * node
| NegateNode of node

fun evaluate (LiteralNode value) = value
| evaluate (AddNode (left, right)) = (evaluate left) + (evaluate right)
| evaluate (NegateNode term) = ~(evaluate term)

fun nodeToString (LiteralNode value) = Real.toString value
| nodeToString (AddNode (left, right)) =
"(" ^ (nodeToString left) ^ " + " ^ (nodeToString right) ^ ")"
``````

Even if we can modify our definition of `node`, we still have a problem: we can no longer safely create functions that operate over our original language. Consider the function `nodeToString`: since it no longer exhaustively matches all possible instances of `node`, it's not type-safe. To restore type safety, we're forced to update it to handle the case of `NegateNode`:

``````datatype node
= LiteralNode of real
| AddNode of node * node
| NegateNode of node

fun evaluate (LiteralNode value) = value
| evaluate (AddNode (left, right)) = (evaluate left) + (evaluate right)
| evaluate (NegateNode term) = ~(evaluate term)

fun nodeToString (LiteralNode value) = Real.toString value
| nodeToString (AddNode (left, right)) =
"(" ^ (nodeToString left) ^ " + " ^ (nodeToString right) ^ ")"
| nodeToString (NegateNode term) = "-" ^ (nodeToString term)
``````

In general, ADTs make it easy to add extra functions that operate over existing data types, but difficult to safely extend those data types. Now, let's take a look at the same problem in an object-orientated language, specifically Java.

#### Object-orientation: interfaces and classes

We begin by defining the interface `Node`, along with two implementations, `AddNode` and `LiteralNode`:

``````public interface Node {
}

public class LiteralNode implements Node {
private final double value;

public LiteralNode(double value) {
this.value = value;
}
}

public class AddNode implements Node {
private final Node left;
private final Node right;

public AddNode(Node left, Node right) {
this.left = left;
this.right = right;
}
}
``````

For the sake of readability, let's leave out the constructors:

``````public interface Node {
}

public class LiteralNode implements Node {
private final double value;
}

public class AddNode implements Node {
private final Node left;
private final Node right;
}
``````

Next, we want to evaluate each node to a single number. We add an `evaluate` method to the interface, and add appropriate implementations to the concrete classes.

``````public interface Node {
double evaluate();
}

public class LiteralNode implements Node {
private final double value;

public double evaluate() {
return value;
}
}

public class AddNode implements Node {
private final Node left;
private final Node right;

public double evaluate() {
return left.evaluate() + right.evaluate();
}
}
``````

Unlike our approach with ADTs in ML, extending our language to support negation is straightforward. We simply add another implementation of `Node`, which doesn't require any modification of the original source code.

``````public interface Node {
double evaluate();
}

public class NegateNode implements Node {
private final Node term;

public double evaluate() {
return -term.evaluate();
}
}

public class LiteralNode implements Node {
private final double value;

public double evaluate() {
return value;
}
}

public class AddNode implements Node {
private final Node left;
private final Node right;

public double evaluate() {
return left.evaluate() + right.evaluate();
}
}
``````

Unfortunately, safely adding another operation on nodes requires us to modify the original source code for `Node`, which may not always be possible. In our case, we want to be able to turn our original language of add and literal nodes into strings, so we need to add a `nodeToString` method on both the `Node` interface and the classes `AddNode` and `LiteralNode`:

``````public interface Node {
double evaluate();
String nodeToString();
}

public class NegateNode implements Node {
private final Node term;

public double evaluate() {
return -term.evaluate();
}
}

public class LiteralNode implements Node {
private final double value;

public double evaluate() {
return value;
}

public String nodeToString() {
return Double.toString(value);
}
}

public class AddNode implements Node {
private final Node left;
private final Node right;

public double evaluate() {
return left.evaluate() + right.evaluate();
}

public String nodeToString() {
return "(" + left.nodeToString() + " + " +
right.nodeToString() + ")";
}
}
``````

Even if we can modify the original source code, by modifying the interface, we've forced all implementations of `Node` to implement `nodeToString` even though we only ever wanted to use such an operation on our original add and literal nodes. In particular, we're forced to add `nodeToString` to `NegateNode`:

``````public interface Node {
double evaluate();
String nodeToString();
}

public class NegateNode implements Node {
private final Node term;

public double evaluate() {
return -term.evaluate();
}

public String nodeToString() {
return "-" + term.nodeToString();
}
}

public class LiteralNode implements Node {
private final double value;

public double evaluate() {
return value;
}

public String nodeToString() {
return Double.toString(value);
}
}

public class AddNode implements Node {
private final Node left;
private final Node right;

public double evaluate() {
return left.evaluate() + right.evaluate();
}

public String nodeToString() {
return "(" + left.nodeToString() + " + " +
right.nodeToString() + ")";
}
}
``````

By using methods on interfaces, we have the opposite problem to ADTs: adding additional types of nodes without modifying or affecting existing code is straightforward, while it's difficult to safely add additional operations over those nodes.

#### Summary

In this particular example, our ideal solution would let us:

• define `AddNode` and `LiteralNode`, and an operation `evaluate` over both of them.
• add a third type of node, `NegateNode`, which `evaluate` can be performed on, without modification of the original source code.
• add a second operation `nodeToString` over the original set of nodes, `AddNode` and `LiteralNode`, without modification of the original source code.
• not be forced to implement `nodeToString` for `NegateNode`.

We can express these properties more generally as being able to:

• define a set of data types and operations over those data types
• add additional data types that can have the same operations applied to them, without modification of the original source code.
• add additional operations over those data types, without modification of the original source code.
• add these additional data types and operations independently. That is, if an extension ExtData adds a data type D, and another extension ExtOp adds an operation Op, we should be able to safely use both extensions without implementing the operation Op for the data type D, although we may choose to do so if we want to apply Op to D.

all while preserving type-safety.

Topics: Language design

Thoughts? Comments? Feel free to drop me an email at hello@zwobble.org. You can also find me on Twitter as @zwobble.