Archive: Shed

Shed is self-hosting

Sunday 14 October 2012 21:09

As of 11 August 2012, the Shed programming language is self-hosting. That is, the compiler written in Shed could successfully compile itself, and also pass of its tests. So pleased was I at the time, I apparently forgot to note the date, so I had to go and look back at the commit logs to find the date that it happened.

Shed might be able to compile itself, but the implementation of the compiler isn't anywhere near finished. To be able to reach this point as quickly as sensible, I've implemented a fairly minimal subset of the language and the compiler. The next big step is to implement type-checking in the compiler, along with corresponding language features such as interfaces.

Since maintaining two separate compilers for the same language seems redundant, not to mention time-consuming and tedious, I've decided to deprecate the JavaScript compiler in favour of the Shed compiler. It also forces the Shed compiler to be of higher quality – I've already added line and character numbers to error messages, something that the JavaScript compiler had and I quickly came to miss. Trying to hunt down syntax errors given only the name of the file and the expected/actual token gets frustrating quickly.

The next couple of things to work on will probably be to add reference resolution, and to modify the parser to make new lines significant so that I can get rid of all those semicolons.

Topics: Shed

Functions and corresponding methods with the same name in Shed

Thursday 4 October 2012 20:47

In Shed, we sometimes define functions that usually delegate to a method, but also have some special cases or defaults if that method isn't implemented. For instance, the function represent should produce a string representation of an object. If the argument implements the method represent, it calls that method. Otherwise, it uses the name of the class of the object to generate the string. In code:

def represent fun(value: any) =>
    if isInstance(value, Representable) then
        value.represent()
    else
        defaultRepresent(value)

A problem arises when we want to call the function represent from within an implementation of the represent method. For instance, if we were implementing represent for the Some class:

def Some class(value: any) => {
    // Skipping the rest of the implementation of Some for brevity
    def represent fun() =>
        "Some(" + represent(value) + ")"
}

This code won't compile since we're calling represent with a single argument, value, but within the scope of the class, represent refers to the zero-argument function that implements represent specifically for Some.

There are several possible solutions to this problem, but the simplest one seems to be to use a different name for the method than for the corresponding function. For consistency, we can introduce the convention that the method name should be a simple variation on the function name. For instance, we might choose to use a leading underscore:

def represent fun(value: any) =>
    if isInstance(value, Representable) then
        value._represent()
    else
        defaultRepresent(value)

Although the leading underscore is perhaps a little ugly, that ugliness does help to reinforce the idea that you shouldn't be calling the method _represent directly. Instead, you should be using the represent method. More generally, instead of calling a method _foo, you should be calling foo (unless you're actually implementing foo).

Topics: Functional programming, Language design, Shed

Solving the expression problem with union types in Shed

Monday 30 April 2012 22:31

The expression is a tricky problem in many languages that asks: given a set of functions that operate over a set of types, how do we allow both the set of functions and the set of types that those functions operate over be extended without losing type safety? If you're not familiar with the problem, I recommend reading the explanation by the author of Magpie. For our purposes, we'll use an abstract syntax tree for mathematical expressions as our data type. To start, let's have two sorts of node: addition operators and literals.

interface Node {}

def AddNode class(myLeft: Node, myRight: Node) implements Node => {
    public def left fun() => myLeft;
    public def right fun() => myRight;
}

def LiteralNode class(myValue: Double) implements Node => {
    public def value fun() => myValue;
}

(As an aside: due to the design of the language, we can't give the arguments to a class the same name as it's getter, for instance def value fun() => value, since the body of the function would refer to the function rather than the class argument. Prepending each of the arguments with my is a poor solution, and although I have several ideas on how to rectify this, I'm still pondering on the simplest, cleanest solution.)

Suppose we want to implement a function evaluate that evaluates the expression to a single value. Our first attempt at an implementation might look like this:

def evaluate fun(node: Node) : Double =>
    match(node,
        case(AddNode, evaluateAdd),
        case(LiteralNode, evaluateLiteral)
    );

def evaluateAdd fun(add: AddNode) =>
    evaluate(add.left()) + evaluate(add.right());

def evaluateLiteral fun(literal: LiteralNode) =>
    literal.value();

There's one immediate with this solution: it's not type-safe. If somebody adds another implementation of Node, then evaluate no longer covers all possible cases. The solution to this problem is to define a union type:

type StandardNode = AddNode | LiteralNode

and update evaluate by changing the type of its argument:

def evaluate fun(node: StandardNode) : Double =>
    match(node,
        case(AddNode, evaluateAdd),
        case(LiteralNode, evaluateLiteral)
    );

def evaluateAdd fun(add: AddNode) =>
    evaluate(add.left()) + evaluate(add.right());

def evaluateLiteral fun(literal: LiteralNode) =>
    literal.value();

This makes evaluate type-safe, but has had the unintended consequence of making evaluateAdd unsafe: add.left() and add.right() both have the type Node, yet evaluate only accepts the narrower type StandardNode. We fix this by adding type parameters to AddNode:

def AddNode class[T] => (myLeft: T, myRight: T) implements Node => {
    public def left fun() => myLeft;
    public def right fun() => myRight;
}

and modifying the type of the argument of evaluateAdd and updating the value of StandardNode:

def evaluateAdd fun(add: AddNode[StandardNode]) =>
    evaluate(add.left()) + evaluate(add.right());
    
type StandardNode = AddNode[StandardNode] | LiteralNode;

(At this point that the interface Node isn't really necessary any more, although there might be other reasons to keep it around.)

Suppose we now add NegateNode and the associated union type ExtendedNode:

def NegateNode class[T] => (myValue: T) => {
    public def value fun() => myValue;
}

type ExtendedNode =
    AddNode[ExtendedNode] |
    NegateNode[ExtendedNode] |
    LiteralNode;

ExtendedNode cannot reuse the definition of StandardNode since AddNode[ExtendedNode] is a subtype of ExtendedNode but not a subtype of StandardNode. The solution is to introduce another type parameter, this time on StandardNode and ExtendedNode:

type StandardNode[T] = AddNode[T] | LiteralNode;

type ExtendedNode[T] = StandardNode[T] | NegateNode[T];

We can then add the appropriate type parameters to the argument of evaluate:

def evaluate fun(node: StandardNode[StandardNode]) : Double =>
    match(node,
        case(AddNode[StandardNode[StandardNode]], evaluateAdd),
        case(LiteralNode, evaluateLiteral)
    );

But this doesn't work either: we need to specify the type parameter to the second reference to StandardNode, which is StandardNode, which also requires a type parameter... and so on. The solution is to add yet more types that fix the type parameter to themselves:

type StandardNodeF = StandardNode[StandardNodeF];
type ExtendedNodeF = ExtendedNode[ExtendedNodeF];

def evaluate fun(node: StandardNodeF) : Double =>
    match(node,
        case(AddNode[StandardNodeF], evaluateAdd),
        case(LiteralNode, evaluateLiteral)
    );

In order to evaluate an instance of ExtendedNode, we'd need to define the following:

def evaluateExtended fun(node: ExtendedNodeF) : Double =>
    match(node,
        case(AddNode[ExtendedNodeF], evaluateAddExtended),
        case(NegateNode[ExtendedNodeF], evaluateNegate),
        case(LiteralNode, evaluateLiteral)
    );

def evaluateAddExtended fun(add: AddNode[ExtendedNodeF]) =>
    evaluateExtended(add.left()) + evaluateExtended(add.right());
    
def evaluateNegate fun(negate: NegateNode[ExtendedNodeF]) =>
    -evaluateExtended(negate.value());

It seems reasonable to write evaluateNegate, but the definition of evaluateAddExtended seems virtually the same as before. The difference is the type parameter for AddNode, and the function we use to evaluate the sub-nodes. So, we introduce a type parameter and argument to abstract both:

def evaluateAdd fun[T] => fun(evaluator: Function[T, Double]) =>
    fun(add: AddNode[T]) =>
        evaluator(add.left()) + evaluator(add.right());

We can also perform a similar transformation on evaluateNegate and evaluate:

def evaluateNegate fun[T] => fun(evaluator: Function[T, Double]) =>
    fun(negate: NegateNode[T]) =>
        -evaluator(negate.value());

def evaluate fun[T] => fun(evaluator: Function[T, Double]) =>
    fun(node: T) : Double =>
        match(node,
            case(AddNode[StandardNodeF], evaluateAdd[T](evaluator)),
            case(LiteralNode, evaluateLiteral)
        );

Now we can rewrite evaluateExtended to use evaluate:

def evaluateExtended fun[T] => (evaluator: Function[T, Double] =>
    fun(node: ExtendedNode[T]) : Double =>
        match(node,
            case(StandardNode[T], evaluate[T](evaluator)),
            case(NegateNode[T], evaluateNegate[T](evaluateNegate))
        );

If we want to call evaluate or evaluateExtended we need to use a similar trick as with StandardNode and ExtendedNode to instantiate the functions:

def evaluateF fun(node: StandardNodeF) =>
    evaluate[StandardNodeF](evaluateF)(node);
    
def evaluateExtendedF fun(node: ExtendedNodeF) =>
    evaluateExtended[ExtendedNodeF](evaluateExtendedF)(node);

Hopefully you can now see how you'd extend the solution to include further node types. Although not covered here, it's also possible to create functions or classes to help combine evaluators, and functions generally written in this style with a bit less boilerplate.

If we imagine an ideal solution to the expression problem, we might argue that this solution is a little verbose, and I'd be inclined to agree. The question is: is it unnecessarily verbose? There's an argument to be made that this exposes the essential complexity of solving the expression problem. Other less verbose solutions hide rather than remove this complexity. On the one hand, this allows one to express the same ideas more succinctly without being cluttered with the machinery of how the solution is achieved, compared to the solution I just described where we have to constantly pass around the type parameter T and evaluator argument. On the other hand, if you want to understand what's going on, you don't have to look very far since everything is explicitly passed around.

On the whole, I think it's simpler than some solutions I've seen to the expression problem, and the verbosity isn't all-consuming. Pretty good for a first go, I reckon.

Topics: Language design, Shed

Shed programming language

Monday 30 April 2012 21:52

Looking around at many of the mainstream languages today, I can't help feeling as if they've become rather large and unwieldy. I'd say this is true of both scripting languages such as Python and Ruby, as well as languages common in the enterprise, such as C# and even Java. I'd argue that features such as (implementation) inheritance, extension methods, properties, attributes/decorators, and null act to complicate each language.

A little over a year ago, I thought about the set of features that I actually used and wondered what a slimmed-down object-orientated programming language would look like. To that end, I've been working on the Shed programming language. A non-exhaustive list of features would be:

Statically-typed
Functions, including
- Closures
- Functions as values
- Lightweight anonymous function syntax
Interfaces
Classes that can implement interfaces
Object literals, which can also implement interfaces

Intentionally missing are:

Nulls
Implementation inheritance
Extension methods
Properties
Function overloading

To give a flavour, here's a naive implementation of the Fibonacci sequence:

def fibonacci fun(n: Double) =>
    if n <= 1
        then n
        else fibonacci(n - 1) + fibonacci(n - 2)

Note that the syntax is still changing. The current implementation of the compiler doesn't support operators yet, so to get the above compiling, you'd have to replace the operators with method calls.

The aim of implementing the language is fun and experimentation, rather than creating a language to write production code in. I'm pretty sure that for the code I tend to write Shed is a good fit, at least for my programming style. I don't know if the language I'm writing applies so well to other domains that I'm less familiar with, but I intend to enjoy finding out, and possibly extending the feature list as I go.

One of the main principles of the language is to optimise for the reader rather than the writer. I spend far more time wondering what some piece of code does that I'd written a few months ago, than typing new bits of code.

Following on from this, variables in Shed may only be introduced into a scope via an explicit declaration, excluding variables in default scope. For instance:

import time.*;

since there's no way to tell exactly what variables have been added to scope. In contrast, the following import adds DateTime to scope:

import time.DateTime;