Sunday 14 October 2012 21:09
As of 11 August 2012,
the Shed programming language is self-hosting. That is, the compiler written in
Shed could successfully compile itself, and also pass of its tests. So pleased
was I at the time, I apparently forgot to note the date, so I had to go and look
back at the commit logs to find the date that it happened.
Shed might be able to compile itself, but the implementation of the compiler isn't anywhere near finished.
To be able to reach this point as quickly as sensible, I've implemented a fairly
minimal subset of the language and the compiler. The next big step is to implement
type-checking in the compiler, along with corresponding language features
such as interfaces.
Since maintaining two separate compilers for the same language seems
redundant, not to mention time-consuming and tedious, I've decided to deprecate
the JavaScript compiler in favour of the Shed compiler. It also forces the
Shed compiler to be of higher quality – I've already added line and
character numbers to error messages, something that the JavaScript compiler had
and I quickly came to miss. Trying to hunt down syntax errors given only
the name of the file and the expected/actual token gets frustrating quickly.
The next couple of things to work on will probably be to add reference resolution,
and to modify the parser to make new lines significant so that I can get rid of
all those semicolons.
Topics: Shed
Thursday 4 October 2012 20:47
In Shed, we sometimes define functions that usually delegate to a method,
but also have some special cases or defaults if that method isn't implemented.
For instance, the function represent
should produce a string
representation of an object. If the argument implements the method
represent
, it calls that method. Otherwise, it uses the name
of the class of the object to generate the string. In code:
def represent fun(value: any) =>
if isInstance(value, Representable) then
value.represent()
else
defaultRepresent(value)
A problem arises when we want to call the function represent
from
within an implementation of the represent
method. For instance,
if we were implementing represent
for the Some
class:
def Some class(value: any) => {
// Skipping the rest of the implementation of Some for brevity
def represent fun() =>
"Some(" + represent(value) + ")"
}
This code won't compile since we're calling represent
with a
single argument, value
, but within the scope of the class,
represent
refers to the zero-argument function that implements
represent
specifically for Some
.
There are several possible solutions to this problem, but the simplest one
seems to be to use a different name for the method than for the corresponding
function. For consistency, we can introduce the convention that the method name should be a simple variation on the
function name. For instance, we might choose to use a leading underscore:
def represent fun(value: any) =>
if isInstance(value, Representable) then
value._represent()
else
defaultRepresent(value)
Although the leading underscore is perhaps a little ugly, that ugliness does
help to reinforce the idea that you shouldn't be calling
the method _represent
directly. Instead, you should be using
the represent
method. More generally, instead of calling a method
_foo
, you should be calling foo
(unless you're
actually implementing foo
).
Topics: Functional programming, Language design, Shed
Monday 30 April 2012 22:31
The expression is a tricky problem in many languages that asks: given a set of functions that operate over a set of types, how do we allow both the set of functions and the set of types that those functions operate over be extended without losing type safety? If you're not familiar with the problem, I recommend reading the explanation by the author of Magpie. For our purposes, we'll use an abstract syntax tree for mathematical expressions as our data type. To start, let's have two sorts of node: addition operators and literals.
interface Node {}
def AddNode class(myLeft: Node, myRight: Node) implements Node => {
public def left fun() => myLeft;
public def right fun() => myRight;
}
def LiteralNode class(myValue: Double) implements Node => {
public def value fun() => myValue;
}
(As an aside: due to the design of the language, we can't give the arguments to a class the same name as it's getter, for instance def value fun() => value
, since the body of the function would refer to the function rather than the class argument. Prepending each of the arguments with my
is a poor solution, and although I have several ideas on how to rectify this, I'm still pondering on the simplest, cleanest solution.)
Suppose we want to implement a function evaluate
that evaluates the expression to a single value. Our first attempt at an implementation might look like this:
def evaluate fun(node: Node) : Double =>
match(node,
case(AddNode, evaluateAdd),
case(LiteralNode, evaluateLiteral)
);
def evaluateAdd fun(add: AddNode) =>
evaluate(add.left()) + evaluate(add.right());
def evaluateLiteral fun(literal: LiteralNode) =>
literal.value();
There's one immediate with this solution: it's not type-safe. If somebody adds another implementation of Node
, then evaluate
no longer covers all possible cases. The solution to this problem is to define a union type:
type StandardNode = AddNode | LiteralNode
and update evaluate
by changing the type of its argument:
def evaluate fun(node: StandardNode) : Double =>
match(node,
case(AddNode, evaluateAdd),
case(LiteralNode, evaluateLiteral)
);
def evaluateAdd fun(add: AddNode) =>
evaluate(add.left()) + evaluate(add.right());
def evaluateLiteral fun(literal: LiteralNode) =>
literal.value();
This makes evaluate
type-safe, but has had the unintended consequence of making evaluateAdd
unsafe: add.left()
and add.right()
both have the type Node
, yet evaluate
only accepts the narrower type StandardNode
. We fix this by adding type parameters to AddNode
:
def AddNode class[T] => (myLeft: T, myRight: T) implements Node => {
public def left fun() => myLeft;
public def right fun() => myRight;
}
and modifying the type of the argument of evaluateAdd
and updating the value of StandardNode
:
def evaluateAdd fun(add: AddNode[StandardNode]) =>
evaluate(add.left()) + evaluate(add.right());
type StandardNode = AddNode[StandardNode] | LiteralNode;
(At this point that the interface Node
isn't really necessary any more, although there might be other reasons to keep it around.)
Suppose we now add NegateNode
and the associated union type ExtendedNode
:
def NegateNode class[T] => (myValue: T) => {
public def value fun() => myValue;
}
type ExtendedNode =
AddNode[ExtendedNode] |
NegateNode[ExtendedNode] |
LiteralNode;
ExtendedNode
cannot reuse the definition of StandardNode
since AddNode[ExtendedNode]
is a subtype of ExtendedNode
but not a subtype of StandardNode
. The solution is to introduce another type parameter, this time on StandardNode
and ExtendedNode
:
type StandardNode[T] = AddNode[T] | LiteralNode;
type ExtendedNode[T] = StandardNode[T] | NegateNode[T];
We can then add the appropriate type parameters to the argument of evaluate
:
def evaluate fun(node: StandardNode[StandardNode]) : Double =>
match(node,
case(AddNode[StandardNode[StandardNode]], evaluateAdd),
case(LiteralNode, evaluateLiteral)
);
But this doesn't work either: we need to specify the type parameter to the second reference to StandardNode
, which is StandardNode
, which also requires a type parameter... and so on. The solution is to add yet more types that fix the type parameter to themselves:
type StandardNodeF = StandardNode[StandardNodeF];
type ExtendedNodeF = ExtendedNode[ExtendedNodeF];
def evaluate fun(node: StandardNodeF) : Double =>
match(node,
case(AddNode[StandardNodeF], evaluateAdd),
case(LiteralNode, evaluateLiteral)
);
In order to evaluate an instance of ExtendedNode
, we'd need to define the following:
def evaluateExtended fun(node: ExtendedNodeF) : Double =>
match(node,
case(AddNode[ExtendedNodeF], evaluateAddExtended),
case(NegateNode[ExtendedNodeF], evaluateNegate),
case(LiteralNode, evaluateLiteral)
);
def evaluateAddExtended fun(add: AddNode[ExtendedNodeF]) =>
evaluateExtended(add.left()) + evaluateExtended(add.right());
def evaluateNegate fun(negate: NegateNode[ExtendedNodeF]) =>
-evaluateExtended(negate.value());
It seems reasonable to write evaluateNegate
, but the definition of evaluateAddExtended
seems virtually the same as before. The difference is the type parameter for AddNode
, and the function we use to evaluate the sub-nodes. So, we introduce a type parameter and argument to abstract both:
def evaluateAdd fun[T] => fun(evaluator: Function[T, Double]) =>
fun(add: AddNode[T]) =>
evaluator(add.left()) + evaluator(add.right());
We can also perform a similar transformation on evaluateNegate
and evaluate
:
def evaluateNegate fun[T] => fun(evaluator: Function[T, Double]) =>
fun(negate: NegateNode[T]) =>
-evaluator(negate.value());
def evaluate fun[T] => fun(evaluator: Function[T, Double]) =>
fun(node: T) : Double =>
match(node,
case(AddNode[StandardNodeF], evaluateAdd[T](evaluator)),
case(LiteralNode, evaluateLiteral)
);
Now we can rewrite evaluateExtended
to use evaluate
:
def evaluateExtended fun[T] => (evaluator: Function[T, Double] =>
fun(node: ExtendedNode[T]) : Double =>
match(node,
case(StandardNode[T], evaluate[T](evaluator)),
case(NegateNode[T], evaluateNegate[T](evaluateNegate))
);
If we want to call evaluate
or evaluateExtended
we
need to use a similar trick as with StandardNode
and
ExtendedNode
to instantiate the functions:
def evaluateF fun(node: StandardNodeF) =>
evaluate[StandardNodeF](evaluateF)(node);
def evaluateExtendedF fun(node: ExtendedNodeF) =>
evaluateExtended[ExtendedNodeF](evaluateExtendedF)(node);
Hopefully you can now see how you'd extend the solution to include further
node types. Although not covered here, it's also possible to create functions
or classes to help combine evaluators, and functions generally written in
this style with a bit less boilerplate.
If we imagine an ideal solution to the expression problem, we might argue
that this solution is a little verbose, and I'd be inclined to agree.
The question is: is it unnecessarily verbose? There's an argument to be made
that this exposes the essential complexity of solving the expression problem.
Other less verbose solutions hide rather than remove this complexity. On the one
hand, this allows one to express the same ideas more succinctly without being
cluttered with the machinery of how the solution is achieved, compared to the
solution I just described where we have to constantly pass around the type
parameter T
and evaluator
argument. On the other
hand, if you want to understand what's going on, you don't have to look
very far since everything is explicitly passed around.
On the whole, I think it's simpler than some solutions I've seen to
the expression problem, and the verbosity isn't all-consuming. Pretty good
for a first go, I reckon.
Topics: Language design, Shed
Monday 30 April 2012 21:52
Looking around at many of the mainstream languages today, I can't help feeling
as if they've become rather large and unwieldy. I'd say this is true of both
scripting languages such as Python and Ruby, as well as languages common in the
enterprise, such as C# and even Java. I'd argue that features such as
(implementation) inheritance, extension methods, properties, attributes/decorators,
and null act to complicate each language.
A little over a year ago, I thought about the set of features that I actually
used and wondered what a slimmed-down object-orientated programming language
would look like. To that end, I've been working on the Shed programming language. A
non-exhaustive list of features would be:
- Statically-typed
- Functions, including
- Closures
- Functions as values
- Lightweight anonymous function syntax
- Interfaces
- Classes that can implement interfaces
- Object literals, which can also implement interfaces
Intentionally missing are:
- Nulls
- Implementation inheritance
- Extension methods
- Properties
- Function overloading
To give a flavour, here's a naive implementation of the Fibonacci sequence:
def fibonacci fun(n: Double) =>
if n <= 1
then n
else fibonacci(n - 1) + fibonacci(n - 2)
Note that the syntax is still changing. The current implementation of the
compiler doesn't support operators yet, so to get the above compiling, you'd have
to replace the operators with method calls.
The aim of implementing the language
is fun and experimentation, rather than creating a language to write
production code in. I'm pretty sure that for the code I tend to write Shed is a
good fit, at least for my programming style. I don't know if the language
I'm writing applies so well to other domains that I'm less familiar with, but I
intend to enjoy finding out, and possibly extending the feature list as I go.
One of the main principles of the language is to optimise for the reader
rather than the writer. I spend far more time wondering
what some piece of code does that I'd written a few months ago, than typing
new bits of code.
Following on from this, variables in Shed may only be introduced into a scope
via an explicit declaration, excluding variables in default scope. For instance:
import time.*;
since there's no way to tell exactly what variables have been added to scope. In
contrast, the following import adds DateTime to scope:
import time.DateTime;
As I implement various bits of Shed, I'll continue to post interesting
problems and solutions I come across. Until then...
Topics: Language design, Shed