A simple interpreter from scratch in Python (part 4)

Published on 2011-05-12
Edited on 2014-04-12
Tagged: compilers imp python

This article is part of the series "A simple interpreter from scratch in Python".

A simple interpreter from scratch in Python (part 1)
A simple interpreter from scratch in Python (part 2)
A simple interpreter from scratch in Python (part 3)
A simple interpreter from scratch in Python (part 4)

In the last three articles in this series, we've built a lexer, parser, and AST for our toy language, IMP. We even wrote our own parser combinator library. In this final article, we will write the last component of the interpreter: the evaluator.

Let's think about how a program is evaluated normally. At any given time, there is some "point of control," which indicates what statement the program is going to evaluate next. When the next statement is evaluated, it modifies the state of the program, both by advancing the "point of control" and by changing the values of variables.

To evaluate an IMP program, we need three things.

Point of control - we need to know the next statement or expression to evaluate.
Environment - we need a way to model the "state of the program."
Evaluation functions - we need to know how the state and point of control should be modified for each statement or expression.

Point of control is the easiest, at least for IMP. We have arranged our intermediate representation in a tree-like structure. We'll just call the evaluation function for the top-level statement, which will recursively call evaluation functions for statements and expressions inside it. We will essentially be using Python's point of control as our own point of control. This wouldn't be as easy for languages with more complicated control structures like functions or exceptions, but we can keep it simple for IMP.

The environment is also easy. IMP only has global variables, so we can model the environment with a simple Python dictionary. Whenever an assignment is made, we will update the variable's value in the dictionary.

Evaluation functions are really the only thing we need to think about. Each kind of expression will have an evaluation function which will take the current environment and return a value. Arithmetic expressions will return integers, and Boolean expressions will return true or false. Expressions have no side effects, so the environment will not be modified. Each kind of statement will also have an evaluation function. Statements act by modifying the environment so no result will be returned.

Defining evaluation functions

We will define the evaluation functions as methods on our AST classes. This gives each function direct access to the structure it is evaluating. Here are the functions for arithmetic expressions:

class IntAexp(Aexp):
    ...
    def eval(self, env):
        return self.i

class VarAexp(Aexp):
    ...
    def eval(self, env):
        if self.name in env:
            return env[self.name]
        else:
            return 0

class BinopAexp(Aexp):
    ...
    def eval(self, env):
        left_value = self.left.eval(env)
        right_value = self.right.eval(env)
        if self.op == '+':
            value = left_value + right_value
        elif self.op == '-':
            value = left_value - right_value
        elif self.op == '*':
            value = left_value * right_value
        elif self.op == '/':
            value = left_value / right_value
        else:
            raise RuntimeError('unknown operator: ' + self.op)
        return value

You'll notice we have some extra logic for the case where the programmer uses a variable that hasn't been defined yet (one that isn't in the environment dictionary). To keep things simple and to avoid writing an error handling system, we give all undefined variables a 0 value.

In BinopAexp we handle the "unknown operator" case by raising a RuntimeError. The parser cannot create an AST with unknown operators, so we don't have to worry about this in practice. However, if someone constructs their own AST without the parser, all bets are off.

Here are the functions for Boolean expressions:

class RelopBexp(Bexp):
    ...
    def eval(self, env):
        left_value = self.left.eval(env)
        right_value = self.right.eval(env)
        if self.op == '<':
            value = left_value < right_value
        elif self.op == '<=':
            value = left_value <= right_value
        elif self.op == '>':
            value = left_value > right_value
        elif self.op == '>=':
            value = left_value >= right_value
        elif self.op == '=':
            value = left_value == right_value
        elif self.op == '!=':
            value = left_value != right_value
        else:
            raise RuntimeError('unknown operator: ' + self.op)
        return value

class AndBexp(Bexp):
    ...
    def eval(self, env):
        left_value = self.left.eval(env)
        right_value = self.right.eval(env)
        return left_value and right_value

class OrBexp(Bexp):
    ...
    def eval(self, env):
        left_value = self.left.eval(env)
        right_value = self.right.eval(env)
        return left_value or right_value

class NotBexp(Bexp):
    ...
    def eval(self, env):
        value = self.exp.eval(env)
        return not value

These are pretty straightforward. They just use the built-in Python relational and logic operators.

Here are the evaluation functions for each kind of statement:

 class AssignStatement(Statement):
    ...
    def eval(self, env):
        value = self.aexp.eval(env)
        env[self.name] = value

class CompoundStatement(Statement):
    ...
    def eval(self, env):
        self.first.eval(env)
        self.second.eval(env)

class IfStatement(Statement):
    ...
    def eval(self, env):
        condition_value = self.condition.eval(env)
        if condition_value:
            self.true_stmt.eval(env)
        else:
            if self.false_stmt:
                self.false_stmt.eval(env)

class WhileStatement(Statement):
    ...
    def eval(self, env):
        condition_value = self.condition.eval(env)
        while condition_value:
            self.body.eval(env)
            condition_value = self.condition.eval(env)

For AssignStatement, we just evaluate the arithmetic expression on the right side, and update the environment with the resulting value. The programmer is free to redefine variables that have already been assigned.

In CompoundStatement, we just evaluate each statement, one after the other. Keep in mind that a CompoundStatement is allowed anywhere a statement is allowed, so long chains of statements are encoded as nested compound statements.

In IfStatement, we first evaluate the Boolean expression condition. If it's true, we evaluate the true statement. If it's false and a false statement was provided, we evaluate the false statement.

In WhileStatement we evaluate the condition at the beginning to check whether the loop body should even be executed once. The condition is evaluated after each iteration of the loop to check if it's still true.

Putting it all together

We have now built every major component of our interpreter. We just need a driver program to integrate them:

#!/usr/bin/env python

import sys
from imp_parser import *
from imp_lexer import *

def usage():
    sys.stderr.write('Usage: imp filename\\n')
    sys.exit(1)

if __name__ == '__main__':
    if len(sys.argv) != 2:
        usage()
    filename = sys.argv[1]
    text = open(filename).read()
    tokens = imp_lex(text)
    parse_result = imp_parse(tokens)
    if not parse_result:
        sys.stderr.write('Parse error!\\n')
        sys.exit(1)
    ast = parse_result.value
    env = {}
    ast.eval(env)

    sys.stdout.write('Final variable values:\\n')
    for name in env:
        sys.stdout.write('%s: %s\\n' % (name, env[name]))

The program expects exactly one argument: the name of the file to interpret. It reads in the file and passes it through the lexer and parser, reporting an error if it finds one. We extract the AST from the parse result and evaluate it using an empty dictionary as the starting environment. Since IMP has no way to output any values to the terminal, we just print everything in the environment after the program is finished so we know that something actually happened.

Here is the canonical factorial example:

n := 5;
p := 1;
while n > 0 do
  p := p * n;
  n := n - 1
end

We evaluate this as follows:

$ ./imp.py hello.imp
Final variable values:
p: 120
n: 0

Conclusion

In the last four articles, we built an interpreter for a simple language from scratch. While the language itself is not very useful, the interpreter is extensible, and its major components (the lexer and parser combinator libraries) are reusable.

I hope this provides a good starting point for people to start experimenting with language design. Here are some ideas for exercises:

Make variables local to the scope they are defined in (for instance, a variable defined in a loop should not be valid outside of the loop).
Add a for-loop construct
Add scan and print statements for user input and output
Add functions