In working on Psil and using it in anger to solve Project Euler problems, I've become acutely aware of the relatively poor performance offered by the interpreter implementation. At least, poor performance relative to normal Python code (since my interpreter itself is written in Python).
To try to address this problem, I decided to try to write a compiler to translate code from Psil to Python. A lot of code (such as arithmetic expressions) essentially maps directly from one language to the other. Function definitions, conditional expressions, and even some types of lambda expressions also have direct counterparts. I expect there will be some tricky cases, and will come upon those in due time.
The general strategy is as follows:
- Begin with the parsed Psil representation, which is a tree of nested lists of symbols. This is what the interpreter would normally run.
- Build a Python abstract syntax tree representing equivalent Python code.
- Generate corresponding Python source code from the AST generated in step 2.
- Call the compiler.compile() function to execute the generated Python source code.
Currently, I am using the compiler module in Python 2.x. Apparently some semi-internal modules such as compiler are going away; Python 3.x has different ways of accessing the internals that I haven't quite figured out yet. Anyway, I'm not using Python 3.x and the compiler.ast module provides exactly the AST structure that I need for now. Step 3 above seems easy enough, it's just reconstructing Python source code that would generate the same AST when compiled (Python 3.x seems to provide a way to directly compile an AST representation to bytecode, skipping the source generation step). Step 2 is the tricky one and represents the core of the Psil compiler.
I've got some simple bits of code working already:
(+ 2 3)
(2) + (3)
(print "hello world")
print "hello world"
(define (square x) (* x x))
def square(x): return (x) * (x)
(if (== a 5) (print "yes") (print "no"))
if (a) == (5): print 'yes' else: print 'no'
I'm still experimenting with the appropriate level of parenthesisation, such as whether the first example should be (2) + (3) or (2 + 3). It won't matter as far as the Python compiler is concerned (as long as the generated code remains correct), but it's just a matter of readability. Even for generated code, I like to be able to read it reasonably easily.
I hope that once I get this all working, compiled code will run much faster. With this mechanism in place it will also be straightforward to declare classes in Psil that work exactly like Python classes (currently there isn't a way to do that).