I've only had a bit of time to work on the Psil compiler, but it's coming along well. The compiler now generates Python AST code for many kinds of examples.
As mentioned previously, I've tried annotating the AST with line and column information to help locate Python runtime errors. The representation of lists that I'm currently using (just a Python list) doesn't leave a lot of room to store the extra annotation information. For example, the code:
(print (+ foo 5))
is represented as the following Python lists:
[Symbol("print"), [Symbol("+"), Symbol("foo"), 5]]
There's not a lot of room in this representation to store source line annotations. The first thing I tried was to declare a global DebugInfo
dictionary, indexed by the id()
of the list (Python's id()
represents a unique identifier such as a machine address). So for example, the above debug info might be:
DebugInfo[123456] = [(1, 2), (1, 8)] DebugInfo[123460] = [(1, 9), (1, 11), (1, 15)]
The first DebugInfo[123456]
represents the starting line and column of each element in the first (outer) list. The second DebugInfo[123460]
represents the same for the three elements of the inner list. This seemed like a great idea, and it was poised to work well for small examples. However, after some more complex examples particularly including macro expansion (very common in a Lisp language), the original code was garbage collected and the addresses of lists were re-used, causing the DebugInfo
addresses to align with different source lists! This was tricky to track down.
I've moved that code to another branch until I figure out what to do with it. I may be able to manage it by not letting the original code as read from the source be garbage collected (by keeping a reference somewhere else), that could work. The information only needs to be kept during the compile phase, as soon as it's embedded in the AST it doesn't need to hang around any longer.
Anyway, more work needed. Just writing this post helped me sort out some ideas. Source on GitHub.