What do you mean, homoiconic?

Emboldened by the previous post on bringing FORTH to your browser and still intrigued by xterm.js, this post explores a different dynamic language in Webassembly. We also want to learn new things as we go so we will take a slightly different approach this time.

If you read the previous post (don’t worry if you didn’t, this one stands on its own), you remember that FORTH, to the extent it has any syntax, is a post-fix language. We want a pre-fix language to change things up. To maintain our recent vibe, we want a retro infix language. (I already have a retro infix language in mind for a follow-up post). Enter the grand-daddy of them all: Lisp. Lisp was first made public in McCarthy’s Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I which defines it by means of a meta-circular evaluator.

I like to think of FORTH code as the linear stream of instructions sent to the processing unit. By contrast, Lisp code is a direct representation of the n-ary syntax tree of the program. The representation is known as s-expressions:

This direct relationship between Lisp’s syntax and its internal data structures is what we mean by homoiconicity (ὁμός “same” + εἰκών “likeness”). Computer science instructors often recommend that their students keep logic and data separate. That advice is moot in Lisp where code is data and data is code. I have seen first-hand many examples of professional developers taking this separation too far in industrial software.

I could have attempted writing a lisp interpreter from scratch but it’s so much easier to adapt an existing one. A bunch of clever hackers wrote one small enough to fit on a 512kb boot sector!

Justine and her hacker friends already provide a reference implementation in C. We could have compiled it to WASM with emscripten but that would be cheating. It would also have produced unnecessarily large assets since emscripten shims the required bits of libc. An x86 emulator in WASM like WebVM would have be an alternative but v86 already did this for sectorlisp! In the spirit of minimalism, we chose to do more to end up with less. So we translated lisp.c to assemblyscript. Assemblyscript is to WASM what early C was to ASM: a high-level assembler. Search and replace got us 90% of the way there. The remaining 10% (measured in LoC, not time) were spent figuring out the precise semantics of load<T> and connecting to the pseudo-terminal.

I wished I had read Justine’s post more closely, particularly the section about their memory model. That would have saved me reverse engineering lisp.c, particularly lines which use this definition

#define M (RAM + sizeof(RAM) / sizeof(RAM[0]) / 2)

Unfortunately, load<T> does not tolerate negative offsets the way C does (instead it throws RuntimeError: memory access out of bounds). This forced me to use load<u8>(i, M) when i > 0 and load<T>(M + (i << align)) otherwise.

You can read the finished product here but it’s much more fun to interact with it directly:

An enthusiastic reader looking for a learning challenge might want to tweak the compiled WASM (use wasm2wat if necessary) and pickup some stack optimizations (particularly intra-block stack scheduling ones) which binaryen missed.