A JIT Compiler for Brainf*ck

I recently read this article by Josh Haberman where he demonstrates a JIT for the Brainf*ck language (henceforth referred to as BF). However it wasn’t very different from a regular AOT compiler — the entire BF script is read in and gets compiled (at runtime) into one giant function. I figured it would be a fun exercise to write a more traditional BF VM (if there is such a thing), specifically one that interprets a BF script and JITs only the hotter loops. The result is up on GitHub and this blog post will try to explain and justify some of the design decisions so as to convince myself that this wasn’t a complete waste of time.

The Bytecode

Neither the interpreter nor the JIT understands BF source; instead they work with a bytecode based IR. The bc_from_source function in bytecode.c parses BF source and generates a stream of bytecodes representing it. The term bytecode is somewhat misleading since a bytecode is four bytes long, usually followed by one or more four-byte payloads.

Translating to bytecode is important for performance. Individual BF instructions do very little and sequences like +++ can be quickly executed at one go (by increasing the current cell’s value by 3, in this case) than by three separate bytecode dispatches. To this end, bc_from_source attempts to do some basic peephole optimizations by recognizing common BF idioms. For instance, one way to add the contents of the current cell to a cell three indices away (and clear the current cell in the process) is [->>>+<<<]. bc_from_source recognizes this pattern and emits a specialized bytecode, BC_MOVE_VALUE, which is specially handled by the interpreter and the JIT.

The Interpreter

The interpreter is quite simplistic. It uses GCC’s labels as values feature to provide for a quick dispatch mechanism. Every loop has a heat counter associated with it, whose numerical value is stored as payload following the BC_LOOP_BEGIN bytecode. The interpreter checks this value and calls in the JIT once a method is hot. The JIT does its thing, and stores away a pointer to the generated code in the program_t structure. The interpreter then calls into this generated code. The compiler also patches the BC_LOOP_BEGIN to a BC_COMPILED_LOOP so that the next iteration directly executes the JITed version of the loop.

The Compiler

The compiler uses dynasm to generate a code-generator; and as a consequence most of the compiler really lives in codegen.dasc. This might change if I later add some analysis or optimization passes which run only when compiling the bytecode, but for now the codegen practically is the compiler. The only compiler specific optimization is that the compiler tries to not emit a redundant cmp instruction if the last instruction makes it unnecessary (by virtue of being an add instruction with a negative constant as one operand).

Performance

Speed is basically what it all boils down to. Running mandlebrot.bf takes 1.25 seconds on average, which is at par with compiling BF to C and then compiling the resultant C file with gcc -O3. While I’d call this pretty good, I think there is still some room for improvement, perhaps studying the machine code generated for typical BF programs will yield some insight.

Future Work

There are two improvements I can think of, believe will improve performance, but haven’t had time to implement:

  • A register allocator should improve performance, but probably not as much as in normal programming languages — BF is a very cache friendly language. 🙂
  • Some basic auto-vectorization. Maybe we can say "this loop simply adds these two blocks of cells together, one by one" and emit some SSE instructions to that effect.
Advertisements
This entry was posted in Computers and tagged , , . Bookmark the permalink.

One Response to A JIT Compiler for Brainf*ck

  1. Pingback: A Brainf*ck JIT | The Limbeck

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s