Wednesday, September 02, 2009

Eliminating some boilerplate in the compiler

Adding new instructions to the low-level optimizer was too hard. Multiple places had to be updated, and I would do all this by hand:
  • The instruction tuple itself is defined in the compiler.cfg.instructions vocabulary with the INSN: class, which also creates a word with the same name that constructs the instruction and adds it to the current sequence.
  • Instructions which have a destination register have convenient constructors in compiler.cfg.hats which creates a fresh virtual register, creates an instruction with this register as the destination, and outputs it. So for example, 1 2 ^^add would create an add instruction with a fresh destination register, and output this register. It might be equivalent to something like 0 1 2 ##add.
  • Instructions that use virtual registers are be added to the vreg-insn union, and respond to methods defs-vreg, uses-vregs, and temp-vregs in compiler.cfg.def-use. This 'def-use' information is used by SSA construction, dead code elimination, copy coalescing, register allocation, among other things.
  • Methods have to be defined for the instruction in compiler.cfg.renaming.functor. This functor is used to generate code for renaming virtual registers in instructions. The renaming code is used for SSA construction, representation selection, register allocation, among other things.
  • Instructions which use non-integer representations (eg, operations on floats) must respond to methods defs-vreg-rep, uses-vreg-reps, and temp-vreg-reps in compiler.cfg.representations.preferred.
  • Instructions must respond to the generate-insn method, defined in compiler.codegen.
  • Instructions which participate in value numbering must define an "expression" variant, and respond to the >expr method defined in compiler.cfg.value-numbering.expressions

As you can see, this is a lot of duplicated work. I used inheritance and mixins to model relationships between instructions and reduce some of this duplication by defining methods on common superclasses rather than individual instructions where possible, but this never seemed to work out well.

If you look at the latest versions of the source files I linked to above, you'll see that all the repetitive copy-pasted code has been replaced with meta-programming. The new approach extends the INSN: parsing word so now all relevant information is specified in one place. Also there is a new PURE-INSN: parsing word, to mark instructions which participate in value numbering; previously this was done with a superclass. For example,
PURE-INSN: ##add
def: dst/int-rep
use: src1/int-rep src2/int-rep ;

This defines an instruction tuple, an expression tuple, def/use information, representation information, a method for converting instructions to expressions, and a constructor, all at the same time.

For the code generator's generate-insn method, not every instruction has a straightforward implementation; some, like GC checks and FFI calls, postpone a fair amount of work until the very last stage of compilation. However, for most instructions, this method simply extracts all the slots from the instruction tuple, then calls a CPU-specific hook in the cpu.architecture vocabulary to generate code. For these cases, a CODEGEN: parsing word sets up the relevant boilerplate;
CODEGEN: ##add %add

is equivalent to
M: ##add generate-insn [ dst>> ] [ src1>> ] [ src2>> ] tri %add ;

This nicely cleans up all the repetition I mentioned in the bullet points at the top.

I've been aware of this boilerplate for a while but wanted the design of the compiler to settle down first. Now that most of the machinery is in place, I feel comfortable cooking up some complex meta-programming to clean things up. Adding new instructions should be easier. I plan on adding some SSE2 vector operations soon, and this was the main motivation behind this cleanup.

How would you do this in other languages? In Lisp, you would use a macro which expands into a bunch of defclass, defmethod, etc forms. In Java, you might use annotations:
public class AddInsn extends Insn {
@Def @Representation("int") Register dst;
@Use @Representation("int") Register src1;
@Use @Representation("int") Register src2;
}

No comments: