Thursday, November 09, 2006

Inline allocation refactoring

On PowerPC and x86 processors with SSE2, Factor supports floating point intrinsics, and the final boxing of a floating point value is entirely open-coded, so no GC check was done. Fixing this was the final stage of the Allocation refactoring.

At the end of a basic block, the compiler calls end-basic-block, which boxes any floating point values stored in registers, and moves them to the data stack. It also moves all other integer values and pointers to the stack too. Within a basic block, no stack writes are performed, and stack shuffling takes place at compile time in the form of register renaming. The fix for the inline allocation GC issue was to simply leave 1kb free at the top of the heap at all times, and if any float values were boxed, compile a call to a C function which performs a GC check in end-basic-block. This smells like the old issue that the entire GC refactoring was supposed to avoid (the GC on 75% heap consumption being inadequate) however very little memory is allocated by boxing floats, much less than 1Kb.

However there was one sticking point; in a handful of locations, end-basic-block would be called to generate machine code in a context where certain register values are not to be clobbered. For example, conditionals worked this way: first, the value at the top of the stack would be popped (which might not generate any code at all, if it is in a register), then the basic block would end so registers would sync with the in-memory stack, and then the conditional jump was performed based on the value of this register. This would no longer work, since end-basic-block could now generate a call into C, and thus the register holding the condition would be clobbered. The fix is not to end the basic block at a conditional jump, but instead "split" the virtual register representation of the stack at both branches. So for example if r3 held some known value before a conditional, the compiler could reuse this value in either branch without reading it back in.

Another place where the compiler called end-basic-block was when generating fixnum intrinsics which could overflow. These open-coded intrinsics are more efficient than calls into C, since in the common case of there not being an overflow, the code path re-uses registers rather than reading from the stack, and so on. For example, fixnum+ adds two integers, and if the result overflows, it allocates a bignum. Again, the problem was that it would read two values from the stack into registers (which might not have generated any machine code at all, if the value were already in registers), then it would end the basic block, then it would add them, and if an overflow occurred, it would call into C to allocate a bignum. This worked in the past, but now that the end of a basic block can call GC, the fix here is to open-code the bignum allocation code, and therefore avoid the need to end the basic block before fixnum+ is assembled in.

The other cases where end-basic-block were called remain untouched; namely, before jumping to another word definition, before returning from a word definition, and before an explicit alien call into C. In these cases, no values were held on in registers so a GC is safe. The removal of calls to end-basic-block from around conditionals and overflowing fixnum primitives means the compiler can keep more values in registers, generate less code to shuffle values between registers and memory, and box less floats. And of course, it also means that certain operations won't run out of memory even if plenty is available, simply because no GC check gets done.

I still have to fix the fixnum* intrinsic on both platforms; this results in an inline allocation of a 2-cell bignum, which is trickier than 1-cell bignum like fixnum+. I'm also hunting down a compiler bug which results in some words from contrib/crypto being miscompiled. Once these two items are done, I will release 0.86.

No comments: