Wednesday, January 31, 2007

ARM port day 2

I worked on the ARM assembler today; it now supports all ARM condition codes (almost every instruction is conditional), as well as "addressing mode 1" instructions (arithmetic), multiplication, and "addressing mode 2" instructions (single value load/store). Multiple load/stores and a few other rarely-used instructions are still missing:
core/compiler/arm/assembler.factor

The last time I was porting Factor to a new platform, the compiler worked differently; it would write generated machine code directly to memory, which made the assembler impossible to unit test. Now, the assembler just appends machine code to the array being built by an enclosing call to make, making it very easy to unit test. I did test-driven development today; I used the GNU assembler to assemble bits of code, add a unit test asserting that Factor generated the same machine code for the given input, and code away until the tests passed:
core/compiler/arm/test.factor

ARM assembly is quite interesting, with lots of operand and addressing modes. The instruction encoding also involves lots of bit fields, so I had to come up with a new abstraction to put together integers with shifts and ors.

The PowerPC instruction format is not as complicated as ARM, so the PowerPC assembler used to just have words which would shift and or values by hand:
: insn ( operand opcode -- ) 26 shift bitor , ;

: a-form ( d a b c xo rc -- n )
>r 1 shift >r 6 shift >r 11 shift >r 16 shift >r 21 shift
r> bitor r> bitor r> bitor r> bitor r> bitor ;

: b-form ( bo bi bd aa lk -- n )
>r 1 shift >r 2 shift >r 16 shift >r 21 shift
r> bitor r> bitor r> bitor r> bitor ;

: d-form ( d a simm -- n )
HEX: ffff bitand >r 16 shift >r 21 shift r> bitor r> bitor ;

: sd-form ( d a simm -- n ) swapd d-form ;

: i-form ( li aa lk -- n )
>r 1 shift bitor r> bitor ;

: x-form ( a s b xo rc -- n )
swap
>r 1 shift >r 11 shift >r swap 16 shift >r 21 shift
r> bitor r> bitor r> bitor r> bitor ;

: xfx-form ( d spr xo -- n )
1 shift >r 11 shift >r 21 shift r> bitor r> bitor ;

: xo-form ( d a b oe rc xo -- n )
swap
>r 1 shift >r 10 shift >r 11 shift >r 16 shift >r 21 shift
r> bitor r> bitor r> bitor r> bitor r> bitor ;

This was not too bad for PowerPC, but for ARM this strategy would have been unmanageable from the start. Here is what the same part of the PowerPC assembler looks like with the new abstraction:
: insn ( operand opcode -- ) { 26 0 } bitfield , ;
: a-form ( d a b c xo rc -- n ) { 0 1 6 11 16 21 } bitfield ;
: b-form ( bo bi bd aa lk -- n ) { 0 1 2 16 21 } bitfield ;
: s>u16 ( s -- u ) HEX: ffff bitand ;
: d-form ( d a simm -- n ) s>u16 { 0 16 21 } bitfield ;
: sd-form ( d a simm -- n ) s>u16 { 0 21 16 } bitfield ;
: i-form ( li aa lk -- n ) { 0 1 0 } bitfield ;
: x-form ( a s b xo rc -- n ) { 1 0 11 21 16 } bitfield ;
: xfx-form ( d spr xo -- n ) { 1 11 21 } bitfield ;
: xo-form ( d a b oe rc xo -- n ) { 1 0 10 11 16 21 } bitfield ;

This expresses the intent of the code much more clearly. The ARM assembler uses much more complicated bitfield specifiers, such as:
: (BX) ( Rm l -- )
{
{ 1 24 }
{ 1 21 }
{ BIN: 111 16 }
{ BIN: 1111 12 }
{ BIN: 1111 8 }
5
{ 1 4 }
{ register 0 }
} insn ;

In the above word, we are building a bit field where some values come from the stack, some are literal, and the last one is obtained by applying the register word to a stack value. Writing this out by hand would be a pain in any language. Fortunately Factor makes it very easy to build mini-DSLs like this.

Here are some ARM assembly instructions, with GNU and Factor syntax side by side:
sub ip, fp, #4               IP FP 4 SUB
addeqs r0, ip, r9 lsl #2 R0 IP R9 2 <LSL> S ?EQ ADD
ldr r1, [r5 - #4] R1 R5 4 <-> LDR
str r1, [r5 + #8]! R1 R5 8 <!+> LDR

As you can see, it looks a bit funny, but remember that the whole point of this exercise is to write an assembler library which is called dynamically by the compiler to emit code; this is not for users who want to write applications in assembly.

Another thing I did was sort out how to flush the instruction cache on ARM (thanks to Mackenzie Straight aka eiz in #concatenative), so I provided a proper implementation of the flush_icache function, which is called by the Factor VM to flush the instruction cache when a new compiled code block is added. On PowerPC, this is implemented in assembly, but on ARM only the kernel is permitted to flush the instruction cache so it has to go through a system call. Also, this system call is not actually exported by glibc. However it is easy enough to call it directly using macros from asm/unistd.h:
#define __NR_cacheflush __ARM_NR_cacheflush
INLINE _syscall3(void,cacheflush,void *,start,void *,end,unsigned long,flags);

INLINE void flush_icache(CELL start, CELL len)
{
cacheflush((void *)start,(void *)(start + len),0);
}

Tomorrow, I'll do some more work on the assembler and see what other instructions are needed by the backend. I'll also start the backend proper and have Factor compiling simple (and perhaps complex) words. And of course I'll document what I did in a blog entry, just like I did today and yesterday.

No comments: