Factor: a practical stack language: December 2008

I added efficient arrays of integers and floats to Factor. We've had byte-arrays and float-arrays for a while; this code is essentially a generalization of these to every primitive C type: chars, shorts, ints, longs, long longs, unsigned version of all of these, as well as floats and doubles.

Specialized arrays

For each primitive C type, there is a vocabulary named specialized-arrays.type, which defines a class named type-array, and several related words: a constructor, and a conversion word for converting existing sequences to specialized arrays of this type. As with all sequence types, the class implements methods for length, nth and set-nth; there is also a literal syntax for parsing and printing specialized arrays. An example:

USE: specialized-arrays.int
int-array{ 584 -13 11 } [ 10 + ] map

Specialized vectors

There are also resizeable versions of all of these, in specialized-vectors.type. These work like ordinary vectors; you can use words such as push and pop, accumulating a stream of values in a vector.

Direct arrays

A third variant is also provided. These are the "direct specialized arrays". Sometimes when working with a C library, the library hands you a pointer to some data which is to be viewed as an array of primitive values. Direct arrays accomplish this. For every primitive C type, the specialized-arrays.direct.type vocabulary provides a word named <direct-type-array> which takes an alien pointer and a length. It wraps them in a tuple which responds to the sequence protocol by reading from memory.

Memory-mapped files

Direct arrays are useful in another context, and that is when working with memory-mapped files. Again, for every primitive C type, there is a vocabulary, io.mmap.type. It defines two words, <mapped-type-array> and with-mapped-type-file. Here is an example usage of the latter: we're mapping in a file of floating point numbers and adding them up:

[ 0.0 [ + ] reduce ] with-mapped-float-file

We've had io.mmap for a while, but specialized arrays make this library a lot more useful.

This is all really powerful stuff, because on one hand, we have these high-level data types that behave like any other Factor sequence and can be used interchangeably with sequence utility words, but on the other hand, not only do they offer a space savings over heterogeneous arrays by avoiding boxing, they also have the potential to improve performance -- the low-level primitives they're built on are already optimized by the compiler, and it is possible to write code which operates on specialized float and integer arrays without any boxing at all.

Mersenne Twister

A good example of an algorithm which benefits from specialized arrays is the Mersenne Twister random number generator. The Factor implementation is very straightforward, and shorter than the C version. I changed it to use a uint-array instead of a normal array to store the random generator state. After making a few more tweaks, the time taken to generate a 100,000,000-element byte array of random bytes on my Mac Book Pro went down to 5 seconds, from 37 seconds!

Here is the main method of the Mersenne Twister, random-32*, and its surrounding utility words:

: mt-n 624 ; inline
: mt-m 397 ; inline
: mt-a HEX: 9908b0df ; inline

: mersenne-wrap ( n -- n' )
    dup mt-n > [ mt-n - ] when ; inline

: wrap-nth ( n seq -- obj )
    [ mersenne-wrap ] dip nth-unsafe ; inline

: set-wrap-nth ( obj n seq -- )
    [ mersenne-wrap ] dip set-nth-unsafe ; inline

: calculate-y ( n seq -- y )
    [ wrap-nth 31 mask-bit ]
    [ [ 1+ ] [ wrap-nth ] bi* 31 bits ] 2bi bitor ; inline

: (mt-generate) ( n seq -- next-mt )
    [
        calculate-y
        [ 2/ ] [ odd? mt-a 0 ? ] bi bitxor
    ] [
        [ mt-m + ] [ wrap-nth ] bi*
    ] 2bi bitxor ; inline

: mt-generate ( mt -- )
    [
        mt-n swap seq>> '[
            _ [ (mt-generate) ] [ set-wrap-nth ] 2bi
        ] each
    ] [ 0 >>i drop ] bi ; inline

: mt-temper ( y -- yt )
    dup -11 shift bitxor
    dup 7 shift HEX: 9d2c5680 bitand bitxor
    dup 15 shift HEX: efc60000 bitand bitxor
    dup -18 shift bitxor ; inline

: next-index  ( mt -- i )
    dup i>> dup mt-n < [ nip ] [ drop mt-generate 0 ] if ; inline

M: mersenne-twister random-32* ( mt -- r )
    [ next-index ]
    [ seq>> wrap-nth mt-temper ]
    [ [ 1+ ] change-i drop ] tri ;

A naive implementation of Factor would struggle with this code because of method dispatch overhead (all arithmetic and sequence access is generic), and because of overflow checks on integer arithmetic (integers have arbitrary precision in Factor). Thankfully, the Factor implementation is not naive. The high-level optimizer, applied to the above code, produces the following output -- note that the translation from high-level IR back to Factor code is imprecise, and some IR nodes cannot be expressed, hence the "COMPLEX SHUFFLE" strings that appear here:

[
    dup >r dup 3 slot dup 624 fixnum<
    [ nip ] [
        drop dup >r 624 swap 2 slot
        >r r> >r r> 0 -rot \ ( gensym ) [
            pick pick fixnum< [
                swap >r 2dup "COMPLEX SHUFFLE" >r r> 2dup
                "COMPLEX SHUFFLE" 2dup "COMPLEX SHUFFLE" 2dup
                "COMPLEX SHUFFLE" >r r> 3 slot
                swap 4 fixnum*fast
                alien-unsigned-4 2147483648 fixnum-bitand
                "COMPLEX SHUFFLE" >r 1 fixnum+fast
                r> >r r> 3 slot swap 4 fixnum*fast
                alien-unsigned-4 2147483647 fixnum-bitand
                fixnum-bitor dup >r -1 fixnum-shift-fast
                r> 1 fixnum-bitand 1 eq? 2567483615 0 rot
                [ drop ] [ nip ] if fixnum-bitxor
                "COMPLEX SHUFFLE" >r 397 fixnum+fast
                r> >r dup 624 fixnum>
                [ 624 fixnum-fast ] [ ] if r> 3 slot
                swap 4 fixnum*fast
                alien-unsigned-4 fixnum-bitxor
                "COMPLEX SHUFFLE" >r r> 3 slot
                swap 4 fixnum*fast
                set-alien-unsigned-4 "COMPLEX SHUFFLE" r> swap
                "COMPLEX SHUFFLE" 1 fixnum+fast
                "COMPLEX SHUFFLE" ( gensym )
            ] [ 3drop ] if
        ] label r> >r 0 r> 3 set-slot 0
    ] if r> dup >r 2 slot >r r> 3 slot swap 4 fixnum*fast
    alien-unsigned-4 dup -11 fixnum-shift-fast fixnum-bitxor
    dup 7 fixnum-shift-fast 2636928640 fixnum-bitand
    fixnum-bitxor dup 15 fixnum-shift-fast
    4022730752 fixnum-bitand fixnum-bitxor
    dup -18 fixnum-shift-fast fixnum-bitxor r> dup >r 3 slot
    1 fixnum+fast r> swap swap >r r> 3 set-slot
]

Only primitive calls remain; everything else has evaporated after 11 distinct optimization passes. A lot of redundant stack shuffling still remains; the low-level optimizer eliminates that and produces a sequence of low-level IR instructions:

=== word: mersenne-twister=>random-32*, label: mersenne-twister=>random-32*

_label 0 
##prologue 
_label 1 
##inc-r 1 
##inc-d 1 
##peek V int-regs 1368412 D 1 
##replace V int-regs 1368412 R 0 
##slot-imm V int-regs 1368417 V int-regs 1368412 3 2 
##replace V int-regs 1368417 D 0 
_compare-imm-branch 3 V int-regs 1368417 4992 cc>= 
_label 2 
##inc-d -1 
##peek V int-regs 1368425 D -1 
##replace V int-regs 1368425 D 0 
_branch 13 
_label 3 
##inc-r 1 
##inc-d 1 
##peek V int-regs 1368427 D 2 
##replace V int-regs 1368427 R 0 
##load-immediate V int-regs 1368429 4992 
##slot-imm V int-regs 1368434 V int-regs 1368427 2 2 
##load-immediate V int-regs 1368439 0 
##replace V int-regs 1368434 D 0 
##replace V int-regs 1368429 D 1 
##replace V int-regs 1368439 D 2 
##loop-entry 
_label 4 
##peek V int-regs 1368443 D 2 
_compare-imm-branch 11 V int-regs 1368443 4992 cc>= 
_label 5 
##inc-r 7 
##peek V int-regs 1368453 D 1 
##peek V int-regs 1368454 D 0 
##replace V int-regs 1368453 R 6 
##peek V int-regs 1368456 D 2 
##replace V int-regs 1368454 R 4 
##replace V int-regs 1368456 R 5 
##replace V int-regs 1368454 R 2 
##replace V int-regs 1368456 R 3 
##replace V int-regs 1368454 R 0 
##replace V int-regs 1368456 R 1 
##slot-imm V int-regs 1368478 V int-regs 1368454 3 2 
##copy V int-regs 1368483 V int-regs 1368456 
##shl-imm V int-regs 1368483 V int-regs 1368483 2 
##copy V int-regs 1368486 V int-regs 1368483 
##sar-imm V int-regs 1368486 V int-regs 1368486 3 
##add-imm V int-regs 1368487 V int-regs 1368478 13 
##add V int-regs 1368489 V int-regs 1368486 V int-regs 1368487 
##alien-unsigned-4 V int-regs 1368490 V int-regs 1368489 
##copy V int-regs 1368491 V int-regs 1368490 
##shl-imm V int-regs 1368491 V int-regs 1368491 3 
##load-immediate V int-regs 1368492 17179869184 
##copy V int-regs 1368495 V int-regs 1368491 
##and V int-regs 1368495 V int-regs 1368495 V int-regs 1368492 
##add-imm V int-regs 1368501 V int-regs 1368456 8 
##copy V int-regs 1368512 V int-regs 1368501 
##shl-imm V int-regs 1368512 V int-regs 1368512 2 
##copy V int-regs 1368515 V int-regs 1368512 
##sar-imm V int-regs 1368515 V int-regs 1368515 3 
##add V int-regs 1368518 V int-regs 1368515 V int-regs 1368487 
##alien-unsigned-4 V int-regs 1368519 V int-regs 1368518 
##copy V int-regs 1368520 V int-regs 1368519 
##shl-imm V int-regs 1368520 V int-regs 1368520 3 
##load-immediate V int-regs 1368521 17179869176 
##copy V int-regs 1368524 V int-regs 1368520 
##and V int-regs 1368524 V int-regs 1368524 V int-regs 1368521 
##copy V int-regs 1368527 V int-regs 1368495 
##or V int-regs 1368527 V int-regs 1368527 V int-regs 1368524 
##copy V int-regs 1368532 V int-regs 1368527 
##sar-imm V int-regs 1368532 V int-regs 1368532 4 
##copy V int-regs 1368533 V int-regs 1368532 
##shl-imm V int-regs 1368533 V int-regs 1368533 3 
##replace V int-regs 1368533 D 2 
##copy V int-regs 1368537 V int-regs 1368527 
##and-imm V int-regs 1368537 V int-regs 1368537 8 
##load-immediate V int-regs 1368542 20539868920 
##load-immediate V int-regs 1368543 0 
##replace V int-regs 1368543 D 0 
##replace V int-regs 1368542 D 1 
_compare-imm-branch 7 V int-regs 1368537 8 cc/= 
_label 6 
##inc-d -1 
_branch 8 
_label 7 
##inc-d -1 
##peek V int-regs 1368550 D -1 
##replace V int-regs 1368550 D 0 
_label 8 
##inc-r -1 
##peek V int-regs 1368551 D 1 
##peek V int-regs 1368552 D 0 
##copy V int-regs 1368553 V int-regs 1368551 
##xor V int-regs 1368553 V int-regs 1368553 V int-regs 1368552 
##replace V int-regs 1368553 D 1 
##peek V int-regs 1368554 R 0 
##peek V int-regs 1368555 R -1 
##add-imm V int-regs 1368559 V int-regs 1368554 3176 
##replace V int-regs 1368555 R 0 
##replace V int-regs 1368559 D 0 
_compare-imm-branch 10 V int-regs 1368559 4992 cc<= 
_label 9 
##peek V int-regs 1368569 D 0 
##sub-imm V int-regs 1368570 V int-regs 1368569 4992 
##replace V int-regs 1368570 D 0 
_label 10 
##inc-r -6 
##inc-d 1 
##peek V int-regs 1368571 R -6 
##slot-imm V int-regs 1368574 V int-regs 1368571 3 2 
##peek V int-regs 1368575 D 1 
##copy V int-regs 1368579 V int-regs 1368575 
##shl-imm V int-regs 1368579 V int-regs 1368579 2 
##copy V int-regs 1368582 V int-regs 1368579 
##sar-imm V int-regs 1368582 V int-regs 1368582 3 
##add-imm V int-regs 1368583 V int-regs 1368574 13 
##add V int-regs 1368585 V int-regs 1368582 V int-regs 1368583 
##alien-unsigned-4 V int-regs 1368586 V int-regs 1368585 
##copy V int-regs 1368587 V int-regs 1368586 
##shl-imm V int-regs 1368587 V int-regs 1368587 3 
##peek V int-regs 1368588 D 2 
##copy V int-regs 1368590 V int-regs 1368588 
##xor V int-regs 1368590 V int-regs 1368590 V int-regs 1368587 
##peek V int-regs 1368591 R -4 
##peek V int-regs 1368592 R -5 
##slot-imm V int-regs 1368597 V int-regs 1368592 3 2 
##copy V int-regs 1368602 V int-regs 1368591 
##shl-imm V int-regs 1368602 V int-regs 1368602 2 
##copy V int-regs 1368605 V int-regs 1368602 
##sar-imm V int-regs 1368605 V int-regs 1368605 3 
##add-imm V int-regs 1368606 V int-regs 1368597 13 
##add V int-regs 1368608 V int-regs 1368605 V int-regs 1368606 
##copy V int-regs 1368610 V int-regs 1368590 
##sar-imm V int-regs 1368610 V int-regs 1368610 3 
##set-alien-integer-4 V int-regs 1368608 V int-regs 1368610 
##peek V int-regs 1368611 R -2 
##peek V int-regs 1368612 R -3 
##peek V int-regs 1368613 R -1 
##add-imm V int-regs 1368620 V int-regs 1368611 8 
##replace V int-regs 1368620 D 2 
##replace V int-regs 1368612 D 0 
##replace V int-regs 1368613 D 1 
_branch 4 
_label 11 
##inc-d -3 
_label 12 
##inc-r -1 
##inc-d 1 
##peek V int-regs 1368626 R -1 
##load-immediate V int-regs 1368628 0 
##set-slot-imm V int-regs 1368628 V int-regs 1368626 3 2 
##replace V int-regs 1368628 D 0 
_label 13 
##inc-r -1 
##peek V int-regs 1368634 R -1 
##slot-imm V int-regs 1368639 V int-regs 1368634 2 2 
##slot-imm V int-regs 1368644 V int-regs 1368639 3 2 
##peek V int-regs 1368645 D 0 
##copy V int-regs 1368649 V int-regs 1368645 
##shl-imm V int-regs 1368649 V int-regs 1368649 2 
##copy V int-regs 1368652 V int-regs 1368649 
##sar-imm V int-regs 1368652 V int-regs 1368652 3 
##add-imm V int-regs 1368653 V int-regs 1368644 13 
##add V int-regs 1368655 V int-regs 1368652 V int-regs 1368653 
##alien-unsigned-4 V int-regs 1368656 V int-regs 1368655 
##copy V int-regs 1368657 V int-regs 1368656 
##shl-imm V int-regs 1368657 V int-regs 1368657 3 
##copy V int-regs 1368661 V int-regs 1368657 
##sar-imm V int-regs 1368661 V int-regs 1368661 14 
##copy V int-regs 1368662 V int-regs 1368661 
##shl-imm V int-regs 1368662 V int-regs 1368662 3 
##copy V int-regs 1368665 V int-regs 1368657 
##xor V int-regs 1368665 V int-regs 1368665 V int-regs 1368662 
##copy V int-regs 1368669 V int-regs 1368665 
##shl-imm V int-regs 1368669 V int-regs 1368669 7 
##load-immediate V int-regs 1368670 21095429120 
##copy V int-regs 1368673 V int-regs 1368669 
##and V int-regs 1368673 V int-regs 1368673 V int-regs 1368670 
##copy V int-regs 1368676 V int-regs 1368665 
##xor V int-regs 1368676 V int-regs 1368676 V int-regs 1368673 
##copy V int-regs 1368680 V int-regs 1368676 
##shl-imm V int-regs 1368680 V int-regs 1368680 15 
##load-immediate V int-regs 1368681 32181846016 
##copy V int-regs 1368684 V int-regs 1368680 
##and V int-regs 1368684 V int-regs 1368684 V int-regs 1368681 
##copy V int-regs 1368687 V int-regs 1368676 
##xor V int-regs 1368687 V int-regs 1368687 V int-regs 1368684 
##copy V int-regs 1368691 V int-regs 1368687 
##sar-imm V int-regs 1368691 V int-regs 1368691 21 
##copy V int-regs 1368692 V int-regs 1368691 
##shl-imm V int-regs 1368692 V int-regs 1368692 3 
##copy V int-regs 1368695 V int-regs 1368687 
##xor V int-regs 1368695 V int-regs 1368695 V int-regs 1368692 
##replace V int-regs 1368695 D 0 
##slot-imm V int-regs 1368701 V int-regs 1368634 3 2 
##add-imm V int-regs 1368704 V int-regs 1368701 8 
##set-slot-imm V int-regs 1368704 V int-regs 1368634 3 2 
##epilogue 
##return

Finally, after register allocation, these instructions map pretty much directly onto machine code:

0x0000000109901940: add    $0x8,%r15
0x0000000109901944: add    $0x8,%r14
0x0000000109901948: mov    -0x8(%r14),%rax
0x000000010990194c: mov    %rax,(%r15)
0x000000010990194f: mov    0x16(%rax),%rcx
0x0000000109901953: mov    %rcx,(%r14)
0x0000000109901956: cmp    $0x1380,%rcx
0x000000010990195d: jge    0x109901973
0x0000000109901963: sub    $0x8,%r14
0x0000000109901967: mov    0x8(%r14),%rcx
0x000000010990196b: mov    %rcx,(%r14)
0x000000010990196e: jmpq   0x109901b5b
0x0000000109901973: add    $0x8,%r15
0x0000000109901977: add    $0x8,%r14
0x000000010990197b: mov    -0x10(%r14),%rcx
0x000000010990197f: mov    %rcx,(%r15)
0x0000000109901982: mov    $0x1380,%rax
0x000000010990198c: mov    0xe(%rcx),%rdx
0x0000000109901990: mov    $0x0,%rcx
0x000000010990199a: mov    %rdx,(%r14)
0x000000010990199d: mov    %rax,-0x8(%r14)
0x00000001099019a1: mov    %rcx,-0x10(%r14)
0x00000001099019a5: nop    
0x00000001099019a6: nop    
0x00000001099019a7: nop    
0x00000001099019a8: nop    
0x00000001099019a9: nop    
0x00000001099019aa: nop    
0x00000001099019ab: nop    
0x00000001099019ac: nop    
0x00000001099019ad: nop    
0x00000001099019ae: nop    
0x00000001099019af: nop    
0x00000001099019b0: mov    -0x10(%r14),%rcx
0x00000001099019b4: cmp    $0x1380,%rcx
0x00000001099019bb: jge    0x109901b3a
0x00000001099019c1: add    $0x38,%r15
0x00000001099019c5: mov    -0x8(%r14),%rcx
0x00000001099019c9: mov    (%r14),%rdx
0x00000001099019cc: mov    %rcx,-0x30(%r15)
0x00000001099019d0: mov    -0x10(%r14),%rcx
0x00000001099019d4: mov    %rdx,-0x20(%r15)
0x00000001099019d8: mov    %rcx,-0x28(%r15)
0x00000001099019dc: mov    %rdx,-0x10(%r15)
0x00000001099019e0: mov    %rcx,-0x18(%r15)
0x00000001099019e4: mov    %rdx,(%r15)
0x00000001099019e7: mov    %rcx,-0x8(%r15)
0x00000001099019eb: mov    0x16(%rdx),%rax
0x00000001099019ef: mov    %rcx,%rdx
0x00000001099019f2: shl    $0x2,%rdx
0x00000001099019f6: sar    $0x3,%rdx
0x00000001099019fa: lea    0xd(%rax),%rbx
0x00000001099019fe: lea    (%rdx,%rbx,1),%rax
0x0000000109901a02: mov    (%rax),%edx
0x0000000109901a04: shl    $0x3,%rdx
0x0000000109901a08: mov    $0x400000000,%rax
0x0000000109901a12: and    %rax,%rdx
0x0000000109901a15: lea    0x8(%rcx),%rax
0x0000000109901a19: shl    $0x2,%rax
0x0000000109901a1d: sar    $0x3,%rax
0x0000000109901a21: lea    (%rax,%rbx,1),%rcx
0x0000000109901a25: mov    (%rcx),%eax
0x0000000109901a27: shl    $0x3,%rax
0x0000000109901a2b: mov    $0x3fffffff8,%rcx
0x0000000109901a35: and    %rcx,%rax
0x0000000109901a38: or     %rax,%rdx
0x0000000109901a3b: mov    %rdx,%rax
0x0000000109901a3e: sar    $0x4,%rax
0x0000000109901a42: shl    $0x3,%rax
0x0000000109901a46: mov    %rax,-0x10(%r14)
0x0000000109901a4a: and    $0x8,%rdx
0x0000000109901a4e: mov    $0x4c84586f8,%rax
0x0000000109901a58: mov    $0x0,%rcx
0x0000000109901a62: mov    %rcx,(%r14)
0x0000000109901a65: mov    %rax,-0x8(%r14)
0x0000000109901a69: cmp    $0x8,%rdx
0x0000000109901a6d: jne    0x109901a7c
0x0000000109901a73: sub    $0x8,%r14
0x0000000109901a77: jmpq   0x109901a87
0x0000000109901a7c: sub    $0x8,%r14
0x0000000109901a80: mov    0x8(%r14),%rcx
0x0000000109901a84: mov    %rcx,(%r14)
0x0000000109901a87: sub    $0x8,%r15
0x0000000109901a8b: mov    -0x8(%r14),%rcx
0x0000000109901a8f: mov    (%r14),%rax
0x0000000109901a92: xor    %rax,%rcx
0x0000000109901a95: mov    %rcx,-0x8(%r14)
0x0000000109901a99: mov    (%r15),%rcx
0x0000000109901a9c: mov    0x8(%r15),%rax
0x0000000109901aa0: lea    0xc68(%rcx),%rdx
0x0000000109901aa7: mov    %rax,(%r15)
0x0000000109901aaa: mov    %rdx,(%r14)
0x0000000109901aad: cmp    $0x1380,%rdx
0x0000000109901ab4: jle    0x109901ac7
0x0000000109901aba: mov    (%r14),%rdx
0x0000000109901abd: lea    -0x1380(%rdx),%rax
0x0000000109901ac4: mov    %rax,(%r14)
0x0000000109901ac7: sub    $0x30,%r15
0x0000000109901acb: add    $0x8,%r14
0x0000000109901acf: mov    0x30(%r15),%rax
0x0000000109901ad3: mov    0x16(%rax),%rdx
0x0000000109901ad7: mov    -0x8(%r14),%rax
0x0000000109901adb: shl    $0x2,%rax
0x0000000109901adf: sar    $0x3,%rax
0x0000000109901ae3: lea    0xd(%rdx),%rcx
0x0000000109901ae7: lea    (%rax,%rcx,1),%rdx
0x0000000109901aeb: mov    (%rdx),%ecx
0x0000000109901aed: shl    $0x3,%rcx
0x0000000109901af1: mov    -0x10(%r14),%rdx
0x0000000109901af5: xor    %rcx,%rdx
0x0000000109901af8: mov    0x20(%r15),%rcx
0x0000000109901afc: mov    0x28(%r15),%rax
0x0000000109901b00: mov    0x16(%rax),%rbx
0x0000000109901b04: shl    $0x2,%rcx
0x0000000109901b08: sar    $0x3,%rcx
0x0000000109901b0c: lea    0xd(%rbx),%rax
0x0000000109901b10: lea    (%rcx,%rax,1),%rbx
0x0000000109901b14: sar    $0x3,%rdx
0x0000000109901b18: mov    %edx,(%rbx)
0x0000000109901b1a: mov    0x10(%r15),%rdx
0x0000000109901b1e: mov    0x18(%r15),%rbx
0x0000000109901b22: mov    0x8(%r15),%rax
0x0000000109901b26: lea    0x8(%rdx),%rcx
0x0000000109901b2a: mov    %rcx,-0x10(%r14)
0x0000000109901b2e: mov    %rbx,(%r14)
0x0000000109901b31: mov    %rax,-0x8(%r14)
0x0000000109901b35: jmpq   0x1099019b0
0x0000000109901b3a: sub    $0x18,%r14
0x0000000109901b3e: sub    $0x8,%r15
0x0000000109901b42: add    $0x8,%r14
0x0000000109901b46: mov    0x8(%r15),%rcx
0x0000000109901b4a: mov    $0x0,%rax
0x0000000109901b54: mov    %rax,0x16(%rcx)
0x0000000109901b58: mov    %rax,(%r14)
0x0000000109901b5b: sub    $0x8,%r15
0x0000000109901b5f: mov    0x8(%r15),%rax
0x0000000109901b63: mov    0xe(%rax),%rcx
0x0000000109901b67: mov    0x16(%rcx),%rbx
0x0000000109901b6b: mov    (%r14),%rcx
0x0000000109901b6e: shl    $0x2,%rcx
0x0000000109901b72: sar    $0x3,%rcx
0x0000000109901b76: lea    0xd(%rbx),%rdx
0x0000000109901b7a: lea    (%rcx,%rdx,1),%rbx
0x0000000109901b7e: mov    (%rbx),%edx
0x0000000109901b80: shl    $0x3,%rdx
0x0000000109901b84: mov    %rdx,%rbx
0x0000000109901b87: sar    $0xe,%rbx
0x0000000109901b8b: shl    $0x3,%rbx
0x0000000109901b8f: xor    %rbx,%rdx
0x0000000109901b92: mov    %rdx,%rbx
0x0000000109901b95: shl    $0x7,%rbx
0x0000000109901b99: mov    $0x4e962b400,%rcx
0x0000000109901ba3: and    %rcx,%rbx
0x0000000109901ba6: xor    %rbx,%rdx
0x0000000109901ba9: mov    %rdx,%rbx
0x0000000109901bac: shl    $0xf,%rbx
0x0000000109901bb0: mov    $0x77e300000,%rcx
0x0000000109901bba: and    %rcx,%rbx
0x0000000109901bbd: xor    %rbx,%rdx
0x0000000109901bc0: mov    %rdx,%rbx
0x0000000109901bc3: sar    $0x15,%rbx
0x0000000109901bc7: shl    $0x3,%rbx
0x0000000109901bcb: xor    %rbx,%rdx
0x0000000109901bce: mov    %rdx,(%r14)
0x0000000109901bd1: mov    0x16(%rax),%rdx
0x0000000109901bd5: lea    0x8(%rdx),%rbx
0x0000000109901bd9: mov    %rbx,0x16(%rax)
0x0000000109901bdd: retq

That's pretty tight code for a dynamic language.

Implementation of specialized arrays

So there are 13 C types here: char uchar short ushort int uint long ulong longlong ulonglong float double void*. For each C type, we have four vocabularies: specialized-arrays.type, specialized-vectors.type, specialized-arrays.direct.type, and io.mmap.type. Implementing all of this by hand would be a pain: even with the best abstractions in Factor's object system, at a minimum there would be 4 or 5 lines of code per type and per feature, and ideally you'd want to clone some methods once per implementation instead of abstracting everything out through dynamic dispatch, for performance reasons. So instead, I did what any self-respecting programmer would do in this situation: I made the computer do all the hard work. Factor lacked the abstraction necessary to express what I wanted, but Factor is extensible, so I just implemented the abstraction -- as a library. The feature we want here is essentially C++ templates. So yes, there is now a functors vocabulary which implements what amounts to C++ templates... in Factor. I called them "functors", even though they're more like templates than ML functors, because the term "template" has meaning in Factor's web framework and I didn't want to create confusion. I don't expect functors will see widespread use and I certainly don't want to promote this as a central feature of Factor. It is just another library that happens to extend the parser -- like many other Factor libraries do.

Let's look at how functors are used. For example, here is the entire text of the specialized-arrays.direct.int vocabulary:

USING: specialized-arrays.int specialized-arrays.direct.functor ;
IN: specialized-arrays.direct.int

<< "int" define-direct-array >>

The key of course is that its invoking the define-direct-array word, at parse time (hence the << and >>). This word is defined in the specialized-arrays.direct.functor vocabulary, and its definition looks quite unusual:

FUNCTOR: define-direct-array ( T -- )

A'      IS ${T}-array
>A'     IS >${T}-array
<A'>    IS <${A'}>

A       DEFINES direct-${T}-array
<A>     DEFINES <${A}>

NTH     [ T dup c-getter array-accessor ]
SET-NTH [ T dup c-setter array-accessor ]

WHERE

TUPLE: A
{ underlying alien read-only }
{ length fixnum read-only } ;

: <A> ( alien len -- direct-array ) A boa ; inline
M: A length length>> ;
M: A nth-unsafe underlying>> NTH call ;
M: A set-nth-unsafe underlying>> SET-NTH call ;
M: A like drop dup A instance? [ >A' execute ] unless ;
M: A new-sequence drop <A'> execute ;

INSTANCE: A sequence

;FUNCTOR

This FUNCTOR: ... IS / DEFINES ... WHERE ... ;FUNCTOR business is not new syntax in the language; its just parsing words defined in the functors vocabulary. Essentially we are defining a word, define-direct-array, which generates code, parameterized by a single input T, which is a C type name in this case. Note that what's going on here is more elaborate than text substitution; the functor body is parsed once, and converted into code which instantiates it, rather than simply expanding at usages like C preprocessor macros do.

Functors may seem like an elaborate new piece of functionality, comparable in complexity to the object system or locals, but in reality, the emperor has no clothes; they have a trivial desugaring.

First, inside a functor body, the TUPLE:, :, M: and INSTANCE: are shadowed to special words in the functors vocabulary named `TUPLE:, `:, and so on.

Next, IS and DEFINES desugar as follows.

A'      IS ${T}-array

expands into

A' [ [ I[ ${T}-array]I ] with-string-writer dup search [ ] [ no-word ] ?if ]

and

<A>     DEFINES <${A}>

expands into

<A> [ [ I[ <${A}>]I ] with-string-writer create-in ]

The I[ syntax is just the interpolate library, which turns this,

I[ Hello, ${name}.]I

into this,

"Hello, " write name write "." write

Now,

FUNCTOR: foo ( inputs -- )

...names...

WHERE

...body...

;FUNCTOR

is really just syntax sugar for

:: foo ( inputs -- )
    [let* | ...names... | ...body... ] ;

With the added caveat that inside the body, `: is remapped to : and so on.

As you can see, a seemingly powerful feature is melting away as it becomse apparent its only some rather trivial syntax sugar. Words such as `: desugar like so;

`: foo 2 2 + . ;

becomes

foo [ 2 2 + . ] define

Let's look at a functor and its desugaring. Here is a simple functor, it takes a tuple class name as input, and defines a constructor word <foo> which makes a new instance of this tuple class using new. So it is like the built-in parsing word C:, except it uses new instead of boa:

FUNCTOR: make-constructor ( name -- )

tuple IS ${name}
<tuple> DEFINES <${name}>

WHERE

: <tuple> tuple new ;

;FUNCTOR

Given the above, this functor is really equivalent to the following:

:: make-constructor ( name -- )
    [let* | tuple [ name dup search [ ] [ no-word ] ?if ]
            <tuple> [ [ "<" write name write ">" write ] with-string-writer ] |
        <tuple> [ tuple new ] define
    ] ;

We could keep going -- the :: word, which defines a word with named parameters, is it self a library word, in the locals vocabulary, which desugars down to stack code.

In this simple example, the hand-coded version is shorter than the functor, but the point of using a functor is that you can make the meta-code look like the code that it is generating. For the more complex functors that generate the various forms of specialized arrays, this is invaluable because it makes the code easier to maintain and debug.

Finally, the stuff described in this post, especially the implementation of functors, is some pretty advanced stuff. I don't expect most people using Factor to know or care about how this works; after all, you can use specialized arrays without "peeking under the hood", so to speak. However, all of this meta-programming power is there when you are ready for it.

Factor: a practical stack language

Friday, December 19, 2008

Discussion of concatenative languages on various blogs

Monday, December 15, 2008

Prevalence of shuffle words and dataflow combinators

Thursday, December 04, 2008

Arrays of unboxed primitive values, and a faster Mersenne Twister

Specialized arrays

Specialized vectors

Direct arrays

Memory-mapped files

Mersenne Twister

Implementation of specialized arrays

Monday, December 01, 2008

Factor now fully supported on 64-bit Windows

Older Posts

Recent Factor Commits

Planet Factor

Links