Friday, December 19, 2008

Discussion of concatenative languages on various blogs

A couple of bloggers are exploring concatenative languages and have posted interesting articles, with more depth than the usual "dup * squares an integer" cursory glance:

Aaron Schaefer (elasticdog in #concatenative; he's been contributing to our math library and writing project euler solutions) has written a couple of introductory Factor articles too, and is working on a third installment.

Things have also been picking up in the #concatenative IRC channel on FreeNode, and usually there are 60-70 people online. Lots of interesting discussion going on there.

Finally, I'll take this opportunity to plug the concatenative languages wiki, and remind everyone that it running on a web framework and server written in a concatenative language. Other than an issue with our O/R mapper that we fixed recently, it has been running without any problems. It has been serving between 10,000 and 24,000 hits a day; I'd like to see it hit 10 or 20 times that number and see how it holds up.

Monday, December 15, 2008

Prevalence of shuffle words and dataflow combinators

Today on IRC, binarybandit wrote this word:
: pie-chart ( keys values -- url )
[ "|" join ] [ [ number>string ] map "," join ] bi*
"http://chart.apis.google.com/chart?cht=p&chs=500x250&chl=%s&chd=t:%s" sprintf ;

(Side note: it uses the recently-contributed printf vocabulary, which is a nice piece of work: it parses the format string uses PEGs and compiles to quotations at compile time).

This inspired me to make some pie charts showing the frequency of usages of shuffle words and dataflow combinators in the Factor library. So first, we need a word which takes a sequence of words and counts the number of usages of each, normalizing the results so that their sum is 1:
: usage-histogram ( words -- keys values )
[ [ name>> ] [ usage length ] bi ] { } map>assoc
dup values sum '[ _ /f ] assoc-map
sort-values unzip ;

Now we can use a meta-programming trick to extract the list of shufflers from the shuffle words help article:
"shuffle-words" >link uses [ word? ] filter usage-histogram pie-chart print

Here is the resulting chart:

Note that it counts the number of usages of each shuffler, so a word that calls a shuffler more than once is not counted, and a word which uses two distinct shufflers is counted twice. You can see that dup, drop and swap are by far the most popular, and the more complex patterns are rarely used. It is interesting that when beginners first start writing Factor they tend to write code which is heavily biased towards the less frequently-used shufflers. I think teaching dataflow combinators before shufflers can help here, because they make it easier to structure your code in a clean way.

For cleave/spread/apply combinators, we have to do a bit more work to get a list of them programatically, since they're spread over several help articles:
[ "cleave-combinators" "spread-combinators" "apply-combinators" ]
[ >link uses ] map concat { either? both? } diff
[ word? ] filter

Now we can make the pie chart as before. Here is the result:

Again, bi completely dominates.

I'm not sure what to make of these results, other than that the Google Charts API is pretty nifty, and that there is untapped potential for "code data mining" in Factor's reflection capabilities. We could use this to discover potential abstractions and patterns.

Thursday, December 04, 2008

Arrays of unboxed primitive values, and a faster Mersenne Twister

I added efficient arrays of integers and floats to Factor. We've had byte-arrays and float-arrays for a while; this code is essentially a generalization of these to every primitive C type: chars, shorts, ints, longs, long longs, unsigned version of all of these, as well as floats and doubles.

Specialized arrays


For each primitive C type, there is a vocabulary named specialized-arrays.type, which defines a class named type-array, and several related words: a constructor, and a conversion word for converting existing sequences to specialized arrays of this type. As with all sequence types, the class implements methods for length, nth and set-nth; there is also a literal syntax for parsing and printing specialized arrays. An example:
USE: specialized-arrays.int
int-array{ 584 -13 11 } [ 10 + ] map

Specialized vectors


There are also resizeable versions of all of these, in specialized-vectors.type. These work like ordinary vectors; you can use words such as push and pop, accumulating a stream of values in a vector.

Direct arrays


A third variant is also provided. These are the "direct specialized arrays". Sometimes when working with a C library, the library hands you a pointer to some data which is to be viewed as an array of primitive values. Direct arrays accomplish this. For every primitive C type, the specialized-arrays.direct.type vocabulary provides a word named <direct-type-array> which takes an alien pointer and a length. It wraps them in a tuple which responds to the sequence protocol by reading from memory.

Memory-mapped files


Direct arrays are useful in another context, and that is when working with memory-mapped files. Again, for every primitive C type, there is a vocabulary, io.mmap.type. It defines two words, <mapped-type-array> and with-mapped-type-file. Here is an example usage of the latter: we're mapping in a file of floating point numbers and adding them up:
[ 0.0 [ + ] reduce ] with-mapped-float-file
We've had io.mmap for a while, but specialized arrays make this library a lot more useful.

This is all really powerful stuff, because on one hand, we have these high-level data types that behave like any other Factor sequence and can be used interchangeably with sequence utility words, but on the other hand, not only do they offer a space savings over heterogeneous arrays by avoiding boxing, they also have the potential to improve performance -- the low-level primitives they're built on are already optimized by the compiler, and it is possible to write code which operates on specialized float and integer arrays without any boxing at all.

Mersenne Twister


A good example of an algorithm which benefits from specialized arrays is the Mersenne Twister random number generator. The Factor implementation is very straightforward, and shorter than the C version. I changed it to use a uint-array instead of a normal array to store the random generator state. After making a few more tweaks, the time taken to generate a 100,000,000-element byte array of random bytes on my Mac Book Pro went down to 5 seconds, from 37 seconds!

Here is the main method of the Mersenne Twister, random-32*, and its surrounding utility words:
: mt-n 624 ; inline
: mt-m 397 ; inline
: mt-a HEX: 9908b0df ; inline

: mersenne-wrap ( n -- n' )
dup mt-n > [ mt-n - ] when ; inline

: wrap-nth ( n seq -- obj )
[ mersenne-wrap ] dip nth-unsafe ; inline

: set-wrap-nth ( obj n seq -- )
[ mersenne-wrap ] dip set-nth-unsafe ; inline

: calculate-y ( n seq -- y )
[ wrap-nth 31 mask-bit ]
[ [ 1+ ] [ wrap-nth ] bi* 31 bits ] 2bi bitor ; inline

: (mt-generate) ( n seq -- next-mt )
[
calculate-y
[ 2/ ] [ odd? mt-a 0 ? ] bi bitxor
] [
[ mt-m + ] [ wrap-nth ] bi*
] 2bi bitxor ; inline

: mt-generate ( mt -- )
[
mt-n swap seq>> '[
_ [ (mt-generate) ] [ set-wrap-nth ] 2bi
] each
] [ 0 >>i drop ] bi ; inline

: mt-temper ( y -- yt )
dup -11 shift bitxor
dup 7 shift HEX: 9d2c5680 bitand bitxor
dup 15 shift HEX: efc60000 bitand bitxor
dup -18 shift bitxor ; inline

: next-index ( mt -- i )
dup i>> dup mt-n < [ nip ] [ drop mt-generate 0 ] if ; inline

M: mersenne-twister random-32* ( mt -- r )
[ next-index ]
[ seq>> wrap-nth mt-temper ]
[ [ 1+ ] change-i drop ] tri ;

A naive implementation of Factor would struggle with this code because of method dispatch overhead (all arithmetic and sequence access is generic), and because of overflow checks on integer arithmetic (integers have arbitrary precision in Factor). Thankfully, the Factor implementation is not naive. The high-level optimizer, applied to the above code, produces the following output -- note that the translation from high-level IR back to Factor code is imprecise, and some IR nodes cannot be expressed, hence the "COMPLEX SHUFFLE" strings that appear here:
[
dup >r dup 3 slot dup 624 fixnum<
[ nip ] [
drop dup >r 624 swap 2 slot
>r r> >r r> 0 -rot \ ( gensym ) [
pick pick fixnum< [
swap >r 2dup "COMPLEX SHUFFLE" >r r> 2dup
"COMPLEX SHUFFLE" 2dup "COMPLEX SHUFFLE" 2dup
"COMPLEX SHUFFLE" >r r> 3 slot
swap 4 fixnum*fast
alien-unsigned-4 2147483648 fixnum-bitand
"COMPLEX SHUFFLE" >r 1 fixnum+fast
r> >r r> 3 slot swap 4 fixnum*fast
alien-unsigned-4 2147483647 fixnum-bitand
fixnum-bitor dup >r -1 fixnum-shift-fast
r> 1 fixnum-bitand 1 eq? 2567483615 0 rot
[ drop ] [ nip ] if fixnum-bitxor
"COMPLEX SHUFFLE" >r 397 fixnum+fast
r> >r dup 624 fixnum>
[ 624 fixnum-fast ] [ ] if r> 3 slot
swap 4 fixnum*fast
alien-unsigned-4 fixnum-bitxor
"COMPLEX SHUFFLE" >r r> 3 slot
swap 4 fixnum*fast
set-alien-unsigned-4 "COMPLEX SHUFFLE" r> swap
"COMPLEX SHUFFLE" 1 fixnum+fast
"COMPLEX SHUFFLE" ( gensym )
] [ 3drop ] if
] label r> >r 0 r> 3 set-slot 0
] if r> dup >r 2 slot >r r> 3 slot swap 4 fixnum*fast
alien-unsigned-4 dup -11 fixnum-shift-fast fixnum-bitxor
dup 7 fixnum-shift-fast 2636928640 fixnum-bitand
fixnum-bitxor dup 15 fixnum-shift-fast
4022730752 fixnum-bitand fixnum-bitxor
dup -18 fixnum-shift-fast fixnum-bitxor r> dup >r 3 slot
1 fixnum+fast r> swap swap >r r> 3 set-slot
]

Only primitive calls remain; everything else has evaporated after 11 distinct optimization passes. A lot of redundant stack shuffling still remains; the low-level optimizer eliminates that and produces a sequence of low-level IR instructions:
=== word: mersenne-twister=>random-32*, label: mersenne-twister=>random-32*

_label 0
##prologue
_label 1
##inc-r 1
##inc-d 1
##peek V int-regs 1368412 D 1
##replace V int-regs 1368412 R 0
##slot-imm V int-regs 1368417 V int-regs 1368412 3 2
##replace V int-regs 1368417 D 0
_compare-imm-branch 3 V int-regs 1368417 4992 cc>=
_label 2
##inc-d -1
##peek V int-regs 1368425 D -1
##replace V int-regs 1368425 D 0
_branch 13
_label 3
##inc-r 1
##inc-d 1
##peek V int-regs 1368427 D 2
##replace V int-regs 1368427 R 0
##load-immediate V int-regs 1368429 4992
##slot-imm V int-regs 1368434 V int-regs 1368427 2 2
##load-immediate V int-regs 1368439 0
##replace V int-regs 1368434 D 0
##replace V int-regs 1368429 D 1
##replace V int-regs 1368439 D 2
##loop-entry
_label 4
##peek V int-regs 1368443 D 2
_compare-imm-branch 11 V int-regs 1368443 4992 cc>=
_label 5
##inc-r 7
##peek V int-regs 1368453 D 1
##peek V int-regs 1368454 D 0
##replace V int-regs 1368453 R 6
##peek V int-regs 1368456 D 2
##replace V int-regs 1368454 R 4
##replace V int-regs 1368456 R 5
##replace V int-regs 1368454 R 2
##replace V int-regs 1368456 R 3
##replace V int-regs 1368454 R 0
##replace V int-regs 1368456 R 1
##slot-imm V int-regs 1368478 V int-regs 1368454 3 2
##copy V int-regs 1368483 V int-regs 1368456
##shl-imm V int-regs 1368483 V int-regs 1368483 2
##copy V int-regs 1368486 V int-regs 1368483
##sar-imm V int-regs 1368486 V int-regs 1368486 3
##add-imm V int-regs 1368487 V int-regs 1368478 13
##add V int-regs 1368489 V int-regs 1368486 V int-regs 1368487
##alien-unsigned-4 V int-regs 1368490 V int-regs 1368489
##copy V int-regs 1368491 V int-regs 1368490
##shl-imm V int-regs 1368491 V int-regs 1368491 3
##load-immediate V int-regs 1368492 17179869184
##copy V int-regs 1368495 V int-regs 1368491
##and V int-regs 1368495 V int-regs 1368495 V int-regs 1368492
##add-imm V int-regs 1368501 V int-regs 1368456 8
##copy V int-regs 1368512 V int-regs 1368501
##shl-imm V int-regs 1368512 V int-regs 1368512 2
##copy V int-regs 1368515 V int-regs 1368512
##sar-imm V int-regs 1368515 V int-regs 1368515 3
##add V int-regs 1368518 V int-regs 1368515 V int-regs 1368487
##alien-unsigned-4 V int-regs 1368519 V int-regs 1368518
##copy V int-regs 1368520 V int-regs 1368519
##shl-imm V int-regs 1368520 V int-regs 1368520 3
##load-immediate V int-regs 1368521 17179869176
##copy V int-regs 1368524 V int-regs 1368520
##and V int-regs 1368524 V int-regs 1368524 V int-regs 1368521
##copy V int-regs 1368527 V int-regs 1368495
##or V int-regs 1368527 V int-regs 1368527 V int-regs 1368524
##copy V int-regs 1368532 V int-regs 1368527
##sar-imm V int-regs 1368532 V int-regs 1368532 4
##copy V int-regs 1368533 V int-regs 1368532
##shl-imm V int-regs 1368533 V int-regs 1368533 3
##replace V int-regs 1368533 D 2
##copy V int-regs 1368537 V int-regs 1368527
##and-imm V int-regs 1368537 V int-regs 1368537 8
##load-immediate V int-regs 1368542 20539868920
##load-immediate V int-regs 1368543 0
##replace V int-regs 1368543 D 0
##replace V int-regs 1368542 D 1
_compare-imm-branch 7 V int-regs 1368537 8 cc/=
_label 6
##inc-d -1
_branch 8
_label 7
##inc-d -1
##peek V int-regs 1368550 D -1
##replace V int-regs 1368550 D 0
_label 8
##inc-r -1
##peek V int-regs 1368551 D 1
##peek V int-regs 1368552 D 0
##copy V int-regs 1368553 V int-regs 1368551
##xor V int-regs 1368553 V int-regs 1368553 V int-regs 1368552
##replace V int-regs 1368553 D 1
##peek V int-regs 1368554 R 0
##peek V int-regs 1368555 R -1
##add-imm V int-regs 1368559 V int-regs 1368554 3176
##replace V int-regs 1368555 R 0
##replace V int-regs 1368559 D 0
_compare-imm-branch 10 V int-regs 1368559 4992 cc<=
_label 9
##peek V int-regs 1368569 D 0
##sub-imm V int-regs 1368570 V int-regs 1368569 4992
##replace V int-regs 1368570 D 0
_label 10
##inc-r -6
##inc-d 1
##peek V int-regs 1368571 R -6
##slot-imm V int-regs 1368574 V int-regs 1368571 3 2
##peek V int-regs 1368575 D 1
##copy V int-regs 1368579 V int-regs 1368575
##shl-imm V int-regs 1368579 V int-regs 1368579 2
##copy V int-regs 1368582 V int-regs 1368579
##sar-imm V int-regs 1368582 V int-regs 1368582 3
##add-imm V int-regs 1368583 V int-regs 1368574 13
##add V int-regs 1368585 V int-regs 1368582 V int-regs 1368583
##alien-unsigned-4 V int-regs 1368586 V int-regs 1368585
##copy V int-regs 1368587 V int-regs 1368586
##shl-imm V int-regs 1368587 V int-regs 1368587 3
##peek V int-regs 1368588 D 2
##copy V int-regs 1368590 V int-regs 1368588
##xor V int-regs 1368590 V int-regs 1368590 V int-regs 1368587
##peek V int-regs 1368591 R -4
##peek V int-regs 1368592 R -5
##slot-imm V int-regs 1368597 V int-regs 1368592 3 2
##copy V int-regs 1368602 V int-regs 1368591
##shl-imm V int-regs 1368602 V int-regs 1368602 2
##copy V int-regs 1368605 V int-regs 1368602
##sar-imm V int-regs 1368605 V int-regs 1368605 3
##add-imm V int-regs 1368606 V int-regs 1368597 13
##add V int-regs 1368608 V int-regs 1368605 V int-regs 1368606
##copy V int-regs 1368610 V int-regs 1368590
##sar-imm V int-regs 1368610 V int-regs 1368610 3
##set-alien-integer-4 V int-regs 1368608 V int-regs 1368610
##peek V int-regs 1368611 R -2
##peek V int-regs 1368612 R -3
##peek V int-regs 1368613 R -1
##add-imm V int-regs 1368620 V int-regs 1368611 8
##replace V int-regs 1368620 D 2
##replace V int-regs 1368612 D 0
##replace V int-regs 1368613 D 1
_branch 4
_label 11
##inc-d -3
_label 12
##inc-r -1
##inc-d 1
##peek V int-regs 1368626 R -1
##load-immediate V int-regs 1368628 0
##set-slot-imm V int-regs 1368628 V int-regs 1368626 3 2
##replace V int-regs 1368628 D 0
_label 13
##inc-r -1
##peek V int-regs 1368634 R -1
##slot-imm V int-regs 1368639 V int-regs 1368634 2 2
##slot-imm V int-regs 1368644 V int-regs 1368639 3 2
##peek V int-regs 1368645 D 0
##copy V int-regs 1368649 V int-regs 1368645
##shl-imm V int-regs 1368649 V int-regs 1368649 2
##copy V int-regs 1368652 V int-regs 1368649
##sar-imm V int-regs 1368652 V int-regs 1368652 3
##add-imm V int-regs 1368653 V int-regs 1368644 13
##add V int-regs 1368655 V int-regs 1368652 V int-regs 1368653
##alien-unsigned-4 V int-regs 1368656 V int-regs 1368655
##copy V int-regs 1368657 V int-regs 1368656
##shl-imm V int-regs 1368657 V int-regs 1368657 3
##copy V int-regs 1368661 V int-regs 1368657
##sar-imm V int-regs 1368661 V int-regs 1368661 14
##copy V int-regs 1368662 V int-regs 1368661
##shl-imm V int-regs 1368662 V int-regs 1368662 3
##copy V int-regs 1368665 V int-regs 1368657
##xor V int-regs 1368665 V int-regs 1368665 V int-regs 1368662
##copy V int-regs 1368669 V int-regs 1368665
##shl-imm V int-regs 1368669 V int-regs 1368669 7
##load-immediate V int-regs 1368670 21095429120
##copy V int-regs 1368673 V int-regs 1368669
##and V int-regs 1368673 V int-regs 1368673 V int-regs 1368670
##copy V int-regs 1368676 V int-regs 1368665
##xor V int-regs 1368676 V int-regs 1368676 V int-regs 1368673
##copy V int-regs 1368680 V int-regs 1368676
##shl-imm V int-regs 1368680 V int-regs 1368680 15
##load-immediate V int-regs 1368681 32181846016
##copy V int-regs 1368684 V int-regs 1368680
##and V int-regs 1368684 V int-regs 1368684 V int-regs 1368681
##copy V int-regs 1368687 V int-regs 1368676
##xor V int-regs 1368687 V int-regs 1368687 V int-regs 1368684
##copy V int-regs 1368691 V int-regs 1368687
##sar-imm V int-regs 1368691 V int-regs 1368691 21
##copy V int-regs 1368692 V int-regs 1368691
##shl-imm V int-regs 1368692 V int-regs 1368692 3
##copy V int-regs 1368695 V int-regs 1368687
##xor V int-regs 1368695 V int-regs 1368695 V int-regs 1368692
##replace V int-regs 1368695 D 0
##slot-imm V int-regs 1368701 V int-regs 1368634 3 2
##add-imm V int-regs 1368704 V int-regs 1368701 8
##set-slot-imm V int-regs 1368704 V int-regs 1368634 3 2
##epilogue
##return

Finally, after register allocation, these instructions map pretty much directly onto machine code:
0x0000000109901940: add    $0x8,%r15
0x0000000109901944: add $0x8,%r14
0x0000000109901948: mov -0x8(%r14),%rax
0x000000010990194c: mov %rax,(%r15)
0x000000010990194f: mov 0x16(%rax),%rcx
0x0000000109901953: mov %rcx,(%r14)
0x0000000109901956: cmp $0x1380,%rcx
0x000000010990195d: jge 0x109901973
0x0000000109901963: sub $0x8,%r14
0x0000000109901967: mov 0x8(%r14),%rcx
0x000000010990196b: mov %rcx,(%r14)
0x000000010990196e: jmpq 0x109901b5b
0x0000000109901973: add $0x8,%r15
0x0000000109901977: add $0x8,%r14
0x000000010990197b: mov -0x10(%r14),%rcx
0x000000010990197f: mov %rcx,(%r15)
0x0000000109901982: mov $0x1380,%rax
0x000000010990198c: mov 0xe(%rcx),%rdx
0x0000000109901990: mov $0x0,%rcx
0x000000010990199a: mov %rdx,(%r14)
0x000000010990199d: mov %rax,-0x8(%r14)
0x00000001099019a1: mov %rcx,-0x10(%r14)
0x00000001099019a5: nop
0x00000001099019a6: nop
0x00000001099019a7: nop
0x00000001099019a8: nop
0x00000001099019a9: nop
0x00000001099019aa: nop
0x00000001099019ab: nop
0x00000001099019ac: nop
0x00000001099019ad: nop
0x00000001099019ae: nop
0x00000001099019af: nop
0x00000001099019b0: mov -0x10(%r14),%rcx
0x00000001099019b4: cmp $0x1380,%rcx
0x00000001099019bb: jge 0x109901b3a
0x00000001099019c1: add $0x38,%r15
0x00000001099019c5: mov -0x8(%r14),%rcx
0x00000001099019c9: mov (%r14),%rdx
0x00000001099019cc: mov %rcx,-0x30(%r15)
0x00000001099019d0: mov -0x10(%r14),%rcx
0x00000001099019d4: mov %rdx,-0x20(%r15)
0x00000001099019d8: mov %rcx,-0x28(%r15)
0x00000001099019dc: mov %rdx,-0x10(%r15)
0x00000001099019e0: mov %rcx,-0x18(%r15)
0x00000001099019e4: mov %rdx,(%r15)
0x00000001099019e7: mov %rcx,-0x8(%r15)
0x00000001099019eb: mov 0x16(%rdx),%rax
0x00000001099019ef: mov %rcx,%rdx
0x00000001099019f2: shl $0x2,%rdx
0x00000001099019f6: sar $0x3,%rdx
0x00000001099019fa: lea 0xd(%rax),%rbx
0x00000001099019fe: lea (%rdx,%rbx,1),%rax
0x0000000109901a02: mov (%rax),%edx
0x0000000109901a04: shl $0x3,%rdx
0x0000000109901a08: mov $0x400000000,%rax
0x0000000109901a12: and %rax,%rdx
0x0000000109901a15: lea 0x8(%rcx),%rax
0x0000000109901a19: shl $0x2,%rax
0x0000000109901a1d: sar $0x3,%rax
0x0000000109901a21: lea (%rax,%rbx,1),%rcx
0x0000000109901a25: mov (%rcx),%eax
0x0000000109901a27: shl $0x3,%rax
0x0000000109901a2b: mov $0x3fffffff8,%rcx
0x0000000109901a35: and %rcx,%rax
0x0000000109901a38: or %rax,%rdx
0x0000000109901a3b: mov %rdx,%rax
0x0000000109901a3e: sar $0x4,%rax
0x0000000109901a42: shl $0x3,%rax
0x0000000109901a46: mov %rax,-0x10(%r14)
0x0000000109901a4a: and $0x8,%rdx
0x0000000109901a4e: mov $0x4c84586f8,%rax
0x0000000109901a58: mov $0x0,%rcx
0x0000000109901a62: mov %rcx,(%r14)
0x0000000109901a65: mov %rax,-0x8(%r14)
0x0000000109901a69: cmp $0x8,%rdx
0x0000000109901a6d: jne 0x109901a7c
0x0000000109901a73: sub $0x8,%r14
0x0000000109901a77: jmpq 0x109901a87
0x0000000109901a7c: sub $0x8,%r14
0x0000000109901a80: mov 0x8(%r14),%rcx
0x0000000109901a84: mov %rcx,(%r14)
0x0000000109901a87: sub $0x8,%r15
0x0000000109901a8b: mov -0x8(%r14),%rcx
0x0000000109901a8f: mov (%r14),%rax
0x0000000109901a92: xor %rax,%rcx
0x0000000109901a95: mov %rcx,-0x8(%r14)
0x0000000109901a99: mov (%r15),%rcx
0x0000000109901a9c: mov 0x8(%r15),%rax
0x0000000109901aa0: lea 0xc68(%rcx),%rdx
0x0000000109901aa7: mov %rax,(%r15)
0x0000000109901aaa: mov %rdx,(%r14)
0x0000000109901aad: cmp $0x1380,%rdx
0x0000000109901ab4: jle 0x109901ac7
0x0000000109901aba: mov (%r14),%rdx
0x0000000109901abd: lea -0x1380(%rdx),%rax
0x0000000109901ac4: mov %rax,(%r14)
0x0000000109901ac7: sub $0x30,%r15
0x0000000109901acb: add $0x8,%r14
0x0000000109901acf: mov 0x30(%r15),%rax
0x0000000109901ad3: mov 0x16(%rax),%rdx
0x0000000109901ad7: mov -0x8(%r14),%rax
0x0000000109901adb: shl $0x2,%rax
0x0000000109901adf: sar $0x3,%rax
0x0000000109901ae3: lea 0xd(%rdx),%rcx
0x0000000109901ae7: lea (%rax,%rcx,1),%rdx
0x0000000109901aeb: mov (%rdx),%ecx
0x0000000109901aed: shl $0x3,%rcx
0x0000000109901af1: mov -0x10(%r14),%rdx
0x0000000109901af5: xor %rcx,%rdx
0x0000000109901af8: mov 0x20(%r15),%rcx
0x0000000109901afc: mov 0x28(%r15),%rax
0x0000000109901b00: mov 0x16(%rax),%rbx
0x0000000109901b04: shl $0x2,%rcx
0x0000000109901b08: sar $0x3,%rcx
0x0000000109901b0c: lea 0xd(%rbx),%rax
0x0000000109901b10: lea (%rcx,%rax,1),%rbx
0x0000000109901b14: sar $0x3,%rdx
0x0000000109901b18: mov %edx,(%rbx)
0x0000000109901b1a: mov 0x10(%r15),%rdx
0x0000000109901b1e: mov 0x18(%r15),%rbx
0x0000000109901b22: mov 0x8(%r15),%rax
0x0000000109901b26: lea 0x8(%rdx),%rcx
0x0000000109901b2a: mov %rcx,-0x10(%r14)
0x0000000109901b2e: mov %rbx,(%r14)
0x0000000109901b31: mov %rax,-0x8(%r14)
0x0000000109901b35: jmpq 0x1099019b0
0x0000000109901b3a: sub $0x18,%r14
0x0000000109901b3e: sub $0x8,%r15
0x0000000109901b42: add $0x8,%r14
0x0000000109901b46: mov 0x8(%r15),%rcx
0x0000000109901b4a: mov $0x0,%rax
0x0000000109901b54: mov %rax,0x16(%rcx)
0x0000000109901b58: mov %rax,(%r14)
0x0000000109901b5b: sub $0x8,%r15
0x0000000109901b5f: mov 0x8(%r15),%rax
0x0000000109901b63: mov 0xe(%rax),%rcx
0x0000000109901b67: mov 0x16(%rcx),%rbx
0x0000000109901b6b: mov (%r14),%rcx
0x0000000109901b6e: shl $0x2,%rcx
0x0000000109901b72: sar $0x3,%rcx
0x0000000109901b76: lea 0xd(%rbx),%rdx
0x0000000109901b7a: lea (%rcx,%rdx,1),%rbx
0x0000000109901b7e: mov (%rbx),%edx
0x0000000109901b80: shl $0x3,%rdx
0x0000000109901b84: mov %rdx,%rbx
0x0000000109901b87: sar $0xe,%rbx
0x0000000109901b8b: shl $0x3,%rbx
0x0000000109901b8f: xor %rbx,%rdx
0x0000000109901b92: mov %rdx,%rbx
0x0000000109901b95: shl $0x7,%rbx
0x0000000109901b99: mov $0x4e962b400,%rcx
0x0000000109901ba3: and %rcx,%rbx
0x0000000109901ba6: xor %rbx,%rdx
0x0000000109901ba9: mov %rdx,%rbx
0x0000000109901bac: shl $0xf,%rbx
0x0000000109901bb0: mov $0x77e300000,%rcx
0x0000000109901bba: and %rcx,%rbx
0x0000000109901bbd: xor %rbx,%rdx
0x0000000109901bc0: mov %rdx,%rbx
0x0000000109901bc3: sar $0x15,%rbx
0x0000000109901bc7: shl $0x3,%rbx
0x0000000109901bcb: xor %rbx,%rdx
0x0000000109901bce: mov %rdx,(%r14)
0x0000000109901bd1: mov 0x16(%rax),%rdx
0x0000000109901bd5: lea 0x8(%rdx),%rbx
0x0000000109901bd9: mov %rbx,0x16(%rax)
0x0000000109901bdd: retq

That's pretty tight code for a dynamic language.

Implementation of specialized arrays


So there are 13 C types here: char uchar short ushort int uint long ulong longlong ulonglong float double void*. For each C type, we have four vocabularies: specialized-arrays.type, specialized-vectors.type, specialized-arrays.direct.type, and io.mmap.type. Implementing all of this by hand would be a pain: even with the best abstractions in Factor's object system, at a minimum there would be 4 or 5 lines of code per type and per feature, and ideally you'd want to clone some methods once per implementation instead of abstracting everything out through dynamic dispatch, for performance reasons. So instead, I did what any self-respecting programmer would do in this situation: I made the computer do all the hard work. Factor lacked the abstraction necessary to express what I wanted, but Factor is extensible, so I just implemented the abstraction -- as a library. The feature we want here is essentially C++ templates. So yes, there is now a functors vocabulary which implements what amounts to C++ templates... in Factor. I called them "functors", even though they're more like templates than ML functors, because the term "template" has meaning in Factor's web framework and I didn't want to create confusion. I don't expect functors will see widespread use and I certainly don't want to promote this as a central feature of Factor. It is just another library that happens to extend the parser -- like many other Factor libraries do.

Let's look at how functors are used. For example, here is the entire text of the specialized-arrays.direct.int vocabulary:
USING: specialized-arrays.int specialized-arrays.direct.functor ;
IN: specialized-arrays.direct.int

<< "int" define-direct-array >>

The key of course is that its invoking the define-direct-array word, at parse time (hence the << and >>). This word is defined in the specialized-arrays.direct.functor vocabulary, and its definition looks quite unusual:
FUNCTOR: define-direct-array ( T -- )

A' IS ${T}-array
>A' IS >${T}-array
<A'> IS <${A'}>

A DEFINES direct-${T}-array
<A> DEFINES <${A}>

NTH [ T dup c-getter array-accessor ]
SET-NTH [ T dup c-setter array-accessor ]

WHERE

TUPLE: A
{ underlying alien read-only }
{ length fixnum read-only } ;

: <A> ( alien len -- direct-array ) A boa ; inline
M: A length length>> ;
M: A nth-unsafe underlying>> NTH call ;
M: A set-nth-unsafe underlying>> SET-NTH call ;
M: A like drop dup A instance? [ >A' execute ] unless ;
M: A new-sequence drop <A'> execute ;

INSTANCE: A sequence

;FUNCTOR
This FUNCTOR: ... IS / DEFINES ... WHERE ... ;FUNCTOR business is not new syntax in the language; its just parsing words defined in the functors vocabulary. Essentially we are defining a word, define-direct-array, which generates code, parameterized by a single input T, which is a C type name in this case. Note that what's going on here is more elaborate than text substitution; the functor body is parsed once, and converted into code which instantiates it, rather than simply expanding at usages like C preprocessor macros do.

Functors may seem like an elaborate new piece of functionality, comparable in complexity to the object system or locals, but in reality, the emperor has no clothes; they have a trivial desugaring.

First, inside a functor body, the TUPLE:, :, M: and INSTANCE: are shadowed to special words in the functors vocabulary named `TUPLE:, `:, and so on.

Next, IS and DEFINES desugar as follows.
A'      IS ${T}-array
expands into
A' [ [ I[ ${T}-array]I ] with-string-writer dup search [ ] [ no-word ] ?if ]
and
<A>     DEFINES <${A}>
expands into
<A> [ [ I[ <${A}>]I ] with-string-writer create-in ]

The I[ syntax is just the interpolate library, which turns this,
I[ Hello, ${name}.]I
into this,
"Hello, " write name write "." write
Now,
FUNCTOR: foo ( inputs -- )

...names...

WHERE

...body...

;FUNCTOR
is really just syntax sugar for
:: foo ( inputs -- )
[let* | ...names... | ...body... ] ;

With the added caveat that inside the body, `: is remapped to : and so on.

As you can see, a seemingly powerful feature is melting away as it becomse apparent its only some rather trivial syntax sugar. Words such as `: desugar like so;
`: foo 2 2 + . ;
becomes
foo [ 2 2 + . ] define
Let's look at a functor and its desugaring. Here is a simple functor, it takes a tuple class name as input, and defines a constructor word <foo> which makes a new instance of this tuple class using new. So it is like the built-in parsing word C:, except it uses new instead of boa:
FUNCTOR: make-constructor ( name -- )

tuple IS ${name}
<tuple> DEFINES <${name}>

WHERE

: <tuple> tuple new ;

;FUNCTOR

Given the above, this functor is really equivalent to the following:
:: make-constructor ( name -- )
[let* | tuple [ name dup search [ ] [ no-word ] ?if ]
<tuple> [ [ "<" write name write ">" write ] with-string-writer ] |
<tuple> [ tuple new ] define
] ;

We could keep going -- the :: word, which defines a word with named parameters, is it self a library word, in the locals vocabulary, which desugars down to stack code.

In this simple example, the hand-coded version is shorter than the functor, but the point of using a functor is that you can make the meta-code look like the code that it is generating. For the more complex functors that generate the various forms of specialized arrays, this is invaluable because it makes the code easier to maintain and debug.

Finally, the stuff described in this post, especially the implementation of functors, is some pretty advanced stuff. I don't expect most people using Factor to know or care about how this works; after all, you can use specialized arrays without "peeking under the hood", so to speak. However, all of this meta-programming power is there when you are ready for it.

Monday, December 01, 2008

Factor now fully supported on 64-bit Windows

The 64-bit Windows porting effort is complete. All vocabularies load and all tests pass (except PostgreSQL doesn't have 64-bit client libraries). A binary package is available. This brings the total number of supported platforms to 13. Enjoy!