Specialized arrays
For each primitive C type, there is a vocabulary named
specialized-arrays.type
, which defines a class named type-array
, and several related words: a constructor, and a conversion word for converting existing sequences to specialized arrays of this type. As with all sequence types, the class implements methods for length
, nth
and set-nth
; there is also a literal syntax for parsing and printing specialized arrays. An example:USE: specialized-arrays.int
int-array{ 584 -13 11 } [ 10 + ] map
Specialized vectors
There are also resizeable versions of all of these, in
specialized-vectors.type
. These work like ordinary vectors; you can use words such as push
and pop
, accumulating a stream of values in a vector.Direct arrays
A third variant is also provided. These are the "direct specialized arrays". Sometimes when working with a C library, the library hands you a pointer to some data which is to be viewed as an array of primitive values. Direct arrays accomplish this. For every primitive C type, the
specialized-arrays.direct.type
vocabulary provides a word named <direct-type-array>
which takes an alien pointer and a length. It wraps them in a tuple which responds to the sequence protocol by reading from memory.Memory-mapped files
Direct arrays are useful in another context, and that is when working with memory-mapped files. Again, for every primitive C type, there is a vocabulary,
io.mmap.type
. It defines two words, <mapped-type-array>
and with-mapped-type-file
. Here is an example usage of the latter: we're mapping in a file of floating point numbers and adding them up:[ 0.0 [ + ] reduce ] with-mapped-float-fileWe've had
io.mmap
for a while, but specialized arrays make this library a lot more useful.This is all really powerful stuff, because on one hand, we have these high-level data types that behave like any other Factor sequence and can be used interchangeably with sequence utility words, but on the other hand, not only do they offer a space savings over heterogeneous arrays by avoiding boxing, they also have the potential to improve performance -- the low-level primitives they're built on are already optimized by the compiler, and it is possible to write code which operates on specialized float and integer arrays without any boxing at all.
Mersenne Twister
A good example of an algorithm which benefits from specialized arrays is the Mersenne Twister random number generator. The Factor implementation is very straightforward, and shorter than the C version. I changed it to use a
uint-array
instead of a normal array
to store the random generator state. After making a few more tweaks, the time taken to generate a 100,000,000-element byte array of random bytes on my Mac Book Pro went down to 5 seconds, from 37 seconds!Here is the main method of the Mersenne Twister,
random-32*
, and its surrounding utility words:: mt-n 624 ; inline
: mt-m 397 ; inline
: mt-a HEX: 9908b0df ; inline
: mersenne-wrap ( n -- n' )
dup mt-n > [ mt-n - ] when ; inline
: wrap-nth ( n seq -- obj )
[ mersenne-wrap ] dip nth-unsafe ; inline
: set-wrap-nth ( obj n seq -- )
[ mersenne-wrap ] dip set-nth-unsafe ; inline
: calculate-y ( n seq -- y )
[ wrap-nth 31 mask-bit ]
[ [ 1+ ] [ wrap-nth ] bi* 31 bits ] 2bi bitor ; inline
: (mt-generate) ( n seq -- next-mt )
[
calculate-y
[ 2/ ] [ odd? mt-a 0 ? ] bi bitxor
] [
[ mt-m + ] [ wrap-nth ] bi*
] 2bi bitxor ; inline
: mt-generate ( mt -- )
[
mt-n swap seq>> '[
_ [ (mt-generate) ] [ set-wrap-nth ] 2bi
] each
] [ 0 >>i drop ] bi ; inline
: mt-temper ( y -- yt )
dup -11 shift bitxor
dup 7 shift HEX: 9d2c5680 bitand bitxor
dup 15 shift HEX: efc60000 bitand bitxor
dup -18 shift bitxor ; inline
: next-index ( mt -- i )
dup i>> dup mt-n < [ nip ] [ drop mt-generate 0 ] if ; inline
M: mersenne-twister random-32* ( mt -- r )
[ next-index ]
[ seq>> wrap-nth mt-temper ]
[ [ 1+ ] change-i drop ] tri ;
A naive implementation of Factor would struggle with this code because of method dispatch overhead (all arithmetic and sequence access is generic), and because of overflow checks on integer arithmetic (integers have arbitrary precision in Factor). Thankfully, the Factor implementation is not naive. The high-level optimizer, applied to the above code, produces the following output -- note that the translation from high-level IR back to Factor code is imprecise, and some IR nodes cannot be expressed, hence the
"COMPLEX SHUFFLE"
strings that appear here:[
dup >r dup 3 slot dup 624 fixnum<
[ nip ] [
drop dup >r 624 swap 2 slot
>r r> >r r> 0 -rot \ ( gensym ) [
pick pick fixnum< [
swap >r 2dup "COMPLEX SHUFFLE" >r r> 2dup
"COMPLEX SHUFFLE" 2dup "COMPLEX SHUFFLE" 2dup
"COMPLEX SHUFFLE" >r r> 3 slot
swap 4 fixnum*fast
alien-unsigned-4 2147483648 fixnum-bitand
"COMPLEX SHUFFLE" >r 1 fixnum+fast
r> >r r> 3 slot swap 4 fixnum*fast
alien-unsigned-4 2147483647 fixnum-bitand
fixnum-bitor dup >r -1 fixnum-shift-fast
r> 1 fixnum-bitand 1 eq? 2567483615 0 rot
[ drop ] [ nip ] if fixnum-bitxor
"COMPLEX SHUFFLE" >r 397 fixnum+fast
r> >r dup 624 fixnum>
[ 624 fixnum-fast ] [ ] if r> 3 slot
swap 4 fixnum*fast
alien-unsigned-4 fixnum-bitxor
"COMPLEX SHUFFLE" >r r> 3 slot
swap 4 fixnum*fast
set-alien-unsigned-4 "COMPLEX SHUFFLE" r> swap
"COMPLEX SHUFFLE" 1 fixnum+fast
"COMPLEX SHUFFLE" ( gensym )
] [ 3drop ] if
] label r> >r 0 r> 3 set-slot 0
] if r> dup >r 2 slot >r r> 3 slot swap 4 fixnum*fast
alien-unsigned-4 dup -11 fixnum-shift-fast fixnum-bitxor
dup 7 fixnum-shift-fast 2636928640 fixnum-bitand
fixnum-bitxor dup 15 fixnum-shift-fast
4022730752 fixnum-bitand fixnum-bitxor
dup -18 fixnum-shift-fast fixnum-bitxor r> dup >r 3 slot
1 fixnum+fast r> swap swap >r r> 3 set-slot
]
Only primitive calls remain; everything else has evaporated after 11 distinct optimization passes. A lot of redundant stack shuffling still remains; the low-level optimizer eliminates that and produces a sequence of low-level IR instructions:
=== word: mersenne-twister=>random-32*, label: mersenne-twister=>random-32*
_label 0
##prologue
_label 1
##inc-r 1
##inc-d 1
##peek V int-regs 1368412 D 1
##replace V int-regs 1368412 R 0
##slot-imm V int-regs 1368417 V int-regs 1368412 3 2
##replace V int-regs 1368417 D 0
_compare-imm-branch 3 V int-regs 1368417 4992 cc>=
_label 2
##inc-d -1
##peek V int-regs 1368425 D -1
##replace V int-regs 1368425 D 0
_branch 13
_label 3
##inc-r 1
##inc-d 1
##peek V int-regs 1368427 D 2
##replace V int-regs 1368427 R 0
##load-immediate V int-regs 1368429 4992
##slot-imm V int-regs 1368434 V int-regs 1368427 2 2
##load-immediate V int-regs 1368439 0
##replace V int-regs 1368434 D 0
##replace V int-regs 1368429 D 1
##replace V int-regs 1368439 D 2
##loop-entry
_label 4
##peek V int-regs 1368443 D 2
_compare-imm-branch 11 V int-regs 1368443 4992 cc>=
_label 5
##inc-r 7
##peek V int-regs 1368453 D 1
##peek V int-regs 1368454 D 0
##replace V int-regs 1368453 R 6
##peek V int-regs 1368456 D 2
##replace V int-regs 1368454 R 4
##replace V int-regs 1368456 R 5
##replace V int-regs 1368454 R 2
##replace V int-regs 1368456 R 3
##replace V int-regs 1368454 R 0
##replace V int-regs 1368456 R 1
##slot-imm V int-regs 1368478 V int-regs 1368454 3 2
##copy V int-regs 1368483 V int-regs 1368456
##shl-imm V int-regs 1368483 V int-regs 1368483 2
##copy V int-regs 1368486 V int-regs 1368483
##sar-imm V int-regs 1368486 V int-regs 1368486 3
##add-imm V int-regs 1368487 V int-regs 1368478 13
##add V int-regs 1368489 V int-regs 1368486 V int-regs 1368487
##alien-unsigned-4 V int-regs 1368490 V int-regs 1368489
##copy V int-regs 1368491 V int-regs 1368490
##shl-imm V int-regs 1368491 V int-regs 1368491 3
##load-immediate V int-regs 1368492 17179869184
##copy V int-regs 1368495 V int-regs 1368491
##and V int-regs 1368495 V int-regs 1368495 V int-regs 1368492
##add-imm V int-regs 1368501 V int-regs 1368456 8
##copy V int-regs 1368512 V int-regs 1368501
##shl-imm V int-regs 1368512 V int-regs 1368512 2
##copy V int-regs 1368515 V int-regs 1368512
##sar-imm V int-regs 1368515 V int-regs 1368515 3
##add V int-regs 1368518 V int-regs 1368515 V int-regs 1368487
##alien-unsigned-4 V int-regs 1368519 V int-regs 1368518
##copy V int-regs 1368520 V int-regs 1368519
##shl-imm V int-regs 1368520 V int-regs 1368520 3
##load-immediate V int-regs 1368521 17179869176
##copy V int-regs 1368524 V int-regs 1368520
##and V int-regs 1368524 V int-regs 1368524 V int-regs 1368521
##copy V int-regs 1368527 V int-regs 1368495
##or V int-regs 1368527 V int-regs 1368527 V int-regs 1368524
##copy V int-regs 1368532 V int-regs 1368527
##sar-imm V int-regs 1368532 V int-regs 1368532 4
##copy V int-regs 1368533 V int-regs 1368532
##shl-imm V int-regs 1368533 V int-regs 1368533 3
##replace V int-regs 1368533 D 2
##copy V int-regs 1368537 V int-regs 1368527
##and-imm V int-regs 1368537 V int-regs 1368537 8
##load-immediate V int-regs 1368542 20539868920
##load-immediate V int-regs 1368543 0
##replace V int-regs 1368543 D 0
##replace V int-regs 1368542 D 1
_compare-imm-branch 7 V int-regs 1368537 8 cc/=
_label 6
##inc-d -1
_branch 8
_label 7
##inc-d -1
##peek V int-regs 1368550 D -1
##replace V int-regs 1368550 D 0
_label 8
##inc-r -1
##peek V int-regs 1368551 D 1
##peek V int-regs 1368552 D 0
##copy V int-regs 1368553 V int-regs 1368551
##xor V int-regs 1368553 V int-regs 1368553 V int-regs 1368552
##replace V int-regs 1368553 D 1
##peek V int-regs 1368554 R 0
##peek V int-regs 1368555 R -1
##add-imm V int-regs 1368559 V int-regs 1368554 3176
##replace V int-regs 1368555 R 0
##replace V int-regs 1368559 D 0
_compare-imm-branch 10 V int-regs 1368559 4992 cc<=
_label 9
##peek V int-regs 1368569 D 0
##sub-imm V int-regs 1368570 V int-regs 1368569 4992
##replace V int-regs 1368570 D 0
_label 10
##inc-r -6
##inc-d 1
##peek V int-regs 1368571 R -6
##slot-imm V int-regs 1368574 V int-regs 1368571 3 2
##peek V int-regs 1368575 D 1
##copy V int-regs 1368579 V int-regs 1368575
##shl-imm V int-regs 1368579 V int-regs 1368579 2
##copy V int-regs 1368582 V int-regs 1368579
##sar-imm V int-regs 1368582 V int-regs 1368582 3
##add-imm V int-regs 1368583 V int-regs 1368574 13
##add V int-regs 1368585 V int-regs 1368582 V int-regs 1368583
##alien-unsigned-4 V int-regs 1368586 V int-regs 1368585
##copy V int-regs 1368587 V int-regs 1368586
##shl-imm V int-regs 1368587 V int-regs 1368587 3
##peek V int-regs 1368588 D 2
##copy V int-regs 1368590 V int-regs 1368588
##xor V int-regs 1368590 V int-regs 1368590 V int-regs 1368587
##peek V int-regs 1368591 R -4
##peek V int-regs 1368592 R -5
##slot-imm V int-regs 1368597 V int-regs 1368592 3 2
##copy V int-regs 1368602 V int-regs 1368591
##shl-imm V int-regs 1368602 V int-regs 1368602 2
##copy V int-regs 1368605 V int-regs 1368602
##sar-imm V int-regs 1368605 V int-regs 1368605 3
##add-imm V int-regs 1368606 V int-regs 1368597 13
##add V int-regs 1368608 V int-regs 1368605 V int-regs 1368606
##copy V int-regs 1368610 V int-regs 1368590
##sar-imm V int-regs 1368610 V int-regs 1368610 3
##set-alien-integer-4 V int-regs 1368608 V int-regs 1368610
##peek V int-regs 1368611 R -2
##peek V int-regs 1368612 R -3
##peek V int-regs 1368613 R -1
##add-imm V int-regs 1368620 V int-regs 1368611 8
##replace V int-regs 1368620 D 2
##replace V int-regs 1368612 D 0
##replace V int-regs 1368613 D 1
_branch 4
_label 11
##inc-d -3
_label 12
##inc-r -1
##inc-d 1
##peek V int-regs 1368626 R -1
##load-immediate V int-regs 1368628 0
##set-slot-imm V int-regs 1368628 V int-regs 1368626 3 2
##replace V int-regs 1368628 D 0
_label 13
##inc-r -1
##peek V int-regs 1368634 R -1
##slot-imm V int-regs 1368639 V int-regs 1368634 2 2
##slot-imm V int-regs 1368644 V int-regs 1368639 3 2
##peek V int-regs 1368645 D 0
##copy V int-regs 1368649 V int-regs 1368645
##shl-imm V int-regs 1368649 V int-regs 1368649 2
##copy V int-regs 1368652 V int-regs 1368649
##sar-imm V int-regs 1368652 V int-regs 1368652 3
##add-imm V int-regs 1368653 V int-regs 1368644 13
##add V int-regs 1368655 V int-regs 1368652 V int-regs 1368653
##alien-unsigned-4 V int-regs 1368656 V int-regs 1368655
##copy V int-regs 1368657 V int-regs 1368656
##shl-imm V int-regs 1368657 V int-regs 1368657 3
##copy V int-regs 1368661 V int-regs 1368657
##sar-imm V int-regs 1368661 V int-regs 1368661 14
##copy V int-regs 1368662 V int-regs 1368661
##shl-imm V int-regs 1368662 V int-regs 1368662 3
##copy V int-regs 1368665 V int-regs 1368657
##xor V int-regs 1368665 V int-regs 1368665 V int-regs 1368662
##copy V int-regs 1368669 V int-regs 1368665
##shl-imm V int-regs 1368669 V int-regs 1368669 7
##load-immediate V int-regs 1368670 21095429120
##copy V int-regs 1368673 V int-regs 1368669
##and V int-regs 1368673 V int-regs 1368673 V int-regs 1368670
##copy V int-regs 1368676 V int-regs 1368665
##xor V int-regs 1368676 V int-regs 1368676 V int-regs 1368673
##copy V int-regs 1368680 V int-regs 1368676
##shl-imm V int-regs 1368680 V int-regs 1368680 15
##load-immediate V int-regs 1368681 32181846016
##copy V int-regs 1368684 V int-regs 1368680
##and V int-regs 1368684 V int-regs 1368684 V int-regs 1368681
##copy V int-regs 1368687 V int-regs 1368676
##xor V int-regs 1368687 V int-regs 1368687 V int-regs 1368684
##copy V int-regs 1368691 V int-regs 1368687
##sar-imm V int-regs 1368691 V int-regs 1368691 21
##copy V int-regs 1368692 V int-regs 1368691
##shl-imm V int-regs 1368692 V int-regs 1368692 3
##copy V int-regs 1368695 V int-regs 1368687
##xor V int-regs 1368695 V int-regs 1368695 V int-regs 1368692
##replace V int-regs 1368695 D 0
##slot-imm V int-regs 1368701 V int-regs 1368634 3 2
##add-imm V int-regs 1368704 V int-regs 1368701 8
##set-slot-imm V int-regs 1368704 V int-regs 1368634 3 2
##epilogue
##return
Finally, after register allocation, these instructions map pretty much directly onto machine code:
0x0000000109901940: add $0x8,%r15
0x0000000109901944: add $0x8,%r14
0x0000000109901948: mov -0x8(%r14),%rax
0x000000010990194c: mov %rax,(%r15)
0x000000010990194f: mov 0x16(%rax),%rcx
0x0000000109901953: mov %rcx,(%r14)
0x0000000109901956: cmp $0x1380,%rcx
0x000000010990195d: jge 0x109901973
0x0000000109901963: sub $0x8,%r14
0x0000000109901967: mov 0x8(%r14),%rcx
0x000000010990196b: mov %rcx,(%r14)
0x000000010990196e: jmpq 0x109901b5b
0x0000000109901973: add $0x8,%r15
0x0000000109901977: add $0x8,%r14
0x000000010990197b: mov -0x10(%r14),%rcx
0x000000010990197f: mov %rcx,(%r15)
0x0000000109901982: mov $0x1380,%rax
0x000000010990198c: mov 0xe(%rcx),%rdx
0x0000000109901990: mov $0x0,%rcx
0x000000010990199a: mov %rdx,(%r14)
0x000000010990199d: mov %rax,-0x8(%r14)
0x00000001099019a1: mov %rcx,-0x10(%r14)
0x00000001099019a5: nop
0x00000001099019a6: nop
0x00000001099019a7: nop
0x00000001099019a8: nop
0x00000001099019a9: nop
0x00000001099019aa: nop
0x00000001099019ab: nop
0x00000001099019ac: nop
0x00000001099019ad: nop
0x00000001099019ae: nop
0x00000001099019af: nop
0x00000001099019b0: mov -0x10(%r14),%rcx
0x00000001099019b4: cmp $0x1380,%rcx
0x00000001099019bb: jge 0x109901b3a
0x00000001099019c1: add $0x38,%r15
0x00000001099019c5: mov -0x8(%r14),%rcx
0x00000001099019c9: mov (%r14),%rdx
0x00000001099019cc: mov %rcx,-0x30(%r15)
0x00000001099019d0: mov -0x10(%r14),%rcx
0x00000001099019d4: mov %rdx,-0x20(%r15)
0x00000001099019d8: mov %rcx,-0x28(%r15)
0x00000001099019dc: mov %rdx,-0x10(%r15)
0x00000001099019e0: mov %rcx,-0x18(%r15)
0x00000001099019e4: mov %rdx,(%r15)
0x00000001099019e7: mov %rcx,-0x8(%r15)
0x00000001099019eb: mov 0x16(%rdx),%rax
0x00000001099019ef: mov %rcx,%rdx
0x00000001099019f2: shl $0x2,%rdx
0x00000001099019f6: sar $0x3,%rdx
0x00000001099019fa: lea 0xd(%rax),%rbx
0x00000001099019fe: lea (%rdx,%rbx,1),%rax
0x0000000109901a02: mov (%rax),%edx
0x0000000109901a04: shl $0x3,%rdx
0x0000000109901a08: mov $0x400000000,%rax
0x0000000109901a12: and %rax,%rdx
0x0000000109901a15: lea 0x8(%rcx),%rax
0x0000000109901a19: shl $0x2,%rax
0x0000000109901a1d: sar $0x3,%rax
0x0000000109901a21: lea (%rax,%rbx,1),%rcx
0x0000000109901a25: mov (%rcx),%eax
0x0000000109901a27: shl $0x3,%rax
0x0000000109901a2b: mov $0x3fffffff8,%rcx
0x0000000109901a35: and %rcx,%rax
0x0000000109901a38: or %rax,%rdx
0x0000000109901a3b: mov %rdx,%rax
0x0000000109901a3e: sar $0x4,%rax
0x0000000109901a42: shl $0x3,%rax
0x0000000109901a46: mov %rax,-0x10(%r14)
0x0000000109901a4a: and $0x8,%rdx
0x0000000109901a4e: mov $0x4c84586f8,%rax
0x0000000109901a58: mov $0x0,%rcx
0x0000000109901a62: mov %rcx,(%r14)
0x0000000109901a65: mov %rax,-0x8(%r14)
0x0000000109901a69: cmp $0x8,%rdx
0x0000000109901a6d: jne 0x109901a7c
0x0000000109901a73: sub $0x8,%r14
0x0000000109901a77: jmpq 0x109901a87
0x0000000109901a7c: sub $0x8,%r14
0x0000000109901a80: mov 0x8(%r14),%rcx
0x0000000109901a84: mov %rcx,(%r14)
0x0000000109901a87: sub $0x8,%r15
0x0000000109901a8b: mov -0x8(%r14),%rcx
0x0000000109901a8f: mov (%r14),%rax
0x0000000109901a92: xor %rax,%rcx
0x0000000109901a95: mov %rcx,-0x8(%r14)
0x0000000109901a99: mov (%r15),%rcx
0x0000000109901a9c: mov 0x8(%r15),%rax
0x0000000109901aa0: lea 0xc68(%rcx),%rdx
0x0000000109901aa7: mov %rax,(%r15)
0x0000000109901aaa: mov %rdx,(%r14)
0x0000000109901aad: cmp $0x1380,%rdx
0x0000000109901ab4: jle 0x109901ac7
0x0000000109901aba: mov (%r14),%rdx
0x0000000109901abd: lea -0x1380(%rdx),%rax
0x0000000109901ac4: mov %rax,(%r14)
0x0000000109901ac7: sub $0x30,%r15
0x0000000109901acb: add $0x8,%r14
0x0000000109901acf: mov 0x30(%r15),%rax
0x0000000109901ad3: mov 0x16(%rax),%rdx
0x0000000109901ad7: mov -0x8(%r14),%rax
0x0000000109901adb: shl $0x2,%rax
0x0000000109901adf: sar $0x3,%rax
0x0000000109901ae3: lea 0xd(%rdx),%rcx
0x0000000109901ae7: lea (%rax,%rcx,1),%rdx
0x0000000109901aeb: mov (%rdx),%ecx
0x0000000109901aed: shl $0x3,%rcx
0x0000000109901af1: mov -0x10(%r14),%rdx
0x0000000109901af5: xor %rcx,%rdx
0x0000000109901af8: mov 0x20(%r15),%rcx
0x0000000109901afc: mov 0x28(%r15),%rax
0x0000000109901b00: mov 0x16(%rax),%rbx
0x0000000109901b04: shl $0x2,%rcx
0x0000000109901b08: sar $0x3,%rcx
0x0000000109901b0c: lea 0xd(%rbx),%rax
0x0000000109901b10: lea (%rcx,%rax,1),%rbx
0x0000000109901b14: sar $0x3,%rdx
0x0000000109901b18: mov %edx,(%rbx)
0x0000000109901b1a: mov 0x10(%r15),%rdx
0x0000000109901b1e: mov 0x18(%r15),%rbx
0x0000000109901b22: mov 0x8(%r15),%rax
0x0000000109901b26: lea 0x8(%rdx),%rcx
0x0000000109901b2a: mov %rcx,-0x10(%r14)
0x0000000109901b2e: mov %rbx,(%r14)
0x0000000109901b31: mov %rax,-0x8(%r14)
0x0000000109901b35: jmpq 0x1099019b0
0x0000000109901b3a: sub $0x18,%r14
0x0000000109901b3e: sub $0x8,%r15
0x0000000109901b42: add $0x8,%r14
0x0000000109901b46: mov 0x8(%r15),%rcx
0x0000000109901b4a: mov $0x0,%rax
0x0000000109901b54: mov %rax,0x16(%rcx)
0x0000000109901b58: mov %rax,(%r14)
0x0000000109901b5b: sub $0x8,%r15
0x0000000109901b5f: mov 0x8(%r15),%rax
0x0000000109901b63: mov 0xe(%rax),%rcx
0x0000000109901b67: mov 0x16(%rcx),%rbx
0x0000000109901b6b: mov (%r14),%rcx
0x0000000109901b6e: shl $0x2,%rcx
0x0000000109901b72: sar $0x3,%rcx
0x0000000109901b76: lea 0xd(%rbx),%rdx
0x0000000109901b7a: lea (%rcx,%rdx,1),%rbx
0x0000000109901b7e: mov (%rbx),%edx
0x0000000109901b80: shl $0x3,%rdx
0x0000000109901b84: mov %rdx,%rbx
0x0000000109901b87: sar $0xe,%rbx
0x0000000109901b8b: shl $0x3,%rbx
0x0000000109901b8f: xor %rbx,%rdx
0x0000000109901b92: mov %rdx,%rbx
0x0000000109901b95: shl $0x7,%rbx
0x0000000109901b99: mov $0x4e962b400,%rcx
0x0000000109901ba3: and %rcx,%rbx
0x0000000109901ba6: xor %rbx,%rdx
0x0000000109901ba9: mov %rdx,%rbx
0x0000000109901bac: shl $0xf,%rbx
0x0000000109901bb0: mov $0x77e300000,%rcx
0x0000000109901bba: and %rcx,%rbx
0x0000000109901bbd: xor %rbx,%rdx
0x0000000109901bc0: mov %rdx,%rbx
0x0000000109901bc3: sar $0x15,%rbx
0x0000000109901bc7: shl $0x3,%rbx
0x0000000109901bcb: xor %rbx,%rdx
0x0000000109901bce: mov %rdx,(%r14)
0x0000000109901bd1: mov 0x16(%rax),%rdx
0x0000000109901bd5: lea 0x8(%rdx),%rbx
0x0000000109901bd9: mov %rbx,0x16(%rax)
0x0000000109901bdd: retq
That's pretty tight code for a dynamic language.
Implementation of specialized arrays
So there are 13 C types here:
char uchar short ushort int uint long ulong longlong ulonglong float double void*
. For each C type, we have four vocabularies: specialized-arrays.type
, specialized-vectors.type
, specialized-arrays.direct.type
, and io.mmap.type
. Implementing all of this by hand would be a pain: even with the best abstractions in Factor's object system, at a minimum there would be 4 or 5 lines of code per type and per feature, and ideally you'd want to clone some methods once per implementation instead of abstracting everything out through dynamic dispatch, for performance reasons. So instead, I did what any self-respecting programmer would do in this situation: I made the computer do all the hard work. Factor lacked the abstraction necessary to express what I wanted, but Factor is extensible, so I just implemented the abstraction -- as a library. The feature we want here is essentially C++ templates. So yes, there is now a functors
vocabulary which implements what amounts to C++ templates... in Factor. I called them "functors", even though they're more like templates than ML functors, because the term "template" has meaning in Factor's web framework and I didn't want to create confusion. I don't expect functors will see widespread use and I certainly don't want to promote this as a central feature of Factor. It is just another library that happens to extend the parser -- like many other Factor libraries do.Let's look at how functors are used. For example, here is the entire text of the
specialized-arrays.direct.int
vocabulary:USING: specialized-arrays.int specialized-arrays.direct.functor ;
IN: specialized-arrays.direct.int
<< "int" define-direct-array >>
The key of course is that its invoking the
define-direct-array
word, at parse time (hence the <<
and >>
). This word is defined in the specialized-arrays.direct.functor
vocabulary, and its definition looks quite unusual:FUNCTOR: define-direct-array ( T -- )This
A' IS ${T}-array
>A' IS >${T}-array
<A'> IS <${A'}>
A DEFINES direct-${T}-array
<A> DEFINES <${A}>
NTH [ T dup c-getter array-accessor ]
SET-NTH [ T dup c-setter array-accessor ]
WHERE
TUPLE: A
{ underlying alien read-only }
{ length fixnum read-only } ;
: <A> ( alien len -- direct-array ) A boa ; inline
M: A length length>> ;
M: A nth-unsafe underlying>> NTH call ;
M: A set-nth-unsafe underlying>> SET-NTH call ;
M: A like drop dup A instance? [ >A' execute ] unless ;
M: A new-sequence drop <A'> execute ;
INSTANCE: A sequence
;FUNCTOR
FUNCTOR: ... IS / DEFINES ... WHERE ... ;FUNCTOR
business is not new syntax in the language; its just parsing words defined in the functors
vocabulary. Essentially we are defining a word, define-direct-array
, which generates code, parameterized by a single input T
, which is a C type name in this case. Note that what's going on here is more elaborate than text substitution; the functor body is parsed once, and converted into code which instantiates it, rather than simply expanding at usages like C preprocessor macros do.Functors may seem like an elaborate new piece of functionality, comparable in complexity to the object system or locals, but in reality, the emperor has no clothes; they have a trivial desugaring.
First, inside a functor body, the
TUPLE:
, :
, M:
and INSTANCE:
are shadowed to special words in the functors
vocabulary named `TUPLE:
, `:
, and so on.Next,
IS
and DEFINES
desugar as follows.A' IS ${T}-arrayexpands into
A' [ [ I[ ${T}-array]I ] with-string-writer dup search [ ] [ no-word ] ?if ]and
<A> DEFINES <${A}>expands into
<A> [ [ I[ <${A}>]I ] with-string-writer create-in ]
The
I[
syntax is just the interpolate
library, which turns this,I[ Hello, ${name}.]Iinto this,
"Hello, " write name write "." writeNow,
FUNCTOR: foo ( inputs -- )is really just syntax sugar for
...names...
WHERE
...body...
;FUNCTOR
:: foo ( inputs -- )
[let* | ...names... | ...body... ] ;
With the added caveat that inside the body,
`:
is remapped to :
and so on.As you can see, a seemingly powerful feature is melting away as it becomse apparent its only some rather trivial syntax sugar. Words such as
`:
desugar like so;`: foo 2 2 + . ;becomes
foo [ 2 2 + . ] defineLet's look at a functor and its desugaring. Here is a simple functor, it takes a tuple class name as input, and defines a constructor word
<foo>
which makes a new instance of this tuple class using new
. So it is like the built-in parsing word C:
, except it uses new
instead of boa
:FUNCTOR: make-constructor ( name -- )
tuple IS ${name}
<tuple> DEFINES <${name}>
WHERE
: <tuple> tuple new ;
;FUNCTOR
Given the above, this functor is really equivalent to the following:
:: make-constructor ( name -- )
[let* | tuple [ name dup search [ ] [ no-word ] ?if ]
<tuple> [ [ "<" write name write ">" write ] with-string-writer ] |
<tuple> [ tuple new ] define
] ;
We could keep going -- the
::
word, which defines a word with named parameters, is it self a library word, in the locals
vocabulary, which desugars down to stack code.In this simple example, the hand-coded version is shorter than the functor, but the point of using a functor is that you can make the meta-code look like the code that it is generating. For the more complex functors that generate the various forms of specialized arrays, this is invaluable because it makes the code easier to maintain and debug.
Finally, the stuff described in this post, especially the implementation of functors, is some pretty advanced stuff. I don't expect most people using Factor to know or care about how this works; after all, you can use specialized arrays without "peeking under the hood", so to speak. However, all of this meta-programming power is there when you are ready for it.
1 comment:
On your PC how much time does it take the C version of the Mersenne Twister to produce the same array of random unsigned bytes? I think this is an important value to compare to.
Post a Comment