Monday, January 18, 2010

How Factor implements closures

The recent blog post from the Clozure CL team on Clozure CL's implementation of closures inspired me to do a similar writeup about Factor's implementation. It is often said that "closures are a poor man's objects", or "objects are a poor man's closures". Factor takes the former view, because as you will see they are largely implemented within the object system itself.


First, let us look at quotations, and what happens internally when a quotation is called. A quotation is a sequence of literals and words. Quotations do not close over any lexical environment; they are entirely self-contained, and their evaluation semantics only depend on their elements, not any state from the time they were constructed. So quotations are anonymous functions but not closures.

Internally, a quotation is a pair, consisting of an array and a machine code entry point. The array stores the quotation's elements, and when you print a quotation with the prettyprinter, this is how Factor knows what its elements are:
( scrachpad ) [ "out.txt" utf8 [ "Hi" print ] with-file-writer ] .
[ "out.txt" utf8 [ "Hi" print ] with-file-writer ]

The machine code entry point refers to a code block in the code heap containing the quotation's compiled definition.

The call generic word calls quotations. This word can take any callable, not just a quotation, and has a method for each type of callable. The callable class includes quotations, as well as curry and compose which are discussed below. This means that closure invocation is implemented on top of method dispatch in Factor.

For reasons that will become clear in the last section on compiler optimizations, most quotations never have their entry points called directly, and so it would be a waste of time to compile all quotations as they are read by the parser.

Instead, all freshly-parsed quotations have their entry points set to the lazy-jit-compile primitive from the kernel.private vocabulary.

The call generic word has a method on the quotation class. This method invokes the (call) primitive from the kernel.private vocabulary. The (call) primitive does not type check, since by the time it is called it is known that the input is in fact a quotation. This primitive has a very simple machine code definition:
mov eax,[esi]    ! Pop datastack
sub esi,4
jmp [eax+13] ! Jump to quotation's entry point
Note that the quotation is stored in the EAX register; this is important. Recall that initially, a quotation's entry point is set to the lazy-jit-compile word, and that all quotations initially share this entry point.

This word, which is not meant to be invoked directly, compiles quotations the first time they are called. Since all quotations share the same initial entry point, it needs to know which quotation invoked it. This is done by passing the quotation to this word in the EAX register. The lazy-jit-compile word compiles this quotation, sets its entry point to the new compiled code block, and then calls it again. On subsequent calls of the same quotation, the new compiled definition will be jumped to directly, and the lazy-jit-compile entry point is not involved.

If you're interested in the definition of lazy-jit-compile, search for it in these files:

The curry and compose words


The curry word is the fundamental constructor for making closures. It takes a value and a callable, and returns something that prints out as if it were a new quotation:
( scratchpad ) 5 [ + 2 * ] curry .
[ 5 + 2 * ]

This is not a quotation though, but rather an instance of the curry class. Instances of this class are pairs of elements: an object, and another callable.

Instances of curry are callable, because the call generic word has a method on curry. This method pushes the first element of the pair on the data stack, then recursively calls call on the second element.

Calls to curry can be chained together:
( scratchpad ) { "a" "b" "c" } 1 2 [ 3array ] curry curry map .
{ { "a" 1 2 } { "b" 1 2 } { "c" 1 2 } }

Note that using curry, many callables can be constructed which share the same compiled definition; only the data value differes.

Technically, curry is just an optimization -- it would be possible to simulate it by using sequence words to construct a new quotation with the desired value prepended, but this would be extremely inefficient. Prepending an element would take O(n) time, and furthermore, result in the new quotation being compiled the first time the result was called. Indeed, in some simpler concatenative languages such as Joy, quotations are just linked lists, and they execute in the interpreter, so partial application can be done by using the cons primitive for creating a new linked list.


The compose word takes two callables and returns a new instance of the compose class. Instances of this class are pairs of callables.
( scratchpad ) [ 3 + ] [ sqrt ] compose .
[ 3 + sqrt ]

As with curry, the call generic word has a method on the compose class. This method recursively applies call to both elements.

It is possible to express the effect of compose using curry and dip:
: my-compose ( quot1 quot2 -- compose )
[ [ call ] dip call ] curry curry ; inline

The main reason compose exists as a distinct type is to make the result prettyprint better. Were it defined as above, the result would not prettyprint in a nice way:
( scratchpad ) [ 3 + ] [ sqrt ] [ [ call ] dip call ] curry curry .
[ [ 3 + ] [ sqrt ] [ call ] dip call ]

The curry and compose constructors are sufficient to express all higher-level forms of closures.

An aside: compose and dip

It is possible to express compose using curry and dip. Conversely, it is also possible to express dip using compose and curry.
: my-dip ( a quot -- ) swap [ ] curry compose call ;

Here is an example of how this works. Suppose we have the following code:
123 321 [ number>string ] my-dip

Using the above definition of my-dip, we get
123 321 [ number>string ] swap [ ] curry compose call  ! substitute definition of 'my-dip'
123 [ number>string ] 321 [ ] curry compose call ! evaluate swap
123 [ number>string ] [ 321 ] compose call ! evaluate curry
123 [ number>string 321 ] call ! evaluate compose
123 number>string 321 ! evaluate call
"123" 321 ! evaluate number>string

Fry syntax

The fry syntax provides nicer syntax sugar for more complicated usages of curry and compose. Beginners learning Factor should start with fry syntax, and probably don't need to know about curry and compose at all; but this syntax desugars trivially into curry and compose, as explained in the documentation.

Lexical variables

Code written with the locals vocabulary can create closures by referencing lexical variables from nested quotations. For example, here is a word from the compiler which computes the breadth-first order on a control-flow graph:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list [
[ accum push ]
[ dom-children work-list push-all-front ] bi
] slurp-deque
accum ;

In the body of the word, three lexical variables are used; the input parameter cfg, and the two local bindings work-list and accum.

When the parser reads the above definition, it creates several quotations whose bodies reference local variables, for example, [ accum push ]. Before defining the new word, however, the :: parsing word rewrites this into concatenative code.

Suppose we were doing this rewrite by hand. The first step would be to make closed-over variables explicit. We can do this by currying the values of accum and work-list onto the two quotations passed to bi:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list [
accum [ push ] curry
work-list [ [ dom-children ] dip push-all-front ] curry bi
] slurp-deque
accum ;

And now, we can do the same transformation on the quotation passed to slurp-deque:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list accum work-list [
[ [ push ] curry ] dip
[ [ dom-children ] dip push-all-front ] curry bi
] curry curry slurp-deque
accum ;

Note how three usages of curry have appeared, and all lexical variable usage now only occurs at the same level where the variable is actually defined. The final rewrite into concatenative code is trivial, and involves some tricks with dup and dip.

A lexical closure closing over many variables will be rewritten into code which builds a long chain of curry instances, essentially a linked list. This is less efficient than closure representations used in Lisp implementations, where typically all closed-over values are stored in a single array. However in practice this is not usually a problem, because of optimizations outlined below.

Mutable local variables

Mutable local variables are denoted by suffixing their name with !. Here is an example of a loop that counts a mutable variable up to 100:
:: counted-loop-test ( -- )
0 :> i!
100 [ i 1 + i! ] times ;

Factor distinguishes between binding (done with :>) and assignment (done with foo! where foo is a local previously declared to be mutable). At a fundamental level, all bindings and closures are immutable; code using mutable locals is rewritten to close over a mutable heap-allocated box instead, and reads and writes involve an extra indirection. The previous example is roughly equivalent to the following, where we use an immutable local variable holding a mutable one-element array:
:: counted-loop-test ( -- )
{ 0 } clone :> i
100 [ i first 1 + i set-first ] times ;

For this reason, code that uses mutable locals does not optimize as well, and iterative loops that uses mutable local variables can run slower than tail-recursive functions which uses immutable local variables. This might be addressed in the future with improved compiler optimizations.


Quotation inlining

If after inlining, the compiler sees that call is applied to a literal quotation, it inlines the quotation's body at the call site. This optimization works very well when quotations are used as "downward closures", and this is why most quotations never have their dynamic entry point invoked at all.

Escape analysis

Since curry and compose are ordinary tuple classes, any time some code constructs instances of curry and compose, and immediately unpacks them, the compiler's escape analysis pass can eliminate the allocations entirely.

The escape analysis pass has no special knowledge of curry and compose; it applies an optimization intended for object-oriented code.

Again, this helps optimize code with "downward closures", with the result that most usages of curry and compose will never allocate any memory at run time.

For more details, see my blog post about Factor's escape analysis algorithm.


Mikhail Glushenkov said...

Shouldn't 'curry' be called 'partially_apply'? In Haskell, 'curry' does something quite different.

Slava Pestov said...

Haskell's curry is '[ curry ] curry' in Factor.