Tuesday, January 08, 2008

Compiler overhaul

I'm back from vacation, and I've overhauled how the Factor compiler recompiles changed definitions. The code is in my git repo but it is not entirely stable. Proceed with caution.

This post is extremely technical and touches on Factor internals. You certainly don't have to understand Factor's low-level workings to use it. But, if you understand the implementation, you will have a better idea of how to write great code -- just like with any other language.

First, some background and history.

The very first iterations of Factor were purely interpreted. The interpreter was a very simple loop, it would traverse quotations, pushing literals and executing words. Words would be executed by recursively invoking the interpreter on a word's "definition" slot, which itself was a quotation.

This is a "late binding" interpreter, because words could be redefined, which updated their definition slot, and successive invocations of that word would immediately pick up the definition.

The following code was valid in early Factor:
: foo 3 ;

: bar [ 2 ] \ foo set-word-def foo ;

The bar word would return 2 on the stack, because after it redefined foo, the new definition took effect immediately.

Pretty early on, Factor got a native (x86 only, at first) compiler. I decided that having all word calls be indirect in compiled code would be too inefficient, and would negate many of the benefits of having a compiler, so I went with an early binding design.

When a word was compiled, all words that it called were compiled first; and a call to a word was compiled as a direct jump to the address of that code's definition in the code heap.

This hindered interactive development; reloading a source file would replace definitions, but they'd only take effect for subsequently compiled words, not those words using these definitions which have already been compiled. Haskell's "ghci" works this way, and they consider it a feature; I consider it a bug.

I quickly realized that there was an easy way to fix this issue, at least temporarily. I added cross-referencing support (thus giving us the usage word that is so useful when refactoring code). When a word was redefined, all words that called that word -- either directly or indirectly -- had their compiled definitions discarded, and reverted back to interpreter operation.

This meant that the following still didn't work as expected; it would return 3:
: foo 3 ;

: bar [ 2 ] \ foo set-word-def foo ;

This is because while set-word-def would discard the compiled definition of bar, after returning, the activation frame on the call stack still ,pointed inside the old compiled definition of bar, and this old compiled definition of bar jumped to the old compiled definition of foo, which pushed 3 on the stack. However, subsequent invocations of bar would call the interpreted definition, which was up to date.

This approach was better, but it had an obvious flaw. While all words were compiled on bootstrap, if you did a lot of heavy development, especially on core code, more and more words would revert to interpreted mode over time. So sometimes, you'd have to manually call compile-all to recompile everything.

When generic words came along, there was a new complication. Suppose you loaded a source file which added methods on sequence generic words such as nth, length, or other generic words such as equal? which are frequently called. To ensure correct behavior, and that the new definitions were used, Factor would discard compiled definitions of all words which called nth (or whatever); and all words which called words which called nth, and so on; since the majority of the words that one would write can trace a call path down to sequence words, this meant that the majority of words in the system would magically lose their compiled definitions and subsequently would run in the interpreter. So at one point, a frequently reported "problem" was that loading some libraries, such as "contrib/math" at the time, would slow down factor tremendously. The "solution" we gave newbies was to call compile-all after loading libraries; compiling all words, of course, takes a while.

Some time around Factor 0.84, I came up with an improved solution. When a word was redefined, the compiler would find all words which depended on that word, but instead of discarding their compiled definition, it would just flag them for recompilation. At a later time, the recompile word would recompile all words flagged for recompilation. The parser and module system were designed in a clever way so that recompile was called after loading a source file or module.

This approach worked well for a long time, but it still had two issues: first, long compile pauses, and second, it was possible to observe out-of-date definitions, just like the original compiler that didn't care about preserving the illusion of late binding, and unlike the subsequent design which decompiled all dependents.

The first problem, long compile pauses, was an issue when loading new sequence implementations, new assoc implementations, classes which defined new equality and hashing methods, and so on. The majority of the words in the system would have to be recompiled, which is not something you want to subject the developer to, since the compiler is designed to trade off compile time for quality of generated code.

The second issue is more subtle, and is a consequence of the first. When loading a module containing multiple source files, the compiler did not recompile changed words after each file, since this might end up recompiling many words multiple times. Instead, it postponed recompilation until after the module was done loading.

However, this would lead to problems if you had a scenario such as the following: a module that consists of two source files, A and B. A defines a new sequence class, and B has a top-level form which creates an instance of this class, and calls, say, append on it. Since append (which depends on nth was not recompiled until after B was loaded, the top level form would fail with a no-method error, suggesting that nth did not have a method for your new sequence class when it clearly did. This was counter-intuitive.

In Factor 0.91, I removed the interpreter, and replaced it with a non-optimizing compiler. The non-optimizing compiler generated code which looked like an unrolled interpreter loop; it still called words through an indirect jump. I started referring to "the compiler" (the expensive one, written in Factor) as the "optimizing compiler". But for all intents and purposes, the recompilation system worked the same way as before.

So, that ends the history lesson. And now, you can forget about all of that. If you look at all these years of hacks and unsatisfactory workarounds, there was a constant underlying theme: if a word was redefined and recompiled, existing jumps to the word's old compiled definition were not updated. This was the fundamental underlying limitation.

Well, now, this limitation is gone. Instead of hacking and working around it, I decided to solve this issue properly, once and for all. It took longer than expected, and the overhaul touched many different parts of Factor, but there were only minimal incompatible changes (but they're important).

Number one: non-optimizing compiler is now early binding


First up, the non-optimizing compiler now compiles direct jumps to words. This improves performance by 33-50%. Since compiled word jumps can now be updated on the fly, there is no longer any reason for the non-optimizing compiler to generate indirect jumps.

Number two: word definition is no longer ad-hoc; formalized compilation units


Generally, you can no longer just redefine words on a whim. While it was unusual for typical Factor code to define new words at run time, we frequently write parsing words which define new words, using many of the same idioms as Forth programming.

If you try doing the following in the listener, you get an error:
( scratchpad ) DEFER: my-word
( scratchpad ) \ my-word [ "Hello world" print ] define
Defining a word outside of a compilation unit
no-compilation-unit-word my-word

However, the following works of course:
( scratchpad ) : my-word "Hello world" print ;

And anyone who has persued the source of : knows that it just reads ahead from the parse stream until ;, and calls define. So what's going on here? The parser wraps the parsing of your source file in a complation unit. A compilation unit is basically a dynamic scope where you can define new words, replace existing word definitions, etc. The kicker is, the new definitions don't take effect until the compilation unit returns. When it returns, all new and changed words are recompiled, and references to the existing changed words in the code heap are updated in one "atomic transaction".

So, ordinary code, 99% of which does not define new words at run time, does not have to worry about compilation units at all. Neither do parsing words; the parser handles it all for you. However, if you decide to define new words at run time -- occasionally, we have code that does this -- you need to wrap your defining inside a with-compilation-unit. Try to put as much as possible into one compilation unit, since compilation units are relatively expensive; eg, if you need to define 1000 words in a loop, put the loop inside a compilation unit, don't create a compilation unit for each word.

The documentation for with-compilation-unit gives details about how it operates.

Number three: words defined in a compilation unit cannot be called until after the compilation unit


Again, this is something ordinary code does not ever need to deal with. However, if you write parsing words, it is something to keep in mind.

Suppose you have this source file:
: hello "Hello world" print ;

hello 2 2 + .

This is a perfectly fine source file. The file is parsed -- inside a new compilation unit -- which creates a word definition hello, and a top-level quotation [ hello 2 2 + . ]. The parser ends the compilation unit, which compiles the word hello, and updates any references to it from the code heap (if there are any). The parser then calls the top level form, which outputs
Hello world
4

This works fine, because the compilation unit ends -- and the new word is compiled -- before the top level form is run.

However, consider the following example:
: hello scan reverse string>number parsed ; parsing

hello 123 .

In Factor 0.91, this would print 321. However, as of Factor 0.92, this is no longer a valid Factor program. Because the compilation unit does not end until after the file is parsed, the hello word has not been compiled yet at the time the parser sees it; and so we get an error:
A parsing word cannot be used in the same file it is defined in.
staging-violation-word POSTPONE: hello

I don't think this is a major limitation. There aren't many libraries where it even comes up. There are two ways to redesign code to get around it:
  • Put the parsing words in a separate vocabulary/module.
  • If you define a parsing word that is only called once, replace the parsing word with a parse-time evaluation. This idiom is used with alien bindings:
    : load-foo-library
    "foo" "libfoo.so" "cdecl" add-library ; parsing

    load-foo-library

    LIBRARY: foo
    FUNCTION: int blah ( float x ) ;
    ...

    The add-library call has to happen at parse time, before the compilation unit ends and compiles bindings, because compiling bindings looks up symbols in the library. However, now in Factor 0.92 the above code doesn't work because load-foo-library is being called before it is compiled. The workaround is to use parse-time evaluation, which results in clearer code because we're no longer defining a parsing word to only call once:
    << "foo" "libfoo.so" "cdecl" add-library >>

    LIBRARY: foo
    FUNCTION: int blah ( float x ) ;
    ...

    Forth fans might notice that Factor's << and >> is almost exactly like Forth's [ and ].

Number four: all words are now compound words


This is really just an implementation issue that most code won't come in contact with. The main change is that define-compound is now called define, and the old low-level define is gone.

Of course, we still have SYMBOL:, GENERIC:, and so on, and see correctly shows defining words. So what exactly changed? How do all these word types map to compound words?

Symbols are easy. The following two are behaviorally equivalent (and have been for all of eternity):
SYMBOL: foo
: foo \ foo ;

It is just that with the old late-binding interpreter and non-optimizing compilers, the former was implemented in a more efficient way in the Factor VM. With the early binding non-optimizing compiler, there is no longer any difference, the VM support is gone, and SYMBOL: just defines a compound word which pushes itself.

To detect whether a word is a symbol, we use a predicate class:
PREDICATE: word symbol
dup 1array swap word-def sequence= ;

Deferred words throw an error. The following are equivalent now:
DEFER: foo
: foo undefined-error ;

Primitives are a bit trickier. If you see a primitive, it looks just like it used to:
( scratchpad ) \ dup see
IN: kernel
PRIMITIVE: dup ( x -- x x )

But if you ask for the underlying definition with word-def,
( scratchpad ) \ dup word-def .
[ 70 do-primitive ]

If we look at do-primitive,
( scratchpad ) \ do-primitive see
USING: kernel ;
IN: kernel.private
: do-primitive ( number -- ) "Improper primitive call" throw ;

It just throws an error! So what's up? Well, do-primitive is not a word you're ever supposed to call. The non-optimizing compiler special cases it; it must be in tail position, and it must be preceded by a literal fixnum. In this case, instead of compiling a call to do-primitive, it compiles a direct jump to that primitive.

So the latter is somewhat of a hack, but having all word definitions be quotations has simplified so much code, and the only real complication is a minor special-case in the non-optimizing compiler; it already special-cases if and dispatch, and it is designed in a clean way, so the code is not any less maintainable.

So, that about sums it up. Factor now gives the illusion of late binding, and fully interactive development, while both compilers actually perform early binding and do their best to generate efficient code. Unlike before, this illusion is almost 100% accurate, and compile time has reduced too.

1 comment:

Anonymous said...

Slava,

Nice work. Keep it up.

Bruce Rennie
(God's Own Country Downunder)