Sunday, February 17, 2008

Some rudimentary control flow analysis

Factor's optimizer is now at the stage where further progress will require a re-design. I'm fine with that, since the current design has really lasted a lot longer than I expected -- the basic structure has remained the same since early 2004 when I started working on the native compiler, and in many ways it resembles the old Java Factor implementation's JVM compiler.

Over the last few days I picked off some of the remaining low-hanging fruit.

Hoisting code after conditionals where one branch throws an error


The first optimization concerns conditionals where one branch unconditionally throws an error. Consider the following code:
: foo [ A ] [ B throw ] if C ;

This is in fact equivalent to:
: foo [ A C ] [ B throw ] if ;

And this is precisely what the new optimization does.

Loop detection and loop tail hoisting


Another new optimization is something I like to call "loop detection". Consider a trivial Factor iteration expression:
: foo 100 [ bar ] times 3 baz ;

The times combinator is inlined:
: foo 100 [ bar ] [ drop ] swap compose each-integer 3 baz ;

The each-integer word is also inlined:
: foo 100 [ bar ] [ drop ] swap compose iterate-prep (each-integer) 3 baz ;

Now (each-integer) is also declared as inline, however it is recursive, so what this means in practice is that the compiler creates a new "generated symbol" containing a definition of (each-integer) which is specialized for that call site. Proceeding with our example, we get:
: G12342 ( i n quot -- )
[ iterate-step iterate-next G12342 ]
[ 3drop ] if-iterate? ; inline

: foo 100 [ bar ] [ drop ] swap compose iterate-prep G12342 ;

After some more inline expansion,
: G12342
[ swap >r 2dup >r >r call r> r> r> swap >r >r 1+ r> r> G12342 ]
[ 3drop ] >r >r 2over < r> r> if ; inline

: foo 100 [ drop bar ] 0 -rot G12342 3 baz ;

Now, the compiler sees that call is being applied to a literal quotation, so its contents can just be copied to the call site:
: G12342
[ swap >r 2dup >r >r drop drop bar r> r> r> swap >r >r 1+ r> r> G12342 ]
[ 3drop ] >r >r 2over < r> r> if ; inline

: foo 100 [ drop bar ] 0 -rot G12342 3 baz ;

Also, if is being applied to two literal quotations, so they can be hoisted up to their call site:
: G12342
[ swap >r 2dup >r >r drop drop bar r> r> r> swap >r >r 1+ r> r> G12342 ]
[ 3drop ] >r >r 2over < r> r> 2drop
[ swap >r 2dup >r >r drop drop bar r> r> r> swap >r >r 1+ r> r> G12342 ]
[ 3drop ] if ; inline

: foo 100 [ drop bar ] 0 -rot G12342 3 baz ;

Now, the quotations [ drop bar ], [ swap >r 2dup >r >r drop drop bar r> r> r> swap >r >r 1+ r> r> G12342 ] and [ 3drop ] are dead literals, so they are removed, and all stack shuffle words which they travel through are collapsed:
: G12342
2dup <
[ >r >r bar r> 1+ r> G12342 ]
[ 2drop ] if ; inline

: foo 100 0 swap G12342 3 baz ;

This is where previous Factor releases would stop performing control and data flow optimizations. The specialized loop body (G12342 in this example) would be compiled as its own word in the code heap, and foo would call this word. This entails some overhead in setting up and tearing down activation frames, and so on.

In the general case, this is all you can do: for example, consider the case of a binary-recursive combinator that calls itself in non-tail position; the combinator body needs its own activation frames. However, for tail recursive loops, we can do better. Since the loop only calls itself in tail position, it can share the activation frame of foo itself; that is, conceptually, we can compile it to the following:
: G12342
2dup <
[ >r >r bar r> 1+ r> G12342 ]
[ 2drop ] if ; inline

: foo
100 0 swap
G12342: ! a label
2dup <
[ >r >r bar r> 1+ r> goto: G12342 ] ! goto is not a real Factor word...
[ 2drop ] if
3 baz ;

After performing this type of inlining we need to distinguish the recursive call to G12342 from a normal call, since now it must be a tail call even though it is not in tail position!

Now with general loops, this is the best we can do. However in the above case, we can apply an optimization much like the first one described in this blog post; since only one branch of that conditional ever returns, we can hoist the code after that conditional into the returning branch:
: foo
100 0 swap
G12342: ! a label
2dup <
[ >r >r bar r> 1+ r> goto: G12342 ] ! goto is not a real Factor word...
[ 2drop 3 baz ] if ;

An example of a tail recursion where this optimization cannot be used:
: example-1
dup string? [ do-something ] [
dup integer? [ do-something-else ] [ do-another-thing example-1 ] if
] if ;

: example-2
example-1 xyz ;

Here, the code at xyz would have to be cloned twice in order to be hoisted up into the non-recursive branches above, and in general, branch splitting can lead to exponential code growth so I avoid this particular optimization.

Loop detection and nested loops


This optimization works on nested loops; for example, if you have
: foo [ [ drop ] each ] each ;

Then it optimizes down to:
: G:144972
pick pick <
[ >r >r 1+ r> r> G:144972 ]
[ 3drop r> r> r> swap >r >r 1+ r> r> G:144955 ]
if ;

: G:144955
pick pick < [
swap >r 2dup >r >r
nth-unsafe dup length swap 0 -rot G:144972
] [ 3drop ] if ;

: foo dup length swap 0 -rot G:144955 ;

Note the mutually-recursive loops that appear due to tail hoisting; also note that it appears as if the conditionals leave the retain stack unbalanced! While the compiler enforces retain stack balancing for user code, internally it is free to do whatever crazy stuff it wants as long as the result gives the same output as the user's code. Of course because of loop detection, both mutually-recursive loops share foo's activation frame:
: foo
dup length swap 0 -rot
G:144955: ! a label
pick pick < [
swap >r 2dup >r >r
nth-unsafe dup length swap 0 -rot
G:144972: ! a label
pick pick <
[ >r >r 1+ r> r> goto: G:144972 ]
[ 3drop r> r> r> swap >r >r 1+ r> r> goto: G:144955 ]
if
] [ 3drop ] if ;

Future directions


Now loop detection works quite well and the implementation is elegant however it is really quite rudimentary. I'd like to extend this to more general control flow analysis, and for this I will need to extend the compiler's intermediate representation into something more sophisticated.

For example, consider the following code:
dup bar? [ foo? ] [ drop f ] if [ A ] [ B ] if ;

Right now it is compiled down to something like this:
dup bar?
if-true-goto: 1
drop f
goto: 2
1: foo?
2: if-true-goto: 3
B
goto: 4
3: A

However if the second branch of the first conditional is taken, then clearly we don't have to test the top of the stack again because we already know it is f. So I'd like to compile the this code as follows:
dup bar?
if-true-goto: 1
drop
goto: 5
1: foo?
2: if-true-goto: 3
5: B
goto: 4
3: A
4: ...

Another thing I need to do very soon is register allocation across basic blocks. Right now the compiler uses registers to store values inside a basic block (a run of code without any subroutine calls; these can get rather long because of inlining, open-coded intrinsics,and optimizations such as the two I implemented and described here). However, it does not attempt to allocate registers across branch merge points and in particular across loop iterations. That's pretty lame.

To be able to do all of this, I need an IR which represents control flow in a more natural way, allowing more sophisticated optimizations to be expressed concisely and efficiently. This will be my next step in advancing Factor's compiler.

5 comments:

_ said...

I did some control-flow optimizations once. It is probably the most fun you can have without involving a woman.

Joseph Huang said...

why not just use LLVM? and no, you don't have to use C++ yourself to generate LLVM.

Slava Pestov said...

Why don't the LLVM folks just use Factor?

Joseph Huang said...

well if you want to reimplement all of the optimizations already implemented for LLVM then go right ahead.

Joseph Huang said...

and LLVM is for being a backend to languages, not a frontend like Factor. i meant compile Factor to LLVM representation, sorry for being unclear.