Saturday, September 13, 2008

Improved performance on the raytracer benchmark

The new optimizer was missing two optimizations performed by the old one; arithmetic identity optimizations, and modular arithmetic optimizations. This brings the total number of optimization passes to 11 (def-use analysis runs twice).

The first optimization involves removing such redundant code sequences as 0 +, 1 * and 0 bitor. The old optimizer performed this unconditionally, which was incorrect since most of these identities fail to hold for floats. For example, -0.0 0 + is 0.0, so in this case 0 + is not a no-op, and 1.0 0.0 / 0.0 * is NaN, so 0.0 * does not always yield zero. However, they are valid for integers, so the new optimizer only applies these optimizations to integer-valued expressions.

The second optimization is described in the blog post I linked to.

I also found and fixed a code generator regression which has been there for ages, and a type inference regression which appeared with the new code generator. Fixing these two regressions and implementing the two missing optimizations above improved performance of benchmark.raytracer by almost 2x; the run time went from 12 seconds to 6.5 seconds on my 2.4 GHz Core 2 Duo MacBook Pro.

The Java version runs in 1.1 seconds with the server VM (I modified the code to run the benchmark 10 times in a loop to give the JIT a chance to do some runtime profiling, and I measured the run time internally so as to avoid JVM startup overhead). The Common Lisp version runs in 3.5 seconds under SBCL. Note that I'm running the benchmark with n=200 and level=3.

Factor is slowly inching towards the performance offered by other compilers. Hopefully one day I'll be able to close the gap.

4 comments:

JankoM said...

Great! Because at the end performance is what matters at a lot of things we do.

Anonymous said...

I thought that having such an obscure syntax was the reason you guys wanted to have the best performance out there. I see that it's better to stay with something more readable. What's the real reason someone would start using Factor? The GUI is scary and unintuitive, the speed is so-so, the syntax is off-putting and I have absolutely no job perspective with this thing. I don't see another Rails/Django being built with Factor, do you? Erlang is ugly but it's got such a long history of being used internally for telecom systems that I can understand its ugliness. Having a language to stop n00bs migrating to it is no reason at all. I would really appreciate a post on your blog stating like 20 points why we should take up Factor instead of Python, Ruby etc. with clear examples of the benefits (f.e. this is done 20 times faster, we could use a database in an easier way, we have a web server in development, we could be certain that more than 45 people use it etc. ). Thanks!

Anonymous said...

Not sure if you read comments for old posts, but here goes :)

Following on Chris Double's use of "optimized." above to see what the compiler does, I notice the pattern ">r r>" appearing in several places in his report regarding linrec and also in some tests I did (e.g. with "human-sort").

For example, in the middle of "\ human-sort optimized.") in a recent drop I see:

tuck 3 set-slot tuck
>r r>
3 set-slot tuck
>r r>

Your other post about the optimizer implies that these return stack patterns are cleaned up, except where values may be live... so here, set-slot takes three values, but the ">r r>" seems like a no-op nonetheless.

Is it?

Slava Pestov said...

Ed, stack patterns are optimized by the code generator, so "optimized." will still show the unoptimized stack patterns.