Wednesday, July 25, 2007

Specialized float arrays

I added float arrays to Factor. A float array behaves like an array of floats, except the representation is more efficient; individual elements are not boxed.

Literal float arrays look like F{ 0.64 0.85 0.43 0.16 0.37 }.

Float arrays can provide a performance benefit if the compiler is able to infer enough information to unbox float values. For example, consider the following word:
: v+ [ + ] 2map ;

It takes two arrays and adds elements pairwise. Let's try timing the performance of this word with normal arrays:
( scratchpad ) { 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3200 ms run / 25 ms GC time

Now float arrays:
( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3653 ms run / 70 ms GC time

It is actually slower! This is because each element access has to allocate a new float on the heap. But now, lets use the new hints vocabulary to give a hint to the compiler that v+ should be optimized for float arrays:
HINTS: v+ float-array float-array ;

This has the effect of compiling a version of this word specialized to float arrays. Here, the compiler can work some magic and eliminate boxing altogether:
( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
974 ms run / 10 ms GC time

I used float arrays to make the spectral norm benchmark faster: the run time went from 120 seconds to 30 seconds. The raytracer was not improved by float arrays, though; I need to investigate why.

Also float array operations are only compiled efficiently on PowerPC right now. I need to code some new assembly intrinsics for the other platforms.

Float arrays can be passed to C functions directly. Long-term, somebody should look into using SSE2 and AltiVec to optimize vector operations on float arrays. That would really rock.

3 comments:

Unknown said...

Cool! Could you upload a new image so I can get this to run, though? It doesn't work to just make a boot image myself.

Anonymous said...

How does the Hints vocabulary work?
I didn't find it in the documentation.

does it overwrite the word with the specialized word once it takes the hint?
and then the compiler/interpreter sees the code for the specialized version?

please explain how it works.
thanks

Anonymous said...

Possibly look into Io's implementation of vectors (same thing as float arrays in Factor, essentially). It's in C, but they use conditional compilation to call specific functions that use altivec optimization. Hope that helps.