Literal float arrays look like
F{ 0.64 0.85 0.43 0.16 0.37 }
.Float arrays can provide a performance benefit if the compiler is able to infer enough information to unbox float values. For example, consider the following word:
: v+ [ + ] 2map ;
It takes two arrays and adds elements pairwise. Let's try timing the performance of this word with normal arrays:
( scratchpad ) { 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3200 ms run / 25 ms GC time
Now float arrays:
( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3653 ms run / 70 ms GC time
It is actually slower! This is because each element access has to allocate a new float on the heap. But now, lets use the new
hints
vocabulary to give a hint to the compiler that v+
should be optimized for float arrays:HINTS: v+ float-array float-array ;
This has the effect of compiling a version of this word specialized to float arrays. Here, the compiler can work some magic and eliminate boxing altogether:
( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
974 ms run / 10 ms GC time
I used float arrays to make the spectral norm benchmark faster: the run time went from 120 seconds to 30 seconds. The raytracer was not improved by float arrays, though; I need to investigate why.
Also float array operations are only compiled efficiently on PowerPC right now. I need to code some new assembly intrinsics for the other platforms.
Float arrays can be passed to C functions directly. Long-term, somebody should look into using SSE2 and AltiVec to optimize vector operations on float arrays. That would really rock.
3 comments:
Cool! Could you upload a new image so I can get this to run, though? It doesn't work to just make a boot image myself.
How does the Hints vocabulary work?
I didn't find it in the documentation.
does it overwrite the word with the specialized word once it takes the hint?
and then the compiler/interpreter sees the code for the specialized version?
please explain how it works.
thanks
Possibly look into Io's implementation of vectors (same thing as float arrays in Factor, essentially). It's in C, but they use conditional compilation to call specific functions that use altivec optimization. Hope that helps.
Post a Comment