In Factor 0.89, I implemented interval inference; this let the compiler convert some arithmetic operations to machine arithmetic.
In Factor 0.90, I added open-coded intrinsics for alien accessor words (words for reading/writing C values in memory); this allowed more values in the inner loop to be stored in registers, without incurring calls into the VM together with save/restore of values to the data stack.
In the last few days, I implemented some representation inference, which allows intermediate alien values to be stored as unboxed pointers. In the YUV to RGB inner loop, a pointer is read from a C struct; this pointer is then dereferenced. Prior to this, the pointer would be boxed in an
alieninstance when read, then unboxed when dereferenced. The compiler is now able to optimize this out.
The improvements in performance over the last year have been amazing. I don't remember how slow the Ogg player was when it was first added, but we're talking unusable slow -- something like 200 ms to convert a 480x320 frame of video from YUV to RGB on my Power Mac G5. When interval inference was added, the time went down to about 100ms per frame. Open-coded alien accessors improved this down to 30ms. And now the recent representation inference has again improved the decode time down to 15ms.
During this time the YUV to RGB conversion algorithm has not changed much; it still sits at the same level of abstraction (generic arithmetic, etc.) The compiler has done all the hard work of making it faster.
I still plan on working on the compiler in the future with the goal of making YUV to RGB faster. Now that the low-hanging fruit have been picked, and the YUV to RGB inner loop no longer calls into the VM or allocates memory, further improvements will come from better register allocation and instruction selection.