Sunday, December 27, 2009

Freeing Factor from gcc's embrace-and-extend C language extensions

I have completed a refactoring of the Factor VM, eliminating usage of gcc-specific language extensions, namely global register variables, and on x86-32, the regparm calling convention. My motivation for this is two-fold.

First of all, I'd like to have the option of compiling the Factor VM with compilers other than gcc, such as Visual Studio. While gcc runs on all of Factor's supported platforms and then some, setting up a build environment using the GNU toolchain on Windows takes a bit of work, especially on 64-bit Windows. Visual Studio will provide an easier alternative for people who wish to build Factor from source on that platform. In the future, I'd also like to try using Clang to build Factor.

The second reason is that the global register variable extension is poorly implemented in gcc. Anyone who has followed Factor development for a while will know that gcc bugs come up pretty frequently, and most of these seem to be related to global register variables. This is quite simply one of the less well-tested corners of gcc, and the gcc developers seem to mostly ignore optimizer bugs triggered by this feature.

The Factor VM used a pair of register variables to hold data stack and retain stack pointers. These are just ordinary fields in a struct now. Back in the days when Factor was interpreted and the interpreter was part of the VM, a lot of time was spent executing code within the VM itself, and keeping these pointers in registers was important. Nowadays the Factor implementation compiles to machine code even during interactive use, using a pair of compilers called the non-optimizing compiler and optimizing compiler. Code generated by Factor's compilers tends to dominate the performance profile, rather than code in the VM itself. Compiled code can utilize registers in any matter desired, and so it continues to access the data stack and retain stack through registers. To make it work with the modified VM, the compiler generates code for saving and restoring these registers in the VM's context structure before and after calls into the VM.

A few functions defined in the VM used gcc's regparm calling convention. Normally, on x86-32, function parameters are always passed in an array on the call stack in the esp register; regparm functions instead pass the first 1, 2 or 3 arguments in registers. Whether or not this results in a performance boost is debatable, but my motivation for using this feature was not performance but perceived simplicity. The optimizing compiler would generate calls to these functions, and the machine code generated is a little simpler if it can simply stick parameters in registers instead of storing them on the call stack. Eliminating regparm did not in reality make anything more complex, as only a few locations were affected and the issue was limited to the x86-32 backend only.

I'm pretty happy with how the refactoring turned out. It did not seem to affect performance at all, which was not too surprising, since code generated by Factor's optimizing compiler did not change, and the additional overhead surrounding Factor to C calls is lost in the noise.

My goal of getting Factor to build with other compilers is not quite achieved yet, however. While the gcc extensions are gone from C++ code, the VM still has some 800 lines of assembly source using GNU assembler syntax, in the files cpu-x86.32.S, cpu-x86.64.S, and cpu-ppc.S. This code includes fundamental facilities which cannot be implemented in C++, such as the main C to Factor call gate, the low-level set-callstack primitive used by the continuations implementation, and a few other things. The assembly source also has a few auxilliary CPU-dependent functions, for example for saving and restoring FPU flags, and detecting the SSE version on x86.

I plan on elimiating the assembly from the VM entirely, by having the Factor compiler generate this code instead. The non-optimizing compiler can generate things such as the C to Factor call gate. For the the remaining assembly routines, such as FPU feature access and SSE version detection, I plan on adding an "inline assembly" facility to Factor itself, much like gcc's asm statement. The result will be that the VM will be a pure C++ codebase, and the machine code generation will be entirely offloaded into Factor code, where it belongs. Factor's assembler DSL is much nicer to use than GNU assembler.

5 comments:

Unknown said...

Moreover, having Factor generate this assembly code will make it inlinable. On x86, {get,set}_{x87,sse}_env or read_timestamp_counter are good candidates for inlining.

Unknown said...

"embrace and extend" seems quite pejorative. The gcc guys are not Microsoft. All you have to do to "escape" gcc's extensions is to add "-ansi -pedantic" to your compile flags.

Doug Coleman said...

tutufan: All you have to do to "escape" gcc's extensions is to add "-ansi -pedantic" to your compile flags and rewrite all your code.

Anonymous said...

Great work . My homepages -
http://xanax.menstyle.it and http://www.mycampage.com/serevent . All my Sites - cyproheptadine cats side effects and arava attorney, . http://mevacor.style.it

Roger said...

When will Factor source be a (near) pure Factor codebase? :)