Factor: a practical stack language: September 2009

Wednesday, September 30, 2009

A survey of domain-specific languages in Factor

Factor has good support for implementing mini-languages as libraries. In this post I'll describe the general techniques and look at some specific examples. I don't claim any of this is original research -- Lisp and Forth programmers have been doing DSLs for decades, and recently the Ruby, Haskell and even Java communities are discovering some of these concepts and adding a few of their own to the mix. However, I believe Factor brings some interesting incremental improvements to the table, and the specific combination of capabilities found in Factor is unique.

Preliminaries

How does one embed a mini-language in Factor? Let us look at what goes on when a source file is parsed:

The parser reads the input program. This consists of definitions and top-level forms. The parser constructs syntax trees, and adds definitions to the dictionary. The result of parsing is the set of top-level forms in the file.
The compiler is run with all changed definitions. The compiler essentially takes syntax trees as input, and produces machine code as output. Once the compiler finishes compiling the new definitions, they are added to the VM's code heap and may be executed.
The top level forms in the file are run, if any.

In Factor, all of these stages are extensible. Note that all of this happens when the source file is loaded into memory -- Factor is biased towards compile-time meta-programming.

Extending the parser with parsing words

Parsing words which execute at parse time can be defined. Parsing words can take over the parser entirely and parse custom syntax. All of Factor's standard syntax, such as : for defining words and [ for reading a quotation, is actually parsing words defined in the syntax vocabulary. Commonly-used libraries such as memoize and specialized-arrays add their own parsing words for custom definitions and data types. These don't qualify as domain-specific languages since they're too trivial, but they're very useful.

Homoiconic syntax

In most mainstream languages, the data types found in the syntax tree are quite different from the data types you operate on at runtime. For example, consider the following Java program:

if(x < 3) { return x + 3; } else { return foo(x); }

This might parse into something like

IfNode(
    condition: LessThan(Identifier(x),Integer(3))
    trueBranch: ReturnNode(Add(Identifier(x),Integer(3)))
    falseBranch: ReturnNode(MethodCall(foo,Identifier(x)))
)

You cannot work with an "if node" in your program, the identifier x does not exist at runtime, and so on.

In Factor and Lisp, the parser constructs objects such as strings, numbers and lists directly, and identifiers (known as symbols in Lisp, and words in Factor) are first-class types. Consider the following Factor code,

x 3 < [ x 3 + ] [ x foo ] if

This parses as a quotation of 6 elements,

The word x
The integer 3
The word <
A quotation with three elements, x, 3, and +
A quotation with two elements, x, and foo
The word if

An array literal, like { 1 2 3 }, parses as an array of three integers; not an AST node representing an array with three child AST nodes representing integers.

The flipside of homoiconic syntax is being able to print out (almost) any data in a form that can be parsed back in; in Factor, the . word does this.

What homoiconic syntax gives you as a library author is the ability to write mini-languages without writing a parser. Since you can input almost any data type wth literal syntax, you can program your mini-language in parse trees directly. The mini-language can process nested arrays, quotations, tuples and words instead of parsing a string representation first.

Compile-time macros

All of Factor's data types, including quotations and words, can be constructed at runtime. Factor also supports compile-time macros in the Lisp sense, but unlike Lisp where they are used to prevent evaluation, a Factor macro is called just like an ordinary word, except the parameters have to be compile-time literals. The macro evaluates to a quotation, and the quotation is compiled in place of the macro call.

Everything that can be done with macros can also be done by constructing quotations at runtime and using call( -- macros just provide a speed boost.

Parsing word-based DSLs

Many DSLs have a parsing word as their main entry point. The parsing word either takes over the parser completely to parse custom syntax, or it defines some new words and then lets the main parser take over again.

Local variables

A common misconception among people taking a casual look at Factor is that it doesn't offer any form of lexical scoping or named values at all. For example, Reg Braithwaite authoritatively states on his weblog:

the Factor programming language imposes a single, constraining set of rules on the programmer: programmers switching to Factor must relinquish their local variables to gain Factor’s higher-order programming power.

In fact, Factor supports lexically scoped local variables via the locals vocabulary. and this library is used throughout the codebase. It looks like in a default image, about 1% of all words use lexical variables.

The locals vocabulary implements a set of parsing words which augment the standard defining words. For example, :: reads a word definition where input arguments are stored in local variables, instead of being on the stack:

:: lerp ( a b c -- d )
    a c *
    b 1 c - *
    + ;

The locals vocabulary also supports "let" statements, lambdas with full lexical closure semantics, and mutable variables. The locals vocabulary compiles lexical variable usage down to stack shuffling, and curry calls (for constructing quotations that close over a variable). This makes it quite efficient, especially since in many cases the Factor compiler can eliminate the closure construction using escape analysis. The choice of whether or not to use locals is one that can be made purely on a stylistic basis, since it has very little effect on performance.

Parsing expression grammars

Parsing expression grammars describe a certain class of languages, as well as a formalism for parsing these languages.

Chris Double implemented a PEG library in Factor. The peg vocabulary offers a combinator-style interface for constructing parsers, and peg.ebnf builds on top of this and defines a declarative syntax for specifying parsers.

A simple example of a PEG grammar can be found in Chris Double's peg.pl0 vocabulary. More elaborate examples can be found in peg.javascript (JavaScript parser by Chris Double) and smalltalk.parser (Smalltalk parser by me).

One downside of PEGs is that they have some performance problems; the standard formulation has exponential runtime in the worst case, and the "Packrat" variant that Factor uses runs in linear time but also linear space. For heavy-duty parsing, it appears as if LL and LR parsers are best, and it would be nice if Factor had an implementation of such a parser generator.

However, PEGs are still useful for simple parsing tasks and prototyping. They are used throughout the Factor codebase for many things, including but not limited to:

Parsing regular expressions in the regexp vocabulary (source)
Parsing URLs in the urls vocabulary (source)
Parsing HTTP headers in the http.server and http.client vocabularies (source)

PEGs can also be used in conjunction with parsing words to embed source code written with custom grammars in Factor source files directly. The next DSL is an example of that.

Infix expressions

Philipp Brüschweiler's infix vocabulary defines a parsing word which parses an infix math expression using PEGs. The result is then compiled down to locals, which in turn compile down to stack code.

Here is a word which solves a quadratic equation ax^2 + bx + c = 0 using the quadratic formula. Infix expressions can only return one value, so this word computes the first root only:

USING: infix locals ;

:: solve-quadratic ( a b c -- r )
    [infix (-b + sqrt(b*b-4*a*c))/2*a infix] ;

Note that we're using two mini-languages here; :: begins a definition with named parameters stored in local variables, and [infix parses an infix expression which can access these local variables.

XML literals

Daniel Ehrenberg's XML library defines a convenient syntax for constructing XML documents. Dan describes it in detail in a blog post, with plenty of examples, so I won't repeat it here. The neat thing here is that by adding a pair of parsing words, [XML and <XML, he was able to integrate XML snippets into Factor, with parse-time well-formedness checking, no less.

Dan's XML library is now used throughout the Factor codebase, particularly in the web framework, for both parsing and generating XML. For example, the concatenative.org wiki uses a markup language called "farkup". The farkup markup language, developed by Dan, Doug Coleman and myself, makes heavy use of XML literals. Farkup is implemented by first parsing the markup into an abstract syntax tree, and then converting this to HTML using a recursive tree walk that builds XML literals. We avoid constructing XML and HTML through raw string concatenation; instead we use XML literals everywhere now. This results in cleaner, more secure code.

Compare the design of our farkup library with markdown.py used by reddit.com. The latter is implemented with a series of regular expression hacks and lots of ad-hoc string processing which attempts to produce something resembling HTML in the end. New markup injection attacks are found all the time; there was a particularly clever one involving a JavaScript worm that knocked reddit right down a few days ago. I don't claim that Farkup is 100% secure by any means, and certainly it has had an order of magnitude less testing, but without a doubt centralizing XHTML generation makes it much easier to audit and identify potential injection problems.

C library interface

Our C library interface (or FFI) was quite low-level initially but after a ton of work by Alex Chapman, Joe Groff, and myself, it has quite a DSLish flavor. C library bindings resemble C header files on mushrooms. Here is a taste:

TYPEDEF: int cairo_bool_t

CONSTANT: CAIRO_CONTENT_COLOR HEX: 1000
CONSTANT: CAIRO_CONTENT_ALPHA HEX: 2000
CONSTANT: CAIRO_CONTENT_COLOR_ALPHA HEX: 3000

STRUCT: cairo_matrix_t
    { xx double }
    { yx double }
    { xy double }
    { yy double }
    { x0 double }
    { y0 double } ;
    
FUNCTION: void
cairo_transform ( cairo_t* cr, cairo_matrix_t* matrix ) ;

The Factor compiler generates stubs for calling C functions on the fly from these declarative descriptions; there is no C code generator and no dependency on a C compiler. In fact, C library bindings are so easy to write that for many contributors, it is their first project in Factor. When Doug Coleman first got involved in Factor, he began by writing a PostgreSQL binding, followed by an implementation of the MD5 checksum. Both libraries have been heavily worked on since then are still in use.

For complete examples of FFI usage, check out any of the following:

unix - POSIX bindings
windows - Windows API bindings
db.sqlite.ffi - SQLite bindings
opengl.gl - OpenGL bindings

There are many more usages of the FFI of course. Since Factor has a minimal VM, all I/O, graphics and interaction with the outside world in general is done with the FFI. Search the Factor source tree for source files that use the alien.syntax vocabulary.

GPU shaders

Joe Groff cooked up a nice DSL for passing uniform parameters to pixel and vertex shaders. In his blog post, Joe writes:

The library makes it easy to load and interactively update shaders, define binary formats for GPU vertex buffers, and feed parameters to shader code using Factor objects.

Here is a snippet from the gpu.demos.raytrace demo:

GLSL-SHADER-FILE: raytrace-vertex-shader vertex-shader "raytrace.v.glsl"
GLSL-SHADER-FILE: raytrace-fragment-shader fragment-shader "raytrace.f.glsl"
GLSL-PROGRAM: raytrace-program
    raytrace-vertex-shader raytrace-fragment-shader ;

UNIFORM-TUPLE: sphere-uniforms
    { "center" vec3-uniform  f }
    { "radius" float-uniform f }
    { "color"  vec4-uniform  f } ;

UNIFORM-TUPLE: raytrace-uniforms
    { "mv-inv-matrix"    mat4-uniform f }
    { "fov"              vec2-uniform f }
    
    { "spheres"          sphere-uniforms 4 }

    { "floor-height"     float-uniform f }
    { "floor-color"      vec4-uniform 2 }
    { "background-color" vec4-uniform f }
    { "light-direction"  vec3-uniform f } ;

The GLSL-SHADER-FILE: parsing word tells Factor to load a GLSL shader program. The GPU framework automatically checks the file for modification, reloading it if necessary.

The UNIFORM-TUPLE: parsing word defines a new tuple class, together with methods which destructure the tuple and bind textures and uniform parameters. Uniform parameters are named as such because they define values which remain constant at every pixel or vertex that the shader program operates on.

Instruction definitions in the compiler

This one is rather obscure and technical, but it has made my job easier over the last few weeks. I blogged about it already.

Other examples

The next set of DSLs don't involve parsing words as much as just clever tricks with evaluation semantics.

Inverse

Daniel Ehrenberg's inverse library implements a form of pattern matching by computing the inverse of a Factor quotation. The fundamental combinator, undo, takes a Factor quotation, and executes it "in reverse". So if there is a constructed tuple on the stack, undoing the constructor will leave the slots on the stack. If the top of the stack doesn't match anything that the constructor could've produced, then the inverse fails, and pattern matching can move on to the next clause. This library works by introspecting quotations and the words they contain. Dan gives many details and examples in his paper on inverse.

Help system

Factor's help system uses an s-expression-like markup language. Help markup is parsed by the Factor parser without any special parsing words. A markup element is an array where the first element is a distinguished word and the rest are parameters. Examples:

"This is " { $strong "not" } " a good idea"

{ $list
    "milk"
    "flour"
    "eggs"
}

{ $link "help" }

This markup is rendered either directly on the Factor UI (like in this screenshot) or via HTML, as on the docs.factorcode.org site.

The nice thing about being able to view help in the UI environment is the sheer interactive nature of it. Unlike something like javadoc, there is no offline processing step which takes your source file and spits out rendered markup. You just load a vocabulary into your Factor instance and the documentation is available instantly. You can look at the help for any documented word by simply typing something like

\ append help

in the UI listener. While working on your own vocabulary, you can reload changes to the documentation and see them appear instantly in the UI's help browser.

Finally, it is worth mentioning that because of the high degree of semantic information encoded in documentation, many kinds of mistakes can be caught in an automated fashion. The help lint tool finds inconsistencies between the actual parameters that a function takes, and the documented parameters, as well as code examples that don't evaluate to what the documentation claims they evaluate to, and a few other things.

You won't find a lot of comments in Factor source, because the help system is much more useful. Instead of plain-text comments that can go out of date, why not have rich text with hyperlinks and semantic information?

For examples of help markup, look at any file whose name ends with -docs.factor in the Factor source tree. There are plenty.

x86 and PowerPC assemblers

I put this one last since its not really a DSL at all, just a nice API. The lowest level of Factor's compiler generates machine code from the compiler's intermediate form in a CPU-specific way. The CPU backends for x86 and PowerPC use the cpu.x86.assembler and cpu.ppc.assembler vocabularies for this, respectively. The way the assemblers work is that they define a set of words corresponding to CPU instructions. Instruction words takes operands from the stack -- which are objects representing registers, immediate values, and in the case of the x86, addressing modes. They then combine the operands and the instruction into a binary opcode, and add it to a byte vector stored in a dynamically-scoped variable. So instead of calling methods on, and passing around, an 'assembler object' as you would in say, a JIT coded in C++, you wrap the code generation in a call to Factor's make word, and simply invoke instruction words therein. The result looks like assembler source, except it is postfix. Here is an x86 example:

[
    EAX ECX ADD
    XMM0 XMM1 HEX: ff SHUFPS
    AL 7 OR
    RAX 15 [+] RDI MOV
] B{ } make .

When evaluated, the above will print out the following;

B{ 1 200 15 198 193 255 131 200 7 72 137 120 15 }

Of course, B{ is the homoiconic syntax for a byte array. Note the way indirect memory operands are constructed; first, we push the register (RAX) then the displacement (here the constant 15, but register displacement is supported by x86 too). Then we call a word [+] which constructs an object representing the addressing mode [RAX+15].

The rationale for choosing this somewhat funny syntax for indirect operands (there is also a [] word for memory loads without a displacement), rather than some kind of parser hack that allows one to write [RAX] or [R14+RDI] directly, is that in reality the compiler only rarely deals with hard-coded register assignments. Instead, the register allocator makes decisions a level above, and passes them to the code generator. Here is a typical compiler code generation template from the cpu.x86 vocabulary:

M:: x86 %check-nursery ( label temp1 temp2 -- )
    temp1 load-zone-ptr
    temp2 temp1 cell [+] MOV
    temp2 1024 ADD
    temp1 temp1 3 cells [+] MOV
    temp2 temp1 CMP
    label JLE ;

Here, I'm using the locals vocabulary together with the assembler. The temp1 and temp2 parameters are registers and label is, as its name implies, a label to jump to. This snippet generates machine code that checks whether or not the new object allocation area has enough space; if so, it jumps to the label, otherwise it falls through (code to save live registers and call the GC is generated next). The load-zone-ptr word is like an assembler macro here; it takes a register and generates some more code with it.

The PowerPC assembler is a tad more interesting. Since the x86 instruction set is so complex, with many addressing modes and so on, the x86 assembler is implemented in a rather tedious manner. Obvious duplication is abstracted out. However, there is a lot of case-by-case code for different groups of instructions, with no coherent underlying abstraction allowing the instruction set to be described in a declarative way.

On PowerPC, the situation is better. Since the instruction set is a lot more regular (fixed width instructions, only a few distinct instruction formats, no addressing modes), the PowerPC assembler itself is built using a DSL specifically for describing PowerPC instructions:

D: ADDI 14
D: ADDIC 12
D: ADDIC. 13
D: ADDIS 15
D: CMPI 11
D: CMPLI 10
...

The PowerPC instruction format DSL is defined in the cpu.ppc.assembler.backend vocabulary, and as a result the cpu.ppc.assembler vocabulary itself is mostly trivial.

Last words

Usually my blog posts describe recent progress in the Factor implementation, and I tend to write about what I'm working on right now. I'm currently working on code generation for SIMD vector instructions in the Factor compiler. I was going to blog about this instead, but decided not to do it until the SIMD implementation and API settles down some more.

With this post I decided to try something a bit different, and instead just describe an aspect of Factor that interests to me, without any of it being particularly breaking news. If you've been following Factor development closely, there is literally nothing in this post that you would not have seen already, however I figured people who don't track the project so closely might appreciate a general survey like this. I'm also thinking of writing a post describing Factor's various high-level I/O libraries. I'd appreciate any feedback, suggestions and ideas on this matter.

Saturday, September 12, 2009

Advanced floating point features: exceptions, rounding modes, denormals, unordered compares, and more

Factor now has a nice library for introspecting and modifying the floating point environment. Joe Groff implemented most of it, and I helped out with debugging and additional floating point comparison operations. All of these features are part of the IEEE floating point specification and are implemented on all modern CPUs, however few programming languages expose a nice interface to working with them. C compilers typically provide hard-to-use low-level intrinsics and other languages don't tend to bother at all. Two exceptions are the SBCL compiler and the D language.

The new functionality is mostly contained in the math.floats.env vocabulary, with a few new words in math for good measure. The new code is in the repository but it is not completely stable yet; there are still some issues we need to work out on the more obscure platforms.

To follow along with the examples below, you'll need to get a git checkout from the master branch and load the vocabulary in your listener:

USE: math.floats.env

The first two features, floating point exceptions and traps, are useful for debugging numerical algorithms and detecting potentially undesirable situations (NaNs appearing, underflow, overflow, etc).

Floating point exceptions

One of the first things people learn about floating point is that it has "special" values: positive and negative infinity, and not-a-number (NaN) values. These appear as the results of computations where the answer is undefined (division by zero, square root of -1, etc) or the answer is too small or large to be represented as a float (2 to the power of 10000, etc). A less widely-known fact is that when a special value is computed, "exception flags" are set in a hardware register which can be read back in. Most languages do not offer any way to access this functionality.

In Factor, exception flags can be read using the collect-fp-exceptions combinator, which first clears the flags, calls a quotation, then outputs any flags which were set. For example, division by zero sets the division by zero exception flag and returns infinity:

( scratchpad ) [ 1.0 0.0 / ] collect-fp-exceptions . .
{ +fp-zero-divide+ }
1/0.

Dividing 1 by 3 sets the inexact flag, because the result (0.333....) cannot be represented as a float:

( scratchpad ) [ 1.0 3.0 / ] collect-fp-exceptions . .
{ +fp-inexact+ }
0.3333333333333333

The fact that 1/3 does not have a terminating decimal or binary expansion is well-known, however one thing that many beginners find surprising is that some numbers which have terminating decimal expansions nevertheless cannot be represented precisely as floats because they do not terminate in binary (one classic case is 1.0 - 0.9 - 0.1 != 0.0):

( scratchpad ) [ 4.0 10.0 / ] collect-fp-exceptions . .
{ +fp-inexact+ }
0.4

Raising a number to a power that is too large sets both the inexact and overflow flags, and returns infinity:

( scratchpad ) [ 2.0 10000.0 ^ ] collect-fp-exceptions . .
{ +fp-inexact+ +fp-overflow+ }
1/0.

The square root of 4 is an exact value; no exceptions were set:

( scratchpad ) [ 4.0 sqrt ] collect-fp-exceptions . .
{ }
2.0

The square root of 2 is not exact on the other hand:

( scratchpad ) [ 2.0 sqrt ] collect-fp-exceptions . .
{ +fp-inexact+ }
1.414213562373095

Factor supports complex numbers, so taking the square root of -1 returns an exact value and does not set any exceptions:

( scratchpad ) [ -1.0 sqrt ] collect-fp-exceptions . .
{ }
C{ 0.0 1.0 }

However, we can observe the invalid operation exception flag being set if we call the internal fsqrt word, which operates on floats only and calls the libc function (or uses the SQRTSD instruction on SSE2):

( scratchpad ) USE: math.libm [ -1.0 fsqrt ] collect-fp-exceptions . .
{ +fp-invalid-operation+ }
NAN: 8000000000000

I describe the new NAN: syntax later in this post.

Signaling traps

Being able to inspect floating point exceptions set after a piece of code runs is all well and good, but what if you have a tricky underflow bug, or a NaN is popping up somewhere, and you want to know exactly where? In this case it is possible to set a flag in the FPU's control register which triggers a trap when an exception is raised. This trap is delivered to the Factor process as a signal (Unix), Mach exception (Mac OS X), or SEH exception (Windows). Factor then throws it as an exception which can be caught using any of Factor's error handling words, or just left unhandled in which case it will bubble up to the listener.

The with-fp-traps combinator takes a list of traps and runs a quotation with those traps enabled; when the quotation completes (or throws an error) the former FPU state is restored again (indeed it has to be this way, since running the Factor UI's rendering code with traps enabled quickly kills it). The all-fp-exceptions word is equivalent to specifying { +fp-invalid-operation+ +fp-overflow+ +fp-underflow+ +fp-zero-divide+ +fp-inexact+ }. Here is an example:

( scratchpad ) all-fp-exceptions [ 0.0 0.0 / ] with-fp-traps
Floating point trap

Without the combinator wrapped around it, 0.0 0.0 / simply returns a NaN value without throwing anything.

Rounding modes

Unlike exceptions and traps, which do not change the result of a computation but merely set status flags (or interrupt it), the next two features, the rounding mode and denormal mode, actually change the results of computations. As with exceptions and traps, they are implemented as scoped combinators rather than global state changes to ensure that code using these features is 'safe' and cannot change floating point state of surrounding code.

If a floating point operation produces an inexact result, there is the question of how the result will be rounded to a value representable as a float. There are four rounding modes in IEEE floating point:

+round-nearest+
+round-down+
+round-up+
+round-zero+

Here is an example of an inexact computation done with two different rounding modes; the default (+round-nearest+) and +round-up+:

( scratchpad ) 1.0 3.0 / .
0.3333333333333333
( scratchpad ) +round-up+ [ 1.0 3.0 / ] with-rounding-mode .
0.3333333333333334

Denormals

Denormal numbers are numbers where the exponent consists of zero bits (the minimum value) but the mantissa is not all zeros. Denormal numbers are undesirable because they have lower precision than normal floats, and on some CPUs computations with denormals are less efficient than with normals. IEEE floating point supports two denormal modes: you can elect to have denormals "flush" to zero (+denormal-flush+), or you can "keep" denormals (+denormal-keep+). The latter is the default:

( scratchpad ) +denormal-flush+ [ 51 2^ bits>double 0.0 + ] with-denormal-mode .
0.0
( scratchpad ) 51 2^ bits>double 0.0 + .
1.112536929253601e-308

Ordered and unordered comparisons

In math, for any two numbers a and b, one of the following three properties hold:

a < b
a = b
a > b

In floating point, there is a fourth possibility; a and b are unordered. This occurs if one of the two values is a NaN. The unordered? predicate tests for this possibility:

( scratchpad ) NAN: 8000000000000 1.0 unordered? .
t

If an ordered comparison word such as < or >= is called with two values which are unordered, they return f and set the +fp-invalid-operation+ exception:

( scratchpad ) NAN: 8000000000000 1.0 [ < ] collect-fp-exceptions . .
{ +fp-invalid-operation+ }
f

If traps are enabled this will throw an error:

( scratchpad ) NAN: 8000000000000 1.0 { +fp-invalid-operation+ } [ < ] with-fp-traps    
Floating point trap

If your numerical algorithm has a legitimate use for NaNs, and you wish to run it with traps enabled, and have certain comparisons not signal traps when inputs are NaNs, you can use unordered comparisons in those cases instead:

( scratchpad ) NAN: 8000000000000 1.0 [ u< ] collect-fp-exceptions . .
{ }
f

Unordered versions of all the comparisons are defined now, u<, u<=, u>, and u>=. Equality of numbers is always unordered, so it does not raise traps if one of the inputs is a NaN. In particular, if both inputs are NaNs, equality always returns f:

( scratchpad ) NAN: 8000000000000 dup [ number= ] collect-fp-exceptions . .
{ }
f

Half-precision floats

Everyone and their drunk buddy know about IEEE single (32-bit) and double (64-bit) floats; IEEE also defines half-precision 16-bit floats. These are not used nearly as much; they come up in graphics programming for example, since GPUs use them for certain calculations with color components where you don't need more accuracy. The half-floats vocabulary provides some support for working with half-floats. It defines a pair of words for converting Factor's double-precision floats to and from half-floats, as well as C type support for passing half-floats to C functions via FFI, and building packed arrays of half-floats for passing to the GPU.

Literal syntax for NaNs and hexadecimal floats

You may have noticed the funny NAN: syntax above. Previously all NaN values would print as 0/0., however this is inaccurate since not all NaNs are created equal; because of how IEEE floating point works, a value is a NaN if the exponent consists of all ones, leaving the mantissa unspecified. The mantissa is known as the "NaN payload" in this case. NaNs now print out, and can be parsed back in, using a syntax that makes the payload explicit. A NaN can also be constructed with an arbitrary payload using the <fp-nan> word:

( scratchpad ) HEX: deadbeef <fp-nan> .
NAN: deadbeef

The old 0/0. syntax still works; it is shorthand for NAN: 8000000000000, the canonical "quiet" NaN.

Some operations produce NaNs with different payloads:

( scratchpad ) USE: math.libm
( scratchpad ) 2.0 facos .
NAN: 8000000000022

In general, there is very little you can do with the NaN payload.

A more useful feature is hexadecimal float literals. When reading a float from a decimal string, or printing a float to a decimal string, there is sometimes ambiguity due to rounding. No such problem exists with hexadecimal floats.

An example of printing a number as a decimal and a hexadecimal float:

( scratchpad ) USE: math.constants
( scratchpad ) pi .
3.141592653589793
( scratchpad ) pi .h
1.921fb54442d18p1

Java supports hexadecimal float literals as of Java 1.5. Hats off to the Java designers for adding this! It would be nice if they would add the rest of the IEEE floating point functionality in Java 7.

Signed zero

Unlike twos-complement integer arithmetic, IEEE floating point has both positive and negative zero. Negative zero is used as a result of computations of very small negative numbers that underflowed. They also have applications in complex analysis because they allow a choice of branch cut to be made. Factor's abs word used to be implemented incorrectly on floats; it would check if the input was negative, and if so multiply it by negative one. However this was a problem because negative zero is not less than zero, and so the absolute value of negative zero would be reported as negative zero. The correct implementation of the absolute value function on floats is to simply clear the sign bit. It works properly now:

( scratchpad ) -0.0 abs .
0.0

Implementation

The implementation of the above features consists of several parts:

Cross-platform Factor code in the math.floats.env vocabulary implementing the high-level API
Assembly code in vm/cpu-x86.32.S, vm/cpu-x86.64.S, and vm/cpu-ppc.S to read and write x87, SSE2 and PowerPC FPU control registers
Low-level code in math.floats.env.x86 and math.floats.env.ppc which implements the high-level API in terms of the assembly functions, by calling them via Factor's FFI and parsing control registers into a cross-platform representation in terms of Factor symbols
Miscellaneous words for taking floats apart into their bitwise representation in the math vocabulary
Compiler support for ordered and unordered floating point compare instructions in compiler.cfg.instructions
CPU-specific code generation for ordered and unordered floating point compare instructions in cpu.x86 and cpu.ppc

Wednesday, September 02, 2009

Eliminating some boilerplate in the compiler

Adding new instructions to the low-level optimizer was too hard. Multiple places had to be updated, and I would do all this by hand:

The instruction tuple itself is defined in the compiler.cfg.instructions vocabulary with the INSN: class, which also creates a word with the same name that constructs the instruction and adds it to the current sequence.
Instructions which have a destination register have convenient constructors in compiler.cfg.hats which creates a fresh virtual register, creates an instruction with this register as the destination, and outputs it. So for example, 1 2 ^^add would create an add instruction with a fresh destination register, and output this register. It might be equivalent to something like 0 1 2 ##add.
Instructions that use virtual registers are be added to the vreg-insn union, and respond to methods defs-vreg, uses-vregs, and temp-vregs in compiler.cfg.def-use. This 'def-use' information is used by SSA construction, dead code elimination, copy coalescing, register allocation, among other things.
Methods have to be defined for the instruction in compiler.cfg.renaming.functor. This functor is used to generate code for renaming virtual registers in instructions. The renaming code is used for SSA construction, representation selection, register allocation, among other things.
Instructions which use non-integer representations (eg, operations on floats) must respond to methods defs-vreg-rep, uses-vreg-reps, and temp-vreg-reps in compiler.cfg.representations.preferred.
Instructions must respond to the generate-insn method, defined in compiler.codegen.
Instructions which participate in value numbering must define an "expression" variant, and respond to the >expr method defined in compiler.cfg.value-numbering.expressions

As you can see, this is a lot of duplicated work. I used inheritance and mixins to model relationships between instructions and reduce some of this duplication by defining methods on common superclasses rather than individual instructions where possible, but this never seemed to work out well.

If you look at the latest versions of the source files I linked to above, you'll see that all the repetitive copy-pasted code has been replaced with meta-programming. The new approach extends the INSN: parsing word so now all relevant information is specified in one place. Also there is a new PURE-INSN: parsing word, to mark instructions which participate in value numbering; previously this was done with a superclass. For example,

PURE-INSN: ##add
def: dst/int-rep
use: src1/int-rep src2/int-rep ;

This defines an instruction tuple, an expression tuple, def/use information, representation information, a method for converting instructions to expressions, and a constructor, all at the same time.

For the code generator's generate-insn method, not every instruction has a straightforward implementation; some, like GC checks and FFI calls, postpone a fair amount of work until the very last stage of compilation. However, for most instructions, this method simply extracts all the slots from the instruction tuple, then calls a CPU-specific hook in the cpu.architecture vocabulary to generate code. For these cases, a CODEGEN: parsing word sets up the relevant boilerplate;

CODEGEN: ##add %add

is equivalent to

M: ##add generate-insn [ dst>> ] [ src1>> ] [ src2>> ] tri %add ;

This nicely cleans up all the repetition I mentioned in the bullet points at the top.

I've been aware of this boilerplate for a while but wanted the design of the compiler to settle down first. Now that most of the machinery is in place, I feel comfortable cooking up some complex meta-programming to clean things up. Adding new instructions should be easier. I plan on adding some SSE2 vector operations soon, and this was the main motivation behind this cleanup.

How would you do this in other languages? In Lisp, you would use a macro which expands into a bunch of defclass, defmethod, etc forms. In Java, you might use annotations:

public class AddInsn extends Insn {
    @Def @Representation("int") Register dst;
    @Use @Representation("int") Register src1;
    @Use @Representation("int") Register src2;
}

Factor: a practical stack language