Factor: a practical stack language: March 2009

Friday, March 20, 2009

Better static safety for higher-order code

Recently Daniel Ehrenberg and I implemented a new language feature which speeds up code calling that calls quotations which the compiler cannot inline.

Thew new language feature takes the form of syntax which is based on the call and execute words from the kernel vocabulary. These words call and execute a quotation, respectively. The new syntax allows for a stack effect declaration to be made at the call site. Suppose you have a hashtable of quotations in the dynamic variable foo, and you want to call the quotation stored at a given key. The traditional way is to use the call word:

"A key" foo get at call

Using the new call( syntax, you can call it but also assert that the quotation has a certain stack effect. For example,

"A key" foo get at call( a -- b )

The execute( syntax generalizes execute in a similar way. For example, here is some code from the parser which invokes parsing words. First it checks that the word was not defined in the same file (because if it was, it wouldn't be compiled yet); then it executes it using execute( to ensure it has exactly one input and one output:

ERROR: staging-violation word ;

: execute-parsing ( accum word -- accum )
    dup changed-definitions get key? [ staging-violation ] when
    execute( accum -- accum ) ;

The basic idea is that the code that uses call( and execute( is statically checked to ensure that the right number of parameters are passed around between words; and at runtime, the actual quotation or word is checked and if its stack effect is not what the code assumed statically, an error is thrown.

The call( and execute( words are parsing words which call parse-effect to read a stack effect object. This is the same underlying facility that the ( word uses. After parsing an effect, call( and execute( are transformed into calls to call-effect and execute-effect. So the following,

"A key" foo get at call( a -- b )

parses as, and is equivalent to

"A key" foo get at (( a -- b )) call-effect

Where call-effect should be thought of as "call a quotation with this effect".

These words complement call and execute. The call and execute words benefit from the compiler's lambda lifting optimization. If a literal quotation, the result of curry and compose, or a word, is only ever passed "downward", then the compiler will inline it at the call site of call and execute, effectively converting your program into a first-order program. This means that all kinds of iteration patterns become simple loops. Even if the iteration combinators are expressed as compositions of higher-order functions, where the quotation might be passed between several words and be operated on with curry and compose, things still work out pretty well.

However, not all code is first-order, and call( and execute( allow more code to be written in a style where Factor's stack checker can infer the stack effect. Whereas before, some 80% of the words in the library would have an inferred stack effect, now the figure is closer to 100%, after introducing the new language feature and refactoring code to use it.

The concept of compiler warnings always confused Factor beginners. In Factor, a compiler warning indicates that the compiler was unable to infer the stack effect of a word; this inhibits most optimizations because no assumptions about the positions of values on the stack can be made. So if you care about the performance of your code, it was good to write it in a way such that the stack effect would infer, but it wasn't possible to write code like this in all situations. Now it is, and compiler warnings are very rare; loading the Furnace web framework now produces less than 10 whereas before it would be a few hundred.

The implementation is pretty interesting. The call( parsing word expands into a call to the call-effect macro. The macro expects that the stack effect is a literal compile-time value. The macro expansion is specialized for calling a quotation with this effect. The quotation itself is only known at runtime, and so the code generated by the macro receives it as a parameter. It uses three strategies:

Inline cache hit: If the quotation is the same as last time, call it without any further runtime checks using the call-effect-unsafe primitive.
Inline cache miss: If the quotation is different from last time, then it checks the stack effect of the quotation against the declaration. The stack effect is computed lazily and stored in a slot in the quotation. If the declaration matches, the quotation is called using call-effect-unsafe.
Slow case: if the quotation's stack effect cannot be inferred, a snapshot of the data stack is taken, the quotation is called, and the stack is compared to make sure the quotation didn't use more parameters than declared.

The execute( parsing word expands into a call of the execute-effect macro, which uses three similar strategies:

Inline cache hit: If the word is the same as last time, call it without any further runtime checks using the execute-effect-unsafe primitive.
Inline cache miss: If the word is different from last time, then it checks the stack effect of the word against the declaration. Stack effects of words are always available, because they are computed by the compiler. If the declaration matches, the quotation is called using execute-effect-unsafe.
Slow case: if the word's stack effect is not known, then it creates a quotation with the word in it, and calls call-effect on it.

In most cases, the slow case should never be hit, and even if it is, the runtime checks are quite efficient. The inline caching ensures that if call( is used to invoke a quotation in a loop, it only has to check its effect once.

Factor is heading in an interesting direction. It generally offers more static checking and optimization than other dynamic languages. Here are the main differences; all of these things that Factor does at compile time, other dynamic languages leave until runtime:

In Factor, calls to undefined words are parse-time errors. In most dynamic languages, a typo in a method name would not be caught until run time.
In Factor, most code passes the stack effect checker. This means that calling a word with the wrong number of parameters is impossible.
In Factor, most uses of higher-order functions are inlined by the compiler, and escape analysis eliminates most closure allocations. So there are less blocks being passed around at runtime, and less allocation.

However, the language is still dynamically typed; the types of values are not known until runtime (except when the compiler can infer them), and there is a lot of generic dispatch and ad-hoc polymorphism in the library.

Now call( and execute( will speed up code which cannot take advantage of the last optimization. Indeed, with inline caching, calling quotations a loop with call( is almost as fast as if everything has been inlined.

Tuesday, March 03, 2009

Rendering Unicode text with Pango and Cairo

I implemented a Pango backend for text rendering in the Factor UI. This means that new_ui now works on Windows and X11 (well, not quite; I have to fix an unrelated bug in the X11 backend), and so it's almost ready to merge.

My main reference for Pango was the API documentation on gnome.org. Conceptually, Pango is quite similar to Core Text, but there are some important differences I will outline below. Pango supports two primary rendering backends; Xft and Cairo. I use the Cairo backend to render to a Cairo offscreen surface, and make an OpenGL texture out of it.

Cairo offscreen surfaces and OpenGL

When I wrote the code to create an offscreen Cairo surface, and wrap the result in a Factor bitmap image object, the code turned out to look almost identical to a similar utility I cooked up for Core Graphics. So as any self-respecting programmer, I factored this duplication out into a images.memory vocabulary.

First, we have some code to allocate unmanaged memory for the bitmap data. This memory is unmanaged because Cairo (and Core Graphics) retain a pointer to it, and so it must not be moved by a GC:

: bitmap-size ( dim -- n ) product "uint" heap-size * ;

: malloc-bitmap-data ( dim -- alien ) bitmap-size 1 calloc &free ;

Note that destructors are used to ensure the memory is deallocated later. Now, we have some code copy data out of the unmanaged memory and into a Factor byte array, and wrap it in a bitmap image object:

: bitmap-data ( alien dim -- data ) bitmap-size memory>byte-array ;

: <bitmap-image> ( alien dim -- image )
    [ bitmap-data ] keep
    <image>
        swap >>dim
        swap >>bitmap ;

Putting everything together into a combinator:

: make-memory-bitmap ( dim quot -- image )
    '[
        [ malloc-bitmap-data ] keep _ [ <bitmap-image> ] 2bi
    ] with-destructors ; inline

So the quotation receives the image dimensions and a pointer to some unmanaged memory.

On the Cairo side, I have a word to create an offscreen bitmap:

: width>stride ( width -- stride ) "uint" heap-size * ; inline

: <image-surface> ( data dim -- surface )
    [ CAIRO_FORMAT_ARGB32 ] dip first2 over width>stride
    cairo_image_surface_create_for_data
    dup check-surface ;

: <cairo> ( surface -- cairo ) cairo_create dup check-cairo ; inline

And then I use make-memory-image together with this word to define a new make-bitmap-image combinator:

: make-bitmap-image ( dim quot -- image )
    '[
        <image-surface> &cairo_surface_destroy
        <cairo> &cairo_destroy
        @
    ] make-memory-bitmap
    BGRA >>component-order ; inline

This combinator makes the Cairo binding very pleasant to use. Once Factor's image library supports some more formats -- PNG comes to mind -- it will be possible to render some graphics in a web app and serve it up as a PNG to the client. Of course Cairo can write PNGs already, so alternatively someone could write a high-level wrapper for those APIs instead.

So the Pango text backend makes Pango calls inside a make-bitmap-image, then passes the result to the OpenGL texture caching code I talked about last time.

Fixed-point and floating-point numbers

Pango uses fixed-point arithmetic for sub-pixel co-ordinates. I convert everything to and from floats when calling Pango using a pair of words:

CONSTANT: PANGO_SCALE 1024

: pango>float ( n -- x ) PANGO_SCALE /f ; inline
: float>pango ( x -- n ) PANGO_SCALE * >integer ; inline

For example, here is some code to convert a fixed-point PangoRectangle into a Factor rect type:

C-STRUCT: PangoRectangle
    { "int" "x" }
    { "int" "y" }
    { "int" "width" }
    { "int" "height" } ;

: PangoRectangle>rect ( PangoRectangle -- rect )
    [ [ PangoRectangle-x pango>float ] [ PangoRectangle-y pango>float ] bi 2array ]
    [ [ PangoRectangle-width pango>float ] [ PangoRectangle-height pango>float ] bi 2array ] bi
    <rect> ;

UTF8 strings and string offsets

The documentation for a few functions, such as pango_layout_line_x_to_index(), uses the following phrase: "the byte index for the grapheme in which the user clicked." What they really mean is the index of the first code point comprising the grapheme the user clicked, inside the UTF8-encoded string that you passed in when creating the layout.

This works well in languages such as C and Ruby 1.8 where a string is just an array of UTF8-encoded bytes, but in Factor, strings are an abstract sequence of Unicode code points with the actual internal encoding hidden from the developer. The Factor FFI takes care of encoding and decoding strings for you, but in this case, I want to convert code point indices to byte indices and this requires some work.

Consider a Unicode string such as "παν語". This string has 4 Unicode code points, so in Factor, it is a sequence of length 4. However, the UTF8 encoding of this string has 9 bytes:

π	2 bytes
α	2 bytes
ν	2 bytes
語	3 bytes

So for example, byte index 4 corresponds to string index 2. To convert from a string index to a byte index, it suffices to take the head of the string, and UTF8 encode it, and look and the byte length of the result. Going in the other direction requires building a mapping from byte indices to string indices and doing a binary search on it. Here is the Factor code:

: code-point-length ( n -- x )
    log2 {
        { [ dup 0 7 between? ] [ 1 ] }
        { [ dup 8 11 between? ] [ 2 ] }
        { [ dup 12 16 between? ] [ 3 ] }
        { [ dup 17 21 between? ] [ 4 ] }
    } cond nip ;

: code-point-offsets ( string -- indices )
    0 [ code-point-length + ] accumulate swap suffix ;

: utf8-index> ( n string -- n' )
    code-point-offsets natural-search drop ;

: >utf8-index ( n string -- n' )
    code-point-offsets nth ;

Note that code-point-length could've been written as 1string utf8 encode length, but this implementation is slightly more efficient.

Font descriptions

Pango has a pretty elaborate set of APIs for working with fonts. At this point, I only need the bare minimum, so I'm working with the PangoFontDescription type. Font descriptions are allocated and deallocated using a pair of functions:

FUNCTION: PangoFontDescription*
pango_font_description_new ( ) ;

FUNCTION: void
pango_font_description_free ( PangoFontDescription* desc ) ;

DESTRUCTOR: pango_font_description_free

Unlike Core Text, which uses Apple's Core Foundation for memory management and thus every object is reference counted with CFRetain() and CFRelease() functions used to manage the reference count, Pango takes a more ad-hoc approach. Some objects are reference counted using GLib's gobject support, others have their own refcounting, and others still just have a deallocation function without a reference count.

Here is some code to create a PangoFontDescription from a Factor font instance:

MEMO: (cache-font-description) ( font -- description )
    [
        [ pango_font_description_new |pango_font_description_free ] dip {
            [ name>> pango_font_description_set_family ]
            [ size>> float>pango pango_font_description_set_size ]
            [ bold?>> PANGO_WEIGHT_BOLD PANGO_WEIGHT_NORMAL ? pango_font_description_set_weight ]
            [ italic?>> PANGO_STYLE_ITALIC PANGO_STYLE_NORMAL ? pango_font_description_set_style ]
            [ drop ]
        } 2cleave
    ] with-destructors ;

: cache-font-description ( font -- description )
    strip-font-colors (cache-font-description) ;

Some things to note about the above code:

I'm using memoization to cache font descriptions
I use an on-error destructor (|foo) to ensure that the font description is deallocated if anything inside the block fails
Since Factor font instances store foreground and background colors, whereas Pango font descriptions do not, I don't want to pollute the cache with lots of duplicates where only the keys differ in color, so I strip the colors out first.
Just like Core Text, Pango offers a lot more functionality than Factor's font objects can represent. I will look into exposing more of this functionality in a cross-platform manner over time.

Unlike Core Text, where font objects can be interrogated for metrics information, Pango's support for font metrics seems pretty simple in comparison; only getting the ascent and descent is supported, and the values reported here don't seem to match the text as it is actually rendered on-screen. Because of this, I only ever get metrics information from text layouts, and not from the fonts themselves, so creating font descriptions is the only thing I need to do here.

Layouts

Just as with Core Text, Pango has a data type representing a set of glyphs which have been laid out and positioned; you never draw strings directly. This recognizes the fact that glyph selection when rendering Unicode is a complex process and should be done ahead of time. Whereas in Core Text, the fundamental data type for this is the CTLine, Pango calls it a PangoLayout. Unlike Core Text, a PangoLayout can contain multiple lines of text, so it is more like a CTParagraph than a CTLine, but Factor's UI only renders single lines at a time so far.

The Pango backend uses a tuple to store the Pango layout, a bitmap rendering, and several related values:

TUPLE: layout font string layout metrics ink-rect logical-rect image disposed ;

The first step is fill in the layout slot, which stores a pointer to a PangoLayout.

Pango layouts can be created by calling pango_layout_new(), however when using the Cairo backend it is better to use pango_cairo_create_layout() instead. Once a layout has been created, you can set the text to render, and the font to render it with, by calling pango_layout_set_text() and pango_layout_set_font_description().

FUNCTION: PangoLayout*
pango_cairo_create_layout ( cairo_t* cr ) ;

FUNCTION: void
pango_layout_set_text ( PangoLayout* layout, char* text, int length ) ;

FUNCTION: void
pango_layout_set_font_description ( PangoLayout* layout, PangoFontDescription* desc ) ;

Unfortunately, pango_layout_set_text() takes a null-terminated UTF8 string, so to avoid problems if the user accidentally passes in a string containing embedded nulls, I substitute all nulls for a zero-width non-breaking space.

Here is the Factor code to create a Pango layout and set the text and font:

: set-layout-font ( font layout -- )
    swap cache-font-description pango_layout_set_font_description ;

: set-layout-text ( str layout -- )
    #! Replace nulls with something else since Pango uses null-terminated
    #! strings
    swap
    dup selection? [ string>> ] when
    { { 0 CHAR: zero-width-no-break-space } } substitute
    -1 pango_layout_set_text ;

: <PangoLayout> ( text font -- layout )
    dummy-cairo pango_cairo_create_layout |g_object_unref
    [ set-layout-font ] keep
    [ set-layout-text ] keep ;

Font metrics

Last time I blogged about how Core Text makes a distinction between image bounds and metric bounds. A similar situation exists in Pango; they're called ink and logical bounds. To get both bounds from a layout, use the pango_layout_get_extents() function.

FUNCTION: void
pango_layout_get_extents ( PangoLayout* layout, PangoRectangle* ink_rect, PangoRectangle* logical_rect ) ;

I showed the code to convert a PangoRectangle into a Factor rect above.

To compute the ascent and descent, it suffices to compute one of them, since their sum is the height of the logical bounds. Since a Pango layout can contain multiple lines, getting a baseline involves getting an iterator, which iterates over lines, first.

FUNCTION: PangoLayoutIter*
pango_layout_get_iter ( PangoLayout* layout ) ;

Now, the baseline of the first line can be obtained from the iterator:

FUNCTION: int
pango_layout_iter_get_baseline ( PangoLayoutIter* iter ) ;

Iterators must be deallocated using a special function.

FUNCTION: void
pango_layout_iter_free ( PangoLayoutIter* iter ) ;

Here is the Factor code that puts it all together:

: layout-baseline ( layout -- baseline )
    pango_layout_get_iter &pango_layout_iter_free
    pango_layout_iter_get_baseline
    pango>float ;

Since I only ever create single-line layouts, the descent is easy to compute; it is the height of the logical bounds minus the ascent.

The other two relevant metrics are the x-height and cap-height metrics that I talked about in an earlier post. Pango doesn't provide a way to get these from the font itself, so I just measure them by hand:

: glyph-height ( font string -- y )
    swap <PangoLayout> &g_object_unref layout-extents drop dim>> second ;

MEMO: missing-font-metrics ( font -- metrics )
    #! Pango doesn't provide x-height and cap-height but Core Text does, so we
    #! simulate them on Pango.
    [
        [ metrics new ] dip
        cache-font-description
        [ "x" glyph-height >>x-height ]
        [ "Y" glyph-height >>cap-height ] bi
    ] with-destructors ;

Rendering a layout

There are several steps involved in rendering text to a bitmap:

Filling the background with the background color
Filling the selection rectangle, if one is set
Setting the foreground color
Setting the text position
Rendering the text layout

The only interesting part is setting the text position. In Core Text, text is drawn with the intersection of the baseline and the caret at the origin. Pango works differently here; the origin is the top-left corner of the text's logical bounds. Here is a screenshot illustrating the ink and logical bounds:

So we want to position the text so that the upper-left corner is at the origin relative to the logical bounds. Here is the Factor code to subtract the logical origin from the image origin:

: text-position ( layout -- loc )
    [ logical-rect>> ] [ ink-rect>> ] bi [ loc>> ] bi@ v- ;

: set-text-position ( cr loc -- )
    first2 cairo_move_to ;

Conclusion

Implementing Unicode text rendering via Core Text and Pango (and putting a cross-platform abstraction layer on top) was a fair amount of work, but now I've achieved one of my big goals for Factor 1.0, Unicode text rendering. There are still a few missing details before the Factor UI is truly Unicode-perfect, but this is an important step. Along with Daniel Ehrenberg's Unicode vocabulary and the large number of I/O encodings supported, Factor's Unicode support is better than most other languages, especially non-mainstream ones, and that is quite an accomplishment for a small team and a few years of work.

Factor: a practical stack language