Factor: a practical stack language: April 2009

Saturday, April 18, 2009

Recent UI improvements

Over the last few months I've done a lot of work on Factor's UI toolkit, and the developer tools built on top. Here is an overview of what's changed.

Code cleanup

I've been working on the UI since 2005, and it has accumulated a lot of cruft since then. Language features have come and gone and not enough refactoring was done over time. Last year I rewrote the HTTP server and compiler using the latest language features and the result was very clean code. The UI is now written in the same style.

Improved font support

I've blogged about this:

All UI gadgets that render text now use the appropriate platform-specific mechanism for rendering text. The ui.text vocabulary provides the cross-platform abstraction layer. The fonts vocabulary defines data types for fonts and font metrics. This replaces ad-hoc loosely-typed "font specifiers" that the UI used formerly, where a font was just a triple; { "sans-serif" plain 12 } for instance. Font metrics are now supported in a cross-platform manner too. All relevant gadgets now do baseline alignment.

Image support

The UI now supports an easy API for displaying images. The ui.images vocabulary defines some words which can be called from your gadget's draw-gadget* method. The ui.gadgets.icons vocabulary defines a simple gadget that renders an image and nothing else. This is all built on top of the images vocabulary, which implements BMP and TIFF loaders in pure Factor. Hopefully we'll get PNG and JPEG at some point too.

Editor gadget improvements

The ui.gadgets.editors vocabulary supports some new features:

Undo and redo (Control-z, Control-Shift-z)
Join lines (Control-j)
Character and word navigation now uses unicode.breaks

Table gadget replaces list gadget

The ui.gadgets.tables vocabulary implements a new gadget which is a replacement for the old list gadget. It's functionality is a superset of what lists offered. Tables are used extensively by the new developer tools to display completion popups and such. Unlike lists, which would create a gadget for every row of data, tables do not have any children and render rows directly. This reduces memory consumption. Tables take the list of rows from an underlying model, and place the currently selected row in another model. The latter model can then be wrapped in an arrow model to compute a new list of rows, which can then be the model for another table. This way, multiple tables can be chained together for a 'drill down navigation'-style of interaction with very little code.

Scroller gadget improvements

Scroller gadgets, defined in the ui.gadgets.scrollers vocabulary, now provide a protocol that their children can implement. This protocol has two generic words:

viewport-pref-dim - this allows the child to specify the preferred dimensions of the scroller surrounding the gadget. Tables and editors implement this generic word, allowing you to set various slots, such as min-rows and max-rows, which control the size of the scroller surrounding the gadget in character units.
viewport-column-header - constructs a gadget which is displayed at the top of the scroller. This allows the table gadget to display column titles which are always visible, regardless of the vertical scroll position.

Frame layout improvements

The ui.gadgets.frames vocabulary is more flexible now. Instead of limiting frame layouts to a 3x3 grid with the center item stretched to fill available space, the grid can be of any size, and an arbitrary cell can be designated to take up available space.

Models API changes

The models vocabulary has undergone some cleanups. To avoid confusion with sequence functionality, filter models are now called arrow models. Compose models are now called product models, since really they represent a cartesian product of functions, not a composition of anything at all. The new models.arrow.smart vocabulary wrap an arrow around a product whose arity is the arity of the arrow's quotation.

Integration between event loop and I/O

Until Factor supports real native threads, any FFI call blocks all Factor threads until the call completes. I/O is done with non-blocking APIs on Unix (epoll, kqueue) and Windows (IO completion ports), so socket operations can proceed concurrently. However, until recently, the UI would have to poll for events because of this, because blocking calls to wait for events would prevent I/O and other Factor threads from running. This meant that even if you weren't doing anything in the Factor UI, it would use a little bit of CPU - 5% to 25%. This problem has been fixed. On Mac OS X, I use the CFFileDescriptor API to wait on a kqueue and GUI events at the same time; you can provide a callback which is invoked when file descriptors are ready for I/O, and call [NSApplication run] as usual. On X11, I use XConnectionNumber() to get a file descriptor out of a Display and call wait-for-fd to wait for events to arrive while also allowing other Factor threads to run. This solves the CPU usage problem and allows Factor threads to run concurrently with the event loop waiting for events.

Improvements to tooling

The main focus of my recent UI work, however, has been on improving UI developer tools. I plan on making a screencast to highlight the new improvements soon.

Wednesday, April 08, 2009

OpenGL textures and the power-of-two size restriction

Prior to version 2.0, the OpenGL specification required that texture dimensions be powers of two. This simplifies the implementation of texture mapping, because converting floating point texture co-ordinates (which are in the range 0..1) to texel coordinates is trivial; multiplying a floating point number by a power of two is essentially just adding a number to the exponent. Eventually, more capable drivers and graphics cards came along, and introduced the ability to use non-power-of-two texture dimensions. To signal this capability to GL applications, they report supporting the GL_ARGB_texture_non_power_of_two extension. OpenGL 2.0 implementations are required to support this extension.

In practice, the only major OpenGL implementations which don't provide this extension are older X11 drivers, and the Microsoft Windows software renderer, which is a very bare-bones OpenGL 1.1 implementation.

There is a trick for padding textures up to a power of two for implementations which don't support this extension, however it doesn't seem to work everywhere either. Instead of manipulating the bitmap in software before passing it onto a call to glTexImage2D(), it is permissible as of OpenGL 1.1 to pass a bitmap pointer of NULL. This creates a texture with uninitialized content. The glTexSubImage2D() function is used to fill it portions of the new texture. In particular, glTexSubImage2D() places no restriction on the width and height, even if GL_ARGB_texture_non_power_of_two is not supported.

The above trick works with the Windows software renderer. On the other hand, previous-generation MacBooks with Intel graphics suffer from a driver bug which results in artifacts appearing when this feature is used to render scaled textures. However, all OpenGL implementations in recent Mac OS X releases support non-power-of-2 textures, so on this platform, the workaround can be avoided entirely anyway.

In Factor, the opengl.capabilities vocabulary provides some utility words to check for extensions. For example, a common operation is checking for either a specific OpenGL version, or an extension (new versions of the GL spec frequently absorb existing extensions):

"2.0" { "GL_ARB_texture_non_power_of_two" } has-gl-version-or-extensions?

The gl-extensions word outputs a sequence of all supported extensions. Here is the output from the Mesa software renderer on Linux.

I replaced my old code in opengl.textures which padded bitmap image objects out to powers of two using sequence manipulation words, with the new way using texture sub-images instead. If the extension is present, no padding is performed, ensuring correct behavior on Mac OS X. This means that any code using opengl.textures, such as the UI's text rendering and image support, should now spend less CPU time running Factor code.

Factor's OpenGL binding has been in development for 4 years and has seen contribution from 4 developers. For a demo of what it can do, try "spheres" run in the UI listener. You will need a video driver that supports OpenGL 2.0 or GL_ARB_shader_objects.

Tuesday, April 07, 2009

Rendering text on Windows via Uniscribe

My original goal was to use Pango to render Unicode text in the Factor UI on Windows. This didn't pan out, for a number of reasons:

Pango, Cairo and all of their dependencies add up to around 7Mb of DLLs that we'd need to ship with every Factor binary, as well as every standalone app binary deployed from Factor. That's too much when a 'Hello world' compiles down to a 500kb image.
Pango has various bugs on 64-bit Windows.
Pango doesn't play well with Microsoft's ClearType, and anti-aliased text is clipped for some reason.

I'm sure these problems will get fixed eventually (except for the first one perhaps) and Pango is open source so I could always send a patch, but I'd rather spend my time working on interesting things instead, so I've decided to bypass Pango on Windows altogether and use Microsoft's native Uniscribe API instead. Uniscribe ships as a standard part of Windows XP (actually it's been around since IE 5.0) and so Factor can depend on it being installed. On X11, I continue to use Pango; most *nix systems have at least part of the GNOME platform installed, so Pango and Cairo will be there, and I haven't seen any rendering issues in Pango with X11 either.

I'm using the Uniscribe script string API, which is intended to be used for laying out and rendering a piece of text with a single font and color. It is directly analogous to CTLine in Core Text and PangoLayout in Pango.

The function to create a script string, ScriptStringAnalyze, takes a device context handle as a parameter. The device context must be provided if the string is going to be rendered or measured.

Before creating the string, the font and text color are set in the device content. To set the font, I look up a font handle with CreateFont, and pass it to the SelectObject() function. There's an important caveat with CreateFont(); if you want your font size to be specified in points (rather than pixels), you must pass a negative size. I noticed that 12-point font was rendered way too small; changing the 12 to a -12 fixed the problem, so now the Factor UI does this for you on Windows.

The text color is set by calling SetTextColor, and the background color (which is only rendered if a flag is passed to ScriptStringOut(); see below) with SetBkColor.

For both fonts and colors, the Uniscribe text rendering uses Factor's cross-platform font and color types.

Size measurement is done by calling ScriptString_pSize(). Unlike Core Text and Pango, Windows Uniscribe makes no distinction between metric and ink bounds. There is a provision for oversize glyphs, such as those in the Zapfino font (see my blog post on ink and metric bounds), where each glyph has an associated ABC metrics structure. I'll implement support for this later.

When rendering to an offscreen context that is intended to be used as an OpenGL texture, as in the Factor UI, a bit of a chicken-and-egg problem occurs, because we need to know the size of the resulting text before we can create a bitmap. Fortunately, Windows GDI separates the process of creating a memory (off-screen) DC and allocating the bitmap storage for it, into two functions, CreateCompatibleDC() and CreateDIBSection(). The trick is to create a DC, create a script string, get its size, then allocate the bitmap for the DC, and finally render the string.

Script strings can be rendered to their underlying DC with the ScriptStringOut function. This function takes a number of parameters. For instance, it can render the text selection for you.

Font metrics -- the ascent, descent, and leading -- can be obtained by calling GetTextMetrics() on the DC.

Once the text has been rendered into a memory DC, the underlying bitmap needs to be obtained and the graphics object handles freed. This is the third time I implement a similar-looking make-bitmap-image combinator, so by now I've developed some utilities that I've abstracted out into the images.memory vocabulary. The bitmap is cached as a texture in the same way on all platforms; I've discussed OpenGL texture caching before.

Converting between x co-ordinates and line offsets, and vice versa, can be done with ScriptStringXToCP() and ScriptStringCpToX(). Watch out for the fact that passing the length of the string to ScriptStringCpToX() is invalid; if you want the X-offset of the trailing edge of the last code point, you have to pass length-1 instead, with the fTrailing parameter TRUE.

Finally, script strings are freed by calling ScriptStringFree(). When binding to the API from another language, watch out for the type of the input parameter; it's a pointer to a SCRIPT_STRING_ANALYSIS type, which is itself a pointer type. So you're passing a pointer to a pointer to a struct! I got the type wrong in my FFI declaration the first time around and was getting segfaults when deallocating stuff. Very annoying.

The code implementing this is split between four vocabularies:

windows.offscreen - support code for creating GDI memory DCs. Originally written by Joe Groff for another purpose; I've generalized it and put it in a common location so that it can be used by both the Uniscribe code and offscreen gadget rendering.
windows.fonts - looking up GDI font handles from Factor font descriptions
windows.uniscribe - the bulk of the code; creating, rendering script strings, converting x co-ordinates to line offsets and vice versa
ui.text.uniscribe - a backend for the UI's cross-platform text rendering support. Just a thin shim over the preceding vocabularies.

Here is what it looks like when all is said and done:

Wednesday, April 01, 2009

Sup dawg, we heard you like Smalltalk so we put Smalltalk in your Factor so you can send messages while you roll

A week ago, I went to the PyCon VM summit in Chicago. It was great fun and I got to meet various Internet celebrities such as John Rose (Sun JVM), Evan Phoenix (Rubinius), Charles Nutter (JRuby), Avi Bryant (Seaside), and Allison Randal (Parrot).

Factor's own Daniel Ehrenberg and Doug Coleman were also there, and I've heard rumors that the world-famous C++ programmer Joe Groff may have tagged along for the ride.

Every VM implementor gave a short 10 minute talk. Mine was an overview of the optimizations and implementation techniques used in Factor; you can look at the slides in a recent Factor by running "chicago-talk" run in the listener. A common theme at the summit was the distinction between platforms versus languages. The JVM, Parrot and PyPy people are all vying for language designers to implement their languages on top of these VMs. The idea is that the platform developers take care of the "hard" stuff such as the optimizing compiler, garbage collector, and libraries, whereas the language designer feeds a grammar into a parser generator, and bingo, instant programming language.

While discussing Smalltalk minutiae with Dan and Avi Bryant, we hit upon the idea of trying to compile Smalltalk down to Factor. After all, Factor already has everything you'd want in a platform for hosting languages. We have a great optimizing compiler, a generational garbage collector, C FFI, a ton of libraries, and so on; in theory we've already done most of the hard work for someone who wants to make a language without dealing with low-level details. So while we've written countless DSLs in Factor since the very beginning, why not try implementing a more complete language?

That's exactly what I did, and the code is in the repository under extra/smalltalk. Run it by issuing the following command in the Factor listener:

"smalltalk.listener" run

Then try a Smalltalk expression such as 1 to: 10 do: [:i|i print]., or look at extra/smalltalk/eval/fib.st for a more complete example. Incidentally, this Smalltalk fib runs much slower than a Factor fib, because of Smalltalk return semantics; I discuss the problem below as well as my planned solution to eliminate the overhead.

In theory, there is a simple mapping from Smalltalk to Factor:

A Smalltalk class can be a Factor tuple class. Since ivars are pre-declared, the performance complications that Ruby and Python implementations must deal with are avoided.
Every Smalltalk selector can map into a Factor generic word; all selectors will be defined in a single vocabulary.
Every Smalltalk method can correspond to a Factor method on the selector's generic word, specializing on the class in question.
Smalltalk local variables can be translated to Factor's locals. Again, Smalltalk has pretty static scoping semantics, and the fact that locals are pre-declared maps very well onto Factor's model.
Smalltalk blocks can map onto Factor quotations.
Non-local returns can be implemented using continuations.

I threw together a simple compiler on the car ride back from Chicago to Minneapolis (no, I wasn't the one driving). The compiler takes a Smalltalk AST as input and produces a Factor quotation. The AST is pretty simple:

SINGLETONS: nil self super ;

TUPLE: ast-comment { string string } ;
TUPLE: ast-block { arguments array } { temporaries array } { body array } ;
TUPLE: ast-message-send receiver { selector string } { arguments array } ;
TUPLE: ast-message { selector string } { arguments array } ;
TUPLE: ast-cascade receiver { messages array } ;
TUPLE: ast-name { name string } ;
TUPLE: ast-return value ;
TUPLE: ast-assignment { name ast-name } value ;
TUPLE: ast-local-variables { names array } ;
TUPLE: ast-method { name string } { body ast-block } ;
TUPLE: ast-class { name string } { superclass string } { ivars array } { methods array } ;
TUPLE: ast-foreign { class string } { name string } ;
TUPLE: ast-sequence { temporaries array } { body array } ;

The parser took more effort. I used Chris Double's PEG library to implement it, and followed the grammar from Smalltalk in one page as a guide. The grammar required some restructuring to parse with PEGs because of how the "ordered choice" works. Also sorting out some more obscure details such as message cacades took some experimentation with Squeak and back-and-forth on the PEG grammar. Contrary to popular belief, while Smalltalk has a relatively simple syntax, but its not as simple as Lisp or Forth.

Here is the finished PEG parser for Smalltalk. Most of the file is written in the PEG EBNF DSL (acronym soup!). There are bits of Factor here and there to construct ASTs.

One place I had to improvise was the syntax to define classes and methods. In traditional Smalltalk 80, there's no real concrete syntax for these definitions at all, since they're entered as distinct entities in the graphical browser. I invented a silly syntax which doesn't fit very well with the rest of the grammar; I might change it later.

Once I had the parser and compiler going, I threw together a simple real/eval/print loop and did more testing. Basic features work and there's a small set of built-in Smalltalk selectors mapped onto Factor words. What remains to be done? Quite a lot:

Class methods and metaclasses.
Adding methods to existing Smalltalk classes.
Implementing super.
More built-in selectors need to be provided.

All of the first three are easy to map to Factor features, and the last one is just busy-work.

The one language feature that doesn't translate so well is Smalltalk's non-local return. I convert usages of this feature into continuations, which are slow relative to how non-local returns are implemented in production Smalltalk VMs. Usages of continuations which are downward-only need to be optimized. A speedup here would make Factor's exception handling faster too, so it would be very valuable overall.

Because Smalltalk selectors are Factor generic words, and Smalltalk classes are Factor classes, Factor's optimizer can work its magic on Smalltalk code, which is nice. I have plans to speed up Factor's downward continuations and implement inline caching for Factor generic words, and this will directly benefit the Smalltalk implementation too.

Since this implementation is just a demo, I consider it mostly done, but I'll keep hacking on it from time to time. I don't intend to turn it into a serious Smalltalk implementation, but it would be great if someone else took over and did so.

It weighs in at only 1217 lines of code, and that includes 423 lines of unit tests. I'm not sure exactly how much time I spent on it, but I'm guessing around 24 hours in total over the last week, and the bulk of that was tweaking the parser to parse various odd syntax. In particular, the AST to Factor compiler was very easy to write.

So the moral of the story is that implementation-wise, Factor is a modern VM which has a lot of the features that scripting language people wish they had, and plan on implementing sometime next year. It is also very easy to target languages to Factor by parsing them with our PEG library and compiling the AST down to Factor quotations which are handed off to the optimizing compiler. If you want to experiment with language design but don't want to spend five years implementing a language like we did, you should consider using an existing system to host your language on, and Factor is one of several great choices in this regard.

Factor: a practical stack language