Factor: a practical stack language: January 2008

Thursday, January 31, 2008

Some things I've been working on

I've been busy lately. Here are my current projects:

I implemented bit-vectors, byte-vectors and float-vectors, the resizable analogues of bit-arrays, byte-arrays and float-arrays. This nicely rounds out Factor's sequences library; previously the only resizable types were strings and vectors; now every built-in array-like type has a resizable analogue. The literal syntax is simple:
```
?V{ t f t } ! bit vector
BV{ 1 2 3 } ! byte vector
FV{ 1.0 2.0 3.0 } ! float vector
```
This is of course derived from V{ } for vectors, and ?{ } B{ } F{ } for bit, byte and float arrays, respectively.

If you want to learn more, just look at the online help.
File system change notification: I have this working on Windows. When I implement it on Linux and Mac OS X I will blog about the API. This is really neat stuff, integrates with the I/O multiplexer loop so that waiting for notifications doesn't block all of Factor.
New representation for strings. I'm just starting this, but it shouldn't take too long. Right now, Factor strings are sequences of 16-bit characters, which is neither here nor there. It uses more memory than ASCII strings, but cannot represent every Unicode code point, since Unicode 5.0 requires 21 bits now. Totally brain-dead. The new representation is quite clever, and comes from Larceny Scheme. The idea is that strings are ASCII strings, but have an extra slot pointing to an 'auxiliary vector'. If no auxiliary vector is set, the nth character of the string is just the nth byte. If an auxiliary vector is set, then the nth character has the nth byte as the least significant 8 bits, and the most significant 13 bits come from the nth double-byte in the auxiliary vector. Storing a non-ASCII character into the string creates an auxiliary vector if necessary. This reduces space usage for ASCII strings, it can represent every Unicode code point, and for strings with high code points in them, it still uses less space than the other alternative, UTF-32.

Generic resource disposal

One approach is the Java approach. It is pretty much the worst:

Resource r = acquire(...);

try {
    doSomething(r);
} finally {
    r.dispose();
}

This is bad because it leads to code duplication, and forgetting to pair acquisition with disposal, or forgetting to do it with try/finally, can result in resource leaks which are hard to track down.

Idiomatic C++ combines resource acquisition with memory management, so you have something like:

void foo(...) {
    Resource r;

    doSomething(&r);

    /* r's dtor called here */
}

This is somewhat cleaner than the Java approach but it depends on value types and deterministic memory management, which is not a trait that most high-level languages share.

Languages with first-class functions such as Lisp, Haskell and Factor take another approach, which is my personal favorite; you pass a block of code to a higher order function, which encapsulates the try/finally boilerplate for you:

[ do-something ] with-stream

This approach works very well; it composes naturally for acquiring multiple resources at once, and there is no boilerplate on the caller side.

However, with our growing library we're also growing the set of disposable resources. Right now, we have:

Streams
Memory mapped files
File system change monitors
Database connections
Database queries

With more on the way. This creates a problem because each with-foo word does essentially the same thing:

: with-foo ( foo quot -- )
    over [ close-foo ] curry [ ] cleanup ; inline

There were some variations on the theme, but it was all essentially the same.

Instead of having different words to close each type of resource, it makes sense to have a single word which does it. So now the continuations vocabulary has two new words, dispose and with-disposal. The dispose word is generic, and it replaces stream-close, close-mapped-file, close-monitor, cleanup-statement (databases), etc. The with-disposal word is the cleanup combinator.

The with-stream word is there, it binds stdio in addition to doing the cleanup. However it is simpler because it now uses with-disposal.

Simple and sweet, and now all the boilerplate is gone, on the side of the caller as well as the implementation side.

Friday, January 25, 2008

Improvements to io.launcher

Factor's integration with the host operating system just keeps on getting better. I already blogged about Factor's awesome process launcher library and today it got even better.

First of all, run-process and run-detached return an instance of process. These objects can be passed to wait-for-process which waits for the process to exit and returns the exit code.

Unlike before, waiting for a process to exit does not block the entire Factor VM, instead process exit notifications are hooked into the I/O event queue provided by our Windows and Unix "native I/O" support.

Second, while we already have flexible input/output redirection in the form of <process-stream>, redirecting input or output to a file was a bit bothersome; you had to start a thread which would read from the file and write to the pipe, or vice versa. Now, there's a new feature for redirecting input and output directly to files:

H{
    { +command+ "ls /etc" }
    { +stdout+ "listing.txt" }
} run-process

Similarly, you can provide +stderr+ or +stdin+ keys. The value +closed+ may also be given.

Of course, if you have a program which needs to munge the output of a process in some way, or send it commands generated on the fly, you can still use a pipe, but for the common use-case of redirecting to files, having direct support is not only easier on the programmer but more efficient.

Finally, you can mix the above redirection features with <process-stream>. For example,

H{
    { +command+ "sort" }
    { +stdin+ "unsorted-data.txt" }
} <process-stream> lines

This starts the Unix "sort" command; it will take standard input from unsorted-data.txt, and standard output will be sent to your Factor process via a pipe. The lines word reads lines of text from the pipe until EOF.

As usual, both new features work and have been tested on both Unix and Windows. To me, it is important that Factor is not a second-class citizen on Windows.

Cool stuff!

Wednesday, January 23, 2008

What's up with non-blocking I/O APIs?

On Linux, epoll() works with ttys, pipes and sockets, but not files.

On Mac OS X, kqueue() works with sockets but not ttys.

On Windows, GetQueuedCompletionStatus() works with everything except for the console.

I think it is sad that non-blocking I/O APIs are typically added as afterthoughts. Instead, operating system I/O APIs should be designed around the notion of asynchronous events; blocking streams can be done in user-space with coroutines or continuations.

However what's done is done and we have to support today's OSes. On Mac OS X, the main reason I'm looking at kqueue() is to support non-blocking process exit notification as well as file system change notification. So I can add the kqueue file descriptor to the select() set and use select() for everything else.

On Linux, I'm not sure there's any point using epoll(). Perhaps I can throw sockets there and use select() for everything else, but this may just end up being more complexity than is necessary.

Anyway, support for the more advanced non-blocking I/O APIs available on Mac OS X and Linux was one of my big to do list items for Factor 1.0. Not only did it take less effort to implement than I thought, but the payoff doesn't seem so great either. At least we can now implement file change notification on OS X.

Saturday, January 19, 2008

Factor talk at Ruby.mn

While I had to cancel my Factor presentation at CUSEC due to lack of time, I will be giving a talk about Factor at ruby.mn, the "Ruby users of Minnesota" user group. Recently they've had a number of talks about non-Ruby languages.

The talk is at 7pm, Monday 28th January, 2626 Hennepin Avenue South, Minneapolis, MN 55408. Come along if you're in the area.

Monday, January 14, 2008

Defended my thesis

My thesis defense was today. It went well and my thesis was accepted, which means I now officially have a Master's degree in Mathematics. You can read my thesis online, it is titled On the cohomology of nilpotent Lie algebras.

Wednesday, January 09, 2008

Multi-methods in Factor

Factor 0.92 now has an experimental implementation of multi-methods in extra/multi-methods. I want to emphasize that for the time being, the generic word system in the core is still there, and the new multi-methods are potentially buggy, slow, and not integrated with the rest of the system very well. This will all change in 0.93, when they will be moved into the core and will replace existing single-dispatch generic words.

You load it just like any other module:

USE: multi-methods

The multi-methods vocabulary provides several new words, some of which have the same names as existing words in the syntax vocabulary. (If you wish to use the built-in generic words and the new multiple dispatch generics in the same source file, you will either need to use qualified names or play games with USE:.)

To define a new multiple dispatch generic word, we use GENERIC::

GENERIC: beats?

Now we define some data types:

MIXIN: thing

TUPLE: paper ;    INSTANCE: paper thing
TUPLE: scissors ; INSTANCE: scissors thing
TUPLE: rock ;     INSTANCE: rock thing

Now we can define some methods:

METHOD: beats? { paper scissors } t ;
METHOD: beats? { scissors rock } t ;
METHOD: beats? { rock paper } t ;
METHOD: beats? { thing thing } f ;

Note the METHOD: syntax. Unlike M:, the generic word comes first, then class specializers are listed in an array. The old syntax

M: foo bar ... ;

corresponds to

METHOD: bar { foo } ... ;

Now that we have the rules for our little game defined, lets write a utility word:

: play ( obj1 obj2 -- ? ) beats? 2nip ;

We drop the two objects from the stack, since the method bodies leave them there.

Now we can play:

T{ paper } T{ rock } play .
  f

The GENERIC# word is no longer necessary. Previously, if you wanted a generic dispatching on the second stack element, you'd have something like

GENERIC# foo 1 ( obj str -- )

M: sequence foo ... ;

M: assoc foo ... ;

Now this falls out naturally as a special case of multi-method dispatch:

GENERIC: foo ( obj str -- )

METHOD: foo { sequence object } ... ;

METHOD: foo { assoc object } ... ;

Hooks (generic dispatching on a variable value) are still there, and are more general because they can now dispatch on the stack as well.

I'm looking forward to replacing the following hand-coded double dispatch:

HOOK: (client) io-backend ( addrspec -- stream )

GENERIC: <client> ( addrspec -- stream )

M: array <client> [ (client) ] attempt-all ;

M: object <client> (client) ;

M: unix-io (client) ( addrspec -- stream ) ... Unix-specific code ... ;

M: windows-io (client) ( addrspec -- stream ) ... Windows-specific code ... ;

With this:

HOOK: <client> io-backend ( addrspec -- stream )

METHOD: <client> { array object } [  ] attempt-all ;

METHOD: <client> { object unix-io } ... Unix-specific code ... ;

METHOD: <client> { object windows-io } ... Windows-specific code ... ;

Other instances where multiple dispatch will be very appropriate in the core is equal?. The following

M: array equal?
    over array? [ sequence= ] [ 2drop f ] if ;

Would become

METHOD: equal? { array array } sequence= ;

Also, the compiler has hand-coded multiple dispatch that I'd like to replace with real multi-methods in a few places.

Finally, in a few places I do type checking on input arguments, to catch errors early, before ill-typed objects are placed in global data structures; for example,

TUPLE: check-create name vocab ;

: check-create ( name vocab -- name vocab )
    2dup [ string? ] both? [
        \ check-create construct-boa throw
    ] unless ;

: create ( name vocab -- word )
    check-create 2dup lookup
    dup [ 2nip ] [ drop  dup reveal ] if ;

This could be expressed as

GENERIC: create ( name vocab -- word )

METHOD: create { string string }
    2dup lookup
    dup [ 2nip ] [ drop  dup reveal ] if ;

Once I do some more reading and figure out how to implement multi-methods efficiently in Factor, they can go in the core. I'm going to release 0.92 first, though. I'm really looking forward to this; having full multi-method dispatch would be a real milestone for Factor.

Mailing list archive problems, again

SourceForge is so incredibly lame, has never turned a profit, and is staffed by smelly PHP-smoking GPL hippies who have never been laid. Their mailing list archive is broken, and just as it has many times in the past, it's starting to lag by a few weeks (what happened to 2008 guys?):
http://sourceforge.net/mailarchive/forum.php?forum_name=factor-talk
I have updated all links on the Factor web site to point to gmane, which actually works properly and contains all recent posts:
http://dir.gmane.org/gmane.comp.lang.factor.general
Very soon now I will move the Factor-talk list to my linode box, and Factor will finally be free of the crappyness that is SourceForge.

I've had so many issues with SourceForge in the past, it's not funny.

You'd think writing a mailing list archive is easy, but if you work at SourceForge, something as simple as counting downloads is hard. Nothing would make me happier than seeing SourceForge go out of business, or get acquired and buried, along with all 166,397 craptastic Loonix open sores GPL projects hosted there. In fact, if I ever make 100 million dollars the first thing I will do is acquire VA Linux and shut down SourceForge, and burn all their disks and backups.

Tuesday, January 08, 2008

Compiler overhaul

I'm back from vacation, and I've overhauled how the Factor compiler recompiles changed definitions. The code is in my git repo but it is not entirely stable. Proceed with caution.

This post is extremely technical and touches on Factor internals. You certainly don't have to understand Factor's low-level workings to use it. But, if you understand the implementation, you will have a better idea of how to write great code -- just like with any other language.

First, some background and history.

The very first iterations of Factor were purely interpreted. The interpreter was a very simple loop, it would traverse quotations, pushing literals and executing words. Words would be executed by recursively invoking the interpreter on a word's "definition" slot, which itself was a quotation.

This is a "late binding" interpreter, because words could be redefined, which updated their definition slot, and successive invocations of that word would immediately pick up the definition.

The following code was valid in early Factor:

: foo 3 ;

: bar [ 2 ] \ foo set-word-def  foo ;

The bar word would return 2 on the stack, because after it redefined foo, the new definition took effect immediately.

Pretty early on, Factor got a native (x86 only, at first) compiler. I decided that having all word calls be indirect in compiled code would be too inefficient, and would negate many of the benefits of having a compiler, so I went with an early binding design.

When a word was compiled, all words that it called were compiled first; and a call to a word was compiled as a direct jump to the address of that code's definition in the code heap.

This hindered interactive development; reloading a source file would replace definitions, but they'd only take effect for subsequently compiled words, not those words using these definitions which have already been compiled. Haskell's "ghci" works this way, and they consider it a feature; I consider it a bug.

I quickly realized that there was an easy way to fix this issue, at least temporarily. I added cross-referencing support (thus giving us the usage word that is so useful when refactoring code). When a word was redefined, all words that called that word -- either directly or indirectly -- had their compiled definitions discarded, and reverted back to interpreter operation.

This meant that the following still didn't work as expected; it would return 3:

: foo 3 ;

: bar [ 2 ] \ foo set-word-def  foo ;

This is because while set-word-def would discard the compiled definition of bar, after returning, the activation frame on the call stack still ,pointed inside the old compiled definition of bar, and this old compiled definition of bar jumped to the old compiled definition of foo, which pushed 3 on the stack. However, subsequent invocations of bar would call the interpreted definition, which was up to date.

This approach was better, but it had an obvious flaw. While all words were compiled on bootstrap, if you did a lot of heavy development, especially on core code, more and more words would revert to interpreted mode over time. So sometimes, you'd have to manually call compile-all to recompile everything.

When generic words came along, there was a new complication. Suppose you loaded a source file which added methods on sequence generic words such as nth, length, or other generic words such as equal? which are frequently called. To ensure correct behavior, and that the new definitions were used, Factor would discard compiled definitions of all words which called nth (or whatever); and all words which called words which called nth, and so on; since the majority of the words that one would write can trace a call path down to sequence words, this meant that the majority of words in the system would magically lose their compiled definitions and subsequently would run in the interpreter. So at one point, a frequently reported "problem" was that loading some libraries, such as "contrib/math" at the time, would slow down factor tremendously. The "solution" we gave newbies was to call compile-all after loading libraries; compiling all words, of course, takes a while.

Some time around Factor 0.84, I came up with an improved solution. When a word was redefined, the compiler would find all words which depended on that word, but instead of discarding their compiled definition, it would just flag them for recompilation. At a later time, the recompile word would recompile all words flagged for recompilation. The parser and module system were designed in a clever way so that recompile was called after loading a source file or module.

This approach worked well for a long time, but it still had two issues: first, long compile pauses, and second, it was possible to observe out-of-date definitions, just like the original compiler that didn't care about preserving the illusion of late binding, and unlike the subsequent design which decompiled all dependents.

The first problem, long compile pauses, was an issue when loading new sequence implementations, new assoc implementations, classes which defined new equality and hashing methods, and so on. The majority of the words in the system would have to be recompiled, which is not something you want to subject the developer to, since the compiler is designed to trade off compile time for quality of generated code.

The second issue is more subtle, and is a consequence of the first. When loading a module containing multiple source files, the compiler did not recompile changed words after each file, since this might end up recompiling many words multiple times. Instead, it postponed recompilation until after the module was done loading.

However, this would lead to problems if you had a scenario such as the following: a module that consists of two source files, A and B. A defines a new sequence class, and B has a top-level form which creates an instance of this class, and calls, say, append on it. Since append (which depends on nth was not recompiled until after B was loaded, the top level form would fail with a no-method error, suggesting that nth did not have a method for your new sequence class when it clearly did. This was counter-intuitive.

In Factor 0.91, I removed the interpreter, and replaced it with a non-optimizing compiler. The non-optimizing compiler generated code which looked like an unrolled interpreter loop; it still called words through an indirect jump. I started referring to "the compiler" (the expensive one, written in Factor) as the "optimizing compiler". But for all intents and purposes, the recompilation system worked the same way as before.

So, that ends the history lesson. And now, you can forget about all of that. If you look at all these years of hacks and unsatisfactory workarounds, there was a constant underlying theme: if a word was redefined and recompiled, existing jumps to the word's old compiled definition were not updated. This was the fundamental underlying limitation.

Well, now, this limitation is gone. Instead of hacking and working around it, I decided to solve this issue properly, once and for all. It took longer than expected, and the overhaul touched many different parts of Factor, but there were only minimal incompatible changes (but they're important).

Number one: non-optimizing compiler is now early binding

First up, the non-optimizing compiler now compiles direct jumps to words. This improves performance by 33-50%. Since compiled word jumps can now be updated on the fly, there is no longer any reason for the non-optimizing compiler to generate indirect jumps.

Number two: word definition is no longer ad-hoc; formalized compilation units

Generally, you can no longer just redefine words on a whim. While it was unusual for typical Factor code to define new words at run time, we frequently write parsing words which define new words, using many of the same idioms as Forth programming.

If you try doing the following in the listener, you get an error:

( scratchpad ) DEFER: my-word
( scratchpad ) \ my-word [ "Hello world" print ] define
Defining a word outside of a compilation unit
no-compilation-unit-word my-word

However, the following works of course:

( scratchpad ) : my-word "Hello world" print ;

And anyone who has persued the source of : knows that it just reads ahead from the parse stream until ;, and calls define. So what's going on here? The parser wraps the parsing of your source file in a complation unit. A compilation unit is basically a dynamic scope where you can define new words, replace existing word definitions, etc. The kicker is, the new definitions don't take effect until the compilation unit returns. When it returns, all new and changed words are recompiled, and references to the existing changed words in the code heap are updated in one "atomic transaction".

So, ordinary code, 99% of which does not define new words at run time, does not have to worry about compilation units at all. Neither do parsing words; the parser handles it all for you. However, if you decide to define new words at run time -- occasionally, we have code that does this -- you need to wrap your defining inside a with-compilation-unit. Try to put as much as possible into one compilation unit, since compilation units are relatively expensive; eg, if you need to define 1000 words in a loop, put the loop inside a compilation unit, don't create a compilation unit for each word.

The documentation for with-compilation-unit gives details about how it operates.

Number three: words defined in a compilation unit cannot be called until after the compilation unit

Again, this is something ordinary code does not ever need to deal with. However, if you write parsing words, it is something to keep in mind.

Suppose you have this source file:

: hello "Hello world" print ;

hello 2 2 + .

This is a perfectly fine source file. The file is parsed -- inside a new compilation unit -- which creates a word definition hello, and a top-level quotation [ hello 2 2 + . ]. The parser ends the compilation unit, which compiles the word hello, and updates any references to it from the code heap (if there are any). The parser then calls the top level form, which outputs

Hello world
4

This works fine, because the compilation unit ends -- and the new word is compiled -- before the top level form is run.

However, consider the following example:

: hello scan reverse string>number parsed ; parsing

hello 123 .

In Factor 0.91, this would print 321. However, as of Factor 0.92, this is no longer a valid Factor program. Because the compilation unit does not end until after the file is parsed, the hello word has not been compiled yet at the time the parser sees it; and so we get an error:

A parsing word cannot be used in the same file it is defined in.
staging-violation-word POSTPONE: hello

I don't think this is a major limitation. There aren't many libraries where it even comes up. There are two ways to redesign code to get around it:

Put the parsing words in a separate vocabulary/module.
If you define a parsing word that is only called once, replace the parsing word with a parse-time evaluation. This idiom is used with alien bindings:
```
: load-foo-library
    "foo" "libfoo.so" "cdecl" add-library ; parsing

load-foo-library

LIBRARY: foo
FUNCTION: int blah ( float x ) ;
...
```
The add-library call has to happen at parse time, before the compilation unit ends and compiles bindings, because compiling bindings looks up symbols in the library. However, now in Factor 0.92 the above code doesn't work because load-foo-library is being called before it is compiled. The workaround is to use parse-time evaluation, which results in clearer code because we're no longer defining a parsing word to only call once:
```
<< "foo" "libfoo.so" "cdecl" add-library >>

LIBRARY: foo
FUNCTION: int blah ( float x ) ;
...
```
Forth fans might notice that Factor's << and >> is almost exactly like Forth's [ and ].

Number four: all words are now compound words

This is really just an implementation issue that most code won't come in contact with. The main change is that define-compound is now called define, and the old low-level define is gone.

Of course, we still have SYMBOL:, GENERIC:, and so on, and see correctly shows defining words. So what exactly changed? How do all these word types map to compound words?

Symbols are easy. The following two are behaviorally equivalent (and have been for all of eternity):

SYMBOL: foo
: foo \ foo ;

It is just that with the old late-binding interpreter and non-optimizing compilers, the former was implemented in a more efficient way in the Factor VM. With the early binding non-optimizing compiler, there is no longer any difference, the VM support is gone, and SYMBOL: just defines a compound word which pushes itself.

To detect whether a word is a symbol, we use a predicate class:

PREDICATE: word symbol
    dup  1array swap word-def sequence= ;

Deferred words throw an error. The following are equivalent now:

DEFER: foo
: foo undefined-error ;

Primitives are a bit trickier. If you see a primitive, it looks just like it used to:

( scratchpad ) \ dup see
IN: kernel
PRIMITIVE: dup ( x -- x x )

But if you ask for the underlying definition with word-def,

( scratchpad ) \ dup word-def .
[ 70 do-primitive ]

If we look at do-primitive,

( scratchpad ) \ do-primitive see
USING: kernel ;
IN: kernel.private
: do-primitive ( number -- ) "Improper primitive call" throw ;

It just throws an error! So what's up? Well, do-primitive is not a word you're ever supposed to call. The non-optimizing compiler special cases it; it must be in tail position, and it must be preceded by a literal fixnum. In this case, instead of compiling a call to do-primitive, it compiles a direct jump to that primitive.

So the latter is somewhat of a hack, but having all word definitions be quotations has simplified so much code, and the only real complication is a minor special-case in the non-optimizing compiler; it already special-cases if and dispatch, and it is designed in a clean way, so the code is not any less maintainable.

So, that about sums it up. Factor now gives the illusion of late binding, and fully interactive development, while both compilers actually perform early binding and do their best to generate efficient code. Unlike before, this illusion is almost 100% accurate, and compile time has reduced too.

Factor: a practical stack language