Thursday, February 28, 2008

Improvements to io.files

Most programming language libraries tend to have pretty weak support for filesystem manipulation compared to what you can find in even the most basic Unix commands.

For example, consider a command like cp (I'll be talking about the GNU variant here). It has three behaviors:
  • You can give it a source file and destination file. It copies it, preserving all attributes if you pass the -a switch.
  • You can give it a source file and a destination directory which already exists. The file is copied into a new file having the same name but inside the destination directory.
  • You can give it a list of source files and a destination directory that must already exist. In this case all the files are copied to the destination directory.

Furthermore, if you use the -r switch you get recursive directory copying.

All the traditional Unix commands, like mv, rm, and so on, support a very rich range of behaviors.

My goal was to recreate this using Factor words. As such, I've beefed up Factor's io.files vocabulary considerably. Unlike the shell, where one command can be overloaded to do various things in a "do what I mean" fashion, programming languages should be somewhat more explicit, in my opinion. So what I've done is implemented the equivalent of various Unix commands where each command maps to several words.

First up is deleting files. Factor already had delete-file and delete-directory for empty directories. What was missing was recursive deletion, so I added delete-tree.

Next up is moving files. Factor had a rename-file word but it could also "rename" it to a different directory, so I changed its name to move-file. I added a move-file-to word which takes a destination directory:
: move-file-to ( from to -- ) over file-name path+ move-file ;

Also I added a move-files-to word which takes a sequence of filenames.

For copying, I already had copy-file. I added copy-file-to and copy-files-to, together with a recursive copy-tree, copy-tree-to, and copy-trees-to.

I'm still thinking about unifying at least some of these words. For example, instead of move-file-to and move-files-to we could have a generic move-to:
GENERIC# move-to 1 ( from to -- )

M: string move-to over file-name path+ move-file ;

M: sequence move-to [ move-to ] curry each ;

Another thing I'm going to add soon is support for globs when deleting, moving and copying files. To do this I just need to add directory globbing to the glob library, which is pretty trivial.

The glob library could add a third method to the above generic:
M: glob move-to >r file-glob r> move-to ;

Once all of this is done, it should be pretty easy for someone to write a portable file manager using the above features. And don't forget portable directory change notification! Factor's I/O capabilities are looking pretty interesting these days indeed...

Friday, February 22, 2008

Improved alarms library

Until now, Factor has had several means of scheduling recurring tasks and timeouts:
  • You could just spawn a thread and call sleep
  • Recurring UI timers, with 10ms resolution
  • One-off alarms, with 100ms resolution
  • I/O timeouts system with 5 second resolution

None of these were satisfactory. The thread approach is fine for the most part, except there was no way to interrupt a sleeping thread. The other two approaches had low resolution because they relied on a thread which would poll some kind of timer queue for events.

Alarms


I unified all these approaches with a new alarms vocabulary which does not rely on polling, instead an alarm thread is woken up when alarms are added. To achieve this, I added a new word to the threads vocabulary named nap. It is like sleep except other threads can interrupt the sleep with interrupt; the nap word returns a boolean indicating whether it was interrupted or not.

The alarm thread naps until the next alarm is set to go off, however if a new timer is added in the mean time which would go off sooner, the thread is interrupted. This means it no longer has to poll regularly, and also increases accuracy.

The alarm API is straightforward:
  • add-alarm ( quot time frequency -- alarm ) adds an alarm which first fires at time and then every frequency time units thereafter. The timestamp and time deltas are the timestamp and dt types defined in the calendar vocabulary; no need to mess around with integers represeting milliseconds.
  • cancel-alarm ( alarm -- ) cancels a pending alarm.
  • later ( quot delay -- alarm ) runs a quotation once, a fixed delay from now. Code which calls it looks almost like English:
    [ "Hello world" print ] 5 minutes later

Alarms are stored in a min-heap for efficiency, and the cancel-alarm word uses a new feature of heaps, namely the ability to remove an entry in O(log n) time.

Strongly-typed timeouts


Until now the set-timeout word for streams, as well as timeouts for the process launcher took integers denoting time counts in milliseconds. That is so 1970's. Instead the new style is strongly-typed time units from the calendar library:
1 minutes stdio get set-timeout

The sleep word is generic, and if calendar is loaded it has methods which accept time units as well:
5 minutes sleep
30 seconds sleep
10 milliseconds sleep

Timeouts for locks, semaphores, promises, futures, message sends


The concurrency.messaging library for Erlang-style message passing finally supports timeouts (strongly-typed, as above). The other concurrency abstractions do as well. All of this is implemented on top of the same underlying framework which in turn sits on top of alarms, which in turn use nap.

Wednesday, February 20, 2008

Factor crash with gcc 4.3.0

If you are running the following version of gcc on Linux/x86, Factor might crash at the beginning of stage2 bootstrap:
gcc (Debian 4.3-20080202-1) 4.3.0 20080202 (experimental)

This seems to be a gcc bug. There is a workaround, however; it was discovered by jamesvnc of #concatenative. Just pass the following argument to make:
make SITE_CFLAGS=-fno-forward-propagate

Tuesday, February 19, 2008

Some changes to threads

For the last two days I've been working on Factor's threading system. First the bad news: it is still a co-operative threading system that doesn't use multiple cores. However the changes were pretty extensive.

I noticed a memory leak in the UI: sometimes opening and closing lots of windows didn't release memory. I tracked it down to a "feature" of Factor's threads: when in-thread is called, the new thread starts with a copy of the current thread's data stack and name stack. This is wrong; consider the following code:
: do-something ( x -- )
[ ... ] in-thread process-x ;

: another-word ( a x -- b )
do-something blah ;

Here, do-something is called while a happens to be on the stack, and the new thread starts with this value on the stack too. However, do-something's stack effect does not include a; it doesn't care about a, and if a is large and the thread outlives the dynamic extent of the call to another-word, then we've potentially leaked memory.

Another limitation of Factor's threads I've been annoyed with in the past is that there's no way to get a list of running threads: threads were just continuations, and threads waiting on some condition were just some random continuation stashed away somewhere.

A final issue was that Chris Double''s message-passing concurrency library needed to associate a mailbox with every thread, and he was doing this in an ad-hoc way, storing the mailbox in a dynamically-scoped variable when spawning the thread. However the problem with this approach is that dynamic scope is not exactly thread local scope: if a continuation is reified in one thread and resumed in another, it inherits all dynamic bindings, whereas in this case you would not want it to inherit the mailbox.

So with these three problems in mind, I set out to make some changes to Factor's thread system.

Spawning threads


Threads are now a first-class data type and runnable threads are registered in a global hashtable which can be printed by the threads. word.

To spawn a thread, it is now recommended that you use the following word:
: spawn ( quot name -- thread )

It now takes a name for debug purposes, and outputs the thread on the stack. The new thread starts with an empty data stack and a name stack containing the global namespace only; if you want to pass data to the thread, you must arrange for this by partially applying the data to the quotation using curry or compose.

The in-thread word is still there and has the old behavior: the new thread gets a copy of the data and name stacks, and nothing remains on the stack. It is useful for quick tests and such, but new code should use spawn instead.

Dynamic, lexical and thread-local variables


As I've mentioned, new threads no longer inherit the name stack when started with spawn (to get the old behavior, use in-thread). That is, the following will print 49:
SYMBOL: x

49 x set-global
63 x [
[ x get . ] "Test" spawn drop
] with-variable

This behavior is in line with a number of Common Lisp implementations that decided not to have threads inherit dynamic bindings for similar reasons (potential hard to find memory leaks) as well as issues unique to Common Lisp (dynamic variables can be stack-allocated if dynamic extent optimizations are used).

Note that threads still respect lexical scope just as one would expect. Here is a word which makes a closure that closes over the parameter to the word (n) as well as a binding established by [let:
:: make-thread | n |
[let | x 49 |
[ [ n x + . ] "Test" spawn ]
] ;

We can make a closure and call it; it will print 69:
20 make-thread call

This required no extra work to implement correctly; the locals vocabulary already implements correct closure semantics for all combinators.

Thread-local variables are new. I haven't found a use for them yet, but they were trivial to implement now that threads are first-class.

The tget, tset words get and set thread-local variable values, just like get and set for dynamic variables.

Message-passing concurrency


There are essentially no changes to this other than the fact that the concurrency vocabulary is now named concurrency.messaging, and the spawn word has been moved into the core threads vocabulary.

Chris Double's channels work unchanged.

Promises and futures


These have been split out and moved into concurrency.promises and concurrency.futures, respectively.

Locks


Message-passing concurrency and channels are great but sometimes you just need something simpler. I implemented some new abstractions, mostly by taking ideas from Java's java.util.concurrent package.

First up are locks, found in concurrency.locks. These come in two forms, non-reentrant and reentrant. A combinator is used to acquire and release the lock, ensuring that lock operations are paired:
SYMBOL: my-lock

<lock> my-lock set

...

my-lock get [ ... ] with-lock

Reentrant locks can be acquired recursively by a thread already holding the lock; otherwise they are the same. They are created by <reentrant-lock>.

Read/write locks


Read/write locks implement the pattern where you want multiple reader threads to have access to a shared resource, but only one writer thread to have access at a time, and only if no threads are reading.

Read/write locks are created with <rw-lock>, and acquired/released with with-read-lock and with-write-lock. Read/read, Write/read and write/write reentrancy is supported.

Count-down latches


A count-down latches, found in concurrency.count-downs, begin with some initial value; threads can either wait for its value to reach zero, or decrement its value (never past zero). They are craeted with <count-down>, decremented with count-down and can be waited on with await.

Count-down latches are used to implement a parallel-each combinator. This combinator takes a sequence and a quotation, spawns a thread for each element, and waits for them all to finish. It does this by creating a count-down latch that each thread decrements as it finishes; after spawning all threads, the caller waits for the latch to reach zero.

Exchangers


Exchangers are implemented in concurrency.exchangers. An exchanger is a synchronization point between two threads: a thread comes to the exchange point with an object, and waits for another thread to do the same. Then the first thread receives the second thread's object, and vice versa.

Exchangers are created by calling <exchanger> and objects are exchanged with exchange. Their implementation is delightfully simple thanks to boxes:
TUPLE: exchanger thread object ;

: <exchanger> ( -- exchanger )
<box> <box> exchanger construct-boa ;

: exchange ( obj exchanger -- newobj )
dup exchanger-thread box-full? [
dup exchanger-object box>
>r exchanger-thread box> resume-with r>
] [
[ exchanger-object >box ] keep
[ exchanger-thread >box ] curry "exchange" suspend
] if ;

Counting semaphores


Counting semaphores are in the concurrency.semaphores vocabulary. Semaphores are created by passing an initial non-zero value to <semaphore>. A thread can either decrement a semaphore with acquire, or increment one with release: if it is already zero when being decremented, it waits for another thread to increment it first.

Unlike locks, acquire/release does not need to be paired and can even be done in different threads: however for pairing, a with-semaphore combinator is provided which decrements the semaphore, runs a quotation, then increments it again.

Here is an example which uses parallel-each to spawn a series of jobs, the jobs proceed mostly in parallel however a semaphore is used to ensure that one particular section is not run by more than 10 threads at a time, perhaps because it spawns an external process which uses a lot of memory and we do not wish to hit the swap:
: do-stuff ( value -- ) ... ;

: do-expensive-task ( value -- ) ... ;

: do-more-stuff ( value -- ) ... ;

: do-job ( value semaphore -- )
over do-stuff
over [ do-expensive-stuff ] curry with-semaphore
do-more-stuff ;

: do-jobs ( values -- )
10 <semaphore> [ do-job ] curry parallel-each ;

Boxes

Today I was looking over some code and noticed a repetitive pattern involving tuple slots that would come up again and again:
  • One word would check if the slot was set; if not, it would throw an error, otherwise it would return the slot's value and set the slot's value to f.
  • Another word would check if a slot was set, and if so, throw an error, otherwise store the top of the stack in the slot.

I decided to abstract this out into a new "box" data type. A box can hold a single value; storing a value into a full box, or taking a value out of an empty box is an error, and taking a value out empties the box. Now the idea is that you set a tuple slot to a new box once in the constructor, and then instead of reading and writing the tuple slot you read and write the box.

I've already found several uses for this in the core/ as well as some of my libraries in extra/. I've not seen the box type in other object-oriented languages. It seems like an obvious abstraction; maybe it is considered too simple by some, but its nice to encapsulate this logic in one place. Factoring it out also let me make the logic more sophisticated: it can distinguish between a box holding f and an empty box. Most individual usages didn't bother. When Factor moves to native threading boxes will be even more valuable since boxes will be atomic.

Sunday, February 17, 2008

Some rudimentary control flow analysis

Factor's optimizer is now at the stage where further progress will require a re-design. I'm fine with that, since the current design has really lasted a lot longer than I expected -- the basic structure has remained the same since early 2004 when I started working on the native compiler, and in many ways it resembles the old Java Factor implementation's JVM compiler.

Over the last few days I picked off some of the remaining low-hanging fruit.

Hoisting code after conditionals where one branch throws an error


The first optimization concerns conditionals where one branch unconditionally throws an error. Consider the following code:
: foo [ A ] [ B throw ] if C ;

This is in fact equivalent to:
: foo [ A C ] [ B throw ] if ;

And this is precisely what the new optimization does.

Loop detection and loop tail hoisting


Another new optimization is something I like to call "loop detection". Consider a trivial Factor iteration expression:
: foo 100 [ bar ] times 3 baz ;

The times combinator is inlined:
: foo 100 [ bar ] [ drop ] swap compose each-integer 3 baz ;

The each-integer word is also inlined:
: foo 100 [ bar ] [ drop ] swap compose iterate-prep (each-integer) 3 baz ;

Now (each-integer) is also declared as inline, however it is recursive, so what this means in practice is that the compiler creates a new "generated symbol" containing a definition of (each-integer) which is specialized for that call site. Proceeding with our example, we get:
: G12342 ( i n quot -- )
[ iterate-step iterate-next G12342 ]
[ 3drop ] if-iterate? ; inline

: foo 100 [ bar ] [ drop ] swap compose iterate-prep G12342 ;

After some more inline expansion,
: G12342
[ swap >r 2dup >r >r call r> r> r> swap >r >r 1+ r> r> G12342 ]
[ 3drop ] >r >r 2over < r> r> if ; inline

: foo 100 [ drop bar ] 0 -rot G12342 3 baz ;

Now, the compiler sees that call is being applied to a literal quotation, so its contents can just be copied to the call site:
: G12342
[ swap >r 2dup >r >r drop drop bar r> r> r> swap >r >r 1+ r> r> G12342 ]
[ 3drop ] >r >r 2over < r> r> if ; inline

: foo 100 [ drop bar ] 0 -rot G12342 3 baz ;

Also, if is being applied to two literal quotations, so they can be hoisted up to their call site:
: G12342
[ swap >r 2dup >r >r drop drop bar r> r> r> swap >r >r 1+ r> r> G12342 ]
[ 3drop ] >r >r 2over < r> r> 2drop
[ swap >r 2dup >r >r drop drop bar r> r> r> swap >r >r 1+ r> r> G12342 ]
[ 3drop ] if ; inline

: foo 100 [ drop bar ] 0 -rot G12342 3 baz ;

Now, the quotations [ drop bar ], [ swap >r 2dup >r >r drop drop bar r> r> r> swap >r >r 1+ r> r> G12342 ] and [ 3drop ] are dead literals, so they are removed, and all stack shuffle words which they travel through are collapsed:
: G12342
2dup <
[ >r >r bar r> 1+ r> G12342 ]
[ 2drop ] if ; inline

: foo 100 0 swap G12342 3 baz ;

This is where previous Factor releases would stop performing control and data flow optimizations. The specialized loop body (G12342 in this example) would be compiled as its own word in the code heap, and foo would call this word. This entails some overhead in setting up and tearing down activation frames, and so on.

In the general case, this is all you can do: for example, consider the case of a binary-recursive combinator that calls itself in non-tail position; the combinator body needs its own activation frames. However, for tail recursive loops, we can do better. Since the loop only calls itself in tail position, it can share the activation frame of foo itself; that is, conceptually, we can compile it to the following:
: G12342
2dup <
[ >r >r bar r> 1+ r> G12342 ]
[ 2drop ] if ; inline

: foo
100 0 swap
G12342: ! a label
2dup <
[ >r >r bar r> 1+ r> goto: G12342 ] ! goto is not a real Factor word...
[ 2drop ] if
3 baz ;

After performing this type of inlining we need to distinguish the recursive call to G12342 from a normal call, since now it must be a tail call even though it is not in tail position!

Now with general loops, this is the best we can do. However in the above case, we can apply an optimization much like the first one described in this blog post; since only one branch of that conditional ever returns, we can hoist the code after that conditional into the returning branch:
: foo
100 0 swap
G12342: ! a label
2dup <
[ >r >r bar r> 1+ r> goto: G12342 ] ! goto is not a real Factor word...
[ 2drop 3 baz ] if ;

An example of a tail recursion where this optimization cannot be used:
: example-1
dup string? [ do-something ] [
dup integer? [ do-something-else ] [ do-another-thing example-1 ] if
] if ;

: example-2
example-1 xyz ;

Here, the code at xyz would have to be cloned twice in order to be hoisted up into the non-recursive branches above, and in general, branch splitting can lead to exponential code growth so I avoid this particular optimization.

Loop detection and nested loops


This optimization works on nested loops; for example, if you have
: foo [ [ drop ] each ] each ;

Then it optimizes down to:
: G:144972
pick pick <
[ >r >r 1+ r> r> G:144972 ]
[ 3drop r> r> r> swap >r >r 1+ r> r> G:144955 ]
if ;

: G:144955
pick pick < [
swap >r 2dup >r >r
nth-unsafe dup length swap 0 -rot G:144972
] [ 3drop ] if ;

: foo dup length swap 0 -rot G:144955 ;

Note the mutually-recursive loops that appear due to tail hoisting; also note that it appears as if the conditionals leave the retain stack unbalanced! While the compiler enforces retain stack balancing for user code, internally it is free to do whatever crazy stuff it wants as long as the result gives the same output as the user's code. Of course because of loop detection, both mutually-recursive loops share foo's activation frame:
: foo
dup length swap 0 -rot
G:144955: ! a label
pick pick < [
swap >r 2dup >r >r
nth-unsafe dup length swap 0 -rot
G:144972: ! a label
pick pick <
[ >r >r 1+ r> r> goto: G:144972 ]
[ 3drop r> r> r> swap >r >r 1+ r> r> goto: G:144955 ]
if
] [ 3drop ] if ;

Future directions


Now loop detection works quite well and the implementation is elegant however it is really quite rudimentary. I'd like to extend this to more general control flow analysis, and for this I will need to extend the compiler's intermediate representation into something more sophisticated.

For example, consider the following code:
dup bar? [ foo? ] [ drop f ] if [ A ] [ B ] if ;

Right now it is compiled down to something like this:
dup bar?
if-true-goto: 1
drop f
goto: 2
1: foo?
2: if-true-goto: 3
B
goto: 4
3: A

However if the second branch of the first conditional is taken, then clearly we don't have to test the top of the stack again because we already know it is f. So I'd like to compile the this code as follows:
dup bar?
if-true-goto: 1
drop
goto: 5
1: foo?
2: if-true-goto: 3
5: B
goto: 4
3: A
4: ...

Another thing I need to do very soon is register allocation across basic blocks. Right now the compiler uses registers to store values inside a basic block (a run of code without any subroutine calls; these can get rather long because of inlining, open-coded intrinsics,and optimizations such as the two I implemented and described here). However, it does not attempt to allocate registers across branch merge points and in particular across loop iterations. That's pretty lame.

To be able to do all of this, I need an IR which represents control flow in a more natural way, allowing more sophisticated optimizations to be expressed concisely and efficiently. This will be my next step in advancing Factor's compiler.

Thursday, February 14, 2008

Invoking the gdb disassembler to disassemble words

Until now, I've had to go through a rather laborious process to look at the machine code generated by the Factor compiler. I'd ask Factor for the address of a word with word-xt, then attach a gdb instance to Factor, then run disassemble and look at the output. This wastes time, especially when I'm debugging major changes (like I am now). So I've cooked up a quick hack to avoid having to do this in the future.

The new code is in tools.disassembler. Note that for now, it only works on Unix. If I figure out how to get the current cygwin process ID from Factor (Factor doesn't link against cygwin.dll) then I can make it work with cygwin gdb too.

The source begins with the usual boilerplate:
USING: io.files io words alien kernel math.parser alien.syntax
io.launcher system assocs arrays sequences namespaces qualified
regexp ;
QUALIFIED: unix
IN: tools.disassembler

We qualify the unix vocab since it has a write word which clashes with io:write, and we want to call the latter.

We communicate with gdb using files:
: in-file "gdb-in.txt" resource-path ;

: out-file "gdb-out.txt" resource-path ;

We cannot use pipes since there is a race condition there; gdb suspends the process while disassembling, so if the pipe fills up, then gdb hangs because Factor cannot read from the pipe since it is suspended.

We have a word which takes a pair of addresses or a word, and creates a gdb command for disassembling this object in the current process, it then writes these commands to the input file:
GENERIC: make-disassemble-cmd ( obj -- )

M: word make-disassemble-cmd
word-xt 2array make-disassemble-cmd ;

M: pair make-disassemble-cmd
in-file [
"attach " write
unix:getpid number>string print

"disassemble " write
[ number>string write bl ] each
] with-file-out ;

Then we write a word to invoke gdb:
: run-gdb ( -- lines )
[
+closed+ +stdin+ set
out-file +stdout+ set
[ "gdb" , "-x" , in-file , "-batch" , ] { } make +arguments+ set
] { } make-assoc run-process drop
out-file file-lines ;

We pass gdb the path name to the file we just saved, together with some switches.

Note that we close stdin so that if gdb attempts to read commands, it gets an EOF instead of hanging. We also redirect the output to our output file. Then we read the output file and return the results.

Finally, a couple of words to clean up the output; we filter everything that's not a line of disassembly (gdb loading messages, etc), and we convert tabs to spaces since the Factor UI doesn't display tabs:
: relevant? ( line -- ? )
R/ 0x.*:.*/ matches? ;

: tabs>spaces ( str -- str' )
[ dup CHAR: \t = [ drop CHAR: \s ] when ] map ;

Finally, we have the actual word that calls the above:
: disassemble ( word -- )
make-disassemble-cmd run-gdb
[ relevant? ] subset [ tabs>spaces ] map [ print ] each ;

Fast and safe jump tables

Factor's case combinator provides efficient dispatch based on object equality:
{
{ "hello" [ ... ] }
{ 3 [ ... ] }
{ { 1 7 } [ ... ] }
...
} case

Now case is a macro, which means the compiler converts it to efficient code. If there are fewer than five objects, it expands into a series of conditionals, if there are more it becomes an open-coded hash lookup. I added a new heuristic today: if all the keys are integers and they form a contiguous sequence, a jump table is emitted. It handles the case where the keys are out of order, and where they do not begin at zero, however they do have to be contiguous otherwise it falls back to the slightly slower open-coded hash dispatch.

This allows a handful of modules such as the 8080 emulator to avoid using the unsafe dispatch combinator, and use case which is type safe (and it expands into dispatch in the case outlined above).

Here is an example from icfp2006, which implements a simple VM:
: run-op ( -- bool )
advance
{
{ 0 [ op0 ] }
{ 1 [ op1 ] }
{ 2 [ op2 ] }
{ 3 [ op3 ] }
{ 4 [ op4 ] }
{ 5 [ op5 ] }
{ 6 [ op6 ] }
{ 7 [ drop t ] }
{ 8 [ op8 ] }
{ 9 [ op9 ] }
{ 10 [ op10 ] }
{ 11 [ op11 ] }
{ 12 [ op12 ] }
{ 13 [ op13 ] }
} case ;

Here is what it expands to:
: run-op ( -- bool )
advance
0 13 3dup pick >= [ >= ] [ 2drop f ] if [
drop - >fixnum {
[ op0 ]
[ op1 ]
[ op2 ]
[ op3 ]
[ op4 ]
[ op5 ]
[ op6 ]
[ drop t ]
[ op8 ]
[ op9 ]
[ op10 ]
[ op11 ]
[ op12 ]
[ op13 ]
} dispatch
] [ 2drop no-case ] if ;

Of course, if you wanted to get rid of the extra verbosity with the key numbers, you could write a "safe-dispatch" macro, which adds them for you. But I would rather incorporate this functionality into case.

This is another example where macros (compile-time transforms) permit you to define a high-level abstraction which is converted into efficient code.

Tuesday, February 12, 2008

File system change monitoring on Mac OS X

I previously blogged about File system change monitoring on Windows and Linux. I have now implemented this feature on Mac OS X as well, using the new FSEventStream API in Leopard.

This API differs a bit from the ones found on Windows (ReadDirectoryChanges) and Linux (inotify) in three ways:
  • The main difference is that it is callback-based; instead of reading changes from a file descriptor, you register a callback which is invoked when directories change.
  • The second is that this callback runs as part of the event loop. This means that on Mac OS X, you can only file system change notification if you are running the UI. I might get around this restriction by always having a Core Foundation run loop going if either the UI is running or monitors are being used; for since monitors are mostly useful for developer and end-user tools, and not server-side software, this is not a high priority.
  • The third difference is that the change notifications are not as precise as the other platforms. They only tell you which directory changed, not which file, and furthermore there is no notification of what changed (file added, file removed, etc.)

Despite these differences I managed to implement a backend for the same io.monitors API without any difficulties. In particular, inverting the callback based approach and turning it into a stream of events was easy using continuations.

This is a milestone for Factor; file system change notification was one of my goals for Factor 1.0. Doug and I first discussed it at FactorCon summer 2007 and now its implemented. Sun is talking about adding this to Java 7 but we already have it.

Update: a number of people have pointed out that the core foundation run loop code is independent of the OS X WindowServer, and that it is possible to use run loops even in a command line application.

Monday, February 11, 2008

Another status update

New tuple slot accessors


For a while now Eduardo Cavazos has been lobbying for changes to tuple slot accessors. He wants slots to be read with name>> instead of tuple-name, and for slots to be written with >>name instead of set-tuple-name. He also wants the stack effect of the setter to be changed from ( value object -- ) to ( object value -- object ). While I liked the new names straight away, it took me a while to appreciate the new stack effects. Seeing how they could interact with the query by example feature of Doug Coleman's database library pushed me over the edge:
<person>
18 25 [a,b] >>age
"Minneasota" >>state
find-by-example

That looks pretty nice; having the setter preserve the tuple on the stack makes it easy to write code in this style, where an empty object is created then the slots filled in.

To experiment with the new slots, get the latest git, and USE: new-slots. This shadows the TUPLE: word with one defining the new accessors. They will become the default soon, however the old accessors will still be generated for as long as it takes developers to update code.

Process launcher timeout support


Sometimes you want to launch a process, but kill it if it takes too long to run. For example, Eduardo's continuous integration does this when running unit tests, because if a unit test hangs, then the build host needs to notify the developers of this.

To implement timeouts, I first refactored the I/O timeout code to be more general. The set-timeout word was moved from io to io.timeouts, and now it can be applied to process exits. If a wait-for-process call doesn't return before the timeout expires, an error is raised. It is also possible to set the timeout in the launch descriptor:
{
{ { +command+ "git pull" } }
{ { +timeout+ 30000 } }
} run-process

Automatic image download tool


I wrote two tools, bootstrap.image.upload and bootstrap.image.download. The first one is only for me to run, it generates and uploads new images together with a text file of their MD5 hashes. The MD5 hashes are not for security, but rather as an optimization: the download tool compares the MD5 of your image against the one on the server, and only if it differs does it download a new image. This is a handy stand-alone tool, but the main reason it exists is again for continuous integration: we want the build machines to grab new images but we don't want them to waste bandwidth over and over again if the images haven't changed.

First-class quotation compositions


While quotations are sequences and arbitrary sequence operations can be applied to them, the compiler cannot infer much information about your code if you modify quotations willy-nilly. So arbitrary quotation construction is something that we do at parse-time and compile-time, in parsing words and macros. At run-time, you still want to partially apply and compose quotations, so for a while now, we've had two light-weight run-time quotation operators, curry for partial application and compose for composition. They're lightweight in the sense that they run in constant time, and also the optimizing compiler optimizes them out altogether. The curry word was primitive and compose was implemented as follows:
: compose [ >r call r> call ] curry curry ;

This is nice, however if you prettyprint a composition it looks ugly:
[ 1 ] [ 2 + ] compose .
[ [ 1 ] [ 2 + ] >r call r> call ]

Instead I made compositions a first-class type of their own; they reference two quotations, and the sequence operations are implemented in the obvious way to make it look like one big quotation:
[ 1 ] [ 2 + ] compose .
[ 1 2 + ]

Another change is behind the scenes and doesn't affect the programmer. Previously curry was a built-in type hardwired into the VM. Now, it's just a tuple, and call is generic:
GENERIC: call

M: quotation call (call) ;

M: curry call dup curry-obj swap curry-quot call ;

M: compose call dup compose-first swap compose-second >r call r> call ;

The (call) word is the VM primitive for calling quotations; it basically compiles the quotation if it hasn't been compiled yet then jumps to the compiled definition.

This is interesting from a theoretical perspective, because it means Factor implements partial application and composition entirely in 'user space'. Since curry is the concatenative equivalent of closures, and Factor already implements lexical scope in the locals vocabulary, this means that in Factor, the whole concept of "closures" are entirely implemented in library code!

Binary reduce for associative operators


Daniel Ehrenberg noticed that there is a way to speed up our product and sum words for larger sequences; instead of scanning the sequence from left to right, accumulating a product or sum, we can split the sequence in two, compute the product of each half (splitting recursively and so on), then multiply the two results. The total algorithmic complexity of this is lower. So I've added Dan's split-reduce word to the core (now named binary-reduce) and changed sum and product to use it.

Friday, February 08, 2008

APIs designed around fixed-width integer types

I had a look at the NIO2 API proposed for Java 7. The functionality looks quite nice; in addition to non-blocking I/O and memory-mapped files already present since Java 1.4, they're adding enhanced file attribute support and access control lists.

However, one thing struck me: because the older Buffer class used for memory mappings was designed around having a length that is a Java int (which is fixed in stone as a 32-bit signed 2's complement integer), and this is clearly insufficient for many applications, they added a whole new BigBuffer class with a parallel hierarchy of subclasses to the Buffer hierarchy.

They ran into a similar problem with String; when Java was being designed, 16 bits was enough to represent all code points, but then Unicode 3.0 came along and required 21 bits, so now their string encoding is essentially variable-width, and applications have to deal with 'surrogate pairs' (most don't even bother!).

Fixed size integer operations are great to have in a language for performance reasons (however compilers should support idiom recognition, so that for example applying a bitwise and to an integer operation should result in fixed-width types being used under the hood), but designing APIs around fixed-width integers -- and not supporting overloaded operators for big integers in the language -- is just plain dumb.

Thursday, February 07, 2008

New logging framework

The current logging code used by the HTTP server and a few other vocabularies is too simplistic. There is too much boilerplate; you have to log messages by constructing strings and then calling log-message explicitly. Also, the log files eventually get quite large since there is no support for log rotation. Finally the format is quite ad-hoc and there are no tools for parsing them.

Today I wrote a new logging framework, aptly named extra/logging. Quite a lot of code for one day - 407 lines. The framework is quite powerful; the main features:
  • Log rotation prevents log files from getting too large
  • Machine-parsable log file format with an included parser which uses parser combinators.
  • Log analysis and reports, using said parser
  • Optional nightly log rotation and e-mailing of summaries

Since it is now in the git repository and fully documented (just do
"logging" require   "logging" about
) I won't bother describing it fully here, but suffice to say it is quite a nice framework and I'm looking forward to using it to run the Factor web site and receiving daily e-mails with hit counts.

The design is interesting; it uses Chris Double's Erlang-style concurrency library to implement a concurrent server which listens for messages that either instruct it to log a message or rotate the logs.

The user-visible words send a message to the server; users of the logging library never deal with message passing directly. The server handles opening and closing log streams; message passing here ensures that the server is a synchronization point is multiple threads can write to the same log file in an orderly fashion; it also ensures that log rotation can proceed with all logging temporarily suspended.

Incidentally, the SMTP library used for mailing log summaries was developed by Elie Chaftari and yesterday I refactored it a bit and added some features for Eduardo's upcoming continuous integration system, which will send e-mails when daily builds fail. Then I came up with the idea of having a message-passing logging library, and decided to implement it right away; the lack of log rotation is a pain on factorcode.org right now.

Wednesday, February 06, 2008

New syntax for ratios

Since the very start Factor has supported exact arithmetic on rationals, however the results weren't very readable:
123 321 * 987 / .
13161/329

Now there's a new syntax which factors out the whole part of the fraction:
123 321 * 987 / .
40+1/329

40 and one 329th is a lot easier to think about than 13161ths of 329. Of course this syntax is symmetric so the reader supports it as well.

Improvements to the unit test framework

In Factor, most unit tests compare the output of a word with a set of expected values:
[ 3 ] [ 42 36 - 2 / ] unit-test

The unit test fails if the returned values don't match, or if the quotation throws an error before returning.

However sometimes you want to ensure that a quotation does throws an error. The old way was to use unit-test-fails:
[ 3 "hi" + ] unit-test-fails

I renamed it to must-fail, which reads better:
[ 3 "hi" + ] must-fail

Sometimes, you want to assert that a specific type of error is thrown. The old idiom was to do the following:
[ t ] [ [ 3 "hi" + ] catch no-math-method? ] unit-test

However this is just boilerplate. There is a new must-fail-with word which encapsulates this idiom:
[ 3 "hi" + ] [ no-math-method? ] must-fail-with

Finally, because the overwhelming majority of the uses of the catch word were in unit tests, this word has now been removed. It has already most been phased out in favor of cleanup or recover, which are higher-level, and now that we have must-fail-with, there is no reason to keep catch around anymore.

The main problem with catch was that it would lead to ugly code; because it would either output the thrown error or f on success, it was almost always followed by a conditional.

The second problem was that it doesn't differentiate between no error and an error of f; while throwing f is bad style, it is not too far-fetched to imagine a scenario where buggy code throws f for some reason, and the error is subsequently ignored.

So, one less word, more boilerplate removed, it's business as usual in Factor-land.

Tuesday, February 05, 2008

A simple application using io.monitor: a tail -f replacement

The tail command on Unix shows the end of a file. If you pass it the -f switch it monitors the file for changes and emits any lines written to the end. This is frequently used to watch log files. Traditionally it was implemented by polling but on modern systems it uses file system change notification APIs.

Here is an example that Factor's file system change notification API. Note that I renamed the io.monitor vocabulary to io.monitors. This demo works on Linux and Windows.

We begin with the customary boilerplate:
USING: kernel io io.files io.monitors ;
IN: log-viewer

First, we need a word which reads lines from a stream until EOF and prints them. Note that the stream isn't closed here.
: read-lines ( stream -- )
dup stream-readln dup [ print read-lines ] [ 2drop ] if ;

Next, a recursive word which takes a stream and a monitor; it reads any lines until EOF, waits for the file to change, then calls itself again:
: tail-file-loop ( stream monitor -- )
over read-lines
dup next-change 2drop
tail-file-loop ;

Finally, a word which puts everything together. It takes a file name, opens it for reading, and creates a monitor on its parent directory:
: tail-file ( file -- )
dup <file-reader> swap parent-directory f <monitor>
tail-file-loop ;

Monday, February 04, 2008

File system change notification on Linux

So I mentioned a few days ago that I had a file system change monitor API going on Windows. Now I've implemented a Linux backend and will be looking at Mac OS X next.

The API is straightforward. First, you create a monitor, passing it a directory and whether the monitor should recurse to subdirectories or not -- non-recursive monitors are supported on Windows, but on Linux all monitors are recursive;
"/home/slava/factor/" t <monitor>

Monitors must be disposed by using dispose or with-disposal since they wrap operating system resources. They function essentially as streams; you read from a monitor, which blocks until some file in the directory has changed. The file's path name is returned along with an array of descriptors indicating what has changed:
dup next-change . .

A sample piece of output might be:
{ +modified-file+ }
"/home/slava/factor/test.txt"

That is all! If you need to continuously monitor a directory for changes, all you have to do is spawn a new thread with in-thread, then call next-change in a loop and take appropriate action.

On Linux and Windows, we use asynchronous notifications behind the hood, so the block API presented to the user is actually implemented with continuations and a low-level multiplexer behind the scenes, just like all non-blocking I/O in Factor.

I'm going to use the io.monitor API to speed up the vocabulary browser and refresh-all. Right now they have to scan the whole Factor source tree but instead they will monitor it for changes. Since not all OSes support change notification, and not all file systems on supported OSes support it either, there will be graceful fallbacks, of course.

I'm not aware of any other language which attempts to implement advanced I/O functionality in a cross-platform manner. Java's process launcher is limited compared to io.launcher, and it doesn't support a cross-platform file system notification API at all; Python, Ruby and Perl implement advanced features but present OS-specific APIs to the programmer. Furthermore Factor attempts to integrate as much as possible with the I/O event loop, and of course, everything is consistently documented.

Friday, February 01, 2008

24-bit strings are in

I implemented the new strings which start off as strings of octets and upgrade transparently to 24-bit strings capable of representing all Unicode 5.0 code points. Dan's Unicode library is almost finished, too.

Many languages shun integration with operating systems and existing standards such as Unicode, HTTP, XML, and so on, because these standards are "dirty" and are beneath them. I think this philosophy is bullshit; it is just an attempt to pass off incompetence as elitism.

Factor is all about "embracing and extending" existing standards: we inter-operate with your existing software but allow you to build something better on top.