Factor: a practical stack language: April 2008

Tuesday, April 29, 2008

USA Zip code database

Recently someone posted a freely available Zip code database on reddit. The database is in CSV format.

Phil Dawes contributed a CSV parser to Factor a few days ago, and I thought this would be the perfect use-case to test out the parser.

I added the library to extra/usa-cities. It begins with the usual boilerplate:

USING: io.files io.encodings.ascii sequences sequences.lib
math.parser combinators kernel memoize csv symbols inspector
words accessors math.order sorting ;
IN: usa-cities

Then, we define some singleton types for the various states of the union. While this isn't strictly necessary, it allows us to write generic words which dispatch on states; for example, I'm sure Doug's taxes library could use this:

SINGLETONS: AK AL AR AS AZ CA CO CT DC DE FL GA HI IA ID IL IN
KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK
OR PA PR RI SC SD TN TX UT VA VI VT WA WI WV WY ;

: states ( -- seq )
    {
        AK AL AR AS AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY
        LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK
        OR PA PR RI SC SD TN TX UT VA VI VT WA WI WV WY
    } ; inline

ERROR: no-such-state name ;

M: no-such-state summary drop "No such state" ;

MEMO: string>state ( string -- state )
    dup states [ word-name = ] with find nip
    [ ] [ no-such-state ] ?if ;

Next up, we define a data type storing rows from the CSV database:

TUPLE: city
first-zip name state latitude longitude gmt-offset dst-offset ;

Now a word which reads the database, parses it as CSV, and then parses each column into a specific data type:

MEMO: cities ( -- seq )
    "resource:extra/usa-cities/zipcode.csv" ascii <file-reader>
    csv rest-slice [
        7 firstn {
            [ string>number ]
            [ ]
            [ string>state ]
            [ string>number ]
            [ string>number ]
            [ string>number ]
            [ string>number ]
        } spread city boa
    ] map ;

This word is tricky; some notes on its workings:

We begin by opening a stream for reading from the CSV file with ASCII encoding.
The csv word reads CSV data from a stream.
The first line of the file consists of column headings and not actual data, so we ignore it by using the non-copying variant of rest, rest-slice (recall that the primary sequence type in Factor is an array, so removing the first element makes a copy; a slice is a virtual sequence presenting a view of a subsequence of an array).
The spread combinator takes a sequence of quotations, Q1,...,QN, and n values from the stack, X1,...XN, and outputs Q1[X1],...,QN[XN]. In this case we're taking the first 7 elements of each row of CSV data (each row has exactly 7 columns so in effect we're pushing every element of the sequence on the stack), then using spread to convert some columns from their initial text format into something more useful to us; a state singleton or a number.
Finally, we use city boa to construct a city tuple "by order of arguments"; this slurps the 7 stack values and stores them into a new instance of city (note that the definition of the city type has exactly 7 slots and they are defined in the same order as the columns of the file).
Finally, we map over the sequence of rows to perform the above steps on each row of the file. The result is a sequence of city instances.

The word is memoized so of course it will only load the database once.

We can now define words to query it:

MEMO: cities-named ( name -- cities )
    cities [ name>> = ] with filter ;

MEMO: cities-named-in ( name state -- cities )
    cities [
        tuck [ name>> = ] [ state>> = ] 2bi* and
    ] with with filter ;

: find-zip-code ( code -- city )
    cities [ first-zip>> <=> ] binsearch* ;

Now, let's look at some examples.

First, let's look up my Zip code:

( scratchpad ) 55406 find-zip-code .
T{ city f 55406 "Minneapolis" MN 44.938615 -93.22082 -6 1 }

And another famous Zip code:

( scratchpad ) 90210 find-zip-code .
T{ city f 90210 "Beverly Hills" CA 34.088808 -118.40612 -8 1 }

How many states have a city named "Minneapolis"?

( scratchpad ) "Minneapolis" cities-named [ state>> ] map prune .
V{ NC MN KS }

What is the possible range of Zip codes for Austin?

( scratchpad ) "Austin" TX cities-named-in [ first-zip>> ] map [ infimum . ] [ supremum . ] bi
73301
78972

There are many possible applications for this library, including form validation in web apps. It could be extended further: if the database was loaded into a quadtree sorted by latitude/longitude, you could perform queries such as finding all towns within 50 miles of a given city.

An addendum to "The new HTTP server, part 2"

If you run the web app presented in the last blog post verbatim, you will get a "500 Internal server error" with no further indication of what's going wrong. This is because the code I presented has a minor omission.

The opaque error message is intentional: if your web app crashes, you don't necessarily want to expose internal details to every user that comes along (one famous case was reddit.com, which leaked a portion of their Python codebase inside a stack trace at some point). However, if you set the development-mode global variable to a true value, the behavior of the HTTP server changes in two respects:

If an error occurs, the error page contains the error message as well as the full stack trace.
Every request begins by calling refresh-all, thus interactive testing of web app changes becomes very straightforward.

If we enable development mode, we see the real error message, "No such table: SESSIONS". This is because I didn't mention that one must initialize the database by creating the table for storing sessions first:

"counter.db" sqlite-db [ init-sessions-table ] with-db

In the next installment of this series, which hopefully won't take as long as the second one did, I will discuss form validation and the templating system.

Write once, run anywhere

Apple finally released Java 6 for Mac OS X, about a year late, and it only runs on 64-bit Intel Macs. It also requires Leopard, which isn't such a big deal because anyone can upgrade. I sold my G5 but a lot of people out there still use PowerPC Macs and I guess they're just screwed. It's a shame too because Java 6 is the first release that's starting to approach true usability for desktop applications.

The new HTTP server, part 2

It's been a month and a half since the first part of this series. Why the long delay? I've been busy with other things. I implemented inheritance, various compiler optimizations, and many other things. In the last couple of weeks I've been working on the web framework again, tying up some loose ends and porting more existing web applications over (namely, the pastebin and planet factor).

In this entry I will talk about session management. Session management was one of the first things I implemented in the new framework when I started working on it, but recently I gave the code an overhaul.

Session management

The basic idea behind session management is that while HTTP is a stateless protocol, we can simulate state by sending a token to the client -- either in the form of a hidden element on the page, or a cookie, which the client sends back to the server with a later request. This token is associated with an object on the server and the object holds state between requests.

Another approach for session management is to store state entirely on the client; instead of sending the client a session ID identifying an object on the server, you send the session data itself to the client. Traditionally this approach has only been used for user preferences and such where security is immaterial, but it can even be used for more sensitive data by encrypting it with a private key only known to the server. The client receives an opaque blob of binary data which cannot be inspected or tampered with (unless the public key encryption algorithm being used is compromised).

Currently Factor's session manager does not support client-side sessions, but it will soon, using Doug Coleman's public-key encryption code. Server-side sessions are supported, however.

The session manager uses two main strategies to pass state to the client:

For GET and HEAD requests, a cookie is used. The cookie's value is a randomly-generated session ID.
For POST requests, the form must define a hidden field with the session ID. The value of the cookie is ignored to thwart cross-site scripting attacks.

The idea is to strike a balance between security and convenience; we don't want to add a session ID to every link and start a new session if the user navigates to the site by directly entering a URL, but on the other hand we don't want potentially destructive POST requests to be accepted unless they were sent by a form generated from within the session itself.

In Factor, a session is simply a hashtable where values can be stored. Keys are known as "session variables" and values can be read and written with the sget and sset words, there's also a schange combinator which applies a quotation is applied to an existing session variable to yield a new value. This all entirely analogous to the get/set/change words for dynamic variables.

Session namespaces are serialized and stored in a database using Doug's db.tuples O/R mapper. I originally supported pluggable "session storage" backends, with database storage and in-memory storage as the two options, however I decided to simplify the code and hardcode database storage. This has the side-effect that you'll need to set up a database to use the session management feature, however SQLite presents a lightweight option which requires no configuration, so I don't think this is a big deal at all.

I will show a small example of a 'counter' web application, much like the counter example for the Seaside framework.

We start off with a vocabulary search path:

USING: math kernel accessors http.server http.server.actions
http.server.sessions http.server.templating.fhtml locals ;
IN: webapps.counter

Now, we define a symbol used to key a session variable:

SYMBOL: count

Next, we define a pair of actions which increment the counter value, using the schange combinator. The display slot of an action contains code to be executed upon a GET request; it is expected to output a response object. In our case, the word outputs an action which applies the quotation to the current counter value; the action outputs a response which redirects back to the main page:

:: <counter-action> ( quot -- action )
    <action> [
        count quot schange
        "" f <standard-redirect>
    ] >>display ;

The action to decrement the counter is entirely analogous:

: <dec-action> ( -- action )
    <action> [ count [ 1- ] schange f ] >>display ;

Note that this word constructs actions, instead of invoking them. This approach is more flexible than the old "furnace" web framework, where actions were mapped directly to word execution, because it allows one to write "higher-order actions" parametrized by values more easily.

Here is the default action; it displays the counter value using a template:

: <counter-action> ( -- action )
    <action> [
        "resource:extra/webapps/counter/counter.fhtml" <fhtml>
    ] >>display ;

Finally we put everything together in a dispatcher:

: <counter-app> ( -- responder )
    counter-app new-dispatcher
        [ 1+ ] <counter-action> "inc" add-responder
        [ 1- ] <counter-action> "dec" add-responder
        <display-action> "" add-responder
    <sessions> ;

We create a dispatcher, add instances of our actions to it, and wrap the whole thing in a session manager.

Now, the template:

<% USING: io math.parser http.server.sessions webapps.counter ; %>

<html>
    <body>
        <h1><% count sget number>string write %></h1>

        <a href="inc">++</a>
        <a href="dec">--</a>
    </body>
</html>

Finally, once we have all the parts, we can create the counter responder and start the HTTP server:

<counter-app> "test.db" sqlite-db <db-persistence> main-responder set
8888 httpd

Note that here we wrap the counter responder in another layer of indirection, this time for database persistence; while the counter web app doesn't use persistence the session manager does, and we chose to use SQLite since it requires no configuration or external services.

Navigating over to http://localhost:8888/ should now display the counter app, and clicking the increment and decrement links should have an effect on the displayed value. Sessions persist between server restarts and time out after 20 minutes of inactivity by default. Looking at your web browser's cookie manager will show that a factorsessid cookie has been set.

As an aside, the Seaside version uses continuations to maintain state. The Factor version explicitly maintains state. Even though I ported Chris Double's modal web framework over to the new HTTP server, I'm avoiding continuations in favor of explicit state for now. I am building up a form component framework with validation, easy persistence, and user authentication without resorting to continuations, and I plan on building a state-machine model with a page flow DSL, much like jBPM, to handle more complex multi-page flows such as shopping carts. While this will result in more work for me, I believe the benefits include transparent support for load-balancing and fail-over, readable URLs, and ultimately, simpler and more reusable web application code because page flow can be decoupled from logic and expressed in a custom DSL intended for that purpose.

The Seaside version is also somewhat shorter; it is easy to express with idiomatic Seaside (transparent session management, presentation logic mixed in with web app code). I will add better abstractions to make up for some of the difference, and for larger applications there should be no difference in code size; in fact since the scope of Factor's framework is wider than Seaside (it covers persistence, authentication and validation, and soon, versioning of persistent entities) you might even need less code to accomplish the same thing.

Virtual hosting

The other topic I promised to cover last time was virtual hosting. Virtual hosting is done with dispatchers, much like nested directory structure is. You create a virtual host dispatcher with <vhost-dispatcher> and add responders for various virtual hosts using add-responder; the >>default slot can be used to set the default virtual host. The key difference between the new approach and the old HTTP server virtual hosting implementation, which relied on a global hashtable mapping virtual host names to responders, is flexibility; the virtual host dispatcher does not necessarily have to be your top-level responder.

For example, the <boilerplate> responder gives you a way of enforcing a common look and feel across a set of web apps, by adding common headers and footers to every page. While I will describe boilerplate responders and the template system in more detail in a later post, for now here is an example:

<vhost-dispatcher>
    <online-store> "store.acme.com" add-responder
    <support-site> "support.acme.com" add-responder
    <main-site> "acme.com" add-responder
<boilerplate>
    "acme-site.xml" >>template
acme-db <db-persistence>
<sessions> main-responder set

Here, all virtual hosts share the same session management, database persistence, and common theme, and the virtual host dispatch only happens after the request filters through the mentioned layers of functionality. This would not be possible with the old HTTP server without duplicating code.

Cookies

Finally I promised to talk about cookies. The session management support is great but sometimes you just want to get and set cookies directly. This can be done by reading the cookies slot of the request object, and writing the cookies slot of the response object. The slot contains a sequence of cookie objects, which are parsed and unparsed from their HTTP representation for you. A cookie object contains a series of slots, such as name, value, expiration date (as a Factor timestamp object), max-age (as a Factor duration object), path, and host. While the expiration date is deprecated as of HTTP/1.1, most sites still use it in favor of max-age because older browsers don't support max-age. Factor's HTTP server sets the date header on each response so that expiration dates can work correctly.

Here is an example of using the HTTP client (which shares the cookie code with the server) to look at Google's ridiculously long-lived cookies:

( scratchpad ) "http://www.google.com" http-get-stream drop cookies>> first describe
cookie instance
"delegate"  f
"name"      "pref"
"value"     "ID=c0f4c074cd87502e:TM=1209466656:LM=1209466656:S=_6gGEKtuTgP..."
"path"      "/"
"domain"    ".google.com"
"expires"   T{ timestamp f 2010 4 29 10 57 36 ~duration~ }
"max-age"   f
"http-only" f

Saturday, April 19, 2008

Performance improvements

Over the last three days I spent some time improving Factor's compiler. I made the following improvements:

Partial dispatch is performed on integer arithmetic operations. Previously, Factor's compiler would convert generic arithmetic to machine arithmetic when it knew the exact types involved; so if you had two floats on the stack, + would become float+, and if you had two fixnums it would become fixnum+ or fixnum+fast, depending on whether interval inference determined if the overflow check was needed or not. While this worked well in many cases there were a lot of instances where the compiler could only infer you were dealing with integers, but not fixnums in particular; either because interval inference was not smart enough, or because the values really could be out of bounds for a fixnum. Now, if it knows that both inputs are integers, it compiles a call to a special word which still performs a dispatch, but with only two possibilities. This is a win, because a conditional is faster than a jump table as used in the generic +. It is an even bigger win if one of the inputs is known to be a fixnum, because then the two jump table dispatches are replaced by a single conditional.
Improved overflow check elimination. First, there is an enabling optimization. Suppose we have a positive integer on the stack. Then, the following two are equivalent:
```
4096 mod
4095 bitand
```
The optimizer now recognizes the case where mod is applied with a power of two to a positive integer, and converts it to a bitwise and. For the rem word an even more general optimization is possible; we only have to assume the first input is an integer and the second is a power of two.

While on a modern CPU, this is not a big in itself for mod (it is for rem, which is more expensive), it does enable the following optimization. Note that the following two expressions are equivalent:
```
4095 bitand
16777215 bitand 4095 bitand
```
That is, if you mask off the first n bits, then the first m bits, and m<n, you get the same result as masking off the first m bits. This might seem trivial and useless and first, but then you realize that a truncating conversion of a positive bignum to a fixnum is simply masking off bits! So the following two are equivalent:
```
4095 bitand
>fixnum 4095 bitand
```
But of course, both inputs to the latter bitand are fixnums now, and can be converted to:
```
4095 bitand
>fixnum 4095 fixnum-bitand
```
It gets even better. Consider the following piece of code from project-euler.150:
```
: next ( t -- new-t s )
    615949 * 797807 + 20 2^ rem dup 19 2^ - ;
```
Constant folding gives us:
```
: next ( t -- new-t s )
    615949 * 797807 + 1048576 rem dup 524288 - ;
```
The above optimization, together with an overflow removal on the -, gives us:
```
: next ( t -- new-t s )
    15949 * 797807 + >fixnum 1048575 fixnum-bitand dup 524288 fixnum-fast ;
```
There is an existing optimization takes advantage of these identities:
```
* >fixnum == [ >fixnum ] bi@ fixnum*fast
+ >fixnum == [ >fixnum ] bi@ fixnum+fast
```
By applying the above, the compiler converts this code to:
```
: next ( t -- new-t s )
    >fixnum 15949 fixnum*fast 797807 fixnum+fast 1048575 fixnum-bitand dup 524288 fixnum-fast ;
```
All generic arithmetic and overflow checks have been removed, because of a single rem in the middle of the calculation!

In project-euler.150, this word was actually used inside a loop, where it was iteratively applied to a starting generator value. With the new modular arithmetic optimization, Factor's existing interval inference code managed to infer the result of the word is always in the interval (-2^20,2^20), and since the initial value is 0, even the call to >fixnum was optimized away.
I improved Factor's type inference algorithm to better deal with local recursive code blocks. While type inference worked okay with loops, it didn't fare so well with binary recursive functions because it wasn't able to obtain any information about return values of the recursive calls. Everybody's favorite binary recursive benchmark, fibonacci, was one example of this situation. Solving this required a bit of thought.

Previously, type inference for recursive functions was done in a pretty ad-hoc manner. Factor's type inference is only used for optimization so it is okay if it is conservative. It worked pretty well, however today I decided to add better support for recursion, and I also found a bug where it would infer invalid types, so I decided to redo it a little.

Type inference of recursive functions is now an iterative process, where you begin by assuming the recursive calls take the top type as inputs and the bottom type as outputs, then you infer the type of the body under these assumptions, then apply the resulting types to the recursive calls, then infer again, and so on, until a fixed point is reached. There are smarter ways of doing this with other type systems however in Factor, the type functions of some words are pretty complex. For example, the output type of + depends on not only the types of its inputs, but the interval ranges they lie in, so I'm not sure if there's any other solution.

With this out of the way, there is one major remaining limitation in Factor's type inference, and that is it only works within words. It does not attempt to infer types across word boundaries. In practice this means many optimizations only kick in if a lot of words are inlined, which is not practical in some cases due to code size explosion. My next project in this area is to cache type signatures for words and use this information when inferring types of callers.
I improved performance of inline object allocation. In the VM, the structure holding the allocation pointer is now stored directly in a global variable, instead of a pointer to it being stored in the global variable. This is one less indirection for inline allocators. Another improvement is that the heap exhaustion check is done in the allocator itself, so a call into the VM is avoided if the heap is not full. This saves a subroutine call in the common case of course, but also some saving and restoring of registers.

These changes have let to some performance improvements on the two benchmarks I was working with. My computer here is a MacBook Pro with a 2.4 GHz Intel Core Duo 2.

The project-euler.150 benchmark saw its runtime decrease from 87 seconds to 22 seconds.

The benchmark.recursive benchmark, which is a Factor implementation of the recursive benchmark from the Computer Language Shootout, saw its runtime decrease from 27 seconds to 9 seconds.

For comparison, SBCL runs the recursive benchmark in 3.5 seconds, and Java runs it in 1.6 seconds.

Using the java -Xint result of 22 seconds, I guesstimated from the results for all languages on the shootout that Factor at around the performance of the Erlang HIPE JIT, and slightly faster than the Python Psyco JIT and GForth. By anybody's standard, this is anywhere between "not very fast" and "bloody slow", but its slowly improving.

Now I'll end with a rant about the language shootout.

The only so-called "dynamic" language implementations which come close to the performance of Java and C on this benchmark are SBCL, Ikarus Scheme, and Chicken Scheme. However, all these benchmarks are actually static programs in disguise, peppered with type declarations and even unsafe low-level features. The Ikarus and Chicken versions even implement the same functions twice, once for integer inputs and using integer arithmetic primitives, and another time for float inputs and using float arithmetic primitives. Unless their goal is really to promote their languages as nothing more than C with s-expression syntax, this is disappointing and dishonest.

The Haskell benchmarks suffer from this also. Many benchmarks seem to use malloc and pointer arithmetic, and exclamation points (strictness annotations) abound. The n-body benchmark contains this gem:

planets :: Ptr Double
planets = unsafePerformIO $ mallocBytes (7 * nbodies * 8) -- sizeOf double = 8

I would expect better from the Haskell people than hardcoding the size of a double to do pointer arithmetic on manually-managed memory!

Right now I have no idea how idiomatic Haskell performs in practice; from looking at these benchmarks, it is clear to me that if one writes C in Haskell, pretty decent performance is possible, but that's not saying much. You can write C in any language.

The runtime of the Factor recursive benchmark further improves by two-fold if I manually replace generic arithmetic with unsafe machine arithmetic, however I'm not interested in writing such code. I don't want to expose low-level primitives to the user, document their use as necessary for performance critical code, and declare that my job is done.

My top 8 shell commands

I wrote a short Factor script to check my bash history and tally up the most frequently occurring commands.

    git
    ./factor
    bf
    cdf
    ls
    cd
    fc
    push

There are some aliases here:

cdf is aliased to cd ~/factor
bf is aliased to ./factor -i=boot.x86.32.image
push is aliased to git push origin master
fc is alised to ssh factorcode.org

It seems that the only thing I use my computer for is running Factor!

Friday, April 11, 2008

Multi-touch gestures in the Factor UI

Recently I acquired a MacBook Pro. The new models come with a multi-touch keypad and I've found it extremely useful in Safari to be able to go back, go forward and zoom with the mouse. However, I missed the ability to do this in other applications, especially the Factor UI. The Factor UI already supports vertical and horizontal scrolling gestures, and I wanted to be able to use the other gestures as well.

Apparently, Apple's official stance is that no multitouch API will be made public until 10.6. I was slightly discouraged by this but some more Googling turned up a blog post by Elliott Harris detailing the undocumented API for receiving these events.

While normally I shy away from relying on undocumented functionality in this case the API is dead-simple and it is almost an oversight on Apple's part not to document it. And if they break it, well, it will be easy to update Factor too.

Here are the changes I had to make. First, I added some new UI gestures to extra/ui/gestures/gestures.factor; these are cross-platform and theoretically the Windows and X11 UI backends could produce them too, perhaps as a result of button presses on those "Internet" keyboards:

TUPLE: left-action ;        C: <left-action> left-action
TUPLE: right-action ;       C: <right-action> right-action
TUPLE: up-action ;          C: <up-action> up-action
TUPLE: down-action ;        C: <down-action> down-action

TUPLE: zoom-in-action ;  C: <zoom-in-action> zoom-in-action
TUPLE: zoom-out-action ; C: <zoom-out-action> zoom-out-action

Next, I edited extra/ui/cocoa/views/views.factor with some new methods for handling the new multitouch gestures, and translating them to Factor gestures:

{ "magnifyWithEvent:" "void" { "id" "SEL" "id" }
    [
        nip
        dup -> deltaZ sgn {
            {  1 [ T{ zoom-in-action } send-action$ ] }
            { -1 [ T{ zoom-out-action } send-action$ ] }
            {  0 [ 2drop ] }
        } case
    ]
}

{ "swipeWithEvent:" "void" { "id" "SEL" "id" }
    [
        nip
        dup -> deltaX sgn {
            {  1 [ T{ left-action } send-action$ ] }
            { -1 [ T{ right-action } send-action$ ] }
            {  0
                [
                    dup -> deltaY sgn {
                        {  1 [ T{ up-action } send-action$ ] }
                        { -1 [ T{ down-action } send-action$ ] }
                        {  0 [ 2drop ] }
                    } case
                ]
            }
        } case
    ]
}

Note that I'm throwing away useful information for the sake of simplicity; the zoom gesture gives you a precise zoom amount, not just +1/-1. The swipe gestures seem to be completely discrete though. I haven't implemented rotation gestures yet because I haven't figured out what to use them for.

With the above code written, the UI now sends multi-touch gestures to gadgets however no gadgets used them yet. I fired up the gesture logger, "gesture-logger" run, and tested that the new actions actually get sent. They were, except the first time I messed up the code and got the signs the wrong way round.

With that fixed, I could proceed to add gesture handlers to the various UI tools. I edited various source files:

browser-gadget "multi-touch" f {
    { T{ left-action } com-back }
    { T{ right-action } com-forward }
} define-command-map

inspector-gadget "multi-touch" f {
    { T{ left-action } &back }
} define-command-map

workspace "multi-touch" f {
    { T{ zoom-out-action } com-listener }
    { T{ up-action } refresh-all }
} define-command-map

I think its pretty clear from the above what the default gesture assignments are:

In the browser tool, swipe left/right navigate the help history.
In the inspector, swipe left goes back.
In any tool, a pinch (zoom out) closes the current tool, leaving only the listener visible. A swipe up reloads any changed source files (I'm not sure if I like this yet).

15 minutes of Google, 20 minutes of hacking, and now Factor supports the fanciest feature of Apple's latest hardware.

Improvements to io.monitors; faster refresh-all

Factor's io.monitors library previously supported Mac OS X, Windows and Linux. Now it also supports BSD, but in a much more restricted fashion than the other platforms. Basically you cannot monitor directories, just individual files. This is because kqueue() only provides very limited functionality in this regard. However, having something is better than nothing, and the functionality provided on BSDs is still useful for monitoring log files and such.

On Linux, inotify doesn't directly support monitoring recursive directory hierarchies so Factor's monitors didn't support recursive monitoring, but a mailing list post by Robert Love discusses how to solve this issue in user-space. I used his solution to implement recursive monitors on Linux.

Another oddity relating to inotify is that if you add the same directory twice to the same inotify, you get the same watch ID both times, and events are only reported once. This means that the previous implementation where there was one global inotify instance shared by all monitors wasn't really as general as one would hope, because you couldn't run two programs that monitor overlapping portions of the file system. I thought of several possible fixes but in the end just changed the monitors API to accommodate this case. All monitor operations must now be wrapped in a with-monitors combinator. On Linux, it creates an inotify instance and stores it in a dynamically-scoped variable, so that subsequent calls to <monitor use this inotify. Independent inotifies in different threads don't interact at all. On Mac OS X, BSD and WIndows, with-monitors just calls the quotation without doing any special setup.

Another issue I fixed was that on Mac OS X, monitors would only work when used from the UI because no run loop was running otherwise. I made a run loop run all of the time and this allows monitors to work in the terminal listener.

Now that monitors are working better, I was able to use them to make refresh-all. This word finds all changed source files in the vocabulary roots and reloads them. It does this by comparing cached CRC32 checksums with the actual checksum of the file. Previously it would also compare modification times, but I took that code out because filesystem meta-data queries got moved out of the VM and into the native I/O code, which isn't available during bootstrap. A side-effect of this is that refresh-all became much slower, because it had to read all files. Using monitors I was able to make this faster than it has ever been. A thread waiting on a monitor is started on startup. Then, the source tree only has to be checksummed in its entirety the first time refresh-all is used in a session. Subsequently, only files for which the monitor reported changes have to be scanned. So refresh-all runs instantly if there are no changes, and so on.

Tuesday, April 08, 2008

The golden rule of writing comments

This is something I've wanted to say for a while, and I think many (most?) programmers don't realize it:

Never comment out code.

Comments are for natural-language descriptions of code, or pseudocode maybe.

If you comment out some code, then the parser isn't checking the syntax, the compiler isn't checking semantics, and the unit tests are not unit testing it.

So the code may as well not work. Why have non-working code in your program, especially if it's not called anywhere?

But perhaps the code is there so that you can see the what the program used to do. In that case, just fire up your favorite graphical GIT/SVN/P4/whatever frontend and check out an older revision.

Multi-methods and hooks

For a while now Factor has had 'hooks', which are generic words dispatching on a dynamically scoped variable. Hooks can be used with variables which are essentially global: the current OS, current CPU, UI backend, etc -- or variables which are truly context-specific, such as the current database connection.

I added support for hooks to the extra/multi-methods library, which is going in the core soon. While doing this I was able to significantly generalize the concept. Suppose we want to define a piece of functionality which depends on both the operating system and processor type.

We can begin with defining an ordinary generic word:

GENERIC: cache-size ( -- l1 l2 )

Notice that it is defined as taking no inputs from the stack.

I don't really know the APIs involved here, but suppose that Linux gives us a way to get this info that works across all CPUs. So we define a method specializing on the os dynamic variable (which is really treated like global):

METHOD: cache-size { { os linux } } ... read something from /proc ... ;

Now suppose on Mac OS X we use different APIs per CPU:

METHOD: cache-size { { os macosx } { cpu ppc } } ... ;

METHOD: cache-size { { os macosx } { cpu x86.32 } } ... ;

METHOD: cache-size { { os macosx } { cpu x86.64 } } ... ;

But perhaps on ARM CPUs, we just use an instruction to read the cache size without any OS-specific calls at all:

METHOD: cache-size { { cpu arm } } ... ;

Now in this case, you have an issue where if you're on Linux and ARM, the method that ends up being called depends on the order in which they were defined. If you wanted to explicitly resolve this ambiguity, you would define a new method on { { os linux } { cpu arm } }; because it is more specific than the other two, it is always called first.

The powerful thing about this new implementation of hooks is that not only can you dispatch on multiple variables, but you can add methods to any old generic which dispatches on a variable and the original designer of the generic does not have to explicitly take this into account.

For example, Factor's compiler currently has a large number of hooks dispatching on the CPU type (as an aside, Phil Dawes wrote an excellent introduction to the Factor compiler recently). If those hooks need to be further refined by OS, as is often the case with FFI-related components, the method implementation on the hook needs to perform its own dispatch; this is the "double dispatch" pattern and design patterns are something to be avoided if one wants to write quality code. When multi-methods go in the core, the compiler will simply define a series of generic words taking no inputs from the stack, and each method will specialize on the CPU, and maybe an OS too.

Another new capability is dispatching off stack values and variables in the same method. Among other things, this will be useful in eliminating a case of double dispatch in the core right now, where the <client> word for opening a client socket has to dispatch off the address type on the stack, and then call another hook which dispatches on the I/O backend stored in a variable. This can be combined into a single generic word where some methods dispatch on stack values, and others dispatch on the I/O backend variable.

The other nice thing about this is that the multi-method GENERIC: word unifies and generalizes four words in the core, GENERIC:, GENERIC# for dispatching on a value further down on the stack, HOOK: for dispatching on a variable, and MATH: which performs double dispatch on numbers only.

One of my goals for Factor 1.0 is to get the object system "right", and with the new-style slot accessors, inheritance, and singletons, we're almost there. All that remains to be done is to merge the multi-methods code. The code is still not quite ready to go in the core, though. The only feature that single dispatch has and multiple-dispatch lacks is call-next-method, which is easy to implement. A bigger hurdle to clear is performance; right now multi-methods are implemented in a naive way, where the dispatch time is O(mn) with m methods and n stack positions and variables to dispatch on. This can be improved significantly and I will find time reading the literature on optimizing method dispatch over the next few weeks.

Saturday, April 05, 2008

GC fixes and improvements

With some advice from Daniel Ehrenberg, I have made a few fixes and improvements to Factor's garbage collector.

First of all, there was a potential memory leak situation I overlooked. Suppose that there is a compiled definition in the code heap which references a very large object in the data heap, and neither the compiled definition or the large object is referenced from anywhere else. Because the code heap was only ever collected when it filled up, it was possible that this large object would never be reclaimed, and it would incur unnecessary collection cycles and memory pressure as a result. This could even result in unbounded heap growth if this was done in a loop. For example, the following test case would crash Factor even though it should have run in constant space:

: leak-step 800000 f <array> 1quotation call drop ;

: leak-loop 1000 [ leak-step ] times ;

The fix is to always collect the code heap when collecting the oldest generation. Only collecting the code heap when it fills up is simply unsound because the code heap can reference objects in the data heap. If this was not the case then collecting them independently would be okay, but it isn't.

The next thing I did was improve how allocation of large objects is handled. Previously, if a new object was too large to fit in the nursery, the entire heap would grow, and every time the heap grew it would increase the size of all generations. This is generally not what you want, because if the nursery is too large, we lose locality, and if the accumulation space is too large we waste time copying objects back and forth that should really be in tenured space.

Now, the nursery and accumulation space have a fixed size that can be changed on startup with command line switches, but never changes while Factor is running. If an attempt is made to allocate an object larger than the nursery, it is directly allocated in tenured space. Presumably, if you're making a 2 megabyte array, you're going to do something with it, and hold on top it for at least a few collection cycles, rather than discard it immediately, so it makes sense to avoid the copying altogether and stick it in tenured space. On retarded microbenchmarks this change might result in worse performance, but on more realistic workloads it should be better; having a small nursery helps with locality and avoiding the inevitable copying of the large array from the nursery to accumulation space and then to tenured space is good too.

Future improvements to the GC will include:

Shrinking the heap when memory usage lowers and remains low for several collections (Zed Shaw experimented with implementing this but didn't finish due to lack of time).
Allocating chunks from the OS in small increments, say 1mb, instead of allocating one large heap. This would make heap growth more efficient (no need to copy everything over) and it would also allow full use of the entire address space (and not half). However it will require rethinking Factor's card marking write barrier, which currently assumes a contiguous heap.
Using mark/sweep/compact for the oldest generation.
Incremental marking and sweeping.

Daniel and I will be working on all of this over the summer.

Thursday, April 03, 2008

Inheritance is done

I've implemented the last major missing piece of functionality in the new inheritance system, and that is method dispatch which is compatible with inheritance.

To make inheritance as useful as delegation, there needs to be an easy way to call the next applicable method. With delegation, this was done entirely in the language itself:

M: my-gadget graft*
    dup delegate graft* ... do more stuff ... ;

However the delegate method would not see the current object on the stack; hence the "slicing problem". With inheritance, there is a special word:

M: my-gadget graft*
    dup call-next-method ... do more stuff ... ;

Now call-next-method is a parsing word. It expands to this:

M: my-gadget graft*
    dup \ my-gadget \ graft* (call-next-method) ... do more stuff ... ;

At compile time, (call-next-method) expands into code which checks if the top of the stack is of the correct type for the current method; if not, it throws an error. If the check succeeds, it calls the next applicable method.

While I added this feature primarily for use with tuple inheritance, it also works in two other instances:

Predicate class inheritance
Union classes

The first case was a real pain to deal with before today, because there there was no way to call the next method on a parent predicate class. Suppose you had two predicate classes:

PREDICATE: foo < word ... ;

PREDICATE: bar < foo ... ;

And a generic word:

GENERIC: blah

M: foo blah ... ;

M: bar blah ... ;

If bar wanted to do something and then call the foo method, the only way was with a design pattern:

: foo-blah ... ;

GENERIC: blah

M: foo blah foo-blah ;

M: bar blah ... foo-blah ;

This is smelly. Now you can just do:

GENERIC: blah

M: foo blah ... ;

M: bar blah ... call-next-method ;

The situation with unions is similar. However, because unions give you what amounts to multiple inheritance, there is a complication.

Since next method is computed statically, and this gives different semantics than CLOS. Suppose we have the following code:

TUPLE: a ;
TUPLE: b ;
TUPLE: c ;

UNION: x a b ;
UNION: y a c ;

UNION: z x y ;

We can represent the subtype relationships here as follows:

         Z
        / \
       X   Y
      /\   /\
    B    A    C

Suppose we have a generic word:

GENERIC: funky* ( obj -- )

M: z funky* "z" , drop ;

M: x funky* "x" , call-next-method ;

M: y funky* "y" , call-next-method ;

M: a funky* "a" , call-next-method ;

M: b funky* "b" , call-next-method ;

M: c funky* "c" , call-next-method ;

: funky [ funky* ] { } make ;

Now the following two results are straightforward:

( scratchpad ) T{ b } funky .
{ "b" "x" "z" }
( scratchpad ) T{ c } funky .
{ "c" "y" "z" }

However, suppose you have an instance of a on the stack. The next method after a is either x or y, and this depends on the phase of the moon; the two classes overlap but neither is a subset of the other, so we have some ambiguity here. However, for the sake of argument, suppose the next method is x. Now, statically, the next method after x is z. However, given that we have an a on the stack, the method on y is also applicable, so if we computed it dynamically, the next method would be y and then z. So to summarize, right now, we get one of the following two results:

( scratchpad ) T{ a } funky .
{ "a" "x" "z" }
( scratchpad ) T{ a } funky .
{ "a" "y" "z" }

However, it would be more desirable to get one of the following:

( scratchpad ) T{ a } funky .
{ "a" "x" "y" "z" }
( scratchpad ) T{ a } funky .
{ "a" "y" "x" "z" }

However, doing so would require expanding (call-next-method) into a type dispatch, not a direct call. Implementing this efficiently and updating all occurrences of (call-next-method) as classes are redefined is still an open problem, and one that I will tackle later. The current static behavior will suffice for now.

Now that the code is done, I still need to write documentation and update the core to actually use inheritance in place of delegation. After that, I will return to the web framework and let things settle down a bit. Once I'm happy with the stability of both the core and the state of the web framework, I will tackle multi-methods; what this really means is efficient implementation of multi-methods, given that an inefficient implementation already exists in extra/. This presents its own set of unique challenges but I look forward to solving that problem.

Once Factor has multi-methods, our object system will be comparable in power to CLOS, however I believe it will offer genuine improvements and innovations in several respects. Furthermore, the implementation will be a lot simpler than PCL.

Converting a table to a tree

I found an interesting Haskell snippet in a reddit comment.

From the comment:

Here's a slightly longer function I created a while ago:
listToForest :: Eq a => [[a]] -> Forest a
listToForest = map toTree . groupBy ((==) `on` head) . filter (/= [])
where toTree = Node . (head . head) <*> (listToForest . map tail)

The function turns a table of results, such as that which might come back from a database query, into a tree. e.g.
[[A, B, C],
[A, B, D],
[A, E, F]]

Becomes:
A
/ \
B E
/ \ \
C D F

I decided to try implementing this in Factor, and I came up with three versions.

All versions represent a tree as its root node, and each node is a hashtable mapping the key of the child to the child node.

The first version does not use any utility words other than what's in the standard library; it consists of a single, rather long word:

: list>forest ( list -- forest )
    [ empty? not ] subset
    H{ } clone [ [ >r unclip r> [ ?push ] change-at ] curry each ] keep
    [ list>forest ] assoc-map ;

The second version uses a couple of utilities:

: push-at ( value key assoc -- )
    [ ?push ] change-at ;

: classify ( seq classifier -- assoc )
    H{ } clone [ [ slip push-at ] 2curry each ] keep ;

: list>forest ( list -- forest )
    [ empty? not ] subset [ unclip ] classify [ list>forest ] assoc-map ;

The third version has a different implementation of classify which uses extra/fry syntax sugar:

: push-at ( value key assoc -- )
    [ ?push ] change-at ;

: classify ( seq classifier -- assoc )
    H{ } clone [ '[ @ , push-at ] each ] keep 

: list>forest ( list -- forest )
    [ empty? not ] subset [ unclip ] classify [ list>forest ] assoc-map ;

If this was a one-off thing I'd use the first version. If I was writing production code, I would use one of the second two, and I might even put the utilities in a shared vocabulary since they're generally useful. In the last version, the only stack manipulation operator is keep, and everything else just happens to be on the stack in the correct order. This is how good stack code is written.

Some explanation:

?push is like push except it makes a new vector if the input is f
slip calls a quotation underneath the top of the stack.
push-at adds a value to the sequence stored at a key in a hashtable.
classify takes a sequence and a quotation which decomposes a sequence element into a key and a value. It builds an association mapping all the unique keys to corresponding values.
unclip takes a sequence and outputs the rest of the sequence followed by the first element.
Haskell's groupBy is a generalization of my classify; it works with an equivalence relation rather than a "fibration" into a set with an implied equivalence relation (wrong term, but I'm not sure what the correct term is).
My classify runs in linear time and I suspect groupBy is quadratic, but I could be wrong.

I'd be interested in seeing a version of this function in idiomatic Common Lisp or Scheme. I suspect it would be somewhat longer, but not horribly so.

Factor: a practical stack language