Saturday, December 22, 2007

Going to Dominican Republic

I will be in the Dominican Republic from December 23rd to January 7th.

I wanted to blog about something I've been working on for the last week and a half, but didn't have time: basically, redefining a word will no longer recompile all words which depend on this word, so that loading a library such as math.ranges, which defines a new sequence type, no longer recompiles ~7000 words which directly or indirectly call length, nth, and so on. This will be a big productivity boost. It is a big change, but I'm trying to do it right, by re-designing the compiler to get rid of some cruft that's been building up for the last few years. When I return, I will write a detailed entry explaining exactly what's going on and what's changing.

I'm bringing my laptop with me, so if I get bored of sitting on the beach (and if the girl lets me) I'll try to do some hacking. However, I don't think I'll have Internet access. As usual, I'm going to leave Daniel Ehrenberg in charge of harvesting patches. If you have anything in your repository, ping him on IRC (he's littledan) so that he can pull. And if you want the latest changes, pull from his repository (http://littledan.onigirihouse.com/factor.git/).

Have a good Christmas and New Year's everybody!

Thursday, December 13, 2007

Factor 0.91 now available

After several months of intense development, Factor 0.91 is available for download. Check out the change log.

Due to lack of interest, OpenBSD and Solaris are not supported anymore. However, to compensate, the Windows CE port has improved, and we have preliminary support for 64-bit Mac OS X Intel (no 64-bit UI yet; that's coming in 0.92.)

Don't forget to subscribe to the Planet Factor Atom feed!

Wednesday, December 12, 2007

Improved help.lint tool

The help.lint tool ensures that code examples in documentation produce the output that they claim to produce, and also checks that stack effects in documentation match the source, as well as finding broken links. I have made it easier to use, and no longer dependent on editor integration. Now, you just run "my-vocab" check-vocab-help, and any problematic words are listed. It is best to run this from the UI, so that the errors are clickable presentations. You can then right-click on any word and choose "Edit" from the menu.

I encourage all contributors to run this tool against their documentation. Just as you should always run unit tests before releasing changes, you should always run this tool too.

Saturday, December 08, 2007

Factor-powered pastebin with syntax highlighting for 100+ languages

I'm proud to announce a major new version of extra/webapps/pastebin. I've deployed the new pastebin on an experimental HTTP server. I hope it doesn't crash by the time you read this. The major new feature is syntax highlighting support.

For the last two weeks or so I've been working on a Factor port of jEdit's syntax highlighting engine. For those who have never used jEdit, syntax highlighting rules are specified in XML files. Rules either involve literal or regular expression matching. Factor already had an XML parser which works pretty well, but regexps were missing.

I've been working with Doug to implement a new regexp library; he will blog about it in the next few days. It suffices to say that it is coming along very well, and the code is very concise and readable, as is the norm with Factor. We're using Chris Double's parser-combinators library to parse the regexp itself, then construct parser combinators from the regexp.

There is quite a bit of history behind the jEdit syntax highlighting engine. jEdit 1.2, released in late 1998, was the first release to support syntax highlighting. It featured a small number of hand-coded "token markers" -- simple incremental parers -- all based on the original JavaTokenMarker contributed by Tal Davidson.

Around the time of jEdit 1.5 in 1999, Mike Dillon began developing a jEdit plugin named "XMode". This plugin implemented a generic, rule-driven token marker which read mode descriptions from XML files. XMode eventually matured to the point where it could replace the formerly hand-coded token markers.

With the release of jEdit 2.4, I merged XMode into the core and eliminated the old hand-coded token markers.

As you can guess from the age of the code, many design decisions date back to Java 1.1, a long-forgotten time when Java VMs with JIT compilers were relatively uncommon, object allocation was expensive, and heap space tight. As a result the parser is basically procedural spaghetti code, with lots of mutable state, the sort that gives Haskell programmers nightmares. So while the code is ugly and the parser design is archaic, the huge advantage is that there is a large suite of contributed modes ready to go.

I expected it would be a pain to port to Factor, but this hasn't been the case. Even though I kept the original design, warts and all, the resulting Factor code is pretty decent. I intend to refactor it to use a more modern design and take advantage of Factor's abstraction features instead of just representing parser state with an bunch of integer and boolean-valued variables. I also need to optimize it more.

Because XMode is an incremental parser, it would be easy to cook up an editor gadget with support for syntax highlighting in the UI. I'll probably do that at some point. Then it will be easy for a contributor to build a text editor in Factor, and we will have come full circle. :-)

Being able to implement a syntax highlighting pastebin in Factor -- one that manages to reuse jEdit mode files no less -- is a milestone. It means we can do XML, regular expressions, web applications, and not to mention the various minor bits and pieces which support these components, such as date/time support, various parsers, and so on. We have different people working on different libraries, and everyone is able to understand and reuse other people's work. It only took a handful of contributors to build all this infrastructure.

Wednesday, December 05, 2007

planet.factorcode.org now has an Atom feed

Thanks to Daniel Ehrenberg, who extended our extra/rss RSS/Atom parser with Atom output support, planet.factorcode.org now publishes an Atom feed. Subscribe away to receive your regular fix of Factor propaganda!

Sunday, November 25, 2007

Factor 0.91 benchmarks

I ran the benchmark suite against Factor 0.91. Compared to last time, there are some nice improvements in bootstrap1, iteration, reverse-complement and ring. There are a few regressions too; I've highlighted those in red and will be investigating them.
Benchmark Run time (ms)
bootstrap1 11452
bootstrap2 575675
continuations 389
dispatch1 3660
dispatch2 4399
dispatch3 5167
dispatch4 8892
empty-loop 496
fib1 238
fib2 907
fib3 1438
fib4 4962
fib5 2015
iteration 13864
mandel 4806
nsieve 2428
nsieve-bits 69021
partial-sums 31281
raytracer 26893
recursive 45196
reverse-complement 12641
ring 8051
sort 2061
spectral-norm 35304
typecheck1 5136
typecheck2 5624
typecheck3 5632
typecheck4 4049

A year and a half ago, reverse-complement was clocking in at 170 seconds. It's 14 times faster now!

Saturday, November 24, 2007

The process launcher, parser combinators, and /bin/sh

The io.launcher vocabulary lets you specify a command line in one of two ways:
"foo bar baz" run-process
{ "foo" "bar" "baz" } run-process

Both have their advantages and disadvantages. The first one is more amenable to user input, the second one is safer and more predictable if some of the arguments are themselves user input and may contain special characters.

Until now, the first form was implemented on Unix by passing the command line to /bin/sh. The shell would take care of tokenizing the command line, parsing quotes and backslashes, etc. This is problematic, for two reasons. The practical reason is that the shell does a hell of a lot of stuff (expanding env vars, tokenizing on $IFS which may change, etc) which is hard to control and predict, hence the regular security advisories we see regarding programs which use system() carelessly. The second reason is philosophical -- Factor should not depend on an external tool for such a trivial task (tokenizing arguments).

Java side-steps the whole issue in a lame way. The Java API also supports launching a process specified as a command line string, or as an array of strings. But their logic for tokenizing a command line string is simplistic, they just split the input on spaces. No quoting or escaping allowed. In Factor, this could be achieved just by calling " " split, but it is lame.

This is Factor, Factor is awesome, and parsing a simple command line string while respecting quoting and escaping should be simple. And it turns out that it is! I haven't used Chris Double's parser-combinators library for anything serious up til now, and I was (pleasantly) surprised at how simple it was. Your grammar really does map pretty much directly to parser combinators, and the clever implementation of the built-in combinators means you pretty much get a parse tree for free, with very little processing of the parse result. Chris did a wonderful job and of course the clever people in the Haskell community who invented parser combinators deserve a lot of praise too.

The command line grammar begins by defining a parser for escaped characters, and another parser for a sequence of escaped and unescaped characters. Note that the former uses &>, so that an escaped character parser result is the actual character itself, without the backslash:

LAZY: 'escaped-char' "\\" token any-char-parser &> ;

LAZY: 'chars' 'escaped-char' any-char-parser <|> <*> ;

Next up, we use the surrounded-by parser combinator to build parsers for quoted strings:
LAZY: 'quoted-1' 'chars' "\"" "\"" surrounded-by ;

LAZY: 'quoted-2' 'chars' "'" "'" surrounded-by ;

The parse result of a surrounded by parser is the result of the parser combinator in the body, so the quotes are stripped off for us automatically. Neat.

Next up, we have a parser for non-whitespace characters used in unquoted tokens:
LAZY: 'non-space-char'
'escaped-char' [ CHAR: \s = not ] satisfy <|> ;

Since we can still escape a space in a unquoted token, we re-use our escaped char parser here.
Now a parser for unquoted tokens:
LAZY: 'unquoted' 'non-space-char' <+> ;

Finally a parser for any type of argument, whether it be quoted or unquoted; we use <@ here to convert the parse result to a string (the result of <*> is an array of parsed elements; the elements are characters because of how we defined our character parsers; we turn this array of characters into a string.)
LAZY: 'argument'
'quoted-1' 'quoted-2' 'unquoted' <|> <|>
[ >string ] <@ ;

Now we have our top-level parser. We define it in a memoized word, so that it is only ever constructed once and the same instance is reused:
MEMO: 'arguments' ( -- parser )
'argument' " " token <+> list-of ;

Finally a word which uses this parser:
: tokenize-command ( command -- arguments )
'arguments' parse-1 ;

The parse-1 word is a utility word which outputs the parse result from the first match (the parse word outputs a lazy list of matches).

Here's an example:
"hello 'world how are you' "\\"today\\"" blah\\ blah" tokenize-command .
{ "hello" "world how are you" "\"today\"" "blah blah" }

The entire parser is 19 lines of code, or 8 lines if we remove blank lines and squish everything together. It reads easily because it is essentially the BNF grammar we're parsing here, just written in a postfix style with some additional fluff. There's no need to use a parser generator or any kind of external tool in the build process; this is just Factor code. Sure beats calling /bin/sh.

Friday, November 23, 2007

Adventures with Cygwin on 64-bit Windows

Warning: this is a rant. If you don't like rants, skip this post and look at my previous post, which is a lot more substantial than this one.

So at FactorCon 2007, Doug and I experimented with getting Factor going on 64-bit Windows XP. The Factor compiler and FFI is ready to go, and it should be an easy port in theory, but the only problem is that we have to compile the VM (which is written in C). And this isn't trivial at all. Here is what we found:
  • First of all, Factor compiled with 32-bit Cygwin runs in 32-bit mode under "WoW emulation", however certain features don't work, for example UDP support. This is a bug in Windows and there's nothing we can do; a C testcase demonstrates the problem quite clearly. The problem is not present in 64-bit mode.
  • Cygwin also ships with broken implementations of certain C runtime functions, such as _osf_openhandle(). We have a testcase which fails in Cygwin and works in Visual Studio (I reported the bug to the Cygwin guys.)
  • 32-bit Mingw is highly unstable on 64-bit Windows -- you need an obscure workaround just to get it to run at all -- however it does not suffer from the _osf_openhandle() bug. The UDP sockets problem is still there since its a Windows bug, not specific to Cygwin or Factor.
  • There's a 64-bit Mingw port in progress, however we were unable to get it to work. Using optimization flags crashes gcc; without optimization flags gcc produces a non-functioning binary, and PE Explorer claims this binary is corrupt.
  • There are no plans at all to port Cygwin to 64-bit Windows.

So the executive summary is that we're pretty much fucked when it comes to 64-bit Windows support, and all because I made a serious miscalculation and used GNU C extensions in the VM. I should have stuck with semi-portable C and then we could use Visual Studio...

If we could compile the Factor VM as a 64-bit binary, then porting the rest of Factor would be trivial; after all, we support Windows, and we support AMD64, and we're awesome programmers. However, the GNU hippies fucked up once again and failed to deliver -- AMD64 chips have been out since 2004 and Windows XP x64 edition was released at the start of 2005, but at the end of 2007, more than two years later, there is still no stable 64-bit GNU toolchain available for Windows.

I realize that this is all just people's hobbies, they work on Cygwin/Mingw in their spare time, I shouldn't expect timely releases or stability of any sort, etc. But hell, if you GNU people want "world domination", if you want us to stop using Mac OS X, Windows and other commercial software, if you want market share greater than 0.00001%, if you want people to take GNU and Linux seriously, get to work and fix your shit. Thank you.

Deploying stand-alone applications on Windows

Everybody loves Microsoft and Windows, and Steve Ballmer is probably the greatest man alive, so for me, a solid Windows port of Factor is a high priority.

The major blocker to getting the deploy tool working on Windows was the lack of <process-stream> support. Doug had some incomplete code written for this earlier. By looking at his code together with the Twisted Python implementation of the same feature, I was able to get it working. The trick is that Windows only supports blocking reads/writes on anonymous pipes, so one has to use named pipes (and resort to stupid tricks like naming them using a random number/milliseconds to ensure uniqueness), which do support non-blocking I/O via completion ports.

So, here is an example. We wish to deploy the Tetris demo, so we type this in the listener:
"tetris" deploy-tool

This opens a window:

Now, we click "Deploy". This spawns a new Factor process which bootstraps and tree shakes a tetris image, dumping output into the current listener window (this is where <process-stream> comes in). After a few minutes, we get a "Tetris" directory which has everything we need:

Note the FreeType files and DLLs; they will go away as soon as we ditch FreeType and start using native font rendering APIs (real soon now), so in the future all you'll have is the executable, the image file, and factor-nt.dll. Double-clicking the executable runs the program:

Now the raptor icon is nice and all, but eventually we'll also have a way to specify custom icons for deployment, on both Mac OS X and Windows.

The completion of the Windows <process-stream> implementation marks a milestone in Factor development. The Windows I/O subsystem has now reached feature parity with the Unix side. This means Factor now offers high-level, cross-platform APIs for the following tasks:
  • Creating, renaming, copying, deleting files and directories
  • Reading and writing files sequentially
  • Memory mapped files
  • TCP/IP sockets
  • UDP/IP sockets
  • IPv4 and IPv6 addressing, host name lookups
  • Launching processes
  • Launching processes with redirected I/O

Where possible, all I/O is non-blocking (the user-level API is presented as a blocking stream API, however under the hood Factor's co-operative thread scheduler yields between threads while waiting on I/O).

On Windows CE, the I/O support is spotty. Windows CE doesn't support pipes or non-blocking I/O, so certain features are not implemented, but we try the best we can.

Factor has an awesome FFI and now it has an awesome I/O library, which will still gain additional features in the future, for example Doug is already working on Windows directory change notification and I'll be doing the Unix side of that soon. A small step for Factor, a giant leap for stack languages!

Saturday, November 17, 2007

Converting a file from big endian to little endian (or vice versa)

Suppose you have a file of 4-byte cells and you wish to reverse each cell. What do you do? Fire up Factor, the latest code from git at least!
USING: io.mmap io.files math.functions splitting sequences ;

: byte-swap ( file -- )
dup file-length 4 align [
4 <sliced-groups> [ reverse-here ] each
] with-mapped-file ;

Given a sequence, 4 <groups> creates a virtual sequence of 4-element slices of the underlying sequence. We apply reverse-here to each slice. Since slices share storage with their underlying sequene, this has the effect of reversing a portion of the underlying sequence in-place. But here, the underlying sequence is a memory-mapped file's bytes.

The interesting thing here is that we're composing two orthogonal features: memory mapped files, and sequence operations.

Also note that error handling is at least half way there: if an error is thrown for whatever reason, the memory mapping is closed and cleaned up for you.

I just got mmap working on Windows CE, too. So the above code will work any supported platform.

Monday, November 12, 2007

CGI support in the Factor HTTP server

Now that io.launcher has the ability to pass environment variables to child processes, I was able to implement CGI support very easily. I realize CGI is slow and obsolete, however it is still handy sometimes, as a lowest common denominator for integrating third-party web apps. I intend to use it to run gitweb.cgi (a git repo browsing tool written in Perl) on factorcode.org.

The CGI support is loaded by default when one does USE: http.server, however to enable it the cgi-root variable must be set to a directory containing CGI scripts. This variable can either be set globally, per vhost, or per responder.

Files whose extension is .cgi are executed and the results are served to the client; other files are served literally.

The implementation of the CGI responder is quite straightforward. First, we have some boilerplate:
! Copyright (C) 2007 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: namespaces kernel assocs io.files combinators
arrays io.launcher io http.server http.server.responders
webapps.file sequences strings ;
IN: webapps.cgi

Now, we define the configurable variable:
SYMBOL: cgi-root

Next, we have a word to build the associative mapping of environment variables which we pass to the child process. The HTTP server calls responders after storing various values pertaining to the request in variables; our task here is to convert the HTTP server's representation of this data into a form which is expected by the CGI script. This word breaks the "keep words short" rule, but its definition is so straightforward that it is readable regardless, and breaking it up into smaller words would probably not help:
: post? "method" get "post" = ;

: cgi-variables ( name -- assoc )
[
"SCRIPT_NAME" set

"CGI/1.0" "GATEWAY_INTERFACE" set
"HTTP/1.0" "SERVER_PROTOCOL" set
"Factor " version append "SERVER_SOFTWARE" set
host "SERVER_NAME" set
"" "SERVER_PORT" set
"request" get "PATH_INFO" set
"request" get "PATH_TRANSLATED" set
"" "REMOTE_HOST" set
"" "REMOTE_ADDR" set
"" "AUTH_TYPE" set
"" "REMOTE_USER" set
"" "REMOTE_IDENT" set

"method" get >upper "REQUEST_METHOD" set
"raw-query" get "QUERY_STRING" set

"User-Agent" header-param "HTTP_USER_AGENT" set
"Accept" header-param "HTTP_ACCEPT" set

post? [
"Content-Type" header-param "CONTENT_TYPE" set
"raw-response" get length "CONTENT_LENGTH" set
] when
] H{ } make-assoc ;

Next, we have a word which constructs a launch descriptor for the CGI script. See my post from earlier today about io.launcher improvements to learn about descriptors, which are a new feature. This word uses the cgi-root variable together with the above cgi-variables word:
: cgi-descriptor ( name -- desc )
[
cgi-root get over path+ 1array +arguments+ set
cgi-variables +environment+ set
] H{ } make-assoc ;

Now, the word which does the hard work. It spawns the CGI script, sends it the user input if the request type is a POST, then copies output to the socket:
: (do-cgi) ( name -- )
"200 CGI output follows" response
stdio get swap cgi-descriptor &t;process-stream> [
post? [
"raw-response" get
stream-write stream-flush
] when
stdio get swap (stream-copy)
] with-stream ;

Files whose extension is not .cgi are simply served to the client; we have a little word which fakes a file responder:
: serve-regular-file ( -- )
cgi-root get "doc-root" [ file-responder ] with-variable ;

Now, the main dispatcher. First we weed out invalid inputs, pass files which don't end with .cgi to serve-regular-file, and send everything else to (do-cgi):
: do-cgi ( name -- )
{
{ [ dup ".cgi" tail? not ] [ drop serve-regular-file ] }
{ [ dup empty? ] [ "403 forbidden" httpd-error ] }
{ [ cgi-root get not ] [ "404 cgi-root not set" httpd-error ] }
{ [ ".." over subseq? ] [ "403 forbidden" httpd-error ] }
{ [ t ] [ (do-cgi) ] }
} cond ;

That's it as far as code is concerned. All that's left is to register a responder which calls do-cgi:
global [
"cgi" [ "argument" get do-cgi ] add-simple-responder
] bind

Improved process launcher

The io.launcher vocabulary now supports passing environment variables to the child process.

Launching processes is a complex and inherently OS-specific task. For example, until version 1.5, Java had a bunch of methods in the Runtime class for this, each one taking a different set of arguments:
Process exec(String command)
Process exec(String[] cmdarray)
Process exec(String[] cmdarray, String[] envp)
Process exec(String[] cmdarray, String[] envp, File dir)
Process exec(String command, String[] envp)
Process exec(String command, String[] envp, File dir)

The fundamental limitation here was that it would always launch processes with stdin/stdout rebound to a new pipe; so there was no easy way to launch the process to read/write the JVM's stdin/stdout file descriptors, you had to spawn two threads to copy streams. In 1.5, they added a new ProcessBuilder class, but it doesn't seem to offer any new functionality except for merging stdout with stderr. It still always opens pipes.

The approach I decided to take in Factor is similar to the ProcessBuilder approach, but more lightweight. The run-process word takes one of the following types:
  • A string -- in which case it is passed to the system shell as a single command line
  • A sequence of strings -- which becomes the command line itself
  • An association -- a "process descriptor", containing keys from the below set.

The latter is the most general. The allowed keys are all symbols in the io.launcher vocabulary:
  • +command+ -- the command to spawn, a string, or f
  • +arguments+ -- the command to spawn, a sequence of strings, or f. Only one of +command+ and +arguments+ can be specified.
  • +detached+ -- t or f. If t, run-process won't wait for completion.
  • +environment+ - an assoc of additional environment variables to pass along.
  • +environment-mode+ - one of prepend-environment, append-environment, or replace-environment. This value controls the function used to combine the current environment with the value of +environment+ before passing it to the child process.

While run-process spawns a process which inherits the current one's stdin/stdout, <process-stream> spawns a process reading and writing on a pipe.

Idiomatic usage of run-process uses make-assoc to build the assoc:
[
{ "ls" "/etc" } +arguments+ set
H{ { "PATH" "/bin:/sbin" } } +environment+ set
] H{ } make-assoc run-process


I'm happy with the new interface. With two words it achieves more than Java's process launcher API, which consists of a number of methods together with a class.

Not only is the interface simple but so is the implementation.

The run-process first converts strings and arrays into full-fledged descriptors, then calls run-process* which is an OS-specific hook.

In the implementation of this hook, the fact that any assoc can be treated as a dynamically scoped namespace really into play. The io.launcher implementation has a pair of words:
: default-descriptor
H{
{ +command+ f }
{ +arguments+ f }
{ +detached+ f }
{ +environment+ H{ } }
{ +environment-mode+ append-environment }
} ;

: with-descriptor ( desc quot -- )
default-descriptor [ >r clone r> bind ] bind ; inline

Passing a descriptor and a quotation to with-descriptor calls it in a scope where the various +foo+ keys can be read, assuming their default values if they're not explicitly set in the descriptor.

So, if we look at the Unix implementation for example,
M: unix-io run-process* ( desc -- )
[
+detached+ get [
spawn-detached
] [
spawn-process wait-for-process
] if
] with-descriptor ;

It calls various words, all of which simply use get to read a dynamically scoped variable. These are resolved in the namespace set up by with-descriptor. For example, there is a word
: get-environment ( -- env )
+environment+ get
+environment-mode+ get {
{ prepend-environment [ os-envs union ] }
{ append-environment [ os-envs swap union ] }
{ replace-environment [ ] }
} case ;

It retrieves the environment set by the user, together with the environment of the current process, and composes them according to the function in +environment-mode+. There is no need to pass anything around on the stack; any word can call get-environment from inside a with-descriptor.

Notice that constructing a namespace with make-assoc, and then passing it to a word which binds to this namespace and builds some kind of object, is similar to the "builder design pattern" in OOP. However, it is simpler. Instead of defining a new data type with methods, we essentially just construct a namespace then run code in this namespace.

Windows CE native I/O

I fixed some bitrot in Doug's Windows CE I/O code, and added support for console I/O, and the <file-appender> word.

Recall that Factor uses ANSI C I/O functions (fread, fwrite, on FILE* pointers...) unless there is an OS-specific I/O subsystem implemented.

The ANSI C I/O functions don't perform any buffering with the (already slow) Windows CE console, which results in very slow output. Having console I/O go through a buffered I/O layer which uses native Windows APIs has improved performance considerably. Words such as see no longer make you wait and watch as individual words are printed out.

Also, a native I/O layer means networking (TCP and UDP) may be used. There are other, more minor features as well (creating directories, removing files, etc).

Unfortunately, we don't have non-blocking I/O on Windows CE, because there is no general I/O multiplexer like IO completion ports on Windows or select() on Unix. Console and file I/O is always blocking, and network I/O can go through a Winsock-only multiplexer.

In the future, I'll write a small piece of C code which is linked with the Factor runtime. This code will start a network multiplexer running in its own thread; the thread will post events to an event queue. Then, the Factor event loop can use WaitForMultipleObjects() to multiplex over the I/O event queue and the UI event queue.

However, for now, I'm done with the CE port. There is still a bug in the ARM backend preventing Chris Double from bootstrapping Factor on his Nokia n800; I'm going to look into this next. In Factor 0.92, I will implement mark/sweep/compact garbage collection for the oldest generation, which will reduce memory usage on Windows CE, and I will also look into getting the UI running there.

Factor will be the premier high-level language for Windows CE development!

Network clipboard tool written in Factor

So lately I've been going back and forth between two machines, my Mac and my Windows box. Getting small chunks of text between the machines (stuff other than what goes in the git repository) is a bit inconvenient. So I cooked up a tool to synchronize their clipboards. It is very rough, but you can use it as follows:
  • network-clipboard-tool - opens the tool, showing a list of remote hosts. Each host is a presentation.
  • "foo" add-network-clipboard - adds a remote host to the host list.
  • Clicking Clipboard Server in the window allows other machines to add this host to their network clipboard host list.
  • Right-clicking on a host presentation and invoking Send Clipboard sends the current machine's clipboard to the remote host.
  • Right-clicking on a host presentation and invoking Receive Clipboard sets the current machine's clipboard to that of the remote host.

Since bidirectional transfer is supported, only one of the two machines needs to have a server running, but for convenience I run it on both. This is what it looks like, after right-clicking on the second host in the list:

The code is reasonably compact while still demonstrating TCP/IP networking and a number of UI features, such as models, operations, presentations, and commands. It has several limitations:
  • Only supports Windows and Mac, not X11, since on X11 the UI uses a different set of words for clipboard access.
  • There's no graphical way of editing the host list
  • Perhaps even better, support for something like Apple's ZeroConf
  • There is no security, but if you use it on a private LAN there is no problem

The code is in the repository now, in extra/network-clipboard. To try it out, just grab the latest Factor and do
"network-clipboard" run

The program begins with the usual boilerplate:
! Copyright (C) 2007 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: io.server io.sockets io strings parser byte-arrays
namespaces ui.clipboards ui.gadgets.panes ui.gadgets.scrollers
ui.gadgets.buttons ui.gadgets.tracks ui.gadgets ui.operations
ui.commands ui kernel splitting combinators continuations
sequences io.streams.duplex models ;
IN: network-clipboard

A utility word to slurp in the current stream. It differs from the contents word in the io vocabulary in that it operates on stdio and doesn't close it at the end:
: contents ( -- str )
[ 1024 read dup ] [ ] [ drop ] unfold concat ;

Now we can implement a very simple network protocol for getting and setting clipboard contents:
: get-request
clipboard get clipboard-contents write ;

: set-request
contents clipboard get set-clipboard-contents ;

We use the with-server combinator, which sets up a simple multi-threaded accept loop with logging:
: clipboard-port 4444 ;

: clipboard-server ( -- )
clipboard-port internet-server "clip-server" [
readln {
{ "GET" [ get-request ] }
{ "SET" [ set-request ] }
} case
] with-server ;

We declare that this word is an UI command which is nullary (does not operate on a target option) and wishes to display output in a listener:
\ clipboard-server H{
{ +nullary+ t }
{ +listener+ t }
} define-command

A combinator to make a network connection:
: with-client ( addrspec quot -- )
>r <client> r> with-stream ; inline

A host type and a simple protocol where either a string or a host can answer its name:
TUPLE: host name ;

C: <host> host

M: string host-name ;

Sending the current clipboard to another host:
: send-text ( text host -- )
clipboard-port <inet4> [ write ] with-client ;

: send-clipboard ( host -- )
host-name
"SET\n" clipboard get clipboard-contents append swap send-text ;

We define it as an UI operation on hosts:
[ host? ] \ send-clipboard H{ } define-operation

Requesting another host's clipboard:
: ask-text ( text host -- )
clipboard-port <inet4>
[ write flush contents ] with-client ;

: receive-clipboard ( host -- )
host-name
"GET\n" swap ask-text
clipboard get set-clipboard-contents ;

[ host? ] \ receive-clipboard H{ } define-operation

Printing host presentations:
: hosts. ( seq -- )
"Hosts:" print
[ dup <host> write-object nl ] each ;

The UI tool class:
TUPLE: network-clipboard-tool ;

We define a command map for the toolbar, containing a single command:
\ network-clipboard-tool "toolbar" f {
{ f clipboard-server }
} define-command-map

The constructor creates a new network clipboard tool instance, adding a toolbar and a list of hosts there. It takes a model as an input:
: <network-clipboard-tool> ( model -- gadget )
\ network-clipboard-tool construct-empty [
toolbar,
[ hosts. ] <pane-control> <scroller> 1 track,
] { 0 1 } build-track ;

Now, a global model containing clipboard hosts and words to maintain it:
SYMBOL: network-clipboards

{ } <model> network-clipboards set-global

: set-network-clipboards ( seq -- )
network-clipboards get set-model ;

: add-network-clipboard ( host -- )
network-clipboards get [ swap add ] change-model ;

The final word -- it opens a window containing a new instance of this tool, with the global hosts model:
: network-clipboard-tool ( -- )
network-clipboards get
<network-clipboard-tool> "Network clipboard" open-window ;

Friday, November 09, 2007

Windows CE Pocket Console crash when writing long strings

This one was a pain to track down. If you write a string longer than about 1024 characters to the console, it just crashes your application. Test case:

#include <windows.h>

#define SIZE 1124

int main() {
HANDLE h = _fileno(_getstdfilex(1));
char buf[SIZE];
memset(buf,'x',SIZE);
unsigned int count = 0;
printf("About to call WriteFile()\n");
int result = WriteFile(h,buf,SIZE,&count,NULL);
printf("\n");
printf("WriteFile returns %d\n",result);
}

It never gets as far as "WriteFile returns...". If you change SIZE to something small like 124 it works.

Tuesday, November 06, 2007

Asus Eee PC

Looks like I've finally found a laptop that I can like: the Asus Eee PC. I'm not a fan of your average laptop; huge, heavy, runs through the battery in 3 hours, and cooks my balls. This "eee PC" only has a 7" screen, solid state disk, and (presumably) good battery life. I need to check out a few more reviews, but I'm probably going to buy one as soon as its released.

Lie algebra cohomology and cellphones

The Windows CE port is now working again. I can bootstrap and run tests successfully.
There are three main limitations:
  • Memory usage is high because of the copying collector. I will implement mark/sweep/compact collection for the oldest generation soon.
  • Network sockets and non-blocking I/O is not supported yet. I need to finish off Doug's io.windows.ce code for that to work.
  • Factor must be run from inside the command prompt. The command prompt is awkward to install and use. Eventually I will port the Factor UI to Windows CE, in some cut-down form, and you'll get a nice listener instead.

You can download a binary package I threw together. It includes the command prompt, which might not be legal. To run it, you must follow these steps:
  • Unzip the ZIP file on your phone; root directory is the easiest place, otherwise you'll be typing full pathnames all the time (Windows CE doesn't have a "current directory" concept).
  • Run pocketconsole.arm.cab to install the console driver.
  • Open a registry editor such as TRE, and change the HKEY_LOCAL_MACHINE\Drivers\Console\OutputTo to the number 0.
  • Run cmd.exe.
  • In the command prompt, run factor-ce.
  • If you want, copy the core and extra directories from the Factor 0.91 sources, so that you can load external modules.

Annoyingly error-prone and long-winded? You bet! Soon you'll just be able to double-click on factor-ce.exe and it will start an UI listener, but for now the command prompt is as good as it gets.

To help with code input on the small dinky cellphone keyboard, I loaded some shortcut definitions. So now you can write u tools.time instead of USE: tools.time, etc.

Now, where does Lie algebra cohomology come into this? Well, I tried running extra/koszul on the phone, and it works fine. This is probably one of the least practical things I have ever done in my life, but it is a good way of testing whether a non-trivial program can run.

Playing around with extra/koszul and math stuff on the phone made me wonder. A modern cell phone has a much higher resolution (not to mention color) screen compared to a graphing calculator. Furthermore it has more RAM and a faster CPU. For example, the TI Voyage 200 is rather bulky and only has 188K of user-available RAM. My phone has more than 300 times that. The TI-89 is pretty much the same but with a different form factor. The calculators have an advantage when it comes to input, but perhaps predictive completion, handwriting recognition and the iPhone multitouch can narrow the gap there. A good CAS for cellphones and PDAs could cut into HP and TI's business...

Monday, November 05, 2007

Problems with flushing the instruction cache on Nokia n800

Chris Double noted that he was having trouble bootstrapping Factor on his Linux/ARM-based Nokia n800 Internet tablet. The crashes were random and we determined that it was due to the instruction cache not being flushed.

Factor was using the following code to flush the instruction cache:
syscall(__ARM_NR_cacheflush,start,start + len,0);

Now I wasn't checking the return value of this... silly me. After adding a check for the return value, Chris's tablet was returning "operation not permitted" from the above call.

After some Googling, I came across a post on the Parrot mailing list. Here, they were flushing the instruction cache by calling the system call directly, without using the syscall wrapper. I asked Chris to try their code, and it worked. Here it is:
 __asm__ __volatile__ (
"mov r0, %1\n"
"sub r1, %2, #1\n"
"mov r2, #0\n"
"swi " __sys1(__ARM_NR_cacheflush) "\n"
"mov %0, r0\n"
: "=r" (result)
: "r" (start), "r" (start + len)
: "r0","r1","r2");

Now, I'm pretty confused by this. In theory, the syscall function should do the same thing as the above. So either the syscall function in the version of libc shipped by Nokia is broken, or I'm stupid and missing something obvious.

Either way, it appears that Factor works on the n800 now.

Tuesday, October 30, 2007

Playing around with the UI

I'm working on a graphical front-end for Factor's stand-alone app deployment tool. I implemented checkbox and radio button gadgets and threw together a prototype. Check out how the UI is constructed in the source code; it's pretty neat.

Sunday, October 28, 2007

Improved usability of profiler tool

Factor 0.90 introduced a profiler:
enable-profiler

[ ... ] profile

disable-profiler

The problem was that the enable-profiler/disable-profiler calls would recompile all words to add/remove a prologue which incremented the word's call count. The point here was that we only want a profiler prolog when we're actually profiling, and not all the time, because of the time overhead in testing the global profiling flag on every word invocation. However, in practice this meant that I rarely used the profiler, because recompiling all words takes a minute or two.

Today I changed the implementation to be much more clever. Now, words are always compiled with profiler prologues. The trick is that unless the profiler is enabled, the entry point of the word points at just after the profiler prolog. When the profiler is enabled, the entire code heap is scanned, and pointers to words are updated to point to before the prolog.

Seems like a pretty obvious idea in hindsight. I know I'll end up using the profiler a lot more now. I also want to experiment with using it to implement new tools, such as test coverage. Basically you'd run unit tests for a certain vocabulary with the profiler enabled, then look at what percentage of words in that vocabulary were called. This won't test branch coverage, but Factor words tend to be short, so a coarse word-oriented coverage tool can still be useful here.

SourceForge mailing list archive problems

It seems that the SourceForge Factor-talk mailing list archive has stopped archiving posts. The last archived post is from 11 days ago, but there were numerous posts since then.

The Factor project's entire Internet presence was formerly hosted on SourceForge. At some point I moved the web site, then downloads, over to my server. Hopefully soon, the mailing list will follow suit, and I can stop using SourceForge forever.

jEdit was SourceForge project #588. It was okay back then. Now there are 160,000 projects! In the last several years, I've experienced many "SourceForge moments":
  • CVS servers which are slow and almost always down
  • Anonymous CVS which lags behind developer CVS by days
  • Mysterious download corruption
  • Weird outages based on the first letter of the project (eg, "all projects starting with G-R are inaccessible")
  • Mailing lists where posts take days to appear
  • Not being able to log in on some pages
  • Ridiculously low disk quotas for projects
  • Download counters took them at least a year to implement properly

Of course this is a free service, blah blah, so I shouldn't complain. I'm glad there are other choices, like linode.com hosting, which is really cheap and trouble-free.

I'll bet that in the next few years, SourceForge will experience a huge outage and tens of thousands of projects will lose their data forever...

Wednesday, October 24, 2007

Going to Minneapolis

I'm back from Montreal and already looking forward to my next trip. I'm going to Minneapolis (in Minnesota) from November 18th to November 22nd. I'm going to meet the world-famous littledan.

Friday, October 19, 2007

Going to Montreal on Monday

I'm going to Montreal on Monday night to attend the MSLUG meeting. I'm not going to OOPSLA because I don't have the time. If you want to hang out before or after the MSLUG meeting, though, let me know. I look forward to meeting Christopher Diggins and Zach Beane.

Tuesday, October 16, 2007

IBM developerworks is lame

I saw an article on IBM Developerworks about Cusp, an IDE for Lisp. Written by a student of Computer Science who obviously first saw Lisp last semester, it made me chuckle a little bit.

"Its strength is in the processing of lists: AI and symbol mathematics."

"You'll notice that Lisp is not at all like other general programming languages. For example, in most general programming languages, you perform multiplication just as you would on paper: int times = 5 * 5;."

(Sorry, but I don't write "int times = 5 * 5;" on paper... perhaps he meant 5×5?)

"Note that the above method actually defines a type string. Up to now, you've been using Lisp as a largely typeless language. Though the double quotes implicitly types data as strings, the above method explicitly types both the input and output to the concat2 function as strings."

(Me no speak good English? Does anybody proofread this stuff or do they nod and smile, looking all impressed at this kid's Lisp knowledge?)

The lack of attention to detail also becomes apparent when he can't even get a simple recursive function right near the end.

The article misses out on all the important features that Cusp offers which make it a serious alternative to SLIME with Emacs: cross-referencing, code completion, navigation, etc. Instead we get a "AI using recursion and linked lists" article.

Most of the links at the end are not very useful either. You have a link to the wikipedia article on Lisp, a paper from 1979, and about 30 Eclipse links. There's also a link to Paul Graham, who nowadays is more of a crazy rich guy with money than a programmer.

Ironically, the one relevant link, Programming Lisp with Cusp, is actually about a thousand times more informative and better written than the Developerworks article is.

Now, the paranoid Lisp zealots like to chalk things like this up to semi-deliberate attempts at smearing Lisp. However, I know that Developerworks articles on Java are just as bad, if not worse. Hell, any Developerworks article, on any topic, is usually extremely useless, misleading and poorly written.

This is just another case of a "by idiots, for idiots" type situation. if you want to learn how to program like IBM's finest enterprise drones, look no further than IBM Developerworks.

If they want to be taken seriously, they need to put some basic editorial standards in place. There needs to be a distinction between a blog and a site like this; the latter should not just be a mass blog orgy.

Friday, October 12, 2007

New addition to planet-factor blogroll

Samuel Tardieu's weblog is now syndicated on planet-factor. Welcome to the community, Sam. Parsing Sam's atom feed actually exposed a bug in Factor's XML parser; Daniel Ehrenberg fixed it on very short notice. Thanks Dan!

New toys to play with

Doug Coleman sent me his old Cingular 8525 (Windows CE) phone. And my friend Erik in Ottawa gave me his ancient Dell Windows laptop in exchange for a Mac Mini that I haven't used since I bought the G5. Win-win situation: I get a Windows box for development, Erik gets a (faster!) computer which doesn't suffer from spyware/adware/etc. :-)

I'm now going to be in charge of the Windows CE port of Factor, and I can help out a bit with the Windows XP/Vista port too! Doug can now focus on more interesting things. I've been pestering him about finishing the regexp library for a while now... :-)

I installed Cygwin, CeGCC and Microsoft's ActiveSync on the Windows laptop, then I got a command prompt going on the phone using the directions in Chris Double's weblog. Note that Chris doesn't mention this, but Windows CE does not ship with a registry editor; to edit the registry you need to install a third-party registry editor. I used TRE.

Doug originally got the phone cheap with an AT&T contract; I used the Hermes Unlocker tool to CID/SIM unlock the phone (which is illegal, voids the warranty, etc.)

While Factor 0.90 worked on Windows CE and ARM, there have been a few changes since then. First of all is the new two-tiered compilation; I have to implement the ARM backend for the new non-optimizing compiler. Then I have to make some (minor) changes to the ARM optimizing compiler backend.

Once it is in a working state, my first task will be removing the dependency on CMD.EXE. It is a pain to install (requires editing the registry) and is rather unusable on the small screen of the phone. Instead I'll borrow code from PythonCE, which implements a simple input/output console using the Win32 API. This will increase the size of Factor's VM on Windows CE somewhat, however it won't depend on the hard to install CMD.EXE.

Eventually I want to get the Factor UI running on the phone, with a cut-down UI listener and workspace. This way I can experiment with new input methods and interesting ways of writing code on the phone. I think Factor is concise and powerful enough that on-board development will be a practical possibility (of course you'll always be able to develop your Factor apps on a desktop machine and copy them over to your phone, but where's the fun in that?)

The phone only has 64 Mb of memory. While Factor's semispace GC is very efficient on systems with virtual memory and overcommit, on a phone with no swap having that semispace just hang around there sucking up RAM is really, really painful. I will be implementing an incremental mark and sweep GC at some point, which will improve interactive performance on desktop machines and reduce memory usage on the phone.

Wednesday, October 10, 2007

Announcing Planet Factor

I put together an Atom/RSS aggregator in Factor. It is running live on the experimental HTTP server. Parroting the idea behind Planet Lisp, now there's Planet Factor.

It took me about an hour of work in total to put this together. Most of the parts were already there - Chris Double's Atom/RSS library, which uses Daniel Ehrenberg's XML library is the cornerstone here, and of course the HTTP server itself.

A thread fetches the feeds every 30 minutes. Entries are then concatenated and sorted by date, and stored in a global variable. The Planet Factor web app together with the news box on the Factor home page access this variable and present the results in a nice way.

The Planet Factor source is relatively short. (And on a somewhat related note, that pastebin is a Factor web app too...) This is my favorite part:
    dup [
second
dup "Fetching " diagnostic
dup news-get feed-entries
swap "Done fetching " diagnostic
] parallel-map

Ignoring the diagnostic message logging, you basically have
[ news-get ] parallel-map

Where news-get makes a network connection, downloads the XML and parses it. While the XML parsing is not parallelized, the network I/O is. Not many languages make it this easy to fire off overlapping I/O operations...

Passing structs by value on AMD64

The Linux/AMD64 ABI is a bit crazy when it comes to passing structs by value. The rules are as follows:
  • If the structure is greater than 16 bytes, it must be passed on the C stack, even if there are enough registers.
  • If the structure is 16 bytes or less, it is broken up into 8 byte chunks. Each chunk is considered separately.
  • If at least one of the fields in a chunk is an integer, the whole chunk is passed in an integer register.
  • Otherwise, if the chunk is either two float fields, or a double field, it is passed in an SSE register. In particular, this means that if your structure consists of two floats, they must be packed into a single SSE register.

There are additional rules for C99 complex numbers and long double types, but Factor's FFI doesn't support those so I don't have to worry about that.

For a long time I didn't properly implement these semantics. They're implemented now. I don't think anybody has run into this issue, however when Mac OS X Leopard is released with 64-bit Cocoa, it would certainly come up since many Cocoa APIs rely on passing structs (assuming Apple uses the same or similar ABI as Linux on AMD64, which is likely).

I think the FFI is one of Factor's strongest points. Many popular languages have weak FFIs (or are weak in other respects). On the other hand, when implementing an obscure language, one must work twice as hard as everybody else to be taken seriously.

Sunday, September 30, 2007

Improved intrinsics for alien accessors

Chris Double's Ogg Theora video player has served as a constant inspiration for ways to improve Factor's compiler. The key bottleneck is the YUV to RGB conversion.

In Factor 0.89, I implemented interval inference; this let the compiler convert some arithmetic operations to machine arithmetic.

In Factor 0.90, I added open-coded intrinsics for alien accessor words (words for reading/writing C values in memory); this allowed more values in the inner loop to be stored in registers, without incurring calls into the VM together with save/restore of values to the data stack.

In the last few days, I implemented some representation inference, which allows intermediate alien values to be stored as unboxed pointers. In the YUV to RGB inner loop, a pointer is read from a C struct; this pointer is then dereferenced. Prior to this, the pointer would be boxed in an alien instance when read, then unboxed when dereferenced. The compiler is now able to optimize this out.

The improvements in performance over the last year have been amazing. I don't remember how slow the Ogg player was when it was first added, but we're talking unusable slow -- something like 200 ms to convert a 480x320 frame of video from YUV to RGB on my Power Mac G5. When interval inference was added, the time went down to about 100ms per frame. Open-coded alien accessors improved this down to 30ms. And now the recent representation inference has again improved the decode time down to 15ms.

During this time the YUV to RGB conversion algorithm has not changed much; it still sits at the same level of abstraction (generic arithmetic, etc.) The compiler has done all the hard work of making it faster.

I still plan on working on the compiler in the future with the goal of making YUV to RGB faster. Now that the low-hanging fruit have been picked, and the YUV to RGB inner loop no longer calls into the VM or allocates memory, further improvements will come from better register allocation and instruction selection.

Friday, September 28, 2007

First impressions of git

So far, I like git a lot. My favorite feature is branches, and the goodies that come with that, such as git fetch, which pulls patches from a remote repository into a branch. I like using git add to flag files to commit more than the darcs "cherry picking", which is tedious and error-prone. I also like it how git status shows me any files which have not been added yet.

The gitk GUI, while not great, is a nice way to get an overview of what's going on with the repository.

Of course the best thing about git is the speed. The Linux Kernel developers use git, and they have a huge codebase with many developers. I'm confident that I've made the right choice by switching to git.

One thing I don't like about git is the complexity -- there are hundreds of commands and it certainly seems like the user interface could be simplified, for the common operations anyway. I'm slightly worried that the increased learning curve might be a hindrance to getting contributors in the future. However, having to wait 10 minutes to darcs push is a turn-off to contributing in itself.

Thursday, September 27, 2007

A neat trick for refactoring code

So today I'm going to be cleaning up one of the vocabularies involved in code generation in the optimizing compiler. Each vocabulary has a set of words which are used by other vocabularies -- the "public interface", and implementation words, those only used to implement the interface. I want to rewrite the generator.registers vocabulary but keep the public interface as close to its current state as possible. Trouble is, the public interface is not well-defined here; this vocabulary does not use <PRIVATE and PRIVATE>. However, this is not a big problem, since I can just write a tool:
: used-externally ( vocab -- words )
dup words [
usage
[ word? ] subset
[ word-vocabulary = not ] curry* contains?
] curry* subset natural-sort ;

How does this work? It takes the list of words in a vocabulary, then for each word, checks if the usages of that word include any words outside of this vocabulary. If the answer is affirmative, it adds this word to the sequence that is returned.

Using this tool I found out that generator.registers vocabulary only has a handful of words which are used externally, and so it should be easy to do a radical refactoring which doesn't require many changes to other modules.

It is hard to articulate how powerful this is. Not only is the Factor listener is a great way of testing code, it also provides a way to interact with the system on a meta-level. I wonder how much work it would be to implement something like used-externally in a conventional IDE such as Eclipse; I'm sure all the pieces are there (cross-referencing, etc) but it can't be that easy to just glue things together like that.

Saturday, September 22, 2007

How to publish a git repository

Now that I'm using git, all the contributors have to switch, too. Chris Double wrote a blog entry about publishing a git repository. Thanks Chris!

The Factor project switches to git

Why git? Because it is faster than darcs, and has kick-ass branching features.

To check out a repo:
git clone git://factorcode.org/git/factor.git/

You can find documentation at the git website.

A gitweb interface is coming really soon; we have to implement cgi support for the Factor HTTP server first.

The code in the repository runs on the following platforms right now; more will be added very soon:
  • Mac OS X/PowerPC
  • Mac OS X/x86
  • Linux/x86
  • Linux/AMD64
  • FreeBSD/x86

You will need the latest boot images.

The old darcs repository is still there. It contains the last revision of Factor to include an interpreter, and so represents a historical milestone of sorts.

Friday, September 21, 2007

Factor web site redesign

Thanks to Elie Chaftari, we have a new Factor web site design. Check it out before it goes live! (Which will probably happen with the release of Factor 0.91).

The "Latest News" sidebar is just a placeholder; I will replace it with an RSS aggregator soon.

Also if you look carefully, you'll see mention of a git repository. Yes, I'm switching to git! Today I encountered a darcs bug/flaw/limitation while merging in Eduardo's patches (it was not Ed's fault) and decided that this would be the very last time.

Right now the repository contains the in-progress quotation compiler/compiled continuations code, which only runs on Mac OS X/PowerPC, FreeBSD/x86 and (just barely) Mac OS X/x86. When it is more stable and supports a wider range of platforms, I will make an announcement and formally retire the darcs repository.

Contributors: please continue using darcs for the next few days, but be prepared to migrate to git Real Soon Now.

Thursday, September 20, 2007

Monday, September 17, 2007

Bignum libraries

So does anybody know of a simple, efficient, BSD-licensed bignum library where the radix size is equal to the CPU's word size, and not the CPU's word size minus two bits?

Factor x86 calling convention

I'm working on the x86 backend for the quotation compiler. This has given me an opportunity to tweak the calling convention used by compiled Factor code. I will outline the calling convention I use, and highlight the main differences from the standard C calling convention.

Registers are assigned as follows:
  • EAX - volatile scratch
  • ECX - volatile scratch
  • EDX - volatile scratch
  • EBX - volatile scratch
  • EBP - volatile scratch
  • ESI - data stack pointer
  • EDI - retain stack pointer
  • ESP - callstack pointer
Factor's compiler does not generate code which saves values in non-volatile registers, because that would complicate matters for the GC. Instead, the only location where values may be saved between subroutine calls is the data and retain stacks. So I use EBX and EBP as volatile scratch registers, since otherwise they'd be unused.

Stack frames are fixed-size, 16 bytes each, with no explicit back-chain pointers. Stack frames have the following layout:
  • 4 bytes current quotation position - used by code compiled with quotation compiler
  • 4 bytes current quotation - used by code compiled with quotation compiler
  • Pointer to start current of current code block - used by code GC as a root when scavenging the call stack
  • Return address for callee

Unlike typical x86 code which passes parameters on the call stack, I use registers to pass parameters to quotations and words.

Quotations are called with the following values in registers:
  • EAX - a tagged pointer to the quotation itself
  • ECX - a pointer to the start of the quotation's code block. The quotation's compiled definition saves this in the stack frame

Compound definitions and symbols are called with the following values in registers:
  • EAX - the word itself

Primitive words are called with the following values in registers:
  • EDX - the top of the call stack before the primitive call

Compiled words can be called without any parameters in registers.

Compiled words are responsible for a pointer to the start of their own in their stack frame, however the caller of a compiled quotation must do this for the quotation. This is because right now, compiled quotations must be fully position independent without requiring any relocation.

Quick poll of x86 users

Does anybody care about Factor running on chips older than the Pentium Pro? I would like to start using the "conditional move" instructions which are not present on Pentium MMX and older chips.

Thursday, September 13, 2007

Software engineering best practices

I submitted a bug via Apple's bug tracker 4 months ago. The bug is a data corruption issue in the kernel/libc and it is very easy to reproduce. I also suspect it is easy to fix, since it is a case of saving/restoring some registers in signal handlers. Apple has not fixed it, or even acknowledged that it exists. I'm willing to bet $100 that nobody at Apple has even read the bug report. It is very likely the entire public bug reporter is just for show, sort of like Sun's "bug parade".

As for the infamous gcc x86 backend bug, well the one I submitted was closed as a duplicate... of a bug which has been open since 2003.

Update: my friend at Apple tells me my bug was fixed in June, but I'd have to be a Premium or Select ADC member to see this, since apparently even bug fix notifications are covered by the Leopard beta NDA cloud.

Monday, September 10, 2007

Two minor incompatible changes as a result of two-tier compilation

Surpringly such a large overhaul of the VM has not resulted in many incompatible changes. There are two I'm aware of:
  • Quotations are immutable now -- I've mentioned already.
  • Passing f to call or if is an error now. Being able to call f was a hold-over from the cons cell days when f was the empty quotation. These days this has required an unnecessary type dispatch, and it is what I consider a rather awkward pun. So this is a type error now.

Two-tier compilation comes to Factor

Factor 0.90 uses the following execution model:
  • Words with a static stack effect are compiled to machine code by the optimizing compiler.
  • Words which use continuations but otherwise have static stack effects run in the interpreter.
  • Words with dynamic stack effects run in the interpreter.
  • Primitives which modify the code heap are considered to have a dynamic stack effect, only to make sure that words calling them do not compile. This is because adding a new compiled code block can trigger a code GC, and code GC is not smart enough to scan the native call stack to identify return addresses as roots.

The interpreter is a rather inefficient affair, quotations are stored on an interpreter callstack; execution of a quotation proceeds by inspecting each element. If the element is a word, it is executed, if it is a wrapped word, it is unwrapped, and otherwise, it is a literal pushed on the data stack. This entails a runtime type dispatch for each element of the quotation.

What I'm working on right now is a new execution model. The code isn't ready yet and probably won't be for another week, but here is how everything works so far:
  • Words with a static stack effect are compiled to machine code by the optimizing compiler, just like before.
  • Words which use continuations but otherwise have static stack effects are compiled to machine code by the optimizing compiler - this is new!
  • Words with dynamic stack effects are compiled to machine code by the new non-optimizing compiler.
  • The GC now knows about native stack frames and is now able to traverse them when scanning for roots.

This is really hardcore -- Factor does not have an interpreter of any form now, except for the single-stepper in tools.interpreter. The amount of code which is compiled optimized has increased now that I/O can compile (I/O uses continuations under the hood to implement non-blocking, and would run in the interpreter; a significant performance problem). Not only that, but the remaining, truly dynamic code is compiled by a "mop-up" compiler.

Ages ago, quotations used to be linked lists. When I made them a distinct type of their own, I had this form of "quotation compilation" in mind as an eventual possibility. See this old article, and this one. It is good that more than a year later I've finally carried over this transition to its logical conclusion.

Quotations are still sequences, but are now immutable, and have an additional slot holding a pointer to a compiled code block.

The new non-optimizing compiler is written in C due to bootstrap reasons, but it is very trivial: it simply unrolls the interpreter loop, eliminating the dynamic dispatch, and using the native call stack instead of a fake interpreter call stack. Everything it compiles is a concatenation of a fixed number of basic blocks, which are assembled and provided by Factor code. So the C side is essentially just a glorified use-case of memcpy(). For example,
[ 2 + * ]
compiles to very naive machine code which performs the following actions:
  • A stack frame prologue -- this is omitted if the only calls to words are in tail position
  • Push literal 2
  • Call + word
  • Tail call (jump to) * word

While the generated code is only 2x faster than the interpreter, compile time is very fast and I don't expect any performance problems for code which constructs quotations on the fly.

Because Factor now controls the stack frame layout from top to bottom, implementing capture and restoration of compiled call stacks is relatively trivial. Scanning them for GC roots is a bit trickier. Note that Factor's GC is not conservative in any way; it manages to precisely identify all roots. Some language implementors claim that conservative collection is the only way to scavenge the native stack for roots, but this is false.

And not having an interpreter is a political victory of sorts. Factor is not an interpreted language, period -- it's thoroughly natively compiled.

The Self language uses a similar, but more advanced implementation strategy; a fast, non-inlining compiler compiles all code, and a highly optimizing compiler compiles hot-spots. Factor's optimizing compiler compiles all code for which it can infer a stack effect, and it doesn't try to do runtime type feedback.

Sunday, September 02, 2007

Factor is now four years old

Time flies! I started working on Factor early September 2003. I can't believe that it has been this long already. I first announced Factor to the Internet in April 2004, after it had already been in development for some time. Here is the original announcement.

In the last year, we had a stready stream of releases, from Factor 0.84 to Factor 0.90, bringing many improvements to functionality, stability and performance. There are also several new contributors on board, which is great. While Factor 1.0 still seems far away, I feel that I'm making good progress and getting closer to achieving my goals.

Here are some code line counts:
  • Factor VM (written in C): 10285 lines
  • Factor core (written in Factor, core/ directory): 21016 lines
  • Factor libraries (written in Factor, extra/ directory): 58853 lines
  • Unit tests: 10101 lines
  • Documentation: 187405 words

Compared to the stats from last year, we see that:
  • the VM has only grown by 2000 lines (yet has evolved considerably in the time)
  • the core actually shrank because of better coding practices, and because the UI was moved out of the core
  • the amount of contributed libraries has more than doubled!
  • the documentation and unit tests have grown considerably too

I think this shows that we're moving in the right direction. I hope to release Factor 1.0 some time next year, and make it a 1.0 release worth the five year wait: an expressive language, with excellent performance, solid libraries, powerful tools and a robust implementation. And after I get a chance to build some large-scale applications with Factor and draw upon the experience gained, the language will continue to evolve into Factor 2.0. Watch out.

Saturday, September 01, 2007

Factor 0.90 now available

Factor 0.90 is now available. The web site hasn't been updated yet, so you can't browse the docs online right now; so instead just download it and read the docs in the UI!

Lots of changes:
Core
  • New module system. (Eduardo Cavazos)
  • Tuple constructors are defined differently now.
  • Mixin classes implemented; these are essentially extensible unions.
  • New float-array data type implements a space-efficient sequence of floats.
  • Moved <file-appender>, delete-file, make-directory, delete-directory words from libs/io into the core, and fixed them to work on more platforms.
  • New host-name word.
  • The directory word now outputs an array of pairs, with the second element of each pair indicating if that entry is a subdirectory. This saves an unnecessary stat call when traversing directory hierarchies, which speeds things up.
  • IPv6 is now supported, along with Unix domain sockets (the latter on Unix systems only). The stack effects of <client> and <server> have changed, since they now take generic address specifiers; see Networking.
  • The stage 2 bootstrap process is more flexible, and various subsystems such as help, tools and the UI can be omitted by supplying command line switches; see Command line switches for bootstrap.
  • The -shell command line switch has been replaced by a -run command line switch; see Command line switches for general usage.
  • Variable usage inference has been removed; the infer word no longer reports this information.
Tools
  • Stand-alone image deployment in tools.deploy vocabulary .
  • Stand-alone application bundle deployment on Mac OS X in tools.deploy.app vocabulary.
  • New vocabulary browser tool in the UI.
  • New profiler tool in the UI.
Extras
    Most existing libraries were improved when ported to the new module system; the most notable changes include:
  • asn1: ASN1 parser and writer. (Elie Chaftari)
  • benchmarks: new set of benchmarks.
  • cfdg: Context-free design grammar implementation; see http://www.chriscoyne.com/cfdg/. (Eduardo Cavazos)
  • cryptlib: Cryptlib library binding. (Elie Chaftari)
  • cryptlib.streams: Streams which perform SSL encryption and decryption. (Matthew Willis)
  • hints: Give type specialization hints to the compiler.
  • inverse: Invertible computation and concatenative pattern matching. (Daniel Ehrenberg)
  • ldap: OpenLDAP library binding. (Elie Chaftari)
  • locals: Efficient lexically scoped locals, closures, and local words.
  • mortar: Experimental message-passing object system. (Eduardo Cavazos)
  • openssl: OpenSSL library binding. (Elie Chaftari)
  • pack: Utility for reading and writing binary data. (Doug Coleman)
  • pdf: Haru PDF library binding. (Elie Chaftari)
  • qualified: Refer to words from another vocabulary without adding the entire vocabulary to the search path. (Daniel Ehrenberg)
  • roman: Reading and writing Roman numerals. (Doug Coleman)
  • scite: SciTE editor integration. (Clemens Hofreither)
  • smtp: SMTP client with support for CRAM-MD5 authentication. (Elie Chaftari, Dirk Vleugels)
  • tuple-arrays: Space-efficient packed tuple arrays. (Daniel Ehrenberg)
  • unicode: major new functionality added. (Daniel Ehrenberg)

Performance
  • The curry word now runs in constant time, and curried quotations can be called from compiled code; this allows for abstractions and idioms which were previously impractical due to performance issues. In particular, words such as each-with and map-with are gone; each-with can now be written as curry* each, and similarly for other -with combinators.
  • Improved generational promotion strategy in garbage collector reduces the amount of junk which makes its way into tenured space, which in turn reduces the frequency of full garbage collections.
  • Faster generic word dispatch and union membership testing.
  • Alien memory accessors are compiled as intrinsics where possible, which improves performance in code which iteroperates with C libraries.
Platforms
  • Networking support added for Windows CE. (Doug Coleman)
  • UDP/IP networking support added for all Windows platforms. (Doug Coleman)
  • Solaris/x86 fixes. (Samuel Tardieu)
  • Linux/AMD64 port works again.

A big thank you to all the contributors and testers who made this release possible.

Thursday, August 30, 2007

Partially redrawing an OpenGL scene in response to exposure events and state changes

Most OpenGL applications redraw the entire scene every time a refresh has to be performed. There are exceptions, however. For example, to maintain responsiveness and conserve CPU time, an UI toolkit should only redraw the gadgets which need redrawing, either as a result of exposure events or gadget state changes.

It seems that partial redraws together with double buffering are not well-documented or understood by OpenGL programmers. Since partial redraws are very important to the Factor UI, I'll document what I've learned so far in this blog entry.

Factor's UI always sets the viewport dimensions to the entire window, however a scissor rectable is set using glScissor() to clip rendering to the region being redrawn. Gadgets are always rendered to the back buffer. Once the exposed region has been redrawn, the problem is now to bring it into the front buffer.

The GLX spec does not state whether glXSwapBuffers() takes the scissor rectangle into account.

On some OpenGL implementations, for example the NVidia drivers on Windows, Mac OS X and Linux, the current scissor rectangle is taken into account, and so the Factor UI can just request that buffers be swapped, and this only copies the requested region to the front buffer.

On Windows machines with Intel 9xx graphics adapters, this does not work, and SwapBuffers() copies the entire back buffer to the front buffer. As I documented in yesterday's entry, there exists a special WGL API for copying a partial region of the back buffer to the front buffer. This appears to fix the problem.

On my Linux/PPC machine with an ATI Radeon 9200 card and the open source "radeon" driver built in to X.org, glXSwapBuffers() also ignores the back buffer. Now, the MESA OpenGL implementation used has an extension GLX_MESA_copy_sub_buffer() which adds a glXCopySubBufferMESA() function for copying part of the back buffer to the front buffer, however, this function is buggy! It works some of time, but other times makes the entire screen go black, or kills the X server altogether.

There is a general fallback routine for copying part of the back buffer, but this is sometimes slower than redrawing the entire scene, at least on the Linux/PPC box described above. You can use glCopyPixels() to copy pixels from the back buffer to the front buffer, after setting the destination with glRasterPos():
: gl-raster-pos ( loc -- )
first2 [ >fixnum ] 2apply glRasterPos2i ;

: gl-copy-pixels ( loc dim buffer -- )
>r fix-coordinates r> glCopyPixels ;

: swap-buffers-slow ( -- )
GL_BACK glReadBuffer
GL_FRONT glDrawBuffer
GL_SCISSOR_TEST glDisable
GL_ONE GL_ZERO glBlendFunc
clip get rect-bounds { 0 1 } v* v+ gl-raster-pos
clip get flip-rect GL_COLOR gl-copy-pixels
GL_BACK glDrawBuffer
glFlush ;

Note the fidgety calculations to convert Factor UI co-ordinates ((0,0) is top left corner) to OpenGL window co-ordinates ((0,0) is lower left corner).

I'm still unsure if SwapBuffers() respects the scissor region on other platforms and OpenGL implementations, and if not, what is the correct way to copy part of the back buffer to the front buffer in the fastest possible way.

One other solution I need to look into is finding a way to be notified when the back buffer becomes invalid. If we redraw only part of the back buffer, then call SwapBuffers() to flip the entire back buffer with the front buffer, then in many cases, the previous content in the back buffer is still valid. It becomes corrupt when the video card needs to reclaim 3D memory for other applications, though. If there was a notification when volatile image buffers become out of date, I could force a full redraw on the next refresh, even if only a few gadgets need to be redrawn.

Wednesday, August 29, 2007

C/C++

Every time you say "C/C++" and lump two very different languages together as one, you're telling the world that you don't know either language. Idiomatic C code is very different from idiomatic C++. Good C++ code has very little that is actually C. Even the problem domains they are used in are different. Usually "C/C++" comes up in Java apologist rants ("Java is unmaintainable you say? Well I spent 10 years chasing memory leaks in "C/C++", it was far worse!") however sometimes you see informed people saying "C/C++". Please, don't.

OpenGL redraw bug on Windows

A while ago, "glguy" reported an issue with Factor 0.89: the UI was not being redrawn properly on Windows. I borrowed a Windows laptop and reproduced the exact same issue, and it turns out both me and glguy have the same graphics adapter: an Intel Express GM915.

I found a workaround: if I disable double buffering and don't call SwapBuffers after redrawing the window, then the issue disappears, and furthermore, it appears as if double buffering is still performed!

This is really strange, and I don't think this workaround will work elsewhere, unless WGL always double buffers, and uses hardware page flipping if you explicitly ask for it, or something.

If anybody has UI redraw issues and they're not using this adapter, or not running Windows at all, let me know!

Update: I found the right fix. On Windows, the glSwapHintRectWIN function must be called before SwapBuffers when doing a partial redraw. Turns out this function is not necessary with most drivers, but Intel drivers are picky when it comes to this. Good to resolve this one, finally. The fix is in the darcs repository.

Tuesday, August 28, 2007

Units and reversable computation

Since we don't want space probes written in Factor to crash because of calculations on mismatching units, Doug Coleman wrote a units library:
10 feet 30 inches d+ 56 s d/

It can add, multiply and divide dimensioned quantities while converting on the fly and making sure all the units make sense (10 miles 30 teaspoons d+ is a runtime error).

There was one problem with this library though, and it was that converting a dimensioned quantity back to a scalar required some boilerplate; for example, we have the following code:
: inches ( n -- dimensioned ) 2.54 * cm ;
: inches> ( dimensioned -- n )
dup 0 { m } { } =dimensions?*
dimensioned-value 100 * 2.54 / ;

The inches word converts a number of inches to centimeters; cm in turn converts it to meters, which is the canonical representation of distances, then creates a dimensioned quantity. However the inches> word is ugly, and it is all boilerplate since the defintion of inches has all the required information.

Let's take a look at cm:
: cm centi m ;

And its auxilliary words:
: centi 100 / ;
: m ( n -- dimensioned ) { m } { } <dimensioned> ;

As you can see, <dimensioned> is the canonical constructor.

Now, enter Daniel Ehrenberg's inverse library. While Dan originally intended to use it for pattern matching, this library is a perfect fit for auto-generating words for converting dimensioned quantities back to scalars.

Indeed, out of the box, it cannot invert <dimensioned>, because it calls natural-sort which is not an invertible operation. However, we can explicitly define an inverse for <dimensioned> (not an inverse in the mathematical sense, since the original function is not one-to-one; strictly speaking, this is a section, that is, a function which satisfies only one of the two conditions of an inverse).

And now, everything works; here, undo is Dan's inversion combinator:
  100 cm [ inches ] undo .
5000/127

So 100 cm constructs a dimensioned object, after converting the dimension to canonical form (meters in the case of distance):
  100 cm .
T{ dimensioned f 1 { m } { } }

And [ inches ] undo takes a dimensioned quantity, and works backwards in order to obtain a number, which when passed to inches, would give the original quantity:
  [ inches ] undo .
5000/127

Lets take a look at the code generated by [ inches ] undo:
  [ inches ] [undo] .    
[ >dimensioned< { } =/fail { m } =/fail 100 * 127/50 / ]

It takes the dimensioned quantity apart, makes sure it is a distance (and not a time, etc), then multiplies the quantity itself by a conversion factor.

So we can convert a scalar represented by unit to any other unit in this way; we can also do stuff like:
  10 miles 30 km d+ [ yards ] undo .
19205600/381

What is shorter, a quarter of a mile or 400 meters?
  1/4 miles 400 m d< .
f

We also support compound units, such as miles per second, miles per second squared, etc. How many teaspoons in a liter plus a tablespoon?
  1 L 1 tablespoons d+ [ teaspoons ] undo .
77937/379

For volume units, the canonical representation is meters cubed:
  15 teaspoons .
T{ dimensioned f 379/5120000 { m m m } { } }

Here is a unit with a denominator:
  20 knots .
T{ dimensioned f 463/45 { m } { s } }

The inverse library automatically guards against invalid conversions:
  30 km 10 s d/ [ teaspoons ] undo .
Unification failed

This is seriously cool stuff. The implementation of units is very simple and elegant, too; only half of the code has to be written, because we can use inverse!

The inverse library depends on two things; the simple mapping between syntax and semantics (the composition of two functions is their concatenation, therefore the inverse of a concatenation of two functions is a concatenation of their inverses!) and it also depends on quotations being sequences rather than opaque code blocks. Two qualities which are unique to Joy dialects such as Factor.

HTTP server on factorcode.org

The Factor HTTP server on factorcode.org has been running non-stop since May 7th; that's almost 4 months. This site doesn't get a whole lot of traffic, but it seems that the I/O code and garbage collector are in good shape these days.

Monday, August 27, 2007

Back from Austin

I'm back in Ottawa. Doug, Ed, and all of their friends are very cool and lots of fun to hang out with! Once again, my luggage has been delayed en route, and I'll only get it Tuesday at the earliest. I wonder if the baggage handling infrastructure runs on Java, implemented by the regulars from JavaLobby?

Factor 0.90 will be released before the end of the month. I only have two things left on my to do list! I need to test on all platforms, then I have to fix one last bug concerning UI redraw on certain Windows systems.

I think I will aim for shorter release cycles in the future; say, 2 or 3 weeks. I'll limit the amount of new functionality in each release, so that it doesn't take so long for things to settle down. In the long term, I might try to get rid of releases altogether, and instead have daily builds; a daily build will only be uploaded if all unit and integration tests pass, and if all contributed modules load, and maybe even if there are no major performance regressions in benchmarks. It would be cool to automate something like this.

Sunday, August 26, 2007

When to use locals

Some people have wondered why I implemented the locals library, and if in fact I'm turning my back on stack-based programming and "selling out". In fact this is not the case.

First of all, Factor has already had named variables in the core for quite some time; these are found in the namespaces vocabulary, which implements very simple dynamically-scoped namespaces. There are many cases where dynamic scope is useful, and languages which lack it often end up with a lot of code which passes parameters down long function call chains, with only the leaf functions actually using the parameters. Examples of dynamic scope usage in Factor include the current stream, and various idioms for sequence and association construction. Here dynamic scope is the right thing to use, and it makes total sense, and it cleans up code when certain values are not passed around on the stack.

In a handful of places, we also used dynamic variables as temporaries in calculations. I think once the right abstraction is found, stack code is almost always simpler and cleaner than the equivalent applicative code using named temporaries. But sometimes you have a math formula which operates on 10 values at once, or something similar, and it simply makes more sense to name these intermediate values. In these cases, we used dynamic variables. There are a handful of examples of this so far, such as the MD5 algorithm, and a couple of routines in the calendar library.

However when one uses dynamic variables for temporaries, the dynamic scope semantics no longer help and in fact get in the way; anybody who has used a Lisp with all-dynamic scope can testify. So while most values which would be lexically-scoped in a Lisp program are passed on the stack in Factor, sometimes you still want to name temporaries, and named locals with lexical scope are useful here, because they allow you to create closures and safely pass them around without running into the "funarg problem" which had the Lispers stumped through the 60's.

I never opposed having named variables in Factor, because if I did, they wouldn't be there from day one. Until recently, I didn't feel the need for lexical scope much, though, and when asked about it, my standard reply would be that the stack is Factor's equivalent of lexical scope, and that if somebody coded it up, I'd be happy to add it as an extra library.

Eduardo eventually implemented somewhat inefficient lexical scope in his extra/rewrite-closures library, which still exists because unlike extra/locals, it has first-class environments. I decided to code extra/locals because I wanted to see if I could make locals as efficient as stack values, and the experiment was a success.

So to recap, Factor now has three main methods of passing values between words, and storing temporary values within words:
  • The stack. Factor is and always will be a stack-based language; the whole point of the project is to explore stack-based programming. I encourage beginners to learn to think in terms of the stack before jumping into other abstractions, and for most code, the stack works really well. It is really interesting that in a stack language, function composition is expressed as concatenation, and compared to Lisp, Factor code has a lot less nesting and arbitrary introduction of temporaries. Especially now that Factor supports efficient partial function application, quotations can have internal state, and this brings much of the benefits of closures to the world of the stack.
  • Dynamically scoped variables. These are useful for a set of issues totally orthogonal to temporaries: they're used to give words context-specific behavior, and when an algorithm spanning multiple words needs to operate on a common set of values.
  • Lexically scoped variables. These are used to name temporary values inside calculations which otherwise would be hard to express with the stack. Hopefully this set will reduce over time, but for now, it includes any kind of hairy mathematical operation such as found in cryptography, numerical methods, and so on.

Having more abstractions to choose from when writing code is good, and I think a stack language can express the widest range of them in the most natural possible way.

FactorCon wrap-up

Today is my last day in Austin. On Friday, we went jet-skiing; as it turns out, I'm not so good at driving one, so after a brief attempt which climaxed in me tipping over the jet-ski (with all three people on it), I delegated driving responsibilities to Doug's friend, also named Doug (common name in Texas?).

Eduardo had to go back to work. Me and Doug did a did a bit more hacking, Ed-less -- we got non-blocking UDP working on Windows, and Doug started working on a directory entry change notification binding for Windows. This won't be ready until Factor 0.91, but I'm going to implement the Unix side; kqueue on BSD, inotify on Linux, and whatever crap Solaris has on Solaris. Having a cross-platform directory change notification API is something I've wanted for a while; many other language implementations don't bother with features like this, but I consider it essential.

I also optimized the MD5 library a little bit, and improved the locals library (you can have local word definitions, and locals are writable). Doug updated the libs/math library for the new module system. There's a lot of cool code there; quaternions, polynomials, combinatorics, numerical integration... indispensable tools for hard-core programming.

I'd like to elaborate on one point regrading writable locals. Consider the following Lisp function:
(defun counter (n)
(values
(lambda () (prog1 n (setf n (1- n))))
(lambda () (prog1 n (setf n (1+ n))))))

This returns a pair of closures which increment and decrement a shared counter. Each closure returns the current value of the counter before changing it. The initial value of the counter is the value passed to the function, and each pair of closures has a unique internal counter value.

Using the locals library, the Factor equivalent is:
:: counter | n! |
[ n dup 1- n! ]
[ n dup 1+ n! ] ;

The ! suffix in the list of local parameter names indicates the local should be writable; now, the n! local word can be used to write to it. The Factor code is almost identical to the Lisp code, except with fewer parentheses.

In both cases, you see that we differentiate between binding and assignment, and also we are able to write to a local closed over in an outer scope; the n! word does not create a new internal variable in the counter quotation, but modifies the value in the lexical environment of the counter word.

On the other hand, some languages, such as Python, claim to support lexical closures, however the essential property of closures -- that they close over the lexical scope -- is not preserved, and assigning to a local simply creates a new binding in the innermost scope! Guys, that is retarded! Look at the workarounds these people use. It is really no better than Java's anonymous inner classes, and if you want to call those "closures", you may as well say that C# is a "dynamic" language.