Factor: a practical stack language: June 2007

Saturday, June 23, 2007

Converting modules to the new module system

I think I can now say the move to the new module system is done, at least for the core. The last remaining component I had to update was the documentation, which has been neglected as of late. In the end I documented some 400 new words and updated the documentation for hundreds more.

What remains to be done is updating all contributed modules. A few modules such as the HTTP server have been updated already, but a lot remains to be done.

Tomorrow, I'm leaving to New Zealand. I will be there until mid-July. I'll be hanging out with Chris Double in Auckland, and my old school friends in Wellington (where I lived from 1992 to 2002). I'll also stop by in Napier and perhaps the east coast of the North Island; if anybody wants to meet up, let me know by e-mail! During this time I will be taking a break from programming (and everything else); this should give the Factor contributors some time to catch up with the new module system. When I come back, I will take care of any remaining modules myself; I hope most modules will be updated by then.

While I am away, Daniel Ehrenberg will be in change of harvesting patches; if you have patches in an HTTP-hosted repository, please let him know so that he can pull them, and if you plan on submitting patches by e-mail, send them to the Factor-talk mailing list so that he can apply them to his repository. His repository is found at

http://littledan.onigirihouse.com/repos

This ensures that when I return, I can just pull all patches from him and continue working.

Here are some general guidelines which should help contributors update their modules:

Try to rethink the module's division into source files. Prior to the new module system this could be arbitrary, but now it is in one-to-one correspondence with vocabulary names. Try not to have too many vocabularies, but on the other hand and try to avoid having vocabularies which group unrelated functionality together.

Many core words were moved around between vocabularies, especially in the UI. If you find that a source file's USING: form is mostly out of date, you can use the following trick. Paste the following word definition in the listener:
```
USING: parser namespaces vocabs kernel assocs sequences hashtables sorting prettyprint ;

: USING?
    use get [
        dictionary get
        [ nip vocab-words eq? ] assoc-find-with
        2drop
    ] map [ ] subset prune natural-sort in get swap vocabs. ;
    parsing
```
Then, remove the source file's USING: form entirely, and add USING? at the end. Load the source file -- you will be prompted with a bunch of restarts where you can pick vocabularies to add to the search path. Then at the end of the file, the USING? word will execute and print the USING: form that you need based on the restarts you chose.

Circular dependencies between vocabularies are highly discouraged because the vocabulary loader might not behave as you would expect; it will not go into an infinite loop, but it also might not load one element in the cycle. This might be fixed later, but for now, avoid circularity because it is bad design anyway.

Many modules had OS-specific components which were loaded conditionally. Each component would define words in the same vocabulary. This no longer works because source files are mapped to vocabularies, but instead you can use hooks. Enter \ HOOK: help in the listener.

For application modules which have a main entry point, the entry point should be defined on the topmost vocabulary. For example, consider the tetris module, which has already been updated. It has this layout:
```
tetris
tetris.board
tetris.game
tetris.gl
tetris.piece
tetris.tetromino
```
The tetris vocabulary depends on all the others, and provides a MAIN: hook.

When updating modules, it is also a good idea to give the module a general code cleanup, round of testing, maybe write some documentation and unit tests for it too.

I hope these guidelines help.

The new module system will be the last major shakeup of the source tree for quite some time. When I come back from New Zealand, I will spend a few months working on behind the scenes stuff:

compilation of code which constructs quotations with curry
compilation of code which uses continuations
removing the interpreter
UI stuff, tools
Finishing stand-alone application deployment
merging Dan's Unicode library

None of this should break code in a major way like the module system did.

Thursday, June 21, 2007

New vocabulary browser

After spending more than a week on documentation, and a week porting code to the module system before that, I finally got to code something fun. Inspired by CPAN, but with a lot less libraries included, I present the Factor vocabulary browser:

Notice how we support "tagging" (in the Web 2.0 agile SOA sense). Factor itself doesn't care about tags, they're purely there to help the programmer find modules they might otherwise overlook, and provide a richer ontology than just a simple hierarchy:

We also keep track of who wrote each vocabulary:

The list of authors in the index page is very small, because only less than half of all modules have been ported to the new syste.

In the new module system, vocabularies are organized into a hierarchy. Here we're viewing the editors vocabulary, which has some words to jump to word definitions in your favorite editor; it has some child vocabularies implementing support for each editor support by Factor:

Notice how each vocabulary, loaded or not, has meta-data: a one-line summary, authors, and tags. These are stored in text files in the vocabulary's directory: summary.txt, tags.txt and authors.txt. The vocabulary system provides words for reading and writing these files:

vocab-summary ( vocab -- string )
set-vocab-summary ( string vocab -- )
vocab-tags ( vocab -- seq )
set-vocab-tags ( seq vocab -- )
vocab-authors ( vocab -- seq )
set-vocab-authors ( seq vocab -- )

Factor has great support for organizing and browsing code.

Leo: a literate text editor with syntax highlighting derived from jEdit

The Leo literate text editor is written and Python and appears to be using a syntax highlighting engine compatible with jEdit syntax definitions, which are written as XML. This gives them access to more than a hundred existing mode files.

This means that Factor syntax highlighting managed to sneak into another editor.

The actual implementation doesn't look like jEdit's code at all; I suspect they did a clean-room port because jEdit's Java code is GPL licensed, however I wouldn't mind if they looked at my (rather crappy) code.

A similar highlighter could be implemented in Factor, too. Since jEdit mode files use regexps, Doug Coleman would need to finish his pure Factor regex library first, or a PCRE binding could be developed.

Sunday, June 17, 2007

Inverting a permutation in 4 words

I discovered a useless but neat hack today. Suppose we wish to compute the inverse of a permutation specified as a sequence of integers, for example { 3 2 0 1 }.

We first turn the sequence { 3 2 0 1 } into an associative structure (a "mapping") which maps each integer to the corresponding element of the sequence. In Factor, this is done by wrapping the sequence in an enum:

{ 3 2 0 1 } <enum>

Now, we convert this enum into an association list, using the generic >alist word, which works on any associative structure:

{ 3 2 0 1 } <enum> >alist

The result is { { 0 3 } { 1 2 } { 2 0 } { 3 1 } }. We take advantage of the key property of association lists -- they're an ordered associative structure but also a sequence -- to sort this association list by value:

{ 3 2 0 1 } <enum> >alist sort-values

The result is { { 2 0 } { 3 1 } { 1 2 } { 0 3 } }.
Finally, we use the generic word keys, which works on any associative structure:

{ 3 2 0 1 } <enum> >alist sort-values keys .

The result is { 2 3 1 0 }.

To compose these permutations, we use map-with:

{ 2 3 1 0 } { 3 2 0 1 } [ swap nth ] map-with

As one would expect, we get { 0 1 2 3 }, the identity permutation.

Friday, June 15, 2007

Enchilada in Factor

Robbert van Dalen has ported Enchilada to Factor. Enchilada is a purely-functional concatenative language. In addition to the language implementation itself, his work has also yielded a library for immutable sequences. These immutable sequences are similar to finger trees; random access is O(log n), and concatenation of similar-sized sequences is very efficient. He implemented the full range of operations such as union, intersection, sorting, and so on. Yet another data structure Factor programmers can use is a good thing, since each data structure has its own advantages and disadvantages.

His code can be found in the repository, extra/enchilada and extra/isequences.

Thursday, June 14, 2007

Stand-alone images and the Cocoa UI backend

In my previous entry I was testing minimal image generation with the X11 UI backend. The Cocoa UI backend is a bit trickier, because it relies on various dynamic behaviors. After a bit of hacking, I got the deployment tool working with the Cocoa UI backend, but the generated images were too large. For example, Alex Chapman's Factor Tetris with X11 is a 714 Kb image, whereas Tetris with Cocoa was 1.5 Mb. This is because the Cocoa binding builds a database of known Objective C methods in a global hashtable, and the tree shaker cannot pair down this hashtable because it doesn't have enough information. Luckily, the deployment tool is pretty flexible, so after a bit more work I added a custom strip stage which is only run if the Cocoa UI backend is loaded. It performs a bit of static analysis to determine which methods are actually called, then pairs down the method database to only include these methods. The result was a deployed Cocoa Tetris image which is comparable in size to the X11 version, 793 Kb.

Wednesday, June 13, 2007

SSL streams support

Elie Chaftari and Matthew Willis have been collaborating on a Factor binding to the Cryptlib library. This binding includes a number of high level words and a Factor stream wrapper over Cryptlib's SSL sockets, which is very nice. Adding SSL support to Factor's HTTP client and server libraries should be easy now.

Also, congratulations to Matthew who recently started a new job at Apple. It must be a nice place to work.

Selecting text in pane gadgets

A long-overdue Factor UI feature is the ability to select and copy text in pane gadgets. This is now possible in the darcs repository. Prior to this, there was no way to copy UI listener output; this was rather annoying. Factor 0.89 had some preliminary code for this which didn't work to well, and I didn't want to delay the release further so I left it as-is, and didn't mention it in the changelog. Now I've fixed this code up and it is mostly usable.

What makes this code interesting is that a pane gadget is very general. It simply lays out its children, which are added as a result of formatted text being written to a stream. There is no underlying document model; writing a line of text just creates a label gadget, sets its font and color attributes accordingly, and adds it to the pane. More complex layouts such as lists, grids, borders and so on create more complex gadgets, some of them containing child panes.

To make selection work, I first devised a generic protocol whereby a gadget can be asked for a textual representation of itself. For label gadgets, this returns the label's string; for horizontal, vertical and grid layouts, this recursively asks each child for its text then combines it in the appropriate way.

The second component is a simple word which takes a gadget and a point inside the gadget, and returns a path (a sequence of integers) to reach the deepest child gadget at that point.

The final component, and the one which took the most work to get right, takes two such paths from a common parent, and outputs the gadget subtree spanned by these two paths.

Now given these components, pane selection is easy to build: when the user drags the mouse between two ranges, you compute the paths from the pane to the deepest child at each endpoint, then you traverse the gadget tree between these two paths, obtaining the text of each gadget along the way. This text can then be copied to the clipboard when the user invokes the copy action.

All in all, this was a lot trickier to implement than text selection in a plain text editor gadget, however the code for turning gadgets into text and for traversing gadget trees should be usable in its own right, and could form the basis for better support for accessibility in the UI.

Saturday, June 09, 2007

Job trends

Many people like to post job trend graphs in their weblogs, as proof of the superiority of their favorite technology. For example, this, and this.

Well, you guys may as well not bother. Why waste time learning programming at all? Think of all the days you spent reading books, papers, writing code. In the real world, all the jobs are in the fast food sector!

Shut down your IDE, monkey, and go flip some burgers!

Generating minimal images; deploying stand-alone applications

First of all, an update on the new module system. The porting of the core library is almost complete, and in fact the only remaining module I still need to update is the Cocoa UI backend! In the last few days I've been using the X11 UI backend to test things on my Mac. There's no technical difficulty preventing the Cocoa binding from being ported; it is just a matter of priorities.

I still haven't pushed the module system patches to the main repository. For now, you can grab the experimental repository:

http://factorcode.org/experimental/ - darcs repository
http://factorcode.org/images/poo/ - boot images

Now that the module system is relatively stable, I've been able to attack one of the major features planned for Factor 1.0.

Factor's images are quite large these days. A fully-loaded 0.90 image with the compiler, UI and tools is 9.4 Mb on Mac OS X/PowerPC. This has two undesirable consequences:

On resource-constrained systems, such as cellphones running Windows CE, a full image might use too much memory, or not even start at all.

Stand-alone application deployment becomes impractical; this is "the Java problem"; the user must endure a lengthy download and install a bulky runtime environment before being able to use any software written in that language.

In the latest sources, I have addressed the first issue by adding a couple of new command line switches to the bootstrap process. The two switches are -include and -exclude, and take a list of components to load into the final image. The default value for -include is

compiler help tools ui ui.tools io

The default value for -exclude is empty. During bootstrap, all components appearing in the included list but not in the excluded list are loaded. So for example, an image with the compiler only can be bootstrapped as follows:

./factor -i=boot.image.x86 -include=compiler

If you would like almost all components except native I/O,

./factor -i=boot.image.x86 -exclude=io

Here are some image sizes on PowerPC:

Options	Size
Minimal	1.2 Mb
Compiler only	4.4 Mb
Compiler, tools	4.6 Mb
Everything except for the UI	6.0 Mb
Everything	9.4 Mb

This additional flexibility at bootstrap time allows one to develop code in resource-constrained environments, however it won't do you any good if you want to deploy a graphical application written in Factor; the UI image is 9.4Mb.

Factor images include a lot of information that is only useful for developers, such as cross-referencing data. Furthermore, a typical application only uses a fraction of the functionality in the image; most would never need to invoke the parser at run time, for example.

The standard solution for deployment in the Lisp world is the "tree shaker", which creates an image containing only those functions referenced from a specified "main" entry point. I decided to give this approach a go. The tree shaker clears out most global variables; for example, the variable holding the dictionary of words, as well as cross-referencing data. It also clears out word names and other such data. It constructs a startup routine which initializes Factor and then executes the main entry point of a specified module. Then, the garbage collector is run. This has the effect of reclaiming all words not directly referenced from the main entry point; after this operation completes, the image is saved and Factor exits. The result is a stripped-down image which is much smaller than the default development image.

The tree-shaker is used as follows:

USE: tools.deploy
deploy-ui? on
"bunny.image" "demos.bunny" deploy

There are various variables one can set (see the the source) before calling the deploy word, which takes an image name and a module name as input.

Here are some figures for the deployed image size of various demonstration programs shipped with Factor:

Program	Image size
Hello world (interpreted)	81 Kb
Hello world (compiled)	283 Kb
Hello world (graphical)	680 Kb
Maze demo	687 Kb
Bunny demo	824 Kb
Contributors	557 Kb

Note that in addition to the image size, there is also the 150kb Factor VM.

All these programs are quite trivial, however some of these do pull in non-trivial dependencies (UI, XML, HTTP). Also, the tree shaker is not as good as it could be; in the future these images will become smaller.

The tree shaker is also configurable; if you want to, you can leave the parser and dictionary intact, allowing runtime source modification and interactive inspection of your application, however this negates most of the space savings.

This code is not really in a usable state right now, and needs a lot of polishing and debugging. However, it is a good first step.

So far, the deploy tool doesn't directly address the issue of generating stand-alone, double-clickable binaries. On Mac OS X, creating a .app bundle consisting of the Factor VM and image is very easy, and I will write a tool in Factor which does this; it will also be able to emit the XML property list file and therefore allow you to customize the application name, icon, etc. On other platforms, I'm still not sure how to proceed; it might be good enough to spit out a shell script for Unix and a batch file for Windows.

Friday, June 08, 2007

Amusing discussion in #forth


DocPlatypus: even if Microsoft released the next Windows version after Vista under the
GPL... I probably would *still* have to wait a year for all the bugs to finally get
solved for good
DocPlatypus: I'm not saying most free software programmers would relish the chance to
fix Windows. I think enough of them would that we'd see a difference within the first
few weeks
Quartus: I've heard of rose-coloured glasses, but this is the first time I've seen
anyone wearing solid red opaque discs

Monday, June 04, 2007

New Factor blogger

Corentin Plouët is blogging about his experiences learning Factor.

Factor: a practical stack language