Saturday, June 09, 2007

Generating minimal images; deploying stand-alone applications

First of all, an update on the new module system. The porting of the core library is almost complete, and in fact the only remaining module I still need to update is the Cocoa UI backend! In the last few days I've been using the X11 UI backend to test things on my Mac. There's no technical difficulty preventing the Cocoa binding from being ported; it is just a matter of priorities.

I still haven't pushed the module system patches to the main repository. For now, you can grab the experimental repository: - darcs repository - boot images

Now that the module system is relatively stable, I've been able to attack one of the major features planned for Factor 1.0.

Factor's images are quite large these days. A fully-loaded 0.90 image with the compiler, UI and tools is 9.4 Mb on Mac OS X/PowerPC. This has two undesirable consequences:
  • On resource-constrained systems, such as cellphones running Windows CE, a full image might use too much memory, or not even start at all.

  • Stand-alone application deployment becomes impractical; this is "the Java problem"; the user must endure a lengthy download and install a bulky runtime environment before being able to use any software written in that language.

In the latest sources, I have addressed the first issue by adding a couple of new command line switches to the bootstrap process. The two switches are -include and -exclude, and take a list of components to load into the final image. The default value for -include is
compiler help tools ui io

The default value for -exclude is empty. During bootstrap, all components appearing in the included list but not in the excluded list are loaded. So for example, an image with the compiler only can be bootstrapped as follows:
./factor -i=boot.image.x86 -include=compiler

If you would like almost all components except native I/O,
./factor -i=boot.image.x86 -exclude=io

Here are some image sizes on PowerPC:
Minimal1.2 Mb
Compiler only4.4 Mb
Compiler, tools4.6 Mb
Everything except for the UI6.0 Mb
Everything9.4 Mb
This additional flexibility at bootstrap time allows one to develop code in resource-constrained environments, however it won't do you any good if you want to deploy a graphical application written in Factor; the UI image is 9.4Mb.

Factor images include a lot of information that is only useful for developers, such as cross-referencing data. Furthermore, a typical application only uses a fraction of the functionality in the image; most would never need to invoke the parser at run time, for example.

The standard solution for deployment in the Lisp world is the "tree shaker", which creates an image containing only those functions referenced from a specified "main" entry point. I decided to give this approach a go. The tree shaker clears out most global variables; for example, the variable holding the dictionary of words, as well as cross-referencing data. It also clears out word names and other such data. It constructs a startup routine which initializes Factor and then executes the main entry point of a specified module. Then, the garbage collector is run. This has the effect of reclaiming all words not directly referenced from the main entry point; after this operation completes, the image is saved and Factor exits. The result is a stripped-down image which is much smaller than the default development image.

The tree-shaker is used as follows:
USE: tools.deploy
deploy-ui? on
"bunny.image" "demos.bunny" deploy

There are various variables one can set (see the the source) before calling the deploy word, which takes an image name and a module name as input.

Here are some figures for the deployed image size of various demonstration programs shipped with Factor:
ProgramImage size
Hello world (interpreted)81 Kb
Hello world (compiled)283 Kb
Hello world (graphical)680 Kb
Maze demo687 Kb
Bunny demo824 Kb
Contributors557 Kb
Note that in addition to the image size, there is also the 150kb Factor VM.

All these programs are quite trivial, however some of these do pull in non-trivial dependencies (UI, XML, HTTP). Also, the tree shaker is not as good as it could be; in the future these images will become smaller.

The tree shaker is also configurable; if you want to, you can leave the parser and dictionary intact, allowing runtime source modification and interactive inspection of your application, however this negates most of the space savings.

This code is not really in a usable state right now, and needs a lot of polishing and debugging. However, it is a good first step.

So far, the deploy tool doesn't directly address the issue of generating stand-alone, double-clickable binaries. On Mac OS X, creating a .app bundle consisting of the Factor VM and image is very easy, and I will write a tool in Factor which does this; it will also be able to emit the XML property list file and therefore allow you to customize the application name, icon, etc. On other platforms, I'm still not sure how to proceed; it might be good enough to spit out a shell script for Unix and a batch file for Windows.


shaurz said...

You could emit an image wrapped in a minimal object file with a single symbol, say _factor_image, pointing to the image. Then link the image object, a minimal stub program and the VM. The stub program would pass the address of the image to the VM.

You might prefer to construct the .exe manually, so you don't depend on external linkers, or to have full control over the result. It's not very hard if you ignore most of the useless features (ELF is much easier than PE). I have done something similar for a little Forth system I wrote a few years ago (hardcoded in assembly source ;-)

Daniel Ehrenberg said...

You know, you can already make somewhat of a clickable executable for limited purposes. Just get the UI into a particular state, save the image, and you have something. I made a program like this for my statistics teacher, and it's not pretty, but it works for something.

When you make executables like you're talking about, will the image, all dynamically linked libraries and the factor runtime all be in the same file, or will you just have a bunch of files?

Slava Pestov said...

Dan, sure, what you did is one way to simulate this, but the deployment tool has the advantage that the generated image is smaller, and the process is automated, and doesn't require the developer to get the UI into a particular state.

So far, this generates an image file only. Soon I will write a tool which bundles this image file with the Factor VM, and any required native shared libraries, together in a Mac OS X application bundle.

I haven't decided what to do on other platforms yet, as far as generating truly stand-alone executables goes.

Anonymous said...

perhaps you can make the executable compressed, so it takes less space.
I thought stack based languages produce really small code. howcome these simple examples are so huge?

also, modern linkers, afaik, only take from a library the functions that were actually used. I think you can do the same when generating the image. (or is the tree-shaker exactly that?)

Slava Pestov said...

The tree shaker is exactly that.

Slava Pestov said...

Also stack language source is typically very small but there is no reason the compiled code would be unless the compiler optimizes for space (Factor's compiler optimizes for speed).