Monday, January 25, 2010

Replacing GNU assembler with Factor code

Lately a major goal of mine has been to get the Factor VM to be as close as possible to standard C++, and to compile it with at least one non-gcc compiler, namely Microsoft's Windows SDK.

I've already eliminated usages of gcc-specific features from the C++ source (register variables mostly) and blogged about it.

The remaining hurdle was that several of the VM's low-level subroutines were written directly in GNU assembler, and not C++. Microsoft's toolchain uses a different assembler syntax, and I don't fancy writing the same routines twice, especially since the code in question is already x86-specific. Instead, I've decided to eliminate assembly code from the VM altogether. Factor's compiler infrastructure has perfectly good assembler DSLs for x86 and PowerPC written in Factor itself already.

Essentially, I rewrite the GNU assembler source files in Factor itself. The individual assembler routines have been replaced by new functionality added to both the non-optimizing and optimizing compiler backends. This avoids the issue of the GNU assembler -vs- Microsoft assembler syntax entirely. Now all assembly in the implementation is consistently written in the same postfix Factor assembler syntax.

The non-optimizing compiler already supports "sub-primitives", words whose definition consists of assembler opcodes, inlined at each call site. I added new sub-primitives to these files to replace some of the VM's assembly routines:

A few entry points are now generated by the optimizing compiler, too. The optimizing compiler has complicated machinery for generating arbitrary machine code. I extended this with a Factor language feature similar to C's inline assembly, where Factor's assembler DSL is used to generate arbitrary assembly within a word's compiled definition. This is more flexible than the non-optimizing compiler's sub-primitives, and has wide applications beyond the immediate task of replacing a few GNU assembler routines with Factor code.

Factor platform ABI

Before jumping into a discussion of the various assembly routines in the VM, it is important to understand the Factor calling convention first, and how it differs from C.

Factor's machine language calling convention in a nutshell:

  • Two registers are reserved, for the data and retain stacks, respectively.

  • Both registers point into an array of tagged pointers. Quotations and words pass and receive parameters as tagged pointers on the data stack.

  • On PowerPC and x86-64, an additional register is reserved for the current VM instance.

  • The call stack register (ESP on x86, r1 on PowerPC) is used like in C, with call frames stored in a contiguous fashion.

  • Call frames must have a bit of metadata so that the garbage collector can mark code blocks that are referenced via return addresses. This ensures that currently-executing code is not deallocated, even if no other references to it remain.

  • This GC meta-data consists of three things: the stack frame's size in bytes, a pointer to the start of a compiled code block, and a return address inside that code block. Since every frame records its size and the next frame immediately follows, the garbage collector can trace and update all return addresses accurately.

  • A call frame can have arbitrary size, but the garbage collector does not inspect the additional payload; its can be any blob of binary data at all. The optimizing compiler generates large call frames in a handful of rare situations when a scratch area is needed for raw data.

  • Quotations must be called with a pointer to the quotation object in a distinguished register (even on x86-32, where the C ABI does not use registers at all). Remaining registers do not have to be preserved, and can be used for any purpose in the compiled code.

  • Tail calls to compiled words must load the program counter into a special register (EBX on x86-32). This allows polymorphic inline caches at tail call sites to patch the call address if the cache misses. A non-tail call PIC can look at the return address on the call stack, but for a space-saving tail-call, this is not available, so to make inline caching work in all cases, tail calls have to pass this address directly. The only compiled blocks that read the value of this register on entry are tail call PIC miss stubs.

  • All other calls to compiled words are made without any registers having defined contents at all, so effectively all registers that are not reserved for a specific purpose are volatile.

  • The call stack pointer must be suitably aligned so that SIMD code can spill vector data to the call frame. This is already the case in the C ABI on all platforms except non-Mac 32-bit x86.

Replacing the c_to_factor() entry point

There are two situations where C code needs to jump into Factor; when Factor is starting up, and when C functions invoke function pointers generated by alien-callback.

There used to be a c_to_factor() function defined in a GNU assembly source file as part of the VM, that would take care of translating from the C ABI to the Factor ABI. C++ code can call assembly routines that obey the C calling convention directly.

Now that the special assembly entry point is gone from the VM, a valid question to ask is how is it even possible to switch ABIs and jump out into Factor-land, without stepping outside the bounds of C++, by writing some inline assembler at least. It seems like an impossible dilemma.

The Factor VM ensures that the transition stub that C code uses to call into Factor is generated seemingly out of thin air. It turns out the only unportable C++ feature you really need when bootstrapping a JIT is the ability to cast a data pointer into a function pointer, and call the result.

The new factor_vm::c_to_factor() method, called on VM startup, looks for a function pointer in a member variable named factor_vm::c_to_factor_func. Initially, the value is NULL, and if this is the case, it dynamically generates the entry point and then calls the brand-new function pointer:
void factor_vm::c_to_factor(cell quot)
/* First time this is called, wrap the c-to-factor sub-primitive inside
of a callback stub, which saves and restores non-volatile registers
as per platform ABI conventions, so that the Factor compiler can treat
all registers as volatile */
tagged<word> c_to_factor_word(special_objects[C_TO_FACTOR_WORD]);
code_block *c_to_factor_block = callbacks->add(c_to_factor_word.value(),0);
c_to_factor_func = (c_to_factor_func_type)c_to_factor_block->entry_point();


All machine code generated by the Factor compiler is stored in the code heap, where blocks of code can move. But c_to_factor() needs a stable function pointer to make the initial jump out of C and into Factor. As I briefly mentioned in a blog post about mark sweep compact garbage collection, Factor has a separate callback heap for allocating unmovable function pointers intended to be passed to C.

This callback heap is used for the initial startup entry point too, as well as callbacks generated by alien-callback..

As mentioned in the comment, the callback stub now takes care of saving
and restoring non-volatile registers, as well as aligning the stack frame. You can see how callback stubs are defined with the Factor assembler by grepping for callback-stub in the non-optimizing compiler backend.

The new callback stub covers part of what the old assembly c_to_factor() entry point did. The remaining component is calling the quotation itself, and this is now done by a special word c-to-factor.

The c-to-factor word loads the data stack and retain stack pointers and jumps to the quotation's compiled definition. Grep for c-to-factor-impl in the non-optimizing compiler backend.

In effect, by abusing the non-optimizing compiler's support for "subprimitives", machine code for the C-to-Factor entry point can be generated by the VM itself.

Callbacks generated by alien-callback in the optimizing compiler used to contain a call to the c_to_factor() assembly routine. The equivalent machine code is now generated directly by the optimizing compiler.

Replacing the throw_impl() entry point

When Factor code throws an error, a continuation is popped off the catch stack, and resumed. When the VM needs to throw an error, it has to go through the same motions, but perform a non-local return to unwind any C++ stack frames first, before it can jump back into Factor and resume another continuation.

There used to be an assembly routine named throw_impl() which would take a quotation and a new value for the stack pointer.

This is now handled in a similar manner to c_to_factor(). The unwind-native-frames word in kernel.private is another one of those very special sub-primitives that uses the C calling convention for receiving parameters. It reloads the data and retain stack registers, and changes the call stack pointer to the given parameter. The call is coming in from C++ code, and the contents of these registers are not guaranteed since they play no special role in the C ABI. Grep for unwind-native-frames in the non-optimizing compiler backend.

Replacing the lazy_jit_compile_impl() and set_callstack()entry points

As I discussed in my most recent blog post, on the implementation of closures in Factor, quotation compilation is deferred, and initially all quotations point to the same shared entry point. This shared entry point used to be an assembly routine in the VM. It is now the lazy-jit-compile sub-primitive.

The set-callstack primitive predates the existence of sub-primitives, so it was implemented as an assembly routine in the VM for historical reasons.

These two entry points are never called directly from C++ code in the VM, so unlike c_to_factor() and throw_impl(), there is no C++ code to fish out generated code from a special word and turn it into a function pointer.

Inline assembly with the alien-assembly combinator

I added a new word, alien-assembly. In the same way as alien-invoke, it generates code which marshals Factor data into C values, and passes them according to the C calling convention; but where alien-invoke would generate a
subroutine call, alien-assembly just calls the quotation at
compile time instead, no questions asked. The quotation can emit
any machine code it desires, but the result has to obey the C calling

Here is an example: a horribly unportable way to add two floating-point numbers that only works on x86.64:
: add ( a b -- c )
double { double double } "cdecl"
alien-assembly ;

1.5 2.0 add .

The important thing is that unlike assembly code in the VM, using this feature the same assembly code will work regardless of whether the Factor VM was compiled with gcc or the Microsoft toolchain!

Replacing FPU state entry points

The remaining VM assembly routines were used to save and restore FPU state used for advanced floating point features. The math.floats.env vocabulary would call these assembly routines as if they were ordinary C functions, using alien-invoke. After the refactoring, the optimizing compiler now generates code for these routines using alien-assembly instead. A hook dispatching on CPU type is used to pick the implementation for the current CPU.

See these files for details:

On PowerPC, using non-gcc compilers is not a goal, so these routines remain in the VM, and the vm/cpu-ppc.S source file still exists. It contains these FPU routines along with one other entry point for flushing the instruction cache (this is a no-op on x86).

Compiling Factor with the Windows SDK

With this refactoring out of the way, the Factor VM can now be compiled using the Microsoft toolchain, in addition to Cygwin and Mingw. The primary benefit of using the Microsoft toolchain is that it has allowed me to revive the 64-bit Windows support.

Last time I managed to get gcc working successfully on Win64, the Factor VM was written in C. The switch to C++ killed the Win64 port, because the 64-bit Windows Mingw port is a huge pain to install correctly. I never got gcc to produce a working executable after the C++ rewrite of the VM, and as a result we haven't had a binary package on this platform since April 2009.

Microsoft's free (as in beer) Windows SDK includes command-line compilers for x86-32 and x86-64, together with various tools, such as a linker, and nmake, a program similar to Unix make. Factor now ships with an Nmakefile that uses the SDK tools to build the VM:
nmake /f nmakefile

After fixing some minor compile errors, the Windows SDK produced a working Win64 executable. After updating the FFI code a little, I quickly got it passing all of the compiler tests. As a result of this work, Factor binaries for 64-bit Windows will be available again soon.

Monday, January 18, 2010

How Factor implements closures

The recent blog post from the Clozure CL team on Clozure CL's implementation of closures inspired me to do a similar writeup about Factor's implementation. It is often said that "closures are a poor man's objects", or "objects are a poor man's closures". Factor takes the former view, because as you will see they are largely implemented within the object system itself.


First, let us look at quotations, and what happens internally when a quotation is called. A quotation is a sequence of literals and words. Quotations do not close over any lexical environment; they are entirely self-contained, and their evaluation semantics only depend on their elements, not any state from the time they were constructed. So quotations are anonymous functions but not closures.

Internally, a quotation is a pair, consisting of an array and a machine code entry point. The array stores the quotation's elements, and when you print a quotation with the prettyprinter, this is how Factor knows what its elements are:
( scrachpad ) [ "out.txt" utf8 [ "Hi" print ] with-file-writer ] .
[ "out.txt" utf8 [ "Hi" print ] with-file-writer ]

The machine code entry point refers to a code block in the code heap containing the quotation's compiled definition.

The call generic word calls quotations. This word can take any callable, not just a quotation, and has a method for each type of callable. The callable class includes quotations, as well as curry and compose which are discussed below. This means that closure invocation is implemented on top of method dispatch in Factor.

For reasons that will become clear in the last section on compiler optimizations, most quotations never have their entry points called directly, and so it would be a waste of time to compile all quotations as they are read by the parser.

Instead, all freshly-parsed quotations have their entry points set to the lazy-jit-compile primitive from the kernel.private vocabulary.

The call generic word has a method on the quotation class. This method invokes the (call) primitive from the kernel.private vocabulary. The (call) primitive does not type check, since by the time it is called it is known that the input is in fact a quotation. This primitive has a very simple machine code definition:
mov eax,[esi]    ! Pop datastack
sub esi,4
jmp [eax+13] ! Jump to quotation's entry point
Note that the quotation is stored in the EAX register; this is important. Recall that initially, a quotation's entry point is set to the lazy-jit-compile word, and that all quotations initially share this entry point.

This word, which is not meant to be invoked directly, compiles quotations the first time they are called. Since all quotations share the same initial entry point, it needs to know which quotation invoked it. This is done by passing the quotation to this word in the EAX register. The lazy-jit-compile word compiles this quotation, sets its entry point to the new compiled code block, and then calls it again. On subsequent calls of the same quotation, the new compiled definition will be jumped to directly, and the lazy-jit-compile entry point is not involved.

If you're interested in the definition of lazy-jit-compile, search for it in these files:

The curry and compose words


The curry word is the fundamental constructor for making closures. It takes a value and a callable, and returns something that prints out as if it were a new quotation:
( scratchpad ) 5 [ + 2 * ] curry .
[ 5 + 2 * ]

This is not a quotation though, but rather an instance of the curry class. Instances of this class are pairs of elements: an object, and another callable.

Instances of curry are callable, because the call generic word has a method on curry. This method pushes the first element of the pair on the data stack, then recursively calls call on the second element.

Calls to curry can be chained together:
( scratchpad ) { "a" "b" "c" } 1 2 [ 3array ] curry curry map .
{ { "a" 1 2 } { "b" 1 2 } { "c" 1 2 } }

Note that using curry, many callables can be constructed which share the same compiled definition; only the data value differes.

Technically, curry is just an optimization -- it would be possible to simulate it by using sequence words to construct a new quotation with the desired value prepended, but this would be extremely inefficient. Prepending an element would take O(n) time, and furthermore, result in the new quotation being compiled the first time the result was called. Indeed, in some simpler concatenative languages such as Joy, quotations are just linked lists, and they execute in the interpreter, so partial application can be done by using the cons primitive for creating a new linked list.


The compose word takes two callables and returns a new instance of the compose class. Instances of this class are pairs of callables.
( scratchpad ) [ 3 + ] [ sqrt ] compose .
[ 3 + sqrt ]

As with curry, the call generic word has a method on the compose class. This method recursively applies call to both elements.

It is possible to express the effect of compose using curry and dip:
: my-compose ( quot1 quot2 -- compose )
[ [ call ] dip call ] curry curry ; inline

The main reason compose exists as a distinct type is to make the result prettyprint better. Were it defined as above, the result would not prettyprint in a nice way:
( scratchpad ) [ 3 + ] [ sqrt ] [ [ call ] dip call ] curry curry .
[ [ 3 + ] [ sqrt ] [ call ] dip call ]

The curry and compose constructors are sufficient to express all higher-level forms of closures.

An aside: compose and dip

It is possible to express compose using curry and dip. Conversely, it is also possible to express dip using compose and curry.
: my-dip ( a quot -- ) swap [ ] curry compose call ;

Here is an example of how this works. Suppose we have the following code:
123 321 [ number>string ] my-dip

Using the above definition of my-dip, we get
123 321 [ number>string ] swap [ ] curry compose call  ! substitute definition of 'my-dip'
123 [ number>string ] 321 [ ] curry compose call ! evaluate swap
123 [ number>string ] [ 321 ] compose call ! evaluate curry
123 [ number>string 321 ] call ! evaluate compose
123 number>string 321 ! evaluate call
"123" 321 ! evaluate number>string

Fry syntax

The fry syntax provides nicer syntax sugar for more complicated usages of curry and compose. Beginners learning Factor should start with fry syntax, and probably don't need to know about curry and compose at all; but this syntax desugars trivially into curry and compose, as explained in the documentation.

Lexical variables

Code written with the locals vocabulary can create closures by referencing lexical variables from nested quotations. For example, here is a word from the compiler which computes the breadth-first order on a control-flow graph:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list [
[ accum push ]
[ dom-children work-list push-all-front ] bi
] slurp-deque
accum ;

In the body of the word, three lexical variables are used; the input parameter cfg, and the two local bindings work-list and accum.

When the parser reads the above definition, it creates several quotations whose bodies reference local variables, for example, [ accum push ]. Before defining the new word, however, the :: parsing word rewrites this into concatenative code.

Suppose we were doing this rewrite by hand. The first step would be to make closed-over variables explicit. We can do this by currying the values of accum and work-list onto the two quotations passed to bi:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list [
accum [ push ] curry
work-list [ [ dom-children ] dip push-all-front ] curry bi
] slurp-deque
accum ;

And now, we can do the same transformation on the quotation passed to slurp-deque:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list accum work-list [
[ [ push ] curry ] dip
[ [ dom-children ] dip push-all-front ] curry bi
] curry curry slurp-deque
accum ;

Note how three usages of curry have appeared, and all lexical variable usage now only occurs at the same level where the variable is actually defined. The final rewrite into concatenative code is trivial, and involves some tricks with dup and dip.

A lexical closure closing over many variables will be rewritten into code which builds a long chain of curry instances, essentially a linked list. This is less efficient than closure representations used in Lisp implementations, where typically all closed-over values are stored in a single array. However in practice this is not usually a problem, because of optimizations outlined below.

Mutable local variables

Mutable local variables are denoted by suffixing their name with !. Here is an example of a loop that counts a mutable variable up to 100:
:: counted-loop-test ( -- )
0 :> i!
100 [ i 1 + i! ] times ;

Factor distinguishes between binding (done with :>) and assignment (done with foo! where foo is a local previously declared to be mutable). At a fundamental level, all bindings and closures are immutable; code using mutable locals is rewritten to close over a mutable heap-allocated box instead, and reads and writes involve an extra indirection. The previous example is roughly equivalent to the following, where we use an immutable local variable holding a mutable one-element array:
:: counted-loop-test ( -- )
{ 0 } clone :> i
100 [ i first 1 + i set-first ] times ;

For this reason, code that uses mutable locals does not optimize as well, and iterative loops that uses mutable local variables can run slower than tail-recursive functions which uses immutable local variables. This might be addressed in the future with improved compiler optimizations.


Quotation inlining

If after inlining, the compiler sees that call is applied to a literal quotation, it inlines the quotation's body at the call site. This optimization works very well when quotations are used as "downward closures", and this is why most quotations never have their dynamic entry point invoked at all.

Escape analysis

Since curry and compose are ordinary tuple classes, any time some code constructs instances of curry and compose, and immediately unpacks them, the compiler's escape analysis pass can eliminate the allocations entirely.

The escape analysis pass has no special knowledge of curry and compose; it applies an optimization intended for object-oriented code.

Again, this helps optimize code with "downward closures", with the result that most usages of curry and compose will never allocate any memory at run time.

For more details, see my blog post about Factor's escape analysis algorithm.

Sunday, January 10, 2010

Factor's bootstrap process explained

Separation of concerns between Factor VM and library code

The Factor VM implements an abstract machine consisting of a data heap of objects, a code heap of machine code blocks, and a set of stacks. The VM loads an image file on startup, which becomes the data and code heap. It then begins executing code in the image, by calling a special startup quotation.

When new source files are loaded into a running Factor instance by the developer, they are parsed and compiled into a collection of objects -- words, quotations, and other literals, along with executable machine code. The new data and code heaps can then be saved into another image file, for faster loading in the future.

Factor's core data structures, object system, and source parser are implemented in Factor and live in the image, so the Factor VM does not have enough machinery to start with an empty data and code heap and parse Factor source files by itself. Instead, the VM needs to start from a data and code heap that already contains enough Factor code to parse source files. This poses a chicken-and-egg problem; how do you build a Factor system from source code? The VM can be compiled with a C++ compiler, but the result is not sufficient by itself.

Some image-based language systems cannot generate new images from scratch at all, and the only way to create a new image is to snapshot an existing session. This is the simplest approach but it has serious downsides -- it lacks determinism and reproducability, and it is difficult to make big changes to the system.

While Factor can snapshot the current execution state into an image, it also has a tool to generate "fresh" image from source. While this tool is written in Factor and runs inside of an existing Factor system, the resulting images depend as little as possible on the particular state of the system that generated it.

Stage 1 bootstrap: creating a boot image

The initial data heap comes from a boot image, which is built from an existing Factor system, known as the host. The result is a new boot image which can run in the target system. Boot images are created using the bootstrap.image tool, whose main entry point is the make-image tool. This word can be invoked from the listener in a running Factor instance:
"x86.32" make-image

The make-image word parses source files using the host's parser, and the code in those source files forms the target image. This tool can be thought of as a form of cross-compiler, except boot images only contain a data heap, and not a code heap. The code heap is generated on the target by the VM, and later by the target's optimizing compiler in Factor.

The make-image word runs the source file core/bootstrap/stage1.factor, which kicks off the bootstrap process.

Building the embryo for the new image

The global namespace is most important object stored in an image file. The global namespace contains various global variables that are used by the parser, along them the dictionary. The dictionary is a hashtable mapping vocabulary names to vocabulary objects. Vocabulary objects contain various bits of state, among them a hashtable mapping word names to word objects. Word objects store their definition as a quotation. The dictionary is how code is represented in memory in Factor; it is built and modified by loading source files from disk.

One of the tasks performed by stage1.factor is to read the source file core/bootstrap/primitives.factor. This source file creates a minimal global namespace and dictionary that target code can be loaded into. This initial dictionary consists of primitive words corresponding to all primitives implemented in the VM, along with some initial state for the object system, consisting of built-in classes such as array. The code in this file runs in the host, but it constructs objects that will ultimately end up in the boot image of the target.

A second piece of code that runs in order to prepare the environment for the target is the CPU-specific backend for the VM's non-optimizing compiler. Again, these are source files which run on the host:

The non-optimizing compiler does little more than glue chunks of machine code together, so the backends are relatively simple and consist of several dozen short machine code definitions. These machine code chunks are stored as byte arrays, constructed by Factor's x86 and PowerPC assemblers.

Loading core source files

Once the initial global environment consisting of primitives and built-in classes has been prepared, source files comprising the core library are loaded in. From this point on, code read from disk does not run in the host, only in the target. The host's parser is still being used, though.

Factor's vocabulary system loads dependencies automatically, so stage1.factor simply calls require on a few essential vocabularies which end up pulling in everything in the core vocabulary root.

During normal operation, any source code at the top level of a source file, not in any definition, is run when the source file is loaded. During stage1 bootstrap, top-level forms from source files in core are not run on the host. Instead, they need to be run on the target, when the VM is is launched with the new boot image.

After loading all source files from core, this startup quotation is constructed. The startup quotation begins by calling top-level forms in core source files in the order in which they were loaded, and then runs basis/bootstrap/stage2.factor.

Serializing the result

At this point, stage1 bootstrap has constructed a new global namespace consisting of a dictionary, object system meta-data, and other objects, together with a startup quotation which can kick off the next stage of bootstrap.

Data heap objects that the VM needs to know about, such as the global namespace, startup quotation, and non-optimizing compiler definitions, are stored in an array of "special objects". Entries are defined in vm/objects.hpp, and in the image file they are stored in the image header.

This object graph, rooted at the special objects array, is now serialized to disk into an image file. The bootstrap image generator serializes objects in the same format in which they are stored in the VM's heap, but it does this without dumping VM's memory directly. This allows object layouts to be changed relatively easily, by first updating the bootstrap image tool, generating an image with the new layouts, then updating the VM and running the new VM with the new image.

The bootstrap image generator also takes care to write the resulting data with the correct cell size and endianness. Along with containing CPU-specific machine code templates for the non-optimizing compiler, this is what makes boot images platform-specific.

Stage 2 bootstrap: fleshing out a development image

At this point, the host system has writen a boot image file to disk, and the next stage of bootstrap can begin. This stage runs on the target, and is initiated by starting the Factor VM with the new boot image:
./factor -i=boot.x86.32.image
The VM reads the new image into an empty data heap. At this point, it also notices that the boot image does not have a code heap, so it cannot start executing the boot quotation just yet.

Early initialization

Boot images have a special flag set in them which kicks off the "early init" process in the VM. This only takes a few seconds, and entails compiling all words in the image with the non-optimizing compiler. Once this is done, the VM can call the startup quotation. Quotations are also compiled by the non-optimizing compiler the first time they are called.

This startup quotation was constructed during stage1 bootstrap. It runs top-level forms in core source files, then runs basis/bootstrap/stage2.factor.

Loading major subsystems

Whereas the goal of stage1 bootstrap is to generate a minimal image that contains barely enough code to be able to load additional source files, stage2 creates a usable development image containing the optimizing compiler, documentation, UI tools, and everything else that makes it into a recognizable Factor system.

The major vocabularies loaded by stage2 bootstrap include:

Finishing up

The last step taken by stage2 bootstrap is to install a new startup quotation. This startup quotation does the usual command-line processing; if no switches are specified, it starts the UI listener, otherwise it runs a source file or vocabulary given on the command line.

Once the new startup quotation has been installed, the current session is saved to a new image file using the save-image-and-exit word.