Factor: a practical stack language: August 2006

Thursday, August 31, 2006

Distributed concurrency in Factor

Chris Double writes about distributed concurrency in Factor. This is built on top of his existing Erlang-like concurrency library, and uses serialization to pass messages between nodes. Cool stuff.

Wednesday, August 30, 2006

Scheme-based real time music programming environment for OS X

Impromptu looks interesting. I just wonder why it wasn't implemented in Scheme, since they already use it for scripting.

Object serialization

Long-time Factor contributor Chris Double has dusted off an incomplete serialization library and polished it up. You can find the code in contrib/serialize/. Now Factor can serialize objects to binary and back again, preserving object identity, shared structure and circularity. He mentioned he will be using this to extend the concurrency library to support distributed message passing. That is just too cool.

Monday, August 28, 2006

New stack-based language

Robbert van Dalen is working on a new stack-based language called The whole enchilada.

Friday, August 25, 2006

Generic editor hook

For a while now Factor had a jedit word in the core which would jump to a word definition in a running jEdit instance. Well since then, emacs and vim integration has appeared in contrib/, so I thought it is time to make this hook generic. Now there is a new edit word which calls a quotation stored in the edit-hook global variable with a file name and line number on the stack. The jEdit code has been moved to contrib/jedit/. Loading either the jedit, vim or emacs module sets the edit hook. There is also a new feature: the :edit word starts your editor at the location of the most recent parser error.

Thursday, August 24, 2006

New commands in the UI listener

Soon, there will be a CLIM-like unified command framework and keyboard help. For now, I'll just have to write a blog entry about the new keyboard commands:

Command	Shortcut
Stack effect of entered quotation	C+i
Single-step entered quotation	C+w
Completions for word at caret (used vocabs)	TAB
Completions for word at caret (all vocabs)	A+a
Reload word at caret	A+r
jEdit word at caret	A+j
See word at caret	A+s
See word at caret	A+s
Use first vocab containing word name at caret	A+u

The single stepper has some shortcuts too:

Step	s
Step in	i
Step out	o
Continue	a
Step back	b

"Step back" feature in single stepper

The new graphical single stepper in Factor 0.84 can now travel back in time and "step back" through a quotation. This only restores the stack contents, it does not undo side effects. Still, even though the time travel is far from perfect, it is still useful, especially for beginners learning the language and figuring out how stuff works. They can step over some words, then rewind and step in to look at things in more detail.

Friday, August 18, 2006

Even faster bootstrap

After the recent set of changes, Factor 0.84 bootstraps in 2 minutes 9 seconds on my Power Mac G5. This is an improvement of 21 seconds over Factor 0.83.

Wednesday, August 16, 2006

Better emacs integration

Eduardo Cavazos submitted a patch which adds a contrib/emacs.factor module with an emacs word, so you can jump to definitions in a running emacs instance:

\ reverse emacs
{ editor draw-gadget* } emacs

Eduardo also improved the factor.el emacs mode, adding keyword highlighting.

If the emacs integration keeps improving and gains SLIME-like features, I'll probably switch over. I'm getting sick of maintaining the jEdit Factor plugin -- the codebase is a mess (it descends from JFactor) and it feels like a waste of time since eventually Factor will have a structure editor in the image anyway.

More about stack effects

The see word shows the stack effect now. Also -- this patch isn't pushed yet -- but HELP: syntax is now different. Before when documenting words, you would write:
HELP: my-word "( a b -- c )" { $values { "a" "..." } { "b" "..." } } { $description "Foo" }
Now, the new style is either:
HELP: my-word { $values ...
or:
HELP: my-word ( a b -- c ) { $values ...
That is, the stack effect comment is no longer given as a string, but written out literally, and it is optional. If no stack effect comment is specified in the help, the stack effect declared for the colon definition is used.

One thing about stack effect declarations I forgot to mention in my previous post is that if a word unconditionally throws an error, you need to declare this for the word to compile:
: foo-error ( -- * ) "Foo!!!" throw ;
If the output values list consists of a single *, then in fact the word can be compiled in a context where any number of outputs are expected; since it throws an error and execution does not continue past throw, the stack height is irrelevant by the time control flow reaches the word.

You cannot lie to the compiler, at least in a way which breaks stack effect inference and the optimizer. If you specify an incorrect stack effect for a recursive word, inference will fail with an unbalanced branches error. For non-recursive words, a stack effect check is performed at the end to ensure the inferred effect matches the declaration.

What you can do is declare an effect which is too deep:
: foobar ( a b c d -- a b c ) + ;
This word claims to touch 4 stack elements but it only touches 2. However the change in stack height for both the declared effect and the inferred effect is the same, and the word does not touch any values it does not declare, so the compiler lets this one slide. All it does is reduce potential for optimization in places where the word is called, since certain values will be assumed to have changed when in fact they have not.

Factor is still evolving quickly. However it feels like the core of the language is slowly coming together in a way I can be truly happy with, and this was true even before the big cons cell removal/first class quotations change in 0.83. I've mostly been concentrating on the compiler and UI lately and this is where most of the work remains.

Tuesday, August 15, 2006

Formal stack effect declarations

Factor has had a stack effect inference feature for a long time now, since the JFactor days. Previously I used a rather elaborate algorithm to infer the stack effects of recursive words. In the last few days I found some cases where the algorithm would fail. Rather than fix it and make the compiler even more complicated, I've decided to take a different approach.

After all, why make the computer work really hard to figure out what the user wrote in their stack effect comments, when instead the compiler can just read them? ( ... ) is formal syntax now, and may only follow a word definition:

: foo ( a b -- c ) 2 + * ;

If a word is not recursive, the stack effect declaration is optional. If the inferred stack effect does not match the declared stack effect, the compiler issues an error. On the other hand, recursive words now require a stack effect declaration in order to compile. Stack effects of arbitrary quotations can still be inferred as before.

Note that because of how delegation is implemented, all generic words are recursive internally, and need a stack effect declaration in order to compile. And finally, if you add a stack effect declaration to a word that does not have a static stack effect, for whatever reason, the declaration is ignored.

Saturday, August 12, 2006

Smarter 'apropos' tool

Inspired by SLIME, I improved Factor's apropos tool to use an edit distance algorithm with scoring instead of just a substring match as in the classical Common Lisp 'apropos':

Friday, August 11, 2006

Automatically recompiling words

In Factor, new word definitions run in the interpreter until they are explicitly compiled. Furthermore, if a compiled word is redefined, then all words which call it must be recompiled to pick up the new definition.

Previously, when a word was redefined all words which call it would immediately revert to their interpreted definition. This caused two problems. The first one is only relevant to me: if I redefine certain words which are depended on by alien-invoke, then all C library calls will become decompiled, and since C library calls cannot be made from the interpreter, this would usually crash the I/O system. I hacked around this by simply not decompiling words that call alien-invoke, but the same problem exists for other words which are compiler-only, such as alien-callback. The second issue with the decompiling behavior is more annoying: if you load a module which defines methods on core generic words such as nth which are called pretty much everywhere -- for instance contrib/math/ does this, then you're in for a lengthy compile time after the module is loaded, since most words in the system would revert to their interpreter definition and the compiler would run slowly.

Factor 0.84 works differently in this regard. When a word is redefined, the word and all words that call it are added to a "changed words" set, but no further action is taken. At a later point, the user can call recompile, which recompiles the changed words. At no point does any word other than the word being immediately redefined revert to its interpreted definition, so this is faster than the old way. Also there is no problem with alien-invoke and similar compiler-only words; no special action has to be taken.

Words defined in the listener are not automatically compiled. I figured this might be too annoying, given that the compiler is somewhat verbose and all. If people want this feature, I will implement it though.

I made run-file call recompile, so now loading a source file will automatically compile any words defined in that file, together with words which call those words. When loading modules, only one call to recompile is made at the end.

Not all words in the library compile, still. Compiled continuations will increase this number, and at some point I might implement a non-optimizing compiler for words without a static stack effect. The performance gain there might not be worth the added complexity, though. It would make Factor appear cleaner to beginners, though -- I regularly have to explain that the inference errors shown during bootstrap are benign.

Of course the extra compiler activity will mean that Factor will eventually run out of code heap space. This puts pressure on me to implement code GC, which will be coming in 0.85.

I haven't pushed the automatic recompile patches to the main repository yet because I have to work out a few bugs. Nothing serious.

Wednesday, August 09, 2006

C function pointer syntax

I don't like C syntax. Every time I have to use a function pointer in the Factor runtime, I need to go look it up...

I suspect people who dismiss any language that doesn't look like C as being "unintuitive" and "exotic" simply don't know what they're talking about, and most likely are just one hit wonders (only managed to learn a single language, after a great effort. Which would be fine, if only they'd stop posting all over sites like Lambda the Ultimate complaining about Lisp's parenthesis or whatnot).

Reworking the compiler, again

This time I'm refactoring the lowest levels of the compiler, and that is the interface with the runtime system and code heap management. Previously the assembler and compiler would write compiled code directly to the code heap. This originally made the implementation simpler, however there were many problems with this approach:

The assembler was impossible to unit test
There was duplication between the code to fix up forward references in the compiler, and the code to relocate compiled blocks when loading the image in the runtime
Because the compiler could leave the code heap in an inconsistent state, great care had to be taken to deal with literals referenced from compiled code. In particular, they were centralized in one table which could fill up. And code GC was out of the question.

In the new design, the compiler calls a runtime primitive, passing it a bunch of vectors holding machine code and relocation information. This has simplified the Factor side of things and removed the duplication, but made the runtime somewhat more complex. However now that the compiler no longer accesses the code heap using pointer arithmetic primitives, these primitives can be moved out of the runtime and become compiled intrinsics. This open coding of memory access, boxing and unboxing should improve performance of C library calls, among other things. Looking further into the future, the improved code heap structure will allow me to implement call traces for errors thrown from compiled code, a garbage collector for compiled code, and compiled continuations.

Perhaps this refactoring allows cross-compilation, even though I have no plans to implement such a feature at this stage; since we're not writing to a code heap, but simply constructing vectors of numbers, the compiled code could easily be dumped to a file.

Here is a fun example: we can look at the machine code corresponding to assembler sequences -- a PowerPC snippet follows:

  [ 0 1 72 LWZ 24 3 LI ] { } make .
{ 2147549256 945815576 }

In particular the assembler now "assembles" machine code by calling the , word just as in traditional Forth.

Monday, August 07, 2006

Gadget models

Chris Double writes about using models to implement a clock. I've been working on a simple dataflow system based around "models" and "connections" for Factor's UI. When the design settles down, I'll document it, but basically you have your typical observable value cells -- these are "models" -- together with wrapper models which combine the values of other modules or change them in some way. When one model updates the change can propagate and affect other models which can eventually change what some gadget is showing.

Apple announcements

The Mac Pro looks nice, but my G5 won't start feeling slow until there's an 8 processor Intel Mac. More interestingly, OS X 10.5 will support 64-bit Carbon and Cocoa, on both Intel Xeon and PowerPC G5 architectures. I guess none of the features in 10.5 are terribly compelling, but I'll probably buy a copy as soon as it is out -- so that I can port Factor to 64-bit PowerPC! Bigger fixnums means less bignum usage and less consing.

JSON parser

Chris Double has coded up a JSON reader and writer in Factor. The implementation is nice and simple.

A funny coincidence is that a Java implementation was announced on the JRoller front page today. Their code is about 10 times more complex, plus it has a bunch of dependencies like Log4J and every single commons-* package ever to come out of Apache.

Sunday, August 06, 2006

Lambda calculus interpreter in Factor

Matthew Willis has implemented a lambda calculus interpreter in Factor. You can enter and evaluate lambda expressions. To start the evaluator, do:

  "lambda" require
  USE: lambda
  lint-boot lint

Now try evaluating some expressions. First, an identity function:

  ((x.x) 5)
OK:5

Something more complicated:

  (((x.(y.(y x))) 1) (x.x))
OK:1

Friday, August 04, 2006

New web framework

Chris Double has been working on a new web framework in Factor. From his summary:

I'm working on a new web framework for Factor. My plan is to have web applications built as a collection of concurrent processes running on the server and the client browser. Processes on the server can send messages to the browser (using a 'comet' style connection) and vice versa.

Among other things, this includes a JSON parser/unparser, implemented using his own parser combinators library. Cool stuff.

Thursday, August 03, 2006

VIM integration

Doug Coleman has implemented a little hack which allows you to jump to definitions in VIM. The code is in the darcs repository. First load the "vim" module and (optionally) save your image:

"vim" require
save

Then you can pass definition specifiers to a "vim" word:

\ = vim
{ editor draw-gadget* } vim
"handbook" <link> vim

Wednesday, August 02, 2006

Definition protocol

For a long time now, you've been able to look at a word definition in the listener:
\ append see
Jump to its definition in jEdit:
\ append jedit
Or reload the source file containing it:
\ append reload
Or even remove the word definition -- just don't do this with core words:
\ testing forget
Well, I've stolen yet-another-InterLisp-feature, and added a generic protocol whereby other types of definitions can be used with the above four words. You can jump to method definitions, even if they're defined in a different source file than either the class or the generic word:
{ string = } jedit
You can also jump to help article definitions:
"handbook" <link> jedit
And of course, see, reload and forget also works with methods and help links.

Previously I used Factor's tools to jump to word definitions, but had to resort to jEdit's directory search feature to find help articles and method definitions. Not that directory search is bad, but even a savings of a second here and two seconds there adds up.

Can your programming language do this?

Joel Spolsky is trying to explain why functional programming matters to mainstream programmers. I think he's wasting his breath though -- its hard to explain to a certified Struts engineer why copy and paste coding is bad. He makes a funny comment about Java's anonymous inner classes being used as poor man's closures:

If your programming language requires you to use functors, you're not getting all the benefits of a modern programming environment. See if you can get some of your money back.

The irony is that we have a company (IntelliJ or whatever they're called these days) whose entire business model is based around selling nothing more than a 400$ text editor which has some auto-expand templates for common Java boilerplate, like iterating over an array.

Tuesday, August 01, 2006

#concatenative IRC channel

If you look at the channel logs, you'll see our IRC channel has been pretty active lately. It is good to see there are several new people interested in Factor.

Factor can :help you

I added a new word, :help. It shows documentation for the most recent error. Of course the longest part was not writing this word definition, but documenting all the errors that can occur in more detail. Here is a screenshot:

Try doing this in a language without an interactive environment...

Self 4.3

There is a new release of Self out. It is not that new, but I have just heard of it. Self is one of the original prototype-based languages. It features a runtime optimizing compiler which profiles your code and inlines the methods called most; also its Morphic UI heavily inspired Squeak's Morphic UI. Self comes from Sun labs. While Java's HotSpot compiler is heavily influenced by Self, the UI and interactive development environment never made it into any of the mainstream languages.