Factor: a practical stack language: March 2007

Friday, March 30, 2007

Adding named parameters to Factor

Christopher Diggins shows some C# code to convert code that uses named parameters into stack code. The same algorithm can be implemented in Factor:

! Based on
! http://cat-language.googlecode.com/svn/trunk/CatPointFreeForm.cs
USING: kernel namespaces arrays sequences sequences-internals
math inference parser words quotations ;
IN: locals

: arg-list, ( n -- )
    dup , [ f <array> ] %
    <reversed> [ , [ swapd pick set-nth-unsafe ] % ] each ;

: arg-n, ( n -- ) , [ r> dup >r nth-unsafe ] % ;

: localize ( args obj -- )
    tuck swap index dup -1 > [ nip arg-n, ] [ drop , ] if ;

: point-free ( quot args -- newquot )
    [
        dup length arg-list,
        \ >r , swap [ localize ] each-with [ r> drop ] %
    ] [ ] make ;

: with-locals ( quot locals -- ) point-free call ;

\ with-locals 2 [ point-free ] define-transform

DEFER: |LOCALS delimiter

: LOCALS|
    #! Syntax: LOCALS| a b c | ... a ... b ... c ... |LOCALS
    "|" parse-tokens
    [ create-in dup define-symbol ] map >r
    \ |LOCALS parse-until >quotation parsed
    r> parsed
    \ with-locals parsed ; parsing

PROVIDE: libs/locals ;

The fact that this is so easy to implement inside the language itself is no fluke. The above code uses parser features only found in Factor 0.89, so you need the latest darcs code if you want to run it; I added it to the library as libs/locals.

It works by placing all the inputs to a LOCALS| ... | ... |LOCALS form in an array, which is moved to the retain stack. Named parameter accesses are rewritten into code which moves this parameter array to the data stack, pulls out the right element, and moves it back to the retain stack.

Here is a usage example:

: quadratic LOCALS| x a b c | a x sq * b x * + c + |LOCALS ;

Factor already has (dynamically scoped) variables in the core; using Eduardo's libs/vars syntax sugar, we could write the above code as:

VARS: x a b c ;
: quadratic [ >c >b >a >x  a x sq * b x * + c + ] with-scope ;

However LOCALS| ... |LOCALS is more efficient (and less flexible) because it performs all lookup at compile time -- using variables entails hashtable creation and access. Also, with LOCALS| don't have to define the parameters as symbols first; the LOCALS| parsing word does that for you.

Personally I don't use local parameters in Factor. I think they are a crutch and result in poor reusability. However I won't stop people from using whatever coding style they find most satisfying, and I think having more options is good; especially since it is easy to define new abstractions in library code, without complicating the core language.

Evaluating a quadratic polynomial

Christopher Diggins writes:

I am about to release a version of Cat which supports optional named parameters. The turning point in making this decision to add support for named parameters to a version of Cat occured when I tried to implement a simple quadratic equation with names.

He shows the following program for evaluating a quadratic polynomial f(x)=ax+bx+c, which takes the values x, a, b, c as input:

define quadratic_point_free
{
    [dupd [sqr] dip swap [*] dip] dipd [* +] dip +
}

This is not good stack code.

I'm not writing this to bash Christopher, but I think he has not had much experience actually programming code in a stack language. Implementing an interpreter for any language doesn't force one to practice the idioms and problem solving techniques employed when actually programming in the language; likewise, one doesn't need a deep understanding of the implementation of their language to use it. So I think Chris is committing the classical newbie mistake when learning a language -- trying to relate features of the language to what they already know, and are somewhat surprised to find that they don't map one to one. In Christopher's case, however, he has the freedom to change the language as he sees fit. Since I don't know Cat very well, I can't say if named parameters would be a good thing or not. However, I think a feature like this needs a more compelling use-case than encoding polynomial evaluations.

So why is this code in bad taste? First, there's no stack effect comment. It wasn't hard for me to see that the stack effect is ( x a b c -- f(x) ), but it should be documented. However, the real problem is that this code operates at a level of abstraction which is too low.

Since he is a compiler developer, I imagine Christopher wrote the above code by manually "compiling" it in his head. This is the wrong way to approach a stack language. Instead, I have found two alternative approaches to be far more fruitful:

Write your code to take advantage of stack language idioms, and combine existing words express your problem in a simple way. That is, implement a Domain Specific Language (DSL)!
Write code which writes tedious code and performs this compilation process for you. That is, implement a DSL with more hair!

I'll show how to apply both approaches to solving this problem in Factor. In both cases we use Factor features not found in Cat, but they would be trivial to add. Cat, Joy and Factor are all very closely related, and share the same evaluation model and representation of code as quotations. Factor is compiled and it's compiler transforms are unique among concatenative languages, but a hypothetical compiled Joy or Cat could implement the same functionality with very little effort. Almost everything I write below also applies to low-level Forth.

Without further ado, I'll start with the first approach.

Just because the formula at hand has three additions and two multiplications, doesn't mean the symbols in the program must map one-to-one. The set of polynomials with real (or complex) coefficients forms a vector space. Polynomials are vectors. Polynomial evaluation is the dot product:
ax^2+bx+c = (a,b,c).(x^2,x,1)
Integers are sequences in Factor, and generating a sequence of the form { 1 x x^2 ... } is easy by mapping over an integer with the ^ word:

: powers ( x n -- seq ) [ ^ ] map-with ;

As long as the coefficients of the quadratic polynomial are given as a single object on the stack, a sequence { c b a }, we are done:
: quadratic ( c,b,a x -- f(x) ) 3 powers v. ;
However, we may as well generalize to evaluating a polynomial of any degree;
: poly-eval ( x coeff -- f(x) ) [ length powers ] keep v. ;
For example, to compute f(x) = 1024x^3 + 115x^2 - 71x + 576, you write
{ 576 -71 115 1024 } poly-eval
Certainly a lot simpler than the equivalent using the + and * words directly. And of course it will work with bignums, rationals, floats, and complex numbers. Also the polynomial is now a first-class entity in the language; even though it is just a sequence of numbers, it still stands that we have identified the essence of a polynomial. We are not threading the polynomial through the expression as a series of scalars. This means we can now pass the polynomial around as a single entity, and manipulate it with sequence words; we can add two polynomials together, multiply a polynomial by a scalar, juggle the coefficients around.

So to sum things up, I would consider the following a better implementation of Christopher's function:

: powers ( x n -- seq ) [ ^ ] map-with ;
: poly-eval ( x coeff -- f(x) ) [ length powers ] keep v. ;

This is useful as a general library routine; and it is very simple considering what it does. Also note that the way we approached the problem was different. We solved a general problem and the special case fell out for free. (In fact, the libs/math library already implements polynomial evaluation. This whole discussion is moot.)

Now, let us consider the second approach to problem solving: metaprogramming.

Back to Chris's function; here is a literal translation into Factor:

: quadratic-eval ( x a b c -- f(x) )
    >r >r dupd >r sq r> swap >r * r> r> r> >r * + r> + ;

This looks like the output of an infix DSL compiler. It would be easy to implement an infix math DSL in Factor (easier than embedding Factor features into applicative languages!) and it would probably generate something almost identical to the above output, given input like
INFIX" a*x*x+b*x+c"
However I won't show you how to build an infix DSL. Instead, I'll solve a more limited problem; polynomial evaluation using Horner's Rule. The result will be a poly-eval word called in the exact same manner:
{ 576 -71 115 1024 } poly-eval
But instead, a compile-time transform will generate efficient code which does not actually create or iterate over arrays, or call the ^ word at run time. Such an approach is overkill in most situations. However, if your program needs to evaluate a lot of polynomials, it can speed things up. It also shows that the Factor compiler is very easy to extend, because it is written in, and compiles, a stack language.

Since we will be defining a compile-time transform, we need to write a word which takes a sequence of coefficients, and generates a quotation -- the quotation receives the input on the stack and evaluates the polynomial at that input. This word will be recursive. Consider the base case first. We are evaluating a constant polynomial;

: poly-eval-base ( c -- quot )
    [ nip ] curry ;

For example, the code to evaluate the constant polynomial { 3.5 } is 3.5 nip; keep in mind the top of the stack is the value of x (which is disregarded by constant functions).

Now if we have a way of computing some polynomial g(x), we write a word which computes x*g(x) + c:

: poly-eval-step ( quot c -- newquot )
    [ >r dupd call * r> + ] curry curry ;

The recursion puts the two together:

: poly-eval-quot ( seq -- quot )
    dup length 1 = [
        first poly-eval-base
    ] [
        unclip >r poly-eval-quot r> poly-eval-step
    ] if ;

The above three words primarily take sequences apart and create new ones. The recursion is really a left fold; a left fold combinator could be added to the Factor library, but I haven't needed it yet.

We can test this code first:

  3 { 1 2 2 } poly-eval-quot call .
25

In fact, it is true that 25 = 1 + 2*3 + 2*3*3. As it stands, this new code generating implementation is not any more efficient. Instead of creating an intermediate array of exponents, we create an intermediate quotation of code. The trick is that if the sequence of coefficients is known at compile time, we can generate the quotation at compile time too:

: poly-eval poly-eval-quot call ;

\ poly-eval 1 [ poly-eval-quot ] define-transform

Consider the following word:

: foo { 11.4 23.7 78.22 99.6 -174.67 } poly-eval ;

We can define a specializer for it. This tells the compiler that while the word can take any type of object as input, a certain type is anticipated to happen more than most others; in our case, we optimize for the case where the input is a floating point value:
\ foo { float } "specializer" set-word-prop
Now, take a look at the PowerPC disassembly of the case where the input is indeed a floating point value:

0x059ea500:     lwz     r11,0(r14)
0x059ea504:     lfd     f0,3(r11)
0x059ea508:     lis     r11,1438
0x059ea50c:     ori     r11,r11,42632
0x059ea510:     lwz     r11,0(r11)
0x059ea514:     lfd     f1,3(r11)
0x059ea518:     fmul    f0,f0,f1
0x059ea51c:     lis     r11,1438
0x059ea520:     ori     r11,r11,42648
0x059ea524:     lwz     r11,0(r11)
0x059ea528:     lfd     f1,3(r11)
0x059ea52c:     fadd    f0,f0,f1
0x059ea530:     lwz     r11,0(r14)
0x059ea534:     lfd     f1,3(r11)
0x059ea538:     fmul    f1,f1,f0
0x059ea53c:     lis     r11,1438
0x059ea540:     ori     r11,r11,42644
0x059ea544:     lwz     r11,0(r11)
0x059ea548:     lfd     f0,3(r11)
0x059ea54c:     fadd    f1,f1,f0
0x059ea550:     lwz     r11,0(r14)
0x059ea554:     lfd     f0,3(r11)
0x059ea558:     fmul    f0,f0,f1
0x059ea55c:     lis     r11,1438
0x059ea560:     ori     r11,r11,42640
0x059ea564:     lwz     r11,0(r11)
0x059ea568:     lfd     f1,3(r11)
0x059ea56c:     fadd    f0,f0,f1
0x059ea570:     lwz     r11,0(r14)
0x059ea574:     lfd     f1,3(r11)
0x059ea578:     fmul    f1,f1,f0
0x059ea57c:     lis     r11,1438
0x059ea580:     ori     r11,r11,42636
0x059ea584:     lwz     r11,0(r11)
0x059ea588:     lfd     f0,3(r11)
0x059ea58c:     fadd    f1,f1,f0
0x059ea590:     lis     r12,1
0x059ea594:     ori     r12,r12,39104
0x059ea598:     lwz     r12,0(r12)
0x059ea59c:     lwz     r11,4(r12)
0x059ea5a0:     addi    r11,r11,16
0x059ea5a4:     stw     r11,4(r12)
0x059ea5a8:     addi    r11,r11,-16
0x059ea5ac:     li      r12,43
0x059ea5b0:     stw     r12,0(r11)
0x059ea5b4:     stfd    f1,8(r11)
0x059ea5b8:     ori     r12,r11,5
0x059ea5bc:     stw     r12,0(r14)
0x059ea5c0:     lis     r12,1
0x059ea5c4:     ori     r12,r12,4492
0x059ea5c8:     mtlr    r12
0x059ea5cc:     blrl

The generated assembly will look like gibberish to people who are not familiar with the internals of the Factor compiler (anyone except for me?), so you'll have to take my word for this, but in fact we have the following:

The compiler doesn't do a good job of eliminating redundant loads from memory, and the above code could be a lot better.
But...
There is no recursion, sequence iteration, or sequence allocation in the above assembly.
The only place where a floating point value is boxed on the heap is at the end, after the final value has been computed. The allocation is done with customized inline assembly.
There are no subroutine calls until the end, where the GC check is called.
The input value is unboxed (too many times) and the constant coefficient values are unnecessarily unboxed, because the compiled code refers to boxed float objects not unboxed float values. This could be improved. However unboxing a float is inlined, and only takes two instructions.
All this is despite the fact that the code was generated from the following high-level description of the problem:
{ 11.4 23.7 78.22 99.6 -174.67 } poly-eval

We can compare the performance of the two implementations using the following benchmark:

:my-poly { 11.4 23.7 78.22 99.6 -174.67 } poly-eval ;
\ my-poly { float } "specializer" set-word-prop
: benchmark 1000000 [ >float my-poly drop ] each ;

It evaluates a fixed polynomial on every integer between 0 and 1000000, discarding the results. All arithmetic is done in floating point. Here are the results:

Implementation	Run time (ms)
Naive	2713
Compile-time transform	102

This is a 27x speed improvement!

Here is the grand finale. Suppose the polynomial coefficients are not fixed at compile time, but compile from some input file which is loaded by the user while the program runs. The user might not be aware there is such a thing as a compiler. Well, in Factor, compile-time is any-time. At the time the file is loaded, you can generate the evaluation quotations and compile them just by calling compile-quot. This technique is widespread among high-level languages. There are documented cases where Common Lisp code can beat optimized C by using information only available at run-time to optimize code. In stack languages, higher-order functions and code generation come even more naturally than in Lisp.

Nothing we did called for named parameters. Instead, one has to learn to "context switch" into the stack language mindset. It is a very different way of looking at the world.

Saturday, March 24, 2007

Documenting user interfaces the smart way

One thing that was always painful with jEdit was keeping the code and documentation in sync. For example, take keyboard shortcuts. For every command that I documented, I would manually enter the shortcut in the documentation, and before every release, I'd have to go through documentation by hand and ensure the shortcuts were up to date. Extremely tedious busy-work. At the time, I wasn't smart enough to understand what Emacs really meant by "self-documenting", but now that I've been exposed to more development tools and language paradigms, I'm starting to come around.

For a while now the Factor UI has had "commands", which are a concept found in many other UI toolkits, as well as jEdit, Emacs, etc. A command is an abstraction over an operation which can be performed by the user, and is often associated with a keyboard shortcut, can be added to toolbars and menus, and so on. in Factor 0.89, I refactored things so that commands are just words with some properties set.

Consider the article on the UI walker. For a long time now, the shortcut/command table there is automatically generated, with a markup element which refers to a command map by name, like the following:
{ $commands walker-gadget "toolbar" }
However the keyboard shortcut "C+A+w" is hard-coded there, and in fact as of Factor 0.88, it is out of date; it is just "C+w" now. The problem was that while command maps were "well-known" and were easy to identify by name, commands were not, they were just tuples floating around in hashtables (with no obvious "primary key" for you database people). Now that commands are words, I defined a new markup element which is rendered as the shortcut associated to a given command in a specific command map. For example,
{ $command interactor "quotation" walk }
The result looks exactly the same, but it is kept up to date when command maps change.

This is just the beginning. I want the Factor UI to feel like an extension of the language. This quality is hard to put in words, but perhaps CLIM comes closest to achieving this goal.

Wednesday, March 21, 2007

New inspector

There is a new inspector tool in the UI which can edit slot values now.

It is useful for debugging, as well as prototyping -- you can build a very ad-hoc GUI this way.

For example, consider the following interactive session:

We set variable to a hashtable holding prices of various foodstuffs, then we define a word which takes a list of foodstuffs, computes the total cost, and prints it.

Unfortunately, the price of sausages has gone up, so the prices have to be adjusted accordingly. No problem. First, print out the hashtable holding the prices; when printed it becomes a clickable presentation:

Now, click it, which opens the inspector tool. The ellipsis to the left of every slot is a button which opens the slot editor:

Here we are in the slot editor:

After clicking 'Commit' and returning to the listener, we see the price has gone up:

The inspector can edit sequences, tuples, strings... you can also input complex structures (the slot editor text area is parsed in the parser, it can take a lot more than just numbers).

One limitation is that so far you cannot add new elements to objects which allow that (eg, hashtables, resizable sequences). However the listener-based inspector supports this feature now, and it won't be long before the UI inspector does too.

Mmmmm, hot dogs.

Number of unit tests

The Factor core consists of 34,000 lines of Factor code. The test suite used to test the core is just a dozen or so unit tests shy of 2000. I don't know if this is considered enough tests or not, but it sure seems like a lot. Lately I've been working on the UI and trying to unit test every new feature and fix I apply. This is a bit difficult and has required some refactoring, but I think it is worth it.

Thursday, March 15, 2007

Improved unit testing framework

I noticed a time-waster in my workflow recently. After making changes to code, I run unit tests. The old test framework would run all tests, collecting errors, printing a report at the end. However this report was not granular enough. It would only print which file failed and what the error was, not the specific test that failed. And there was no way to get a traceback after the fact. So I would often invoke test-modules, wait for the tests to finish, look at the report of failed files, and run each one individually to see what test failed, and where it failed, by using :c.

The situation is better now. More detailed information is collected at the end of a test run -- here I intentionally broke the ^ word to return incorrect results, it throws an error when the exponent is zero, otherwise always returns 3:

Now I click "Traceback", which is a presentation of a continuation. The UI presents the continuation in a window:

The code to display a continuation in this manner was taken from the UI single stepper tool. It is a generic gadget now, used by both the single stepper and for presenting continuations.

The best part? The entire unit test framework is 59 lines of code. The gadget to present a continuation (keep in mind this gadget is reusable) is 31 lines. So in total, it only takes 90 lines of code to build a useful unit testing tool in Factor.

Despite the small size, I can't think of many ways I could improve it. Some unit test frameworks out there can generate fancy HTML reports, but if I wanted that, all I have to do is HTTP server module and wrapping the call to test-modules in with-html-stream. Then I'd get clickable links for browsing word definitions in the output.

Tuesday, March 13, 2007

Topological sort

Today I had to implement a topological sort for arrays of classes under the partial order of class inclusion. Since the arrays in question are never large I used a rather naive algorithm. I think it looks nice:

: largest-class ( seq -- n elt )
    dup [ swap [ class< ] subset-with length 1 = ] find-with ;

: (sort-classes) ( vec -- )
    dup empty?
    [ drop ]
    [ dup largest-class , over delete-nth (sort-classes) ] if ;

: sort-classes ( seq -- newseq )
    [ >vector (sort-classes) ] { } make ;

The first word takes a sequence and finds a class which is only a subclass of itself. Such a class always exists, since classes form a lattice.

The second word is the main iterative loop here. It repeatedly removes the largest class and tucks it away by calling , - then deletes it from the sequence. The recursion stops when the sequence is empty.

The last word does the setup and teardown for the algorithm. It copies the input sequence into a fresh mutable vector, and wraps the whole thing in a make so that objects passed to , are collected into a new sequence which is returned at the end.

Multimedia library bindings

Adam Wendt is working on a binding to libMAD, an MP3 decoder. He is also working on a library to parse ID3 tags. Chris Double is working on Ogg Vorbis and Ogg Theora bindings. Cool stuff.

Prettier prettyprinting

So what have I been working on for the last few days? The prettyprinter, which is invoked any time you use a word such as . or see.

This concept does not even exist in many languages, but it is central to Factor. The idea is that given an object, you can print it in a form which can then be parsed back in. Doing this is easy enough. The hard part is formatting the result nicely so that it is useful for human as well as machine consumption.

The prettyprinter is still not as good as a human programmer, but my goal is to make it as close as possible -- when the structure editor is implemented, it will use the prettyprinter to display the definitions being edited. There are more improvements I plan on making, but at some point any the only word definitions which are still formatted badly will be the longer ones, and they can be factored and cleaned up.

Now I will describe some of the improvements I have made.

Keeping quotations together
A very common idiom is to push a pair of quotations and then pass them to a combinator, for example if. The old prettyprinter didn't do anything special in this case, it would just insert line breaks wherever it had to so that the whole thing would fit within the margin:

IN: sequences : head? ( seq begin -- ? ) 
    2dup [ length ] 2apply < [ 2drop f ] 
    [ [ length head-slice ] keep sequence= ] if ;

Now it tries to avoid leaving a quotation at the end of the line:

IN: sequences
: head? ( seq begin -- ? )
    2dup [ length ] 2apply <
    [ 2drop f ] [ [ length head-slice ] keep sequence= ] if ;

Grouping lines by stack flow
While well-factored definitions tend to be short, sometimes a longer one crops up. This happens often in the compiler backend, where you have words which emit a series of assembly instructions, with no good way of splitting this up into several words. The old prettyprinter would turn everything into a big soup:

\ (%dispatch) see
IN: generator : (%dispatch) ( word-table# -- ) 
    "n" operand dup 1 SRAWI 0 "scratch" operand LOAD32 
    rc-absolute-ppc-2/2 rel-dispatch "n" operand dup "scratch" 
    operand ADD "n" operand dup 0 LWZ ;

The new prettyprinter looks at word stack effects and tries to lay things out using a variety of heuristics:

IN: generator
: (%dispatch) ( word-table# -- )
    "n" operand dup 1 SRAWI 0 "scratch" operand LOAD32
    rc-absolute-ppc-2/2 rel-dispatch
    "n" operand dup "scratch" operand ADD
    "n" operand dup 0 LWZ ;

Support for C:
The prettyprinter used to display tuple constructors in their internal, expanded form:

IN: parser : <parse-error> ( msg -- error ) 
    \ parse-error 6 <tuple> file get over set-parse-error-file 
    line-number get over set-parse-error-line column-number get 
    over set-parse-error-col line-text get over 
    set-parse-error-text [ set-delegate ] keep ;

This confused a few people in the last (what is the <tuple> word? What does that number mean? etc). Now, the prettyprinter knows what is a constructor and displays it properly; also note the improved layout:

IN: parser
C: parse-error ( msg -- error )
    file get over set-parse-error-file
    line-number get over set-parse-error-line
    column-number get over set-parse-error-col
    line-text over set-parse-error-text
    [ set-delegate ] keep ;

Better display of >r / r>
The prettyprinter now tries to place a block enclosed in >r and r> on one line if possible.

Before:

M: live-search handle-gesture* 
    drop over search-gesture dup [ 
        over find-workspace hide-popup >r live-search-list 
        list-value r> invoke-command f
    ] [ 2drop t ] if ;

After:

M: live-search handle-gesture*
    drop over search-gesture dup [
        over find-workspace hide-popup
        >r live-search-list list-value r>
        invoke-command f
    ] [ 2drop t ] if ;

Factor: a practical stack language