Factor: a practical stack language: December 2006

Wednesday, December 27, 2006

Announcing apps/help-lint

Lately I've been building up a small library of routines for automated checking of Factor's documentation. My most recent effort was the doc test clone for testing code examples. Well, today I cleaned things up, added some new inspections, and put everything together into a new apps/help-lint module. It now performs the following checks against word documentation:

ensures examples run and produce stated output
ensures $see-also elements don't contain duplicate entries (I always make this mistake!)
ensures $module-link elements point to modules which actually exist
ensures that $values match the stack effect declaration
ensures that word help articles render (this catches broken links, improper nesting, etc)

Some of these checks will make their way into the core of the help system over time, others will remain as stand-alone tools.

factorcode.org now running latest darcs Factor; UI documentation

The Factor web site is now running the latest code. In particular, this means you can browse the latest Factor documentation online. This includes documentation for the UI, which is something I've been working on for the last couple of weeks. The documentation is always a work in progress, but I think the fundamentals of the UI are now well-documented and people should have a much easier time writing their own gadgets.

The previous HTTPD instance running on Factor 0.85 never crashed, and ran continuously for a month and a half. I hope 0.87 keeps it up, but who knows, the web site might be down by the time you read this!

The parse time -vs- run time duality

In Factor, anything which can be done via syntax in a source file can also be done at runtime. This lets you construct code on the fly, and define new words with the new code, just as if they were written in a source file:

Task	Parsing word	Ordinary word
Defining a word	:	define-compound
Defining a symbol	SYMBOL:	define-symbol
Defining a generic word	GENERIC:	define-generic
Defining a generic word	G:	define-generic*
Defining a method	M:	define-method
Defining a tuple class	TUPLE:	define-tuple

Monday, December 25, 2006

Another web service example in Factor

Daniel Ehrenberg has an example of using libs/xml and libs/http-client to search the web using the Yahoo! REST API. The code is very short, I like it. Factor makes XML usable.

Saturday, December 23, 2006

Scheme 48 1.4 released

Scheme 48 is a notable Scheme implementation. The VM is implemented in a low-level Scheme-like language PreScheme which is translated to C. This is the same implementation technique as used by Squeak.

Friday, December 22, 2006

Funny quote

Seen on JavaLobby. This is simply the best thing I have ever read in my entire life! It made my day, no, it made my week! I love it. Java programmers are awesome.

"Closures are the OO euphemism for function pointers which in OO terms is a dirty word because pointer is a dirty word."

Tuesday, December 19, 2006

Google removes SOAP search API

This is sad. I understand why they do this (there's no ad revenue from web services) but it comes just as Chris Double added a Google search client to apps/. It would have been nice to use this client to implement searching for web sites written in Factor, such as Matthew's upcoming weblog app, parsing the results of the SOAP call and displaying them as HTML. But now we'll have to find another way to do that.

JazzScheme: Scheme environment and IDE for Windows

One of the members of MSLUG has open-sourced his JazzScheme project, which he has been developing for 10 years now.

Weblog software in Factor

Matthew Willis has been working on a weblog implemented in Factor. He's using the following libraries:

/libs/httpd/ to serve pages
/libs/furnace/ for action handling and templating
/libs/tupledb/ for O/R mapping
/libs/sqlite/ for persistence

Monday, December 18, 2006

Solved a conjecture

I proved Conjecture 5.9 on page 14 of Barry Jessup's paper in the case of two-step nilpotent Lie algebras. I had a proof for the case where the dimension of the center was equal to 2 a few weeks ago, and discovered a nice generalization this morning after my supervisor suggested some simplified notation to clarify the argument. The proof is nothing special; there are some tricky bookkeeping details and it runs to about 5 pages since it is a double induction, however it does not use any advanced tools from algebraic topology. I'm still proud I managed to prove it, because I've been working on this since April. Generalizing it to the case of an arbitrary nilpotent Lie algebra (or finding a counter-example) should be fun, and might require the use of spectral sequences. The current proof relies on what is essentially a spectral sequence collapsing at the first term. This is the second original result of my thesis. My previous result shows that the central representation is faithful for a certain class of two-step algebras. This latest result shows that it is non-trivial for all two-step algebras (however we know of two-steps where the central representation is not faithful).

No math

I find it interesting that relatively few Factor words in the standard library directly call arithmetic words.

  { + - * / /f /i < <= > >= shift bitand bitor bitxor bitnot neg recip }
  [ usage ] map concat prune length .
405
  all-words length .
6561
  6561 405 /f .
16.2

That is, less than 1 out of 16 words directly calls an arithmetic word.

What about APL-style vector math?

  { v+ v- v* n*v v*n v/ v. norm vneg vmax vmin }
  [ usage ] map concat prune length .
55

Right now vector math is mostly only used in the UI, however it is used to great effect. Layout calculations become much simpler.

Many (most?) source files don't even USE: the math vocabulary.

Kind of makes the arguments about postfix arithmetic not being intuitive a moot point.

UI documentation

Documentation for the UI library is something which many people have asked for and it is long overdue. I have been working on it for the past week; so far I've managed to document most of the relevant words, and I'm now writing help articles which tie everything together. This is the last major piece of work I plan to finish before 0.87.

I implemented something else today which I was previously planning on deferring until 0.88; stack traces for errors thrown from compiled code. I had some half-working code for this some weeks ago; I dusted it off and made it more robust. It still needs testing on Windows and Linux/PPC (the set of people who use Linux/PPC and follow Factor seems to have cardinality one, right now, though).

Interestingly, the old Java implementation of Factor could produce stack traces for exceptions thrown by compiled words. This was done by wrapping every compiled word in a try/catch which would cons the word onto a list stored in a global variable in the catch handler. Then when the stack was unwound to the top-level interpreter loop, this variable would be passed on to Factor code which handles the error. CFactor does this in a much more hardcore way, by walking the native stack after the error is thrown.

Cleaned up embedded.factor

Matthew Willis reported a bug against the embedded.factor templating library in the Factor HTTP server, which I now fixed.

Recall that embedded.factor evaluates Factor code between <% and %>, and outputs the rest as text.

I wanted to post the code because I also cleaned it up and now it looks really nice. Factor is all about writing simple code to get useful things done, and I don't think many languages manage to implement a web templating solution with less complexity than Factor:

! Copyright (C) 2005 Alex Chapman
! Copyright (C) 2006 Slava Pestov
! See http://factorcode.org/license.txt for BSD license.
IN: embedded
USING: sequences kernel parser namespaces io html errors ;

: process-html ( parse-tree string -- parse-tree )
    dup empty? [ drop ] [ parsed \ write-html parsed ] if ;

: process-embedded ( parse-tree string -- string parse-tree )
    "<%" split1 >r process-html r> "%>" split1 >r (parse) r> ;

: (parse-embedded) ( parse-tree string -- parse-tree )
    dup empty?
    [ drop ] [ process-embedded (parse-embedded) ] if ;

: parse-embedded ( string -- quot )
    [ f swap (parse-embedded) >quotation ] with-parser ;

: eval-embedded ( string -- ) parse-embedded call ;

: run-embedded-file ( filename -- )
    [
        [
            file-vocabs
            dup file set ! so that reload works properly
            dup <file-reader> contents eval-embedded
        ] with-scope
    ] assert-depth drop ;

: embedded-convert ( infile outfile -- )
    <file-writer> [ run-embedded-file ] with-stream ;

Bear in mind that this implementation properly supports mixing combinators with templating, so for example you can write the following in the middle of your template:

<% 10 [ %> <p>Hello</p> <% ] times %>

And as a final note, I'm aware that mixing markup and logic is a bad thing. However Factor encourages factoring problems into many small words, so the template should not be doing any heavy processing, and instead only calling a handful of words here and there, to achieve the same effect (but less complexity and more elegance) than JSP taglibs and such.

Wednesday, December 13, 2006

'doctest' in Factor

The doctest module for Python scans your source file for code examples in docstrings, and evaluates them to check against the correct result. The documentation for this module seems to suggest it is a rather non-trivial affair (and looking at the code confirms this). In comparison, implementing this feature in Factor takes a few minutes:

: all-word-help ( -- seq )
    all-words [ word-help ] map [ ] subset ;

: all-examples ( words -- elements )
    [ \ $example swap elements ] map [ empty? not ] subset ;

: example-valid? ( elements -- ? )
    1 tail
    [ 1 head* "\n" join eval>string "\n" ?tail drop ] keep
    peek = ;

: incorrect-examples ( words -- elements )
    all-examples
    [ [ example-valid? ] all? not ] subset ;

all-word-help incorrect-examples [ . ] each

Factor's help is written in a structured markup language where examples are explicitly denoted as such (and in the UI, you can click them to evaluate them). So all you have to do is iterate over all words in the dictionary, extract their documentation, collect $example markup elements, and ensure they do the right thing. Note that $example elements look like so:

{ $example "2 2 + ." "4" }

All but the last element of the array are strings which are joined together with newlines in between to form the expression to be tested, and the last string is the expected output.

Using this tool I actually found some typos in the Factor documentation. I think the idea is great, and while Python makes it easier than something like Java (where you'd have to write a Doclet and use BeanShell, or something), Factor makes it even easier because it has an help system with a markup language, not just docstrings.

I have some other bits of code sitting around for checking consistency of parameter names in documentation, and checking for broken links and markup. One day I'm going to put them together into a single 'lint'-style tool for documentation.

Factor module system considered harmful?

Eduardo Cavazos doesn't like the current Factor module system, and has a proposal for a replacement. In a nutshell, he wants to unify the concept of a module with a vocabulary, and remove the conceptual overhead that goes with the distinction. I agree with most of his ideas. I wrote a lengthy post to the Factor-talk mailing list in response to an e-mail from Eduardo which discusses Eduardo's module system idea, and suggests some additions/improvements which would fix even more shortcomings of the current module system. I've reproduced the post below:

Hi Eduardo,

The more I think about it the more I like your idea. It would solve some
problems I perceive with the current setup. The first one, I think, is
the one you're focusing on, but the others strike me as easy
consequences of the design:

1) Too many concepts. Vocabularies, modules, source files... USING:,
REQUIRES:, IN:, PROVIDE:. The module system feels hacked on to an
existing core which has a notion of vocabularies, and this is exactly
what happened. I implemented the module system initially as a half-hour
hack to make it easier to load libraries from contrib/ without manually
loading dependencies. It did this job well, but now we both want to use
it to /structure code/, and later on to edit code in the UI, which is
something else entirely.

2) Suppose you have a source file where you write,

IN: foo
: a ;
: b a ;

You load the source file into Factor. Then you add a new word definition
: c b ; but you add it _before_ the definition of b. The file still
loads, however, because the definition of b is present in the vocabulary
when c is parsed.

But if you start Factor with a clean image, the source file won't load
(and bottom-up ordering of definitions a good thing; in CL,
automatically interning symbols in the wrong package can be a source of
problems).

--

The new module system can fix the problem (which is that the modified
source file could load without an error in the first place) as follows.
You start by marking words in a module with a special flag before
reloading the module; then as each definition is read, the word's flag
is reset. If a definition refers to a word with the flag still set, a
parse-time error is raised. When the module is finished loading, the
module system can also check if any words which have the flag set are
still referenced by words in other modules; if so, a message can be
printing instructing the programmer to either reintroduce the word, or
refactor those modules, etc.

3) Suppose you have a source file X.factor where you write,

IN: x
: a ;
: b a ;

And another file Y.factor,

USING: x ;
IN: y
: c b ;

You load X.factor and Y.factor, then you decide to move the b word from
the x vocabulary to the y vocabulary. So you move the word and reload
the two files, but the definition of b in x is still in the dictionary
and this can cause confusion when testing the code, or if y has a USING:
x ; after the IN: y.

--

The new module system can fix this, again, by using 'obsolete' flags on
words. After loading a module, any words which are marked as being old
have not been redefined by the updated source file, and thus they can be
forgotten.

4) Source files are allowed to define words in arbitrary vocabularies.
By unifying the notion of a source file and a vocabulary, you completely
do away with this 'feature'.

--

Redefining words in other vocabularies is not a problem in itself, but
it is something I want to move away from. Right now for example you
cannot load the Cocoa and X11 UI backends at the same time, and use the
Cocoa instance while running Factory, because the two UI backends will
redefine the same DEFER:'d words.

Running the Cocoa UI and Factory at the same time is contrived use case,
but with a structure editor you want to load code before editing it, and
not being able to load certain modules is totally unacceptable. So
instead, I want to restructure the UI code, as well as I/O and compiler
backends, so instead of redefining words it uses generic words,
dispatching in a singleton tuple representing the "backend" in question.

If the module system enforced such a design, it would motivate me to do
this refactoring faster :)

Now I'm completely against a language which enforces *arbitrary* design
decisions, because it requires bending over backwards to implement
certain things; for example, something that only supports
message-passing OOP is too restrictive in my opinion. But enforcing
design decisions which demonstrably lead to code which is cleaner and
easier to write is a good thing, for example memory safety and garbage
collection. (I'm not going to get into whenever I think static typing
enforces arbitrary design decisions or improves code quality, and I
don't want any replies to this thread to turn into a dynamic -vs- static
discussion, please, since this has nothing to do with the module
system.)

5) A continuation of the theme in 1/2/3/4. I've run into this problem
from time to time. Suppose you have a file a.factor,

IN: jelly

: toast ... ;

And another file, b.factor,

IN: jelly

: toast ... ;

You forgot that a 'toast' word already exists, and redefined it with a
different meaning! Then you test some caller of 'toast' in a.factor only
to discover it doesn't work. Of course unless you decide to put the
words in a.factor and b.factor in different vocabularies, the new module
system as Eduardo describes it won't solve this issue outright, since
the naive programmer could simply concatenate a.factor and b.factor into
one:

IN: jelly

: toast ... ;

... words using toast #1

! Oops, didn't notice this is a redefinition... : toast ... ;

... words using toast #2

However, just like referring to a word with the 'obsolete' bit still set
could be a parse time error, /redefining/ a word with the 'obsolete' bit
/not/ set could be an error too, unless the old definition was a DEFER:.

6) Documentation will be simplified. Right now we have several parallel
conceptual 'trees' in Factor;

- vocabularies contain words - modules contain source files which
contain definitions (where definitions includes words, methods, etc) -
the documentation for the core library is rooted at a 'handbook' page
which links to multiple articles, each one documenting a major subsystem
of Factor, but many such top-level articles don't really correspond to
modules and vocabularies

If modules and vocabularies were unified, I could redo the core
documentation in a more structured way, with each vocabulary/module
having a main documentation page, containing as subsections more
articles, word documentation, and so on. The handbook could still start
with an 'entry points' page with links to the most commonly needed
topics, much like the 'cookbook' now.

Note that all this could be done without drawing a distinction between
'loading' and 'reloading' from the viewpoint of the user, and even more
importantly it would not require extra bookkeeping beyond tracking
loaded modules and words (for instance, implementing the first 3 items
can be done with the current setup, by recording a list of definitions
loaded from every source file in a new data structure, but maintaining
this would be extra burden).

7) What should be done with unsafe words in -internals vocabularies is
an open question. I'm against having private words and such, because I
believe any word should be interactively testable. One way to get around
this is to have private words, but also have another parsing word
FRIEND: which is like USING: but makes private words available. But this
would complicate the implementation (and description!) of the vocabulary
search path. Instead, it might be best to simply split off -internals
vocabs into separate files and treat them like normal
modules/vocabularies.

Let me know what you guys think,

Slava

Monday, December 11, 2006

Compiling Factor to JavaScript

Chris Double writes about compiling a subset of Factor to JavaScript. I'd like this to evolve into something which lets me write web apps which use client side JavaScript to manipulate the DOM and communicate with the server ("AJAX"), without having to write JavaScript by hand, or write code which constructs JavaScript code by concatenating strings. Instead, I want to write the whole web app in Factor, with some code running client side and some server side. Perhaps Chris Double's concurrency library can be used to communicate between Factor on the server and JavaScript Factor too.

Blogger sucks

I know many people idolize Google (and this seems to give their employees unlimited license to blog with complete ignorance), but other than the search engine and Google maps, they don't have any good products. Everything else is typical Java web trash. Blogger seems to be suffering from more reliability problems lately, and the "publishing" process is annoying. Also I don't like the way newlines in the blog entry editor are converted to <br>'s; this makes it hard to paste HTML from another source, you have to merge it all into one line first (but there might be a way to disable this line break behavior). All in all, blogger might be great if you just want to post rants about how dynamically typed languages don't support refactoring (ahem), but it is a poor product overall. Still beats JRoller, though.

Sunday, December 10, 2006

Live music performance with Squeak

Craig Latta, lead developer of Spoon (a Squeak fork with an emphasis on running developer tools in a different image from the application being developed) uses Squeak for live music performance. Cool stuff.

Thursday, December 07, 2006

Generating a minimal image

I played around with bootstrap and the module system today, and got Factor to generate a 600kb factor.image. The image does not include the compiler and UI, however it does display a listener, and has all the math and collections words in it. If there is any interest in this, I will clean this facility up and make it easier to use. Personally I don't have much interest in making images small. I think the current size of around 6mb is fine, considering it includes full documentation and developer tools.

Division by zero hoax

James Anderson is perpetrating an elaborate hoax, claiming to have finally "solved" the issue of division by zero in the real numbers. The BBC is apparently unaware that this is a joke.

As so many others who have tried to "solve" division by zero have found out in the past, issue here, of course, is not that one can't divide by zero; you can modify the axioms of the real line and define division by zero in any way you want. The issue is that changing these axioms makes the real line no longer satisfy the axioms of a ring, and so much of algebra and topology can no longer apply. Of course one is free to invent their own ring theory and topology, but why bother? A century of research into abstract algebra will no longer hold, and it is not clear what good such a number system is.

All in all, James does a great job of parodying how out of touch with reality computer science professors really are. The paper is written in Microsoft Word, not LaTeX -- check out the unreadable notation and poor formatting, its hilarious.

Wednesday, December 06, 2006

Don't be a victim

Protect yourself against velociraptor attacks. As for this, I don't know what to say.

Tuesday, December 05, 2006

Recent UI improvements

Since the dawn of time, Factor's prettyprinter would print a "#" in place of a sequence when nesting was too deep, and "&" in place of a circular reference. I have enhanced this so that these symbols are printed as presentations of the object in question. So if you print an object while nesting depth is limited, you can still descend deeper into the object's structure simply by clicking. Of course the inspector has always offered this and it is still around, but it is useful to be able to do this with prettyprinted output as well.

Another feature I implemented is proper multi-line input handling. In the UI listener, if you type the start of an incomplete expression, for example : foo, pressing Return automatically inserts a newline. What the UI listener does is attempt to parse the input, and if it does not parse completely, it simply retains the contents of the input area, instead of passing it to the evaluator. This is very nice for entering multi-line expressions.

Factor 0.86 and previous versions already supported multi-line input in the UI listener but in a more awkward way; if you pressed Return with a partial expression entered, it would go to the evaluator and disappear from the input area, at which point you could still continue entering the remainder of the expression, but you could not edit what had already been parsed. You could also explicitly insert a newline with Shift+Return. I think the new approach is far more usable, especially together with the improved parse error handling as described below.

If you get a parse error, instead of simply dumping the parse error to the output area and forcing you to pick the previous input from the history, fix it, and try again, the listener now positions the caret at the location of the parse error and displays the parse error in a popup (which idiotically obscures the input field right now, but that's going to get fixed).

Finally, if there are any restarts, for instance offering to add a vocabulary to the vocabulary search path so that a word can be found, they are presented in a keyboard-navigatable list. No more trying to type 0 :res, mistyping it as 0 ;res, and losing your restarts, forcing a trip to the history, and another attempt at entering the input! Instead you use the Up/Down arrow keys and the Return key to pick the relevant restart.

I'd post some screenshots, but the appearance is not finalized and its hard to get a feel for the interactive behavior of the new listener in screenshots. I think when 0.87 is out, I'll try my hand at making a screencast instead.

I think the UI listener is now rather sophisticated, offering better usability than most other interactive evaluators I've seen. The big hole in the functionality right now is that you cannot select and copy text from the output area. This is not trivial to implement, since the output area actually consists of gadgets, one for every piece of styled text or presentation which was printed. So to be able to select and copy I would need to implement some kind of cross-gadget text selection protocol. However McCLIM already does something like this, its not rocket science, and it shouldn't be too hard to implement.

Factor 0.87 approaching

Factor 0.85 introduced code GC, and 0.86 introduced the local roots refactoring. Both of these changes had their fair share of bugs. I think 0.87 shakes most of them out; I used a simple looping regression test:

: foo \ = reload foo ; foo

What this does is reload the source file defining the = word, which in turn causes most of the library to be recompiled. The word then calls itself, causing an infinite loop. Eventually the code heap fills up and code GC is performed.

Factor 0.86 would crash pretty quickly with this test, because of problems with local root recording. I would modify the runtime to trigger a GC after every heap allocation, which always moves objects, and thus always breaks code which does not record local roots properly. At first this would crash Factor after a few seconds, but after working these issues out, I hit a new one; Doug Coleman reported that this tests crashed Factor on Windows after a few hours. I managed to reproduce it and the cause was surprising: if the last block in the code heap was allocated and not free, the free list would be corrupted and all blocks from the last free block onwards would be marked free. Oops!

I don't claim Factor is bug-free, but I think 0.87 is a lot more stable than 0.86. Every time I make a major change to memory management, it takes a few point releases to shake the bugs out; the last time this happened was with generational GC some time in the 0.7x series. Overall I've had virtually no "Heisenbugs" which were impossible to reproduce, unlike my experience with jEdit. In fact most Factor bugs I've had fell into one of several categories:

Insufficient coverage of corner-cases; this describes the code GC bug above.
Portability breakage: stuff that works on one platform but breaks on another because some API call doesn't behave as one would expect
Insufficient validation: some runtime primitives and low-level words would not validate their inputs enough, and create inconsistent objects and/or cause some form of corruption when handled bad values.

Doug Coleman has continued enhancing his random compiler tester in order to catch the last type of bug. When he first started his project I was somewhat skeptical, however as the tester has grown in sophistication it has uncovered some real bugs.

One thing I have noticed is that I rarely run into runtime type errors, or stack balancing errors. Logic errors certainly account for the bulk of my bugs. However I have found that compiler-checked stack comments are nice to have. I am still considering adding mandatory static stack effects, and optional static typing, at some point in the distant future, however I don't have any concrete plans yet.

Factor: a practical stack language