Sunday, October 30, 2005

Incompatible changes in Factor 0.79

Factor 0.79 is almost ready for release. There is a minor prettyprinter bug, and a show-stopper OpenGL failure on Windows that no doubt will be resolved soon. After that, I have to update the documentation, and it will be released. This release shakes things up a little, so I thought it would be worthwhile to post an entry about the most noticable changes, now that we have a number of active contributors.

First of all, the ifte combinator has been renamed to if.

Next up, literal syntax for various types has changed:


Data typeOld syntaxNew syntax
Vector{ 1 2 3 }V{ 1 2 3 }
ArrayNone{ 1 2 3 }
Hashtable{{ [[ key value ]] ... }}H{ [[ key value ]] ... }
Tuple<< class slots ... >>T{ class slots }
Complex number#{ real imaginary }#C{ real imaginary }

In particular, note that in 0.79, arrays play a much larger role in the language. Formely, arrays were an implementation detail; they lived in the kernel-internals vocabulary, did not perform bounds checking, and were only used internally to implement vectors and hashtables. In 0.79, arrays have a literal syntax, are fully safe, and live in the arrays vocabulary. Arrays are not resizable like vectors are, but are slightly more efficient. This is the only real difference; the same operations work on both, since Factor's sequences are fully generic. Arrays are now the preferred data type for literal sequences. In fact, literal vectors are extremely rare; there's really only two cases where they are needed:

V{ } clone -- make a new, empty vector
[ 1 , 2 , 3 , ] V{ } make -- implicit sequence construction

This is why arrays now have the old vector syntax, and vectors now have a slightly more verbose syntax.

In particular, these changes have made the inspector display much more readable.

Finally, I also wanted to touch on a topic that I have discussed on IRC, but never put in writing. Lists and conses are being phased out over time. This does not affect the upcoming release of 0.79, but eventually I hope that quotations can become a first-class immutable type with underlying array storage; they will reuse the [ ... ] syntax. Code that uses conses to store data will be refactored to use arrays and vectors instead. There are several reasons for this:
  • Conses entail a 2x space overhead, and make the inner interpreter needlessly slow as it traverses the 'cdr' pointer while interpreting code. While the compiler makes this irrelevant, not all code is compiled. During bootstrap, interpretation overhead accounts for a significant portion of run time.
  • If quotations become arrayed, first-class types, then the debugger will be radically improved, and return stack traces will become much more readable due to extra information available at run time. Anybody who has looked at :r output knows what I mean. Its hard to understand if it is just a quotation soup, without any idea of what word each quotation came from.
  • Conses require their own implementations of algorithms such as each, map, head, and so on. Already, some combinators like 2map are only implemented for arrays, and result in a quadratic performance degradation with conses. I realize the "iterator" design pattern is supposed to eliminate this type of duplication, but iterators have performance and complexity of implementation problems.
  • The Factor programming style, which tends to favor the creation of new sequences from old ones rather than direct mutation, virtual sequences, and using combinators instead of explicit recursion, does not really require conses. Most algorithms can be expressed effectively using arrays and vectors instead.
  • Factor's conses are already quite constrained by being immutable. Thus circular lists cannot be implemented, which the main application of conses that is difficult to achieve with arrays.
  • If a specific problem calls for conses, then nothing prevents anybody from implementing a library for working with them.
  • Removal of conses would remove the requirement for the object memory manager to handle headerless objects. This will allow switching to a generational mark-sweep-compact garbage collector, which uses less memory than the generational copying collector we have now. Also, giving each object a header allows incremental GC to be implemented.

The total removal of conses from the core library is quite a major change. However, to keep the language lean and mean, this is something that should be done. Some people might find it surprising that I am still considering major changes to Factor, even as it is more than two years old. However, I did not have the benefit of extensive research and experience in language design before I began this effort, so I'd rather streamline my design while I still can rather than accumulate a large body of historical cruft and scar tissue.

1 comment:

Anonymous said...

Is it possible to write parsing words to consume a quotation and not just a lexical token? All the with-* syntax has always looked backward to my eyes, but with a word that can consume a quotation, one would be able to write something like:

"/foo/bar" with-open-file: [ do stuff with file ]

In fact, reading any arbitrary datatype, perhaps even choosing among several patterns to match, that would make parsing a lot more flexible. The so-called "parser" of forth has always been something of a joke, and limits its ability to create non-trivial DSL's. Factor carries this damage forward from forth. Lisp macros tend to steer you into sexps (LOOP notwithstanding) but they're able to consume any number of forms. Something like prolog DCG's might be a better fit.