Wednesday, April 01, 2009

Sup dawg, we heard you like Smalltalk so we put Smalltalk in your Factor so you can send messages while you roll

A week ago, I went to the PyCon VM summit in Chicago. It was great fun and I got to meet various Internet celebrities such as John Rose (Sun JVM), Evan Phoenix (Rubinius), Charles Nutter (JRuby), Avi Bryant (Seaside), and Allison Randal (Parrot).

Factor's own Daniel Ehrenberg and Doug Coleman were also there, and I've heard rumors that the world-famous C++ programmer Joe Groff may have tagged along for the ride.

Every VM implementor gave a short 10 minute talk. Mine was an overview of the optimizations and implementation techniques used in Factor; you can look at the slides in a recent Factor by running "chicago-talk" run in the listener. A common theme at the summit was the distinction between platforms versus languages. The JVM, Parrot and PyPy people are all vying for language designers to implement their languages on top of these VMs. The idea is that the platform developers take care of the "hard" stuff such as the optimizing compiler, garbage collector, and libraries, whereas the language designer feeds a grammar into a parser generator, and bingo, instant programming language.

While discussing Smalltalk minutiae with Dan and Avi Bryant, we hit upon the idea of trying to compile Smalltalk down to Factor. After all, Factor already has everything you'd want in a platform for hosting languages. We have a great optimizing compiler, a generational garbage collector, C FFI, a ton of libraries, and so on; in theory we've already done most of the hard work for someone who wants to make a language without dealing with low-level details. So while we've written countless DSLs in Factor since the very beginning, why not try implementing a more complete language?

That's exactly what I did, and the code is in the repository under extra/smalltalk. Run it by issuing the following command in the Factor listener:
"smalltalk.listener" run

Then try a Smalltalk expression such as 1 to: 10 do: [:i|i print]., or look at extra/smalltalk/eval/fib.st for a more complete example. Incidentally, this Smalltalk fib runs much slower than a Factor fib, because of Smalltalk return semantics; I discuss the problem below as well as my planned solution to eliminate the overhead.

In theory, there is a simple mapping from Smalltalk to Factor:
  • A Smalltalk class can be a Factor tuple class. Since ivars are pre-declared, the performance complications that Ruby and Python implementations must deal with are avoided.
  • Every Smalltalk selector can map into a Factor generic word; all selectors will be defined in a single vocabulary.
  • Every Smalltalk method can correspond to a Factor method on the selector's generic word, specializing on the class in question.
  • Smalltalk local variables can be translated to Factor's locals. Again, Smalltalk has pretty static scoping semantics, and the fact that locals are pre-declared maps very well onto Factor's model.
  • Smalltalk blocks can map onto Factor quotations.
  • Non-local returns can be implemented using continuations.

I threw together a simple compiler on the car ride back from Chicago to Minneapolis (no, I wasn't the one driving). The compiler takes a Smalltalk AST as input and produces a Factor quotation. The AST is pretty simple:
SINGLETONS: nil self super ;

TUPLE: ast-comment { string string } ;
TUPLE: ast-block { arguments array } { temporaries array } { body array } ;
TUPLE: ast-message-send receiver { selector string } { arguments array } ;
TUPLE: ast-message { selector string } { arguments array } ;
TUPLE: ast-cascade receiver { messages array } ;
TUPLE: ast-name { name string } ;
TUPLE: ast-return value ;
TUPLE: ast-assignment { name ast-name } value ;
TUPLE: ast-local-variables { names array } ;
TUPLE: ast-method { name string } { body ast-block } ;
TUPLE: ast-class { name string } { superclass string } { ivars array } { methods array } ;
TUPLE: ast-foreign { class string } { name string } ;
TUPLE: ast-sequence { temporaries array } { body array } ;

The parser took more effort. I used Chris Double's PEG library to implement it, and followed the grammar from Smalltalk in one page as a guide. The grammar required some restructuring to parse with PEGs because of how the "ordered choice" works. Also sorting out some more obscure details such as message cacades took some experimentation with Squeak and back-and-forth on the PEG grammar. Contrary to popular belief, while Smalltalk has a relatively simple syntax, but its not as simple as Lisp or Forth.

Here is the finished PEG parser for Smalltalk. Most of the file is written in the PEG EBNF DSL (acronym soup!). There are bits of Factor here and there to construct ASTs.

One place I had to improvise was the syntax to define classes and methods. In traditional Smalltalk 80, there's no real concrete syntax for these definitions at all, since they're entered as distinct entities in the graphical browser. I invented a silly syntax which doesn't fit very well with the rest of the grammar; I might change it later.

Once I had the parser and compiler going, I threw together a simple real/eval/print loop and did more testing. Basic features work and there's a small set of built-in Smalltalk selectors mapped onto Factor words. What remains to be done? Quite a lot:
  • Class methods and metaclasses.
  • Adding methods to existing Smalltalk classes.
  • Implementing super.
  • More built-in selectors need to be provided.

All of the first three are easy to map to Factor features, and the last one is just busy-work.

The one language feature that doesn't translate so well is Smalltalk's non-local return. I convert usages of this feature into continuations, which are slow relative to how non-local returns are implemented in production Smalltalk VMs. Usages of continuations which are downward-only need to be optimized. A speedup here would make Factor's exception handling faster too, so it would be very valuable overall.

Because Smalltalk selectors are Factor generic words, and Smalltalk classes are Factor classes, Factor's optimizer can work its magic on Smalltalk code, which is nice. I have plans to speed up Factor's downward continuations and implement inline caching for Factor generic words, and this will directly benefit the Smalltalk implementation too.

Since this implementation is just a demo, I consider it mostly done, but I'll keep hacking on it from time to time. I don't intend to turn it into a serious Smalltalk implementation, but it would be great if someone else took over and did so.

It weighs in at only 1217 lines of code, and that includes 423 lines of unit tests. I'm not sure exactly how much time I spent on it, but I'm guessing around 24 hours in total over the last week, and the bulk of that was tweaking the parser to parse various odd syntax. In particular, the AST to Factor compiler was very easy to write.

So the moral of the story is that implementation-wise, Factor is a modern VM which has a lot of the features that scripting language people wish they had, and plan on implementing sometime next year. It is also very easy to target languages to Factor by parsing them with our PEG library and compiling the AST down to Factor quotations which are handed off to the optimizing compiler. If you want to experiment with language design but don't want to spend five years implementing a language like we did, you should consider using an existing system to host your language on, and Factor is one of several great choices in this regard.

3 comments:

Shrutarshi Basu said...

This is really interesting especially for someone like me interested in languages. In fact I'm thinking on asking my profs to include stack languages as part of our Programming Languages course.

Anonymous said...

Coooool

Jon said...

Are all Factor classes available from within this Smalltalk? Will it be possible to write Smalltalk code that makes use of the Factor GUI? Some simple demo code could be nice ...