Monday, March 31, 2008

Inheritance, tuple reshaping, cleavers, and language evolution

About a year ago I implemented smart tuple reshaping, which updated instances of a class in memory when the class was redefined. Now that I'm working on inheritance, one of the most important sub-projects is updating this feature. The old implementation was already quite hairy, and I thought that updating it to support inheritance would make it intolerably so, however after some refactoring I simplified it down considerably.

Tuple reshaping and inheritance

The best way to demonstrate the feature is with live code examples. Note that they work with Factor from git; in fact, inheritance is mostly in place, the only thing that needs to be updated is method dispatch -- of course inheritance is useless without methods, but with all the other changes in place this remaining task is comparatively easy.

First a datatype:
TUPLE: bytes n ;
: bytes \ bytes construct-boa ;
: megabytes 20 shift bytes ;
: gigabytes 30 shift bytes ;

Let's define a class to represent computers; persumably, this is a high-end, million-dollar enterprise systems management app:
( scratchpad ) TUPLE: computer ram cpu ;

Now a subclass for servers:
( scratchpad ) TUPLE: server < computer ;

A constructor:
( scratchpad ) C: <server> server

This is the 21st century and we have some pretty high-end gear:
64 megabytes "Pentium II" <server> server set

Let's check out our server in the inspector:
( scratchpad ) server get describe
server instance
"delegate" f
"ram" T{ bytes f 67108864 }
"cpu" "Pentium II"

Is our server a computer?
( scratchpad ) server get computer?

Oh crap, we forgot to define a slot for the disk. No problem, let's redefine the server type:
TUPLE: server < computer disk ;

What happened to our server?
( scratchpad ) server get describe
sserver instance
"delegate" f
"ram" T{ bytes f 67108864 }
"cpu" "Pentium II"
"disk" f

Notice the new slot that appeared. Let's give it a value:
( scratchpad ) server get 2 gigabytes >>disk drop

But hey, the disk slot should really go in the computer type. All computers have disks, not just servers. That's not a problem:
( scratchpad ) TUPLE: computer disk cpu ram ; TUPLE: server < computer ;

What happened to our server?
( scratchpad ) server get describe
server instance
"delegate" f
"disk" T{ bytes f 2147483648 }
"ram" T{ bytes f 67108864 }
"cpu" "Pentium II"

Now we want to add a new type for electronic equipment.
( scratchpad ) TUPLE: electronic-device voltage ;

Is our server an electronic device?
( scratchpad ) server get electronic-device?

Well, no it isn't, there's not type relation here. But we can redefine our computer type to inherit from the electronic device:
TUPLE: computer < electronic-device disk cpu ram ;

Tada! The object is now an instance of a type it wasn't an instance of before:
( scratchpad ) server get electronic-device?

Let's take one last look at that server:
( scratchpad ) server get describe
server instance
"delegate" f
"voltage" f
"disk" T{ bytes f 2147483648 }
"ram" T{ bytes f 67108864 }
"cpu" "Pentium II"

Now we could go on and assign it a voltage, then redefine a class again permuting the slots, etc. Of course people who've been following Factor for a while take this type of thing for granted, but let's take a step back and think about what happened here: we created an instance of a class, then proceeded to redefine the class, change its position in the inheritance hierarchy, and move slots up and down the hierarchy. The instance remained consistent with the current definition of the class at each step and Factor tried its best to preserve slot values when things were being moved around.

This is of course trivial in languages which use hashtables to represent objects but Factor uses a more efficient representation: objects really are just tuples, with slot values stored contiguously in memory. You get very fast slot access because of this, but the garbage collector and object system work together to give the illusion that these objects are a lot more malleable than they really are.

Cleave and spread combinators

The next thing I want to touch on is the fact that the tuple reshaping implementation (and indeed, the entire tuple class implementation in general) is a lot cleaner than the old code, primarily because I've been using the new "cleave" and "spread" family of combinators designed by Eduardo Cavazos. These combinators are in the kernel vocabulary now, and they're documented; enter "kernel" about in the UI listener to learn more. I won't repeat the documentation here, just give an overview.

These combinators help structure your data flow by grouping parts of computations into one-to-many, many-to-many and many-to-one designs. I've never seen a programming language make this explicit before and it seems to be a genuinely new idea now present in either applicative or stack-based languages. (More academically-minded folk might wish to ponder the implications of this approach when combined with linear type systems.)

The basic picture is that we have three groups of combinators:
  • The "cleave" family take m values and n quotations. They apply each quotation to all m values in turn.
  • The "spread" family take m*n values and n quotations. They apply each quotation to the corresponding group of m values in turn.
  • The "apply" family take m*n values and 1 quotation. They apply the quotation to each m-element group of values in turn.

The naming convention is simple. The suffix on the combinator name determines which family it belongs to. No suffix denotes a "cleave", a * suffix denotes a "spread", and a @ suffix denotes an "apply". If the combinator name does not start with a digit, m is 1. Otherwise, the first digit is the value of m. Finally, the rest of the name, except for the prefix and suffix, determines n. This name is either bi, for n=2 or tri, for n=3.

Here is the full list:
  • bi: 1 value, 2 quotations, quotation has arity 1
  • tri: 1 value, 3 quotations, quotation has arity 1
  • 2bi: 2 values, 2 quotations, quotation has arity 2
  • 2tri: 2 values, 3 quotations, quotation has arity 2
  • 3bi: 3 values, 2 quotations, quotation has arity 3
  • 3tri: 3 values, 3 quotations, quotation has arity 3
  • bi* - 2 values, 2 quotations, quotation has arity 1
  • tri* - 3 values, 3 quotations, quotation has arity 1
  • 2bi* - 4 values, 2 quotations, quotation has arity 2
  • bi@ - 2 values, 1 quotation, quotation has arity 1
  • tri@ - 3 values, 1 quotation, quotation has arity 1
  • 2bi@ - 4 values, 1 quotation, quotation has arity 2

Note that 2tri*, 3bi* and 3tri* do not exist: if they did, they would take 6, 6 and 9 input values, respectively, and if your code needs that many values on the stack, then you should either use sequences, assocs, tuples, or variables.

Finally, there is a generalized versions of bi and bi*: cleave and spread take a sequence of quotations.

Because the naming convention is simple and consistent, the names of these combinators are easier to remember than the names of the stack shuffles. More interestingly, I've found that after refactoring code to use the cleave combinators, the only shuffle words I really need are dup, drop, and very rarely, a swap.

For example, I recently converted our serialization library to use these combinators. The result is extremely straightforward: there is almost no stack manipulation, except for the occasional dup and drop, but it doesn't use named locals either. Thanks to the cleave combinators, all values just happen to be in the right place at the right time, and everything composes together very nicely, like lego bricks.


Factor has changed greatly since its inception. Recent language changes such as new slots and the addition of the cleave combinators, as well changes I'm working on right now, such as inheritance and multimethods, should be the last major changes before Factor 1.0.

Last year, Daniel Ehrenberg published a roadmap for Factor 1.0. It seems that we're on track at this stage. After the language features are done, I'm going to review the entire core library and incorporate new idioms and features where possible. To put this in context, I don't think the Java implementation is far from exemplary Java code, the SBCL source code far from exemplary Common Lisp code, and Squeak's core is far from exemplary Smalltalk code. I don't want Factor to follow this trend and release 1.0 with a huge, bloated, incomprehensible implementation: I want the code to be simple, clean, clear and consistent. With the new language features and the new idioms I've been learning along the way, this goal should be achievable; in fact the core is for the most part pretty well-designed, except for some dark corners here and there.

Because Factor is a new language, implementing Factor is more work than implementing, say, Scheme or Common Lisp: when you're working against a fixed spec, for a language that has had a wealth of programs written in it, representing decades of experience, there isn't so much need for experimentation, trying different approaches. Factor is different. Factor is a new language, and the job of implementing Factor isn't a race to a certain finish-line that we can see from the start, but rather a mind-expanding journey, full of learning opportunities, joy, frustrations, obstacles and wrong turns.

On the flip-side, I can change the language to suit the implementation; if a feature is ugly to implement and makes the system more fragile, then it is better not to include it, and I have that freedom.

Once inheritance is done, I'm going to return to working on the web framework, and hopefully you will see more practical blog posts from me again, instead of philosophical junk!


Anonymous said...

New combinators is simply "Wow!"

I'm currently developing DNS library, using latest stable Factor. Serialisation of complex tuples into message was pain in the ass because of hevvy stack shuffling. So I used someting like "[ call ] 2each" on arrays of quotations. With new combinators it should be much simplier.


Anonymous said...

PS Excuse me for bad english. It is not my primary language, and I had very little practice during last years.


Slava Pestov said...

Hi Max,

Your English is great. I'm very interested in your DNS library; you should go to the IRC channel or post to the mailing list.

zenhacker_rouan said...

Nice article Slava. Always good to hear where we are with Factor and what features are being worked on.