Tuesday, April 03, 2007

Evolving object shapes

In Factor, the fundamental class of user-defined objects is the class of "tuples". A tuple is essentially an array, except elements have names.

In previous Factor releases, redefining the order or number of slots in a tuple class would unintern the word corresponding to the tuple class, and define a new word. This had the effect of introducing a new class into the system; instances of the old class, with the old slot arrangement, and instances of the new class, with the new slot arrangement, were not interchangeable. Any source files defining methods on the tuple class had to be reloaded, to also define those methods on the new class. Old instances would often cause problems when they showed up in calculations. This was all so annoying that when I had to add or remove slots to tuples defined in the core, more often than not I would just bootstrap again to avoid any problems with the old phantom class and its instances hanging around.

This is not an acceptable state of affairs for a language like Factor, which aims to be interactive and robust. Today I implemented a new approach, essentially borrowed from Squeak. When slots are added to a tuple class, existing instances are updated in-place to have the new slot with a value of f; when slots are removed, existing instances are updated and the slot is removed.

The implementation is interesting. First, I added a new low-level become word, which is only intended to be used for the purpose of reshaping tuples, which takes two arrays. The first array is an array of old objects, the second is an array of new objects. This primitive installs forwarding pointers for every old object pointing to the corresponding new object, then runs a GC; this has the side effect of updating all pointers to the old objects to transparently point to the new objects!

On top of this I built the tuple reshaping code. When a tuple class is redefined, it uses the instances word to get all instances of the tuple class, reshapes each one, creating a new tuple in the process, then calls become to perform a bulk update.

Here is a transcript of this in action.

I define a new tuple class with one slot, just a person's name:
( scratchpad ) TUPLE: person name ;
Then I create an instance and inspect it:
( scratchpad ) "Slava" <person>
( scratchpad ) dup describe
an instance of the person class
person-name "Slava"
This instance will remain on the stack the whole time. Now watch as it literally morphs as the class is being redefined.

We add an 'address' slot:
( scratchpad ) TUPLE: person name address ;
*** Data GC (0 minor, 0 cards)
*** Data GC (0 minor, 0 cards)

Note that the reshaping process triggers two garbage collections, so it is not particularly efficient. Obviously you wouldn't do anything stupid like re-define a tuple class inside a tight loop with different slots. Let's take a look at the same old person object we created earlier, that is still on the stack:
( scratchpad ) dup describe
an instance of the person class
person-name "Slava"
person-address f
A new slot has appeared! Let's give it a value:
( scratchpad ) "10 foo st" over set-person-address
( scratchpad ) dup describe
an instance of the person class
person-name "Slava"
person-address "10 foo st"

Now let's add an 'id' slot:
( scratchpad ) TUPLE: person id name address ;
*** Data GC (0 minor, 0 cards)
*** Data GC (0 minor, 0 cards)

Note that in our person instance we have sitting around, all slots were shifted down by one to make room for the new slot:
( scratchpad ) dup describe
an instance of the person class
person-id f
person-name "Slava"
person-address "10 foo st"
We give it a value:
( scratchpad ) 123 over set-person-id
( scratchpad ) dup describe
an instance of the person class
person-id 123
person-name "Slava"
person-address "10 foo st"
Now we redefine the class, removing the 'address' slot, and changing the order of the other two slots:
( scratchpad ) TUPLE: person name id ;
*** Data GC (0 minor, 0 cards)
*** Data GC (0 minor, 0 cards)
Looks like everything worked! Keep in mind that this is the same object you're seeing here; it has morphed several times:
( scratchpad ) dup describe
an instance of the person class
person-name "Slava"
person-id 123
This type of feature is very powerful; by eliminating unnecessary restarts, the programmer can very rapidly prototype software, test changes and try new approaches to solving problems.

A commonly-cited benefit of "objects are hashtables" languages such as Ruby is that adding and removing instance variables doesn't disrupt your program's execution or render existing objects invalid. Classically, languages were objects are instead arrays with slots stored at fixed objects have suffered when it comes to on-the-fly object schema evolution. For example, I don't know if Java supports adding instance variables on the fly yet, even in special "debugging" and "hot swap" modes. In contrast, Factor gives you the best of both worlds. Looking up a slot value is an O(1) operation, but classes can still evolve in a very fluid manner. All it takes is some clever garbage collector trickery.

3 comments:

Anonymous said...

Java "hot debugging", or what they call it, can't adapt classes, it can only add methods, it seems. When I add any instance variables in Eclipse, the debugged program has to be restarted.

Nice new feature!

Anonymous said...

It is possible for "objects are hashtables" languages to provide "both worlds" as well. For example, Io uses hashtables for objects, but lookup is a O(1) operation thanks to cuckoo hashing.

Slava Pestov said...

I literally meant using plain hashtables for objects, since such a design is comparable in complexity to Factor's object system implementation.

Sure, with cuckoo hashing, or even better, Self/Slate-style "maps", one can have O(1) lookup but still be able to change object shapes, on a per-object basis, at run time. These implementation techniques are more complex than the approach Factor (and Squeak) take. For example, the Factor compiler is able to perform a very straightforward optimization: when it is able to infer types, reading a tuple's slot compiles to a single instruction. Self can do this too, but it takes a vastly more sophisticated compiler.