Saturday, June 07, 2008

Syntax proposal for multi-methods

At the end of March, I outlined the future direction for Factor. In that list of major planned features, two are already done: inheritance and singletons. The new tuple syntax is relatively minor (except for updating existing code), so that leaves multi-methods. I really want to get multi-methods done by the end of June, or sometime in July perhaps, and then focus on the compiler, documentation, bug fixes, and libraries. I'm going to draw a line in the sand soon; no more language changes until 1.0.

While I've implemented a rough draft of multi-methods already, I'm not happy with the syntax. When I revisit multi-methods, which I will soon, I will implement a radically different syntax that Eduardo and I came up with today.

Code will have to be updated for multi-methods anyway; I didn't want to push out a lackluster syntax and have to change it later. I think now I've come up with a very good solution. There are still some minor details to work out, but I'm confident that they're solvable.

First, a bit of background. Here is a word definition in Factor:
: v- ( seq n -- seq' ) [ - ] curry map ;

Here is a generic word definition with the single-dispatch core generics:
GENERIC: present ( obj -- str )

M: string present ;

M: real present number>string ;

M: word present word-name ;

M: object present drop "an object" ;

Here is the equivalent with extra/multi-methods:
USE: multi-methods

GENERIC: present ( obj -- str )

METHOD: present { string } ;

METHOD: present { real } number>string ;

METHOD: present { word ] word-name ;

Here is the syntax we thought of and will most likely implement soon:
: present ( |string -- string ) ;

: present ( |real -- string ) number>string ;

: present ( |word -- string ) word-name ;

Whoa! Those look like ordinary colon definitions! Well, here's the kicker. All words would be generic (and the distinct notion of a "generic word" won't exist; it will just be the default and only type of word). Indeed, any entries in the stack effect which contain | are specializers, with the text after | being the class name to specialize on.

Right now, redefining the same word or method multiple times is an error. This won't change. The following will be invalid:
: write-content ( text -- ) ... ;

: write-content ( text -- ) ... ;

The following will be OK though:
: write-content ( text|string -- ) ... ;

: write-content ( text|xml -- ) ... ;

If you specify a method with no specializers, it is just the default method:
: write-content ( text -- ) ... ;

And if you don't care to make a word generic, you just use : ... ; as before; a generic word with one default method behaves exactly like a normal word.

All stack effects of methods will need to be 'congruent'; the following is not OK:
: combine ( s1|sequence s2|sequence -- result ) ... ;

: combine ( s1|sequence s2|sequence s3|sequence -- result ) ... ;

Now a big selling point of generic words is that you can put methods on generic words in other vocabularies; otherwise, they'd really be no different from pattern matching in ML. Right now, : always defines a word in the same vocabulary. But this will change, and you will be able to refer to qualified word names:
TUPLE: my-seq ... ;

: sequences#length ( s|my-seq -- len ) ... ;

: sequences#nth ( i|integer s|my-seq -- value ) ... ;

Right now, we already have a qualified library for disambiguating the search path with explicit vocabulary names. You first explicitly request qualified access to a vocabulary,
USE: qualified
QUALIFIED: sequences

Then proceed to refer to qualified names:
3 "hello world" sequences:nth

Well, I don't like the use of : there; it was hastily thought out in my opinion (no offence, Dan :-) because it clashes with our usage of : as a word definition operator. So instead I propose #; it has precedent in HTML (anchors in URLs) and Ocaml (method references).

Whereas with :, you will be able to define new words in the current vocabulary, and add methods to words in the current vocabulary, when referring to a word in another vocabulary you will only be allowed to refer to words that already exist. This is to catch mistakes like the following:
: seqences#lgenth ( i|integer s|my-seq -- value ) ... ;

We don't want people to pollute existing vocabularies with their typos, and we want a parse-time error if a generic word in another vocabulary is renamed. Within the same vocabulary, you will be able to make a mistake like the following, which you cannot do right now:
: sheeple ( s|integer -- ) ... ;

: shepele ( s|string -- ) ... ;

Creating two words instead of one word with two methods. The parser won't be able to catch this mistake anymore, because whereas before M: would complain if the generic word didn't exist, : creates the word if necessary. However, if qualified word creation is prohibited, this potential problem is limited in scope to one vocabulary and good unit testing will catch it.

Sometimes you want to define a generic word with no methods; the methods, and the concrete data types they specialize on, are defined in other vocabularies. Right now this is easy;
GENERIC: foo

How would you do this in the new system you ask? Well, since we're merging M: with :, we should merge GENERIC: with its closest non-generic equivalent, which happens to be DEFER:. The DEFER: word basically means, "create a word with this name, so that I can refer to it, but don't give it a definition; I'll supply the definition later". DEFER: is rarely used right now, its mostly just useful for mutually recursive words, which don't come up often. If all words become generic, then DEFER: basically means "give me a generic word, but don't put any methods on it yet". Neat, huh?

Another problem this solves is composable word definition types. Right now we have : for normal words, GENERIC: for generic words and M: for methods. In extra/, there is :: for locals, MEMO: for memoized words, and MACRO: for macros. But what if you want a macro with named parameters and locals? Or a memoized generic? Well, MACRO:: happens to exist, but not every combination is there, and in general there is a combinatorial explosion with the current system. We will still have ::, but it will subsume the old :: as well as M::. Furthermore, we won't need local variants of MEMO: and MACRO: at all. Instead, MEMO: foo ( a b -- c ) will be like DEFER:; it will create a generic word with no methods, but with the additional perk that it memoizes on its inputs.

This removes the double-up of defining words for variants with named parameters; we don't need MEMO::, MACRO::, M::, etc.

So it ends up looking a bit more verbose. Current Factor:
MEMO: fib ( n -- result )
dup 1 > [ [ 1 - fib ] [ 2 - fib ] bi + ] when ;

Proposed definition syntax:
MEMO: fib ( n -- result )

: fib ( n -- result )
dup 1 > [ [ 1 - fib ] [ 2 - fib ] bi + ] when ;

However, macros and memoized words don't come up that frequently, and this not only simplifies the existing implementation by removing defining words, it opens up new possibilities: generic macros, generic memoized words, a generic memoizing word where some methods use named parameters, etc. Cool stuff.

There's a new idiom that this enables here, and that is a word which conceptually isn't generic (even though all words are, you don't actually want to override every word necessarily), it simply type checks its arguments. For example, right now I have a create-method word which does this:
TUPLE: check-method class generic ;

: check-method ( class generic -- class generic )
over class? over generic? and [
\ check-method boa throw
] unless ; inline

: <method> ( class generic -- method )
check-method
[ method-word-props ] 2keep
method-word-name f <word>
[ set-word-props ] keep ;

That's pretty disgusting. Here is the new style:
: <method> ( |class |generic -- method )
[ method-word-props ] 2keep
method-word-name f <word>
[ set-word-props ] keep ;

Scrap the boilerplate!

This formalizes a pattern that's been coming up frequently, and also allows the compiler to perform better optimizations.

I also plan on allowing the type of the return values to be annotated. While this won't be used for dispatching (since its impossible unless you predict the future), the compiler will insert runtime checks to ensure that the returned values are indeed of the asserted type (optimizing them out if necessary). Furthermore, the compiler can assume the return type at call sites, which will enable further optimizations without inlining.

Finally, parts of the documentation can be auto-generated. Here is the help for the append word:
HELP: append
{ $values { "seq1" sequence } { "seq2" sequence } { "newseq" sequence } }
{ $description "Outputs a new sequence of the same type as " { $snippet "seq1" }
" consisting of the elements of " { $snippet "seq1" } " followed by " { $snippet
"seq2" } "." }
{ $errors "Throws an error if " { $snippet "seq2" } " contains elements not
permitted in sequences of the same class as " { $snippet "seq1" } "." } ;

If instead it were defined as
: append ( seq1|sequence seq2|sequence -- newseq|sequence )

We wouldn't need the $values element in the help at all in this case at least.

It would also make it easier to write generic code. You'd still have to think of good protocols to expose, and stuff like that; there's no silver bullet to make code automagically generic and reusable; but you'd pay less of a penalty for making code more flexible. Right now, the basic operations on sequences, such as nth and length are generic, but more complex ones, such as push, append, and so on, are not. This sucks if someone wants to implement a custom sequence type; a rope, for example; where append can be implemented more efficiently than the current implementation. While right now I could make append generic (along with all other sequence operations) in the future I won't need to. Instead you'll just be able to override any sequence operation, within reason. So this,
GENERIC: append ( seq1 seq2 -- newseq )

M: object append ... default implementation ... ;

Will collapse down to:
: append ( seq1 seq2 -- newseq ) ... default implementation ... ;


A final point to mention is hooks. Right now, core generics support a type of generic word known as a "hook". A hook is a generic word which dispatches on the value of a dynamically-scoped variable rather than a stack position. The multi-methods vocabulary generalizes this so that method definitions can be declared to dispatch on arbitrary variables. Well, when any word is able to be generic, you will be able to override any word with a new behavior that takes effect if a given variable has a certain type! This is insanely powerful and ripe for abuse, so the documentation will make it clear that this should only be done for words which are documented to have a contract that is compatible with hook overriding. In addition to being used where hooks are used now (I/O and UI backends, stuff like that) I believe this feature will be useful when unit testing code. If I'm working on a vocabulary, I don't need to explicitly make protocols for the sole purpose of mocking out some objects. I still need to write code in a style that's amenable to unit testing, but doing so becomes easier if the unit tests can specialize a word on a dynamic variable!

As far as efficiency goes, I don't anticipate any slowdowns there. The two generic word systems we have now already optimize the case where you have a generic word with a sole method on object. So the following two definitions compile to the exact same machine code:
! Version 1
: foo ... ;

! Version 2
GENERIC: foo

M: object foo ... ;

You only pay a price if you begin to specialize on different types. In addition, multiple dispatch will replace instances of hand-coded double dispatch; this will yield a performance boost. Finally, being able to declare input types on some words will not only be great for documentation purposes, it will allow the compiler's dispatch elimination machinery to kick it up a notch. Right now it only works within a single word, so the optimizer first has to inline a lot before it can pay off. This increases compile time. If it can optimize types across word boundaries, a whole new set of program analysis techniques are possible.

Let's put this another way. The following parsing words will be eliminated:
  • M:
  • GENERIC:
  • MEMO::
  • MACRO::
  • M::
  • GENERIC#
  • HOOK:

In return, you gain a huge amount of expressive power: multiple dispatch. Wow!

My main priority right now is still the web framework. One the new wiki, pastebin and blog are in good shape I will blog about them in detail and deploy them on factorcode.org. After things settle down with the new web site, I will return to language work and implement the above!

No comments: