Tuesday, April 08, 2008

Multi-methods and hooks

For a while now Factor has had 'hooks', which are generic words dispatching on a dynamically scoped variable. Hooks can be used with variables which are essentially global: the current OS, current CPU, UI backend, etc -- or variables which are truly context-specific, such as the current database connection.

I added support for hooks to the extra/multi-methods library, which is going in the core soon. While doing this I was able to significantly generalize the concept. Suppose we want to define a piece of functionality which depends on both the operating system and processor type.

We can begin with defining an ordinary generic word:
GENERIC: cache-size ( -- l1 l2 )

Notice that it is defined as taking no inputs from the stack.

I don't really know the APIs involved here, but suppose that Linux gives us a way to get this info that works across all CPUs. So we define a method specializing on the os dynamic variable (which is really treated like global):
METHOD: cache-size { { os linux } } ... read something from /proc ... ;

Now suppose on Mac OS X we use different APIs per CPU:
METHOD: cache-size { { os macosx } { cpu ppc } } ... ;

METHOD: cache-size { { os macosx } { cpu x86.32 } } ... ;

METHOD: cache-size { { os macosx } { cpu x86.64 } } ... ;

But perhaps on ARM CPUs, we just use an instruction to read the cache size without any OS-specific calls at all:
METHOD: cache-size { { cpu arm } } ... ;

Now in this case, you have an issue where if you're on Linux and ARM, the method that ends up being called depends on the order in which they were defined. If you wanted to explicitly resolve this ambiguity, you would define a new method on { { os linux } { cpu arm } }; because it is more specific than the other two, it is always called first.

The powerful thing about this new implementation of hooks is that not only can you dispatch on multiple variables, but you can add methods to any old generic which dispatches on a variable and the original designer of the generic does not have to explicitly take this into account.

For example, Factor's compiler currently has a large number of hooks dispatching on the CPU type (as an aside, Phil Dawes wrote an excellent introduction to the Factor compiler recently). If those hooks need to be further refined by OS, as is often the case with FFI-related components, the method implementation on the hook needs to perform its own dispatch; this is the "double dispatch" pattern and design patterns are something to be avoided if one wants to write quality code. When multi-methods go in the core, the compiler will simply define a series of generic words taking no inputs from the stack, and each method will specialize on the CPU, and maybe an OS too.

Another new capability is dispatching off stack values and variables in the same method. Among other things, this will be useful in eliminating a case of double dispatch in the core right now, where the <client> word for opening a client socket has to dispatch off the address type on the stack, and then call another hook which dispatches on the I/O backend stored in a variable. This can be combined into a single generic word where some methods dispatch on stack values, and others dispatch on the I/O backend variable.

The other nice thing about this is that the multi-method GENERIC: word unifies and generalizes four words in the core, GENERIC:, GENERIC# for dispatching on a value further down on the stack, HOOK: for dispatching on a variable, and MATH: which performs double dispatch on numbers only.

One of my goals for Factor 1.0 is to get the object system "right", and with the new-style slot accessors, inheritance, and singletons, we're almost there. All that remains to be done is to merge the multi-methods code. The code is still not quite ready to go in the core, though. The only feature that single dispatch has and multiple-dispatch lacks is call-next-method, which is easy to implement. A bigger hurdle to clear is performance; right now multi-methods are implemented in a naive way, where the dispatch time is O(mn) with m methods and n stack positions and variables to dispatch on. This can be improved significantly and I will find time reading the literature on optimizing method dispatch over the next few weeks.

3 comments:

Anonymous said...

"right now multi-methods are implemented in a native way", do you mean naive?

Slava Pestov said...

By 'naive', I mean the effective method is determined by a linear search.

Anonymous said...

Yea, so "naTive" was a typo. :)