Sunday, August 26, 2007

When to use locals

Some people have wondered why I implemented the locals library, and if in fact I'm turning my back on stack-based programming and "selling out". In fact this is not the case.

First of all, Factor has already had named variables in the core for quite some time; these are found in the namespaces vocabulary, which implements very simple dynamically-scoped namespaces. There are many cases where dynamic scope is useful, and languages which lack it often end up with a lot of code which passes parameters down long function call chains, with only the leaf functions actually using the parameters. Examples of dynamic scope usage in Factor include the current stream, and various idioms for sequence and association construction. Here dynamic scope is the right thing to use, and it makes total sense, and it cleans up code when certain values are not passed around on the stack.

In a handful of places, we also used dynamic variables as temporaries in calculations. I think once the right abstraction is found, stack code is almost always simpler and cleaner than the equivalent applicative code using named temporaries. But sometimes you have a math formula which operates on 10 values at once, or something similar, and it simply makes more sense to name these intermediate values. In these cases, we used dynamic variables. There are a handful of examples of this so far, such as the MD5 algorithm, and a couple of routines in the calendar library.

However when one uses dynamic variables for temporaries, the dynamic scope semantics no longer help and in fact get in the way; anybody who has used a Lisp with all-dynamic scope can testify. So while most values which would be lexically-scoped in a Lisp program are passed on the stack in Factor, sometimes you still want to name temporaries, and named locals with lexical scope are useful here, because they allow you to create closures and safely pass them around without running into the "funarg problem" which had the Lispers stumped through the 60's.

I never opposed having named variables in Factor, because if I did, they wouldn't be there from day one. Until recently, I didn't feel the need for lexical scope much, though, and when asked about it, my standard reply would be that the stack is Factor's equivalent of lexical scope, and that if somebody coded it up, I'd be happy to add it as an extra library.

Eduardo eventually implemented somewhat inefficient lexical scope in his extra/rewrite-closures library, which still exists because unlike extra/locals, it has first-class environments. I decided to code extra/locals because I wanted to see if I could make locals as efficient as stack values, and the experiment was a success.

So to recap, Factor now has three main methods of passing values between words, and storing temporary values within words:
  • The stack. Factor is and always will be a stack-based language; the whole point of the project is to explore stack-based programming. I encourage beginners to learn to think in terms of the stack before jumping into other abstractions, and for most code, the stack works really well. It is really interesting that in a stack language, function composition is expressed as concatenation, and compared to Lisp, Factor code has a lot less nesting and arbitrary introduction of temporaries. Especially now that Factor supports efficient partial function application, quotations can have internal state, and this brings much of the benefits of closures to the world of the stack.
  • Dynamically scoped variables. These are useful for a set of issues totally orthogonal to temporaries: they're used to give words context-specific behavior, and when an algorithm spanning multiple words needs to operate on a common set of values.
  • Lexically scoped variables. These are used to name temporary values inside calculations which otherwise would be hard to express with the stack. Hopefully this set will reduce over time, but for now, it includes any kind of hairy mathematical operation such as found in cryptography, numerical methods, and so on.

Having more abstractions to choose from when writing code is good, and I think a stack language can express the widest range of them in the most natural possible way.

2 comments:

Dan said...

Slava,

Why is it necessary to mark which locals are assigned? I looked at the code in factorcode.org/repos/Factor/extra/locals, but I'm still learning to read stack code, so I only sort of understand the code. Nevertheless, this is very interesting stuff.

Slava Pestov said...

It is necessary because I was too lazy to have the code infer which locals are written to. It could be changed to be implicit.

Some people might say explicit declaration of mutable locals is a feature though, since writing to locals in a closure can have funny semantics, and it helps to know what's going on.