Christopher Diggins writes:
I am about to release a version of Cat which supports optional named parameters. The turning point in making this decision to add support for named parameters to a version of Cat occured when I tried to implement a simple quadratic equation with names.
He shows the following program for evaluating a quadratic polynomial
f(x)=ax+bx+c
, which takes the values
x, a, b, c
as input:
define quadratic_point_free
{
[dupd [sqr] dip swap [*] dip] dipd [* +] dip +
}
This is not good stack code.
I'm not writing this to bash Christopher, but I think he has not had much experience actually
programming code in a stack language. Implementing an interpreter for any language doesn't force one to practice the idioms and problem solving techniques employed when actually programming in the language; likewise, one doesn't need a deep understanding of the implementation of their language to use it. So I think Chris is committing the classical newbie mistake when learning a language -- trying to relate features of the language to what they already know, and are somewhat surprised to find that they don't map one to one. In Christopher's case, however, he has the freedom to change the language as he sees fit. Since I don't know Cat very well, I can't say if named parameters would be a good thing or not. However, I think a feature like this needs a more compelling use-case than encoding polynomial evaluations.
So why is this code in bad taste? First, there's no stack effect comment. It wasn't hard for me to see that the stack effect is
( x a b c -- f(x) )
, but it should be documented. However, the real problem is that this code operates at a level of abstraction which is too low.
Since he is a compiler developer, I imagine Christopher wrote the above code by manually "compiling" it in his head. This is the wrong way to approach a stack language. Instead, I have found two alternative approaches to be far more fruitful:
- Write your code to take advantage of stack language idioms, and combine existing words express your problem in a simple way. That is, implement a Domain Specific Language (DSL)!
- Write code which writes tedious code and performs this compilation process for you. That is, implement a DSL with more hair!
I'll show how to apply both approaches to solving this problem in Factor. In both cases we use Factor features not found in Cat, but they would be trivial to add. Cat, Joy and Factor are all very closely related, and share the same evaluation model and representation of code as quotations. Factor is compiled and it's compiler transforms are unique among concatenative languages, but a hypothetical compiled Joy or Cat could implement the same functionality with very little effort. Almost everything I write below also applies to low-level Forth.
Without further ado, I'll start with the first approach.
Just because the formula at hand has three additions and two multiplications, doesn't mean the symbols in the program must map one-to-one. The set of polynomials with real (or complex) coefficients forms a vector space. Polynomials are vectors. Polynomial evaluation is the dot product:
ax^2+bx+c = (a,b,c).(x^2,x,1)
Integers are sequences in Factor, and generating a sequence of the form
{ 1 x x^2 ... }
is easy by mapping over an integer with the
^
word:
: powers ( x n -- seq ) [ ^ ] map-with ;
As long as the coefficients of the quadratic polynomial are given as a single object on the stack, a sequence
{ c b a }
, we are done:
: quadratic ( c,b,a x -- f(x) ) 3 powers v. ;
However, we may as well generalize to evaluating a polynomial of any degree;
: poly-eval ( x coeff -- f(x) ) [ length powers ] keep v. ;
For example, to compute
f(x) = 1024x^3 + 115x^2 - 71x + 576
, you write
{ 576 -71 115 1024 } poly-eval
Certainly a lot simpler than the equivalent using the + and * words directly. And of course it will work with bignums, rationals, floats, and complex numbers. Also the polynomial is now a first-class entity in the language; even though it is just a sequence of numbers, it still stands that we have identified the essence of a polynomial. We are not threading the polynomial through the expression as a series of scalars. This means we can now pass the polynomial around as a single entity, and manipulate it with sequence words; we can add two polynomials together, multiply a polynomial by a scalar, juggle the coefficients around.
So to sum things up, I would consider the following a better implementation of Christopher's function:
: powers ( x n -- seq ) [ ^ ] map-with ;
: poly-eval ( x coeff -- f(x) ) [ length powers ] keep v. ;
This is useful as a general library routine; and it is very simple considering what it does. Also note that the way we approached the problem was different. We solved a general problem and the special case fell out for free. (In fact, the
libs/math
library already implements polynomial evaluation. This whole discussion is moot.)
Now, let us consider the second approach to problem solving: metaprogramming.
Back to Chris's function; here is a literal translation into Factor:
: quadratic-eval ( x a b c -- f(x) )
>r >r dupd >r sq r> swap >r * r> r> r> >r * + r> + ;
This looks like the output of an infix DSL compiler. It would be easy to implement an infix math DSL in Factor (easier than embedding Factor features into applicative languages!) and it would probably generate something almost identical to the above output, given input like
INFIX" a*x*x+b*x+c"
However I won't show you how to build an infix DSL. Instead, I'll solve a more limited problem; polynomial evaluation using
Horner's Rule. The result will be a
poly-eval
word called in the exact same manner:
{ 576 -71 115 1024 } poly-eval
But instead, a compile-time transform will generate efficient code which does not actually create or iterate over arrays, or call the
^
word at run time. Such an approach is overkill in most situations. However, if your program needs to evaluate a lot of polynomials, it can speed things up. It also shows that the Factor compiler is very easy to extend, because it is written in, and compiles, a stack language.
Since we will be defining a compile-time transform, we need to write a word which takes a sequence of coefficients, and generates a quotation -- the quotation receives the input on the stack and evaluates the polynomial at that input. This word will be recursive. Consider the base case first. We are evaluating a constant polynomial;
: poly-eval-base ( c -- quot )
[ nip ] curry ;
For example, the code to evaluate the constant polynomial
{ 3.5 }
is
3.5 nip
; keep in mind the top of the stack is the value of
x
(which is disregarded by constant functions).
Now if we have a way of computing some polynomial g(x), we write a word which computes x*g(x) + c:
: poly-eval-step ( quot c -- newquot )
[ >r dupd call * r> + ] curry curry ;
The recursion puts the two together:
: poly-eval-quot ( seq -- quot )
dup length 1 = [
first poly-eval-base
] [
unclip >r poly-eval-quot r> poly-eval-step
] if ;
The above three words primarily take sequences apart and create new ones. The recursion is really a left fold; a left fold combinator could be added to the Factor library, but I haven't needed it yet.
We can test this code first:
3 { 1 2 2 } poly-eval-quot call .
25
In fact, it is true that
25 = 1 + 2*3 + 2*3*3
. As it stands, this new code generating implementation is not any more efficient. Instead of creating an intermediate array of exponents, we create an intermediate quotation of code. The trick is that if the sequence of coefficients is known at compile time, we can generate the quotation at compile time too:
: poly-eval poly-eval-quot call ;
\ poly-eval 1 [ poly-eval-quot ] define-transform
Consider the following word:
: foo { 11.4 23.7 78.22 99.6 -174.67 } poly-eval ;
We can define a specializer for it. This tells the compiler that while the word can take any type of object as input, a certain type is anticipated to happen more than most others; in our case, we optimize for the case where the input is a floating point value:
\ foo { float } "specializer" set-word-prop
Now, take a look at the PowerPC disassembly of the case where the input is indeed a floating point value:
0x059ea500: lwz r11,0(r14)
0x059ea504: lfd f0,3(r11)
0x059ea508: lis r11,1438
0x059ea50c: ori r11,r11,42632
0x059ea510: lwz r11,0(r11)
0x059ea514: lfd f1,3(r11)
0x059ea518: fmul f0,f0,f1
0x059ea51c: lis r11,1438
0x059ea520: ori r11,r11,42648
0x059ea524: lwz r11,0(r11)
0x059ea528: lfd f1,3(r11)
0x059ea52c: fadd f0,f0,f1
0x059ea530: lwz r11,0(r14)
0x059ea534: lfd f1,3(r11)
0x059ea538: fmul f1,f1,f0
0x059ea53c: lis r11,1438
0x059ea540: ori r11,r11,42644
0x059ea544: lwz r11,0(r11)
0x059ea548: lfd f0,3(r11)
0x059ea54c: fadd f1,f1,f0
0x059ea550: lwz r11,0(r14)
0x059ea554: lfd f0,3(r11)
0x059ea558: fmul f0,f0,f1
0x059ea55c: lis r11,1438
0x059ea560: ori r11,r11,42640
0x059ea564: lwz r11,0(r11)
0x059ea568: lfd f1,3(r11)
0x059ea56c: fadd f0,f0,f1
0x059ea570: lwz r11,0(r14)
0x059ea574: lfd f1,3(r11)
0x059ea578: fmul f1,f1,f0
0x059ea57c: lis r11,1438
0x059ea580: ori r11,r11,42636
0x059ea584: lwz r11,0(r11)
0x059ea588: lfd f0,3(r11)
0x059ea58c: fadd f1,f1,f0
0x059ea590: lis r12,1
0x059ea594: ori r12,r12,39104
0x059ea598: lwz r12,0(r12)
0x059ea59c: lwz r11,4(r12)
0x059ea5a0: addi r11,r11,16
0x059ea5a4: stw r11,4(r12)
0x059ea5a8: addi r11,r11,-16
0x059ea5ac: li r12,43
0x059ea5b0: stw r12,0(r11)
0x059ea5b4: stfd f1,8(r11)
0x059ea5b8: ori r12,r11,5
0x059ea5bc: stw r12,0(r14)
0x059ea5c0: lis r12,1
0x059ea5c4: ori r12,r12,4492
0x059ea5c8: mtlr r12
0x059ea5cc: blrl
The generated assembly will look like gibberish to people who are not familiar with the internals of the Factor compiler (anyone except for me?), so you'll have to take my word for this, but in fact we have the following:
- The compiler doesn't do a good job of eliminating redundant loads from memory, and the above code could be a lot better.
- But...
- There is no recursion, sequence iteration, or sequence allocation in the above assembly.
- The only place where a floating point value is boxed on the heap is at the end, after the final value has been computed. The allocation is done with customized inline assembly.
- There are no subroutine calls until the end, where the GC check is called.
- The input value is unboxed (too many times) and the constant coefficient values are unnecessarily unboxed, because the compiled code refers to boxed float objects not unboxed float values. This could be improved. However unboxing a float is inlined, and only takes two instructions.
- All this is despite the fact that the code was generated from the following high-level description of the problem:
{ 11.4 23.7 78.22 99.6 -174.67 } poly-eval
We can compare the performance of the two implementations using the following benchmark:
:my-poly { 11.4 23.7 78.22 99.6 -174.67 } poly-eval ;
\ my-poly { float } "specializer" set-word-prop
: benchmark 1000000 [ >float my-poly drop ] each ;
It evaluates a fixed polynomial on every integer between 0 and 1000000, discarding the results. All arithmetic is done in floating point. Here are the results:
Implementation | Run time (ms) |
---|
Naive | 2713 |
Compile-time transform | 102 |
This is a 27x speed improvement!
Here is the grand finale. Suppose the polynomial coefficients are not fixed at compile time, but compile from some input file which is loaded by the user while the program runs. The user might not be aware there is such a thing as a compiler. Well, in Factor, compile-time is any-time. At the time the file is loaded, you can generate the evaluation quotations and compile them just by calling
compile-quot
. This technique is widespread among high-level languages. There are documented cases where Common Lisp code can beat optimized C by using information only available at run-time to optimize code. In stack languages, higher-order functions and code generation come even more naturally than in Lisp.
Nothing we did called for named parameters. Instead, one has to learn to "context switch" into the stack language mindset. It is a very different way of looking at the world.