Factor: a practical stack language: December 2005

Friday, December 30, 2005

Wednesday, December 28, 2005

Help system preview

Here is a preview of the new help system:

Right now I'm in the process of adding documentation for all built-in words, using the handbook and documentation comments as much as possible.
I've decided that eventually, I want all documentation strings out of the source code and into help markup. Eventually the structure editor will be introduced, along with a graphical editor for help markup.

Sunday, December 25, 2005

"Programming language paradigms" are meaningless

Labels such as "object oriented language" and "functional language" (and "logic language", etc) have so many conflicting accepted interpretations that they are almost totally devoid of meaning.

In the programming mainstream there is a weasel word when talking about paradigms, and that is "purity". To illustrate: Smalltalk code uses block closures extensively, perhaps more frequently than Common Lisp. Common Lisp on the other hand has object orientation facilities more advanced than Smalltalk; multiple dispatch and multiple inheritance are supported. Yet, mainstream programmers hold the belief that Common Lisp is "functional", while at the same time you'd be hard-pressed to find a programmer who would deny that Smalltalk is "object oriented". Of course, one can say that both Smalltalk and CL are "hybrid" OOP/functional languages, and are not "pure".

So what is a "pure" OOP language then? Perhaps Java, C#, or C++? Of course this would be yet another, totally arbitrary re-interpretation of the term "OOP"; and we would find that such a paradigm was an evolutionary dead-end, lacking expressive power needed to solve many problems elegantly. Along similar lines, what the mainstream refers to as "pure" functional language, namely those without mutable state, are also insufficient -- although the drawbacks of statelessness are overblown, there are definite drawbacks, such as the difficulty of implementing an interactive development environment in such a language.

Languages which confine themselves to one paradigm (or the mainstream definition thereof) also suffer in terms of implementation elegance, and invariably must step outside the language to describe themselves. Java cannot express its own control structures; purely functional languages have a hard time modeling low-level system-building constructs such as dynamic compilers and code replacement. So taking a mainstream concept of a "paradigm" is certainly not the path to follow for a language implementor.

There is really only one context where such labels make sense, and that is if we look at the history of programming languages. Certain ideas and the continuation of threads of research that gave us Simula, Smalltalk, C++, CLOS, Self, StrongTalk and Java could be described as "the object oriented movement", even if the languages in the list are as varied as a list consisting of ML, Prolog, SQL and FORTH. Certainly any language being developed today, unless it manages to be a total clone of a predecessor, does not belong to the part of history referred to by any existing programming paradigm in common use.

New language implementations (the interesting ones, anyway) instead aspire to implement the designer's vision of what good programming is; not a pre-existing set of ideas. I'm always looking around at new language projects, and there are many interesting ones out there. However when I see something with C-derived syntax, I groan. It probably means the semantics don't offer anything new either.

Factor runs on AMD64

The compiler backend is pretty much done; everything except for the C library interface works. A bootstrapped AMD64 factor.image is 5.4 Mb. This is much larger than the x86 image (3.3 Mb) or the PowerPC image (3.7 Mb).

Finishing the AMD64 port took a while, for several reasons. I took a break to work on the help system, and also I spent some time doing several compiler refactorings which were not required to get the backend operational, but were good for the longer-term health of the code. Whenever I revisit old code, I invariably end up refactoring or in many cases rewriting it. I hope that Factor's implementation gets tighter and cleaner over time, and not the other way around like with most code bases.

Implementing the C library interface should be a trivial but technical task; its only a few lines of code to write, but testing and debugging is always a pain, and gdb is no use with low-level issues. Once this is done I will return to working on the help system. Finishing the help system is not a high priority for 0.80, however I do need to implement enough to make the tutorial usable again, so expect a 0.80 release around the new year.

Thursday, December 22, 2005

Approaching functional programming from a procedural mindset

I found an amusing article about allegedly redundant methods in Ruby's Array class that demonstrates the difficulty some people have when first trying to understand functional programming. From reading the author's weblog, it seems his main (only?) programming experience comes from Java. Being accustomed to Java's collections and arrays, but he does not understand the value of a powerful collections library like Ruby's precisely because he is accustomed to writing explicit loops and iteration patterns, rather than combining bulk collection operators in abstract ways. While I don't know much about Ruby, I instantly recognized many of the methods he deemed "redundant" as standard functional programming idioms that I take for granted in Factor, and couldn't live without. They are the utility glue for reshaping data between invocations of higher-order functions. So while I'm not a Ruby expert, most of the "redundant" methods have direct analogues in Factor, and I found enough compelling usage of them to warrant attention.

So lets go through his list; I've skipped some of the entries because I do not know Ruby well enough to make an informed comment:

Array.new(size=0, obj=nil)

This method creates an array that has size copies of the object. Why? Have you ever needed to create an array/list that begins its life containing more than one copy of the same object? I don't think I have. I can imagine a few uses cases, especially if Ruby doesn't distinguish primitive types from objects like Java does; but I don't think they're nearly common enough to justify explicit support in the API. On the rare occasions when you need this, there's a fill method you can call after constructing the list.

Actually creating a repeated sequence of the same element is very useful in practice. Here are some uses of repetitions in the Factor library:

Creating mathematical zero vectors. Factor's UI library uses vector math extensively to avoid loops, simplify layout management code, and avoid duplicating code that performs the same logic, except with horizontal or vertical orientation being a parameter (consider split panes, scroll bars, and so on).
The prettyprinter uses repetitions to indent source code in a natural way.
The type inference code in the compiler uses repetitions to avoid code duplication.

array.each_index {|index| block } -> array

"Same as Array#each, but passes the index of the element instead of the element itself." Am I missing something, or is this just a for loop across the array indexes? Why not just use a for loop, or whatever Ruby's equivalent is?

If you want to argue that any concept is redundant, it is the "for" loop. Higher-order functions can express iteration in a much nicer way. Both iteration over elements and iteration over indices is useful: in Factor, integers are sequences of predecessors, so you can iterate over elements with [ ... ] each and iterate over indices with length [ ... ] each

array.at(index) -> obj or nil

As near as I can figure, this is exactly the same as using array brackets except it uses a method call. If there is a difference, it's pretty subtle. DRY, anyone?
My wild guess is that the method exists because of Ruby's superior reflection support. You can get a method object corresponding to 'at' at runtime, pass it somewhere, and invoke it. Try getting a method object for the '[]' or '+' operator in Java for reflective use. You can't. Here is an example where this causes major pain.

array.fetch(index) -> obj

This one is mostly the same as array brackets, but there is one difference. It will throw an IndexErr rather than returning nil if you go outside the range. But will anyone remember that difference and which does which when reading code? Sometimes when presented with different possible options, it's best just to pick one. Don't provide every possible option in an API.

Both bounds-checking and silently-failing sequence access is useful. Factor has both in the form of the nth and ?nth words. Neither word is used very frequently, because Factor encapsulates most collection operations in generic combinators; however having both words make code clearer and shorter than one alone.

array.pop and array.push(obj, ... )

Someone pushed a stack into the list. Didn't we learn from that mistake in Java? (OK. push is really just append, but still, pop? And if you have pop why do you need last? Does anyone here remember DRY?)

Why clutter the code with a stack class when you can use an array? Factor has push, peek, pop and pop* words for working with arrays-that-are-stacks (pop* removes the last element without returning it). If I remove pop just because peek already exists, then I'll just have to change all the usages of pop to duplicate the body of the definition of pop; and if somebody wants to argue that code duplication is not that bad, they might as well stop reading here.

array.rassoc(key)

Is it a list? Is it a stack? Is it a map? It's three, three, three data strcutures in one!

Yes, and there's nothing wrong with that. Association lists, while they suffer from linear lookup time, are occasionally useful where you want to have a mapping together with a certain intrinsic order. For example, Factor's generic word code takes the hashtable of methods defined on a generic, converts this to an association list, then sorts the association list according to the partial ordering of classes. Finally, the sorted association list is used to generate code.

array.transpose --> an_array

And now it's a matrix! What's next? a B-tree?
It might surprise the author to learn that mathematically, a matrix is a two-dimensional array. In Factor, you can perform mathematical operations on vectors, such as vector addition, dot product, and so on. The contributed math library defines even more: vector projection, matrix addition and multiplication, polynomials, quaternions, and so on. And everything uses an arrays of numbers, because mathematically that's what these structures are. Polynomials in one variable form a vector space over the complex numbers; since multiplying a vector by a scalar is the same concept as multiplying a polynomial by a scalar, why implement it twice? Ditto for matrices.
With the math rant over, I can list additional non-mathematical usages of a matrix transpose in the Factor library:

The hash>alist word, which converts a hashtable to the oft-lamented association list, works by extracting an array of keys from the hash, then extracting an array of values, then it creates a two-element array from these, and finally, it performs a matrix transpose to get an array of key/value pairs:
```
: hash>alist ( hash -- assoc ) 
    dup hash-keys swap hash-values 2array flip ;
```
This approach to algorithm design requires some practice, but the rewards include more elegant code, and a clearer mental framework for thinking about bulk operations with collections.
The inspector uses a similar technique. It builds up a table of strings represented as an array of columns, pads the strings in each column to the same width, transposes the matrix so that now it is an array of rows, and outputs each row.
I found four usages of flip in the compiler, all dealing with technical aspects such as value unification and dealing with recursive blocks.
The layout manager code in Factor's GUI toolkit does something similar to the inspector's padding of text columns; the frame layout uses transposition and reduction to ensure that each gadget is padded with respect to its row and column. I don't even want to think about what this code looks like when written in a procedural style.

array.reverse and array.reverse!

Have you ever needed to reverse a list outside of a job interview? Iterate through the list in reverse order, yes; reverse the list, no.
This is just silly. Factor reverses sequences, and not just for iteration, in a number of places:

In the parser. Syntax tokens are consed onto the head of the current node of the parse tree; when parsing of the node is finished it gets reversed. This is a very, very common idiom in Lisp; Factor only uses it occasionally.
The C library interface reverses an array of register numbers to handle differing low-level assembly calling conventions.
The library includes a functional queue with O(1) amortized enqueue/dequeue operations. Of course, the implementation of this involves a reverse.
The routine to convert an integer to a string repeatedly divides the integer by the numerical base, adding digits to a string; the string is reversed at the end, then returned.
The routine to convert an integer to a big-endian byte string is simply defined as follows:
```
: >be ( x n -- string ) >le reverse ;
```
This is an exceptionally elegant implementation: it converts it to little-endian, then reverses it.

Reverse iteration in Factor is implemented by first creating a virtual reversal sequence using reverse-slice, and then applying one of the standard combinators such as each, map, and so on. I'm not sure if Ruby is the same, but in either case, reverse iteration can be elegantly implemented by reversing your sequence first; no need to implement all your combinators twice, with forward and backward direction.

Do you really need any of this getting in your way? Possibly you need one or two of these methods; but how often, and how many? These methods are all way beyond the 80/20 point. I promised today I was going to show you all the methods you could take out of the Array class and still have a humane, indeed a more humane interface. I'm afraid I haven't fulfilled my promise. I got tired before I got halfway through the class and started skipping some cases. Try reading the entire documentation yourself from start to finish, and you'll see what I mean. It's just too damn big. A smaller, simpler API would be more humane.
As you can see, most of the author's complaints stem from a lack of understanding of functional programming style. In a language without higher order functions, the programmer writes loop after loop after loop, taking care to set up indices, always pulling elements out of collections and storing them back one by one. The result is error-prone, and full of duplication of effort. The duplication goes unnoticed because there is so much boilerplate anyway, that attempting good programming practice makes little difference. However, one should not let their past experience with such broken tools cloud their judgement when it comes to evaluating something new they do not fully grasp yet.

Monday, December 19, 2005

Markup language for online help system

My implementation of a markup language, plus rendering code for the UI, is now in a minimally usable state. Here is some sample output, showing a non-sensical help article being rendered:

Here is what the markup language looks like.

Next up for the help system is updating the HTML stream for the extended stream output protocol, and implementing the help article database. However, I'm going to put the help system aside for now, and finish the AMD64 port.

Saturday, December 17, 2005

Thursday, December 15, 2005

Finished my degree; Factor status update

I have finally completed all the courses required to obtain a BSc Math. Next January, I begin graduate studies in mathematics (more specifically, algebraic topology, or even more specifically, axiomatic homotopy theory via model categories). I'm still working part-time doing Objective C development. This job will wind down in February or March next year. After this my next job will be working as a teaching assistant, doing marking and problem solving tutorials with undergraduate students.

On the Factor front, the Factor web site now runs on the Factor HTTP server. This is long-overdue; the HTTP server has been stable for a while, but I was using SourceForge. Now I have a hosting plan with Linode which gives you a user-mode Linux instance where you can run whatever server software you want. Tomorrow or the day after, DNS will propogate and you will be able to access the new web site at factorcode.org. Until then, you can access it at 69.93.127.154. Let's see how long it stays up.

The AMD64 compiler backend is almost complete. Integer overflow checks need debugging, there's a crash I haven't resolved yet, and the C library interface remains to be ported. I'm also working on a new prettyprinter for source code which does away with syntax noise entirely:

More details soon...

Wednesday, December 07, 2005

AMD64 compiler backend in progress

For the last week or so I've been working on an AMD64 compiler backend. Since the AMD64 architecture is so close to x86, my approach has been to refactor the existing x86 backend into a state where it can support both 32- and 64-bit code generation. So far, this has been a huge success -- if you look at the AMD64 directory in CVS, you will notice it contains a trivial amount of code; namely, an architecture description file, and an assembler extension file. This is in sharp contrast to SBCL, GCC, and others, which chose to implement AMD64 support by making a copy of the x86 source, and changing bits and pieces as required. This results in a massive amount of code duplication.

The first thing I did was add support for the 64-bit instruction set extensions to Factor's x86 assembler. Traditional x86 has 8 directly-addressable registers: EAX, ECX, EDX, EBX, ESI, EDI, EBP, and ESP. The instruction encoding has a byte known as the "MOD-R/M" byte. It reserves 3 bits for the source and destination registers, with 2 bits left over indicating the addressing mode of the destination -- this includes indirect addressing (MOV [EAX],ECX stores the contents of ECX at the address stored in EAX), indirect addressing with a displacement (MOV [ESI+4],EAX), and other more esoteric variations (the so-called scale/index/displacement mode). If the source, rather than the destination, uses a complicated addressing mode, then the direction bit in the opcode byte is set, effectively reversing the roles of the fields of the MOD-R/M byte. This explains why you cannot do memory-to-memory moves on x86; MOV [EAX],[EDX] is not valid since only one of the two operands can have a complex addressing mode.

The AMD64 architecture extends the 8 traditional x86 registers to 64 bits; the 64-bit registers have an R prefix rather than E (x86 old-timers will recall the E prefix was introduced during the move to 32 bits from 16; in 16 bit mode, AX, CX, DX etc are your 8 16-bit registers). AMD64 also introduces 8 new registers, numbered R8 to R15. In typical x86 fashion, these new registers are shoehorned into the instruction set in a pretty bizzare way. A special prefix is used to indicate that an instruction uses 64-bit operands. Since the MOD-R/M byte can only encode 8 distinct registers but AMD64 has 16, there are two bits in the prefix to indicate which of the two operands are extended registers.

This sounds rather convoluted, however the code changes required to the assembler were rather minor. Basically there is a set of 3 words to compute the relevant prefix (or no prefix at all, if the operands are 32-bit registers, or the instruction is one of a small set which do not require a prefix, despite having 64-bit operands). Of course the assembler works exactly as before if no 64-bit registers or instructions are used.

One aspect of the AMD64 architecture which is somewhat annoying is that most instructions still take 32-bit immediates. In fact, the only instruction that can take a 64-bit immediate is the simple form of MOV, where the destination is a register. So in effect,

MOV EAX,1234123412341234    ;; b8 variant permits a 64-bit constant -- valid
MOV [EAX],1234123412341234  ;; c7 variant only permits 32-bit constant -- INVALID

This means, among other things, that a JMP to an address further than 2^31 bytes away from the instruction pointer has to be encoded as two instructions; one to load the address, one to jump (recall that on x86, all jump and branch instructions take a relative address, hence "further away").

Once I had the assembler working, and various issues such as the above ironed out, I proceeded to refactor the generators for the various linear IR nodes (VOPs). The old x86 backend code had a lot of hard-coded registers in the code. There were two reasons for this:

x86 weirdness. Some instructions, such as those for multiplication and division, use implicit operands. For example IMUL takes one operand only, and multiplies it by the value in EAX (RAX in 64-bit mode).
Scratch registers. My compiler framework had no provision for scratch registers used in VOPs; I would just pull an unused register out of thin air and use it. Of course, this had a huge problem: the VOP's implementation now depended on the VOP's inputs and outputs being tied to certain registers.

I solved the first problem by defining words that would push an assembler register symbol on the stack, then I would call this word from the implementation of various VOPs, instead of referencing the register directly. Then I'd just define the word differently in the x86 and AMD64 architecture description files.

The second problem is solved by a new scratch register framework. Calling it a framework is a bit of a stretch, since it only involves 3 new words. Basically, when a VOP is being generated, the compiler builds a sequence of all registers the VOP takes as input and output. It then takes a list of all scratch registers provided by the architecture, and removes the ones referenced by the VOP. The result is a set of scratch registers the implementation of the VOP can use for whatever purpose.

There is a minor difficulty with this type of random register clobbering. Some VOPs are "basic block" VOPs; basically, any VOP that cannot not result in a change in the flow of control. Consecutive basic block VOPs are merged into a basic block and more aggressive optimization is performed on this basic block. For example, basic block VOPs are assumed not to touch any registers other than their inputs and outputs, and as such, the basic block optimizer can use this assumption to eliminate dead loads and stores. Of course, basic block VOPs thus cannot use the new scratch register framework. This has not been a problem yet, since basic block VOPs tend to be simple, and the only VOPs that use scratch registers are the more complicated ones (for examples, overflow-checking fixnum arithmetic) which are not basic block VOPs.

Finally, I had to fix some code in the x86 backend that would assume the word size was 32 bits. Surprisingly, there was very little code that made this assumption. In fact, it was only used in a handful of places, where a tagged fixnum was being converted into a an object slot offset. Tagged fixnums are cells with the low 3 bits set to zero; the value of the fixnum is the cell shifted to the right by 3 bits. Object slot offsets are mutiples of the word size. On x86, converting from a tagged fixnum to a slot offset is in effect division by two, or a right shift by 1. On AMD64, it is a no-op. Again, I simply abstracted out the call to the assembler sequence to shift a register to the right by 1 in the relevant places, replacing them with a word call. This word was defined in the old way in the x86 architecture description, and as a no-op in the AMD64 description.

The port is not finished yet. While basic code compiles and runs, more complicated words will raise sig11 errors or hang. There are some bugs to work out yet, but it should be pretty easy. Also, I have not yet implemented C library calls, fixnum overflow checks, or code relocation. Here is the status of each of these three items:

C library calls will actually require new code to be written; the existing x86 code cannot be adopted to this task. The reason is that the AMD64 calling convention is rather different from x86; function parameters are passed in registers, not on the stack, and stack frames have an elaborate structure that must be set up correctly; on x86 pushing a return address suffices.
Fixnum overflow checks actually do work, however the action they must take on overflow, upgrading fixnums to bignums, does not work. The reason is, of course, upgrading a fixnum to a bignum involves calling into the Factor runtime, and the Factor runtime is written in C, thus the C library interface must be implemented first.
The final item, code relocation, concerns saving compiled code to disk. When the image is saved, compiled code blocks require associated relocation information, so that if later the image is loaded at a different base address, any absolute addresses and references to symbols in the Factor runtime or C libraries can be fixed up. The runtime currently only supports word-size relocation; so far this has been sufficient, but for AMD64 the ability to fix up both 32-bit and 64-bit addresses is required. This will not involve a lot of code, but might be a pain to debug; code relocation is always tricky because it is easy to screw up, and not always obvious how to fix things when they go wrong -- the image just crashes on startup because some random instruction got clobbered.

I hope to finish the port within the next week or so; after this is done, Factor will compile native code on three CPU architectures, and I will have mastered the assembly language for all three. Compiler-wise, my next project is rewriting the translation from dataflow IR to linear IR (VOPs); right now a lot of information in the dataflow IR is lost, resulting in sub-optimal register usage and VOP selection. Once this is done, I will implement floating point and complex number VOPs using the SSE2 vector math extensions for x86 and AMD64; this should result in a substantial performance gain for numerical code.

However, first, I would like to finish the new help system that I started, but put aside. I put it aside for two reasons; I wanted to finish the AMD64 port first, so that I can develop Factor on my Athlon instead of the slow-as-fuck Mac Mini, and also, because I am stuck on how best to design an extended formatted stream protocol. This concerns a generic way to perform word wrap, indentation, and tabular output. Right now the prettyprinter and inspector have some ad-hoc code to do this, however I don't want to duplicate it yet again for the help system's styled text engine; I'd rather consolidate it in one place with a generic protocol allowing textual, HTML and GUI output.

Thursday, December 01, 2005

New web site address: factorcode.org

I registered the factorcode.org domain name, and set up virtual hosting with SourceForge, so going to that address now takes you to the same place as factor.sourceforge.net. By the end of this year, I will set up proper web hosting and finally be able to use the Factor HTTP server to run the Factor web site.

X11 window manager written in Factor

Eduardo Cavazos has been working on an X11 bindings library for Factor, and a window manager named Factory. Here is a screenshot. Cool stuff!

Doug Coleman is working on Win32 bindings, and I will later start work on Carbon (Mac OS X) bindings. These three libraries will be used to build platform-specific backends for Factor's UI, replacing the SDL library we use now.

My hat goes off to all the contributors; there's 6 people (not including me) hacking Factor now, which is ideal this stage of the project.

Factor: a practical stack language