Wednesday, December 28, 2005

Help system preview

Here is a preview of the new help system:

Right now I'm in the process of adding documentation for all built-in words, using the handbook and documentation comments as much as possible.
I've decided that eventually, I want all documentation strings out of the source code and into help markup. Eventually the structure editor will be introduced, along with a graphical editor for help markup.

Sunday, December 25, 2005

"Programming language paradigms" are meaningless

Labels such as "object oriented language" and "functional language" (and "logic language", etc) have so many conflicting accepted interpretations that they are almost totally devoid of meaning.

In the programming mainstream there is a weasel word when talking about paradigms, and that is "purity". To illustrate: Smalltalk code uses block closures extensively, perhaps more frequently than Common Lisp. Common Lisp on the other hand has object orientation facilities more advanced than Smalltalk; multiple dispatch and multiple inheritance are supported. Yet, mainstream programmers hold the belief that Common Lisp is "functional", while at the same time you'd be hard-pressed to find a programmer who would deny that Smalltalk is "object oriented". Of course, one can say that both Smalltalk and CL are "hybrid" OOP/functional languages, and are not "pure".

So what is a "pure" OOP language then? Perhaps Java, C#, or C++? Of course this would be yet another, totally arbitrary re-interpretation of the term "OOP"; and we would find that such a paradigm was an evolutionary dead-end, lacking expressive power needed to solve many problems elegantly. Along similar lines, what the mainstream refers to as "pure" functional language, namely those without mutable state, are also insufficient -- although the drawbacks of statelessness are overblown, there are definite drawbacks, such as the difficulty of implementing an interactive development environment in such a language.

Languages which confine themselves to one paradigm (or the mainstream definition thereof) also suffer in terms of implementation elegance, and invariably must step outside the language to describe themselves. Java cannot express its own control structures; purely functional languages have a hard time modeling low-level system-building constructs such as dynamic compilers and code replacement. So taking a mainstream concept of a "paradigm" is certainly not the path to follow for a language implementor.

There is really only one context where such labels make sense, and that is if we look at the history of programming languages. Certain ideas and the continuation of threads of research that gave us Simula, Smalltalk, C++, CLOS, Self, StrongTalk and Java could be described as "the object oriented movement", even if the languages in the list are as varied as a list consisting of ML, Prolog, SQL and FORTH. Certainly any language being developed today, unless it manages to be a total clone of a predecessor, does not belong to the part of history referred to by any existing programming paradigm in common use.

New language implementations (the interesting ones, anyway) instead aspire to implement the designer's vision of what good programming is; not a pre-existing set of ideas. I'm always looking around at new language projects, and there are many interesting ones out there. However when I see something with C-derived syntax, I groan. It probably means the semantics don't offer anything new either.

Factor runs on AMD64

The compiler backend is pretty much done; everything except for the C library interface works. A bootstrapped AMD64 factor.image is 5.4 Mb. This is much larger than the x86 image (3.3 Mb) or the PowerPC image (3.7 Mb).

Finishing the AMD64 port took a while, for several reasons. I took a break to work on the help system, and also I spent some time doing several compiler refactorings which were not required to get the backend operational, but were good for the longer-term health of the code. Whenever I revisit old code, I invariably end up refactoring or in many cases rewriting it. I hope that Factor's implementation gets tighter and cleaner over time, and not the other way around like with most code bases.

Implementing the C library interface should be a trivial but technical task; its only a few lines of code to write, but testing and debugging is always a pain, and gdb is no use with low-level issues. Once this is done I will return to working on the help system. Finishing the help system is not a high priority for 0.80, however I do need to implement enough to make the tutorial usable again, so expect a 0.80 release around the new year.

Thursday, December 22, 2005

Approaching functional programming from a procedural mindset

I found an amusing article about allegedly redundant methods in Ruby's Array class that demonstrates the difficulty some people have when first trying to understand functional programming. From reading the author's weblog, it seems his main (only?) programming experience comes from Java. Being accustomed to Java's collections and arrays, but he does not understand the value of a powerful collections library like Ruby's precisely because he is accustomed to writing explicit loops and iteration patterns, rather than combining bulk collection operators in abstract ways. While I don't know much about Ruby, I instantly recognized many of the methods he deemed "redundant" as standard functional programming idioms that I take for granted in Factor, and couldn't live without. They are the utility glue for reshaping data between invocations of higher-order functions. So while I'm not a Ruby expert, most of the "redundant" methods have direct analogues in Factor, and I found enough compelling usage of them to warrant attention.

So lets go through his list; I've skipped some of the entries because I do not know Ruby well enough to make an informed comment:, obj=nil)
This method creates an array that has size copies of the object. Why? Have you ever needed to create an array/list that begins its life containing more than one copy of the same object? I don't think I have. I can imagine a few uses cases, especially if Ruby doesn't distinguish primitive types from objects like Java does; but I don't think they're nearly common enough to justify explicit support in the API. On the rare occasions when you need this, there's a fill method you can call after constructing the list.

Actually creating a repeated sequence of the same element is very useful in practice. Here are some uses of repetitions in the Factor library:
  • Creating mathematical zero vectors. Factor's UI library uses vector math extensively to avoid loops, simplify layout management code, and avoid duplicating code that performs the same logic, except with horizontal or vertical orientation being a parameter (consider split panes, scroll bars, and so on).
  • The prettyprinter uses repetitions to indent source code in a natural way.
  • The type inference code in the compiler uses repetitions to avoid code duplication.

array.each_index {|index| block } -> array
"Same as Array#each, but passes the index of the element instead of the element itself." Am I missing something, or is this just a for loop across the array indexes? Why not just use a for loop, or whatever Ruby's equivalent is?

If you want to argue that any concept is redundant, it is the "for" loop. Higher-order functions can express iteration in a much nicer way. Both iteration over elements and iteration over indices is useful: in Factor, integers are sequences of predecessors, so you can iterate over elements with [ ... ] each and iterate over indices with length [ ... ] each -> obj or nil

As near as I can figure, this is exactly the same as using array brackets except it uses a method call. If there is a difference, it's pretty subtle. DRY, anyone?
My wild guess is that the method exists because of Ruby's superior reflection support. You can get a method object corresponding to 'at' at runtime, pass it somewhere, and invoke it. Try getting a method object for the '[]' or '+' operator in Java for reflective use. You can't. Here is an example where this causes major pain.

array.fetch(index) -> obj
This one is mostly the same as array brackets, but there is one difference. It will throw an IndexErr rather than returning nil if you go outside the range. But will anyone remember that difference and which does which when reading code? Sometimes when presented with different possible options, it's best just to pick one. Don't provide every possible option in an API.

Both bounds-checking and silently-failing sequence access is useful. Factor has both in the form of the nth and ?nth words. Neither word is used very frequently, because Factor encapsulates most collection operations in generic combinators; however having both words make code clearer and shorter than one alone.

array.pop and array.push(obj, ... )
Someone pushed a stack into the list. Didn't we learn from that mistake in Java? (OK. push is really just append, but still, pop? And if you have pop why do you need last? Does anyone here remember DRY?)

Why clutter the code with a stack class when you can use an array? Factor has push, peek, pop and pop* words for working with arrays-that-are-stacks (pop* removes the last element without returning it). If I remove pop just because peek already exists, then I'll just have to change all the usages of pop to duplicate the body of the definition of pop; and if somebody wants to argue that code duplication is not that bad, they might as well stop reading here.

Is it a list? Is it a stack? Is it a map? It's three, three, three data strcutures in one!

Yes, and there's nothing wrong with that. Association lists, while they suffer from linear lookup time, are occasionally useful where you want to have a mapping together with a certain intrinsic order. For example, Factor's generic word code takes the hashtable of methods defined on a generic, converts this to an association list, then sorts the association list according to the partial ordering of classes. Finally, the sorted association list is used to generate code.

array.transpose --> an_array
And now it's a matrix! What's next? a B-tree?
It might surprise the author to learn that mathematically, a matrix is a two-dimensional array. In Factor, you can perform mathematical operations on vectors, such as vector addition, dot product, and so on. The contributed math library defines even more: vector projection, matrix addition and multiplication, polynomials, quaternions, and so on. And everything uses an arrays of numbers, because mathematically that's what these structures are. Polynomials in one variable form a vector space over the complex numbers; since multiplying a vector by a scalar is the same concept as multiplying a polynomial by a scalar, why implement it twice? Ditto for matrices.
With the math rant over, I can list additional non-mathematical usages of a matrix transpose in the Factor library:
  • The hash>alist word, which converts a hashtable to the oft-lamented association list, works by extracting an array of keys from the hash, then extracting an array of values, then it creates a two-element array from these, and finally, it performs a matrix transpose to get an array of key/value pairs:
    : hash>alist ( hash -- assoc ) 
    dup hash-keys swap hash-values 2array flip ;
    This approach to algorithm design requires some practice, but the rewards include more elegant code, and a clearer mental framework for thinking about bulk operations with collections.
  • The inspector uses a similar technique. It builds up a table of strings represented as an array of columns, pads the strings in each column to the same width, transposes the matrix so that now it is an array of rows, and outputs each row.
  • I found four usages of flip in the compiler, all dealing with technical aspects such as value unification and dealing with recursive blocks.
  • The layout manager code in Factor's GUI toolkit does something similar to the inspector's padding of text columns; the frame layout uses transposition and reduction to ensure that each gadget is padded with respect to its row and column. I don't even want to think about what this code looks like when written in a procedural style.

array.reverse and array.reverse!
Have you ever needed to reverse a list outside of a job interview? Iterate through the list in reverse order, yes; reverse the list, no.
This is just silly. Factor reverses sequences, and not just for iteration, in a number of places:
  • In the parser. Syntax tokens are consed onto the head of the current node of the parse tree; when parsing of the node is finished it gets reversed. This is a very, very common idiom in Lisp; Factor only uses it occasionally.
  • The C library interface reverses an array of register numbers to handle differing low-level assembly calling conventions.
  • The library includes a functional queue with O(1) amortized enqueue/dequeue operations. Of course, the implementation of this involves a reverse.
  • The routine to convert an integer to a string repeatedly divides the integer by the numerical base, adding digits to a string; the string is reversed at the end, then returned.
  • The routine to convert an integer to a big-endian byte string is simply defined as follows:
    : >be ( x n -- string ) >le reverse ;
    This is an exceptionally elegant implementation: it converts it to little-endian, then reverses it.

Reverse iteration in Factor is implemented by first creating a virtual reversal sequence using reverse-slice, and then applying one of the standard combinators such as each, map, and so on. I'm not sure if Ruby is the same, but in either case, reverse iteration can be elegantly implemented by reversing your sequence first; no need to implement all your combinators twice, with forward and backward direction.

Do you really need any of this getting in your way? Possibly you need one or two of these methods; but how often, and how many? These methods are all way beyond the 80/20 point. I promised today I was going to show you all the methods you could take out of the Array class and still have a humane, indeed a more humane interface. I'm afraid I haven't fulfilled my promise. I got tired before I got halfway through the class and started skipping some cases. Try reading the entire documentation yourself from start to finish, and you'll see what I mean. It's just too damn big. A smaller, simpler API would be more humane.
As you can see, most of the author's complaints stem from a lack of understanding of functional programming style. In a language without higher order functions, the programmer writes loop after loop after loop, taking care to set up indices, always pulling elements out of collections and storing them back one by one. The result is error-prone, and full of duplication of effort. The duplication goes unnoticed because there is so much boilerplate anyway, that attempting good programming practice makes little difference. However, one should not let their past experience with such broken tools cloud their judgement when it comes to evaluating something new they do not fully grasp yet.

Monday, December 19, 2005

Markup language for online help system

My implementation of a markup language, plus rendering code for the UI, is now in a minimally usable state. Here is some sample output, showing a non-sensical help article being rendered:

Here is what the markup language looks like.

Next up for the help system is updating the HTML stream for the extended stream output protocol, and implementing the help article database. However, I'm going to put the help system aside for now, and finish the AMD64 port.

Saturday, December 17, 2005

More work on CLIM-like presentation output streams

I've been working on an online help system for Factor, and various components are already complete. One aspect that I finished implementing and testing today is some additional CLIM-like features for styled text output in the UI. The new with-nesting ( style quot -- ) combinator takes a style specification hashtable and a quotation. It calls the quotation inside a nested scope where the default stream is rebound to a nested pane; the style information is applied to the pane, and the new pane is written to the current pane. The style information allows border colors, border width and word wrap to be specified. Here is a screenshot showing two examples; the first one is the entire text of Factor's README.txt with all line breaks removed output into a nested pane with word wrap enabled as one line, and the second is a word definition output into a nested pane with a border:

Note that word wrap works with proportional-width fonts too, not just monospaced fonts. Tomorrow I'm going to update the markup language that I developed a couple of weeks ago to support this new output stream protocol, and then I can start work on an article database with cross referencing. At some point I will also update the HTML stream in the HTTP server to support the new protocol. With these components in place, the help system will be done, and all that will remain is converting the documentation from LaTeX into this new markup language. The new documentation will be browsable online on the Factor web site, and also from within Factor in the UI.

Thursday, December 15, 2005

Finished my degree; Factor status update

I have finally completed all the courses required to obtain a BSc Math. Next January, I begin graduate studies in mathematics (more specifically, algebraic topology, or even more specifically, axiomatic homotopy theory via model categories). I'm still working part-time doing Objective C development. This job will wind down in February or March next year. After this my next job will be working as a teaching assistant, doing marking and problem solving tutorials with undergraduate students.

On the Factor front, the Factor web site now runs on the Factor HTTP server. This is long-overdue; the HTTP server has been stable for a while, but I was using SourceForge. Now I have a hosting plan with Linode which gives you a user-mode Linux instance where you can run whatever server software you want. Tomorrow or the day after, DNS will propogate and you will be able to access the new web site at Until then, you can access it at Let's see how long it stays up.

The AMD64 compiler backend is almost complete. Integer overflow checks need debugging, there's a crash I haven't resolved yet, and the C library interface remains to be ported. I'm also working on a new prettyprinter for source code which does away with syntax noise entirely:

More details soon...

Wednesday, December 07, 2005

AMD64 compiler backend in progress

For the last week or so I've been working on an AMD64 compiler backend. Since the AMD64 architecture is so close to x86, my approach has been to refactor the existing x86 backend into a state where it can support both 32- and 64-bit code generation. So far, this has been a huge success -- if you look at the AMD64 directory in CVS, you will notice it contains a trivial amount of code; namely, an architecture description file, and an assembler extension file. This is in sharp contrast to SBCL, GCC, and others, which chose to implement AMD64 support by making a copy of the x86 source, and changing bits and pieces as required. This results in a massive amount of code duplication.

The first thing I did was add support for the 64-bit instruction set extensions to Factor's x86 assembler. Traditional x86 has 8 directly-addressable registers: EAX, ECX, EDX, EBX, ESI, EDI, EBP, and ESP. The instruction encoding has a byte known as the "MOD-R/M" byte. It reserves 3 bits for the source and destination registers, with 2 bits left over indicating the addressing mode of the destination -- this includes indirect addressing (MOV [EAX],ECX stores the contents of ECX at the address stored in EAX), indirect addressing with a displacement (MOV [ESI+4],EAX), and other more esoteric variations (the so-called scale/index/displacement mode). If the source, rather than the destination, uses a complicated addressing mode, then the direction bit in the opcode byte is set, effectively reversing the roles of the fields of the MOD-R/M byte. This explains why you cannot do memory-to-memory moves on x86; MOV [EAX],[EDX] is not valid since only one of the two operands can have a complex addressing mode.

The AMD64 architecture extends the 8 traditional x86 registers to 64 bits; the 64-bit registers have an R prefix rather than E (x86 old-timers will recall the E prefix was introduced during the move to 32 bits from 16; in 16 bit mode, AX, CX, DX etc are your 8 16-bit registers). AMD64 also introduces 8 new registers, numbered R8 to R15. In typical x86 fashion, these new registers are shoehorned into the instruction set in a pretty bizzare way. A special prefix is used to indicate that an instruction uses 64-bit operands. Since the MOD-R/M byte can only encode 8 distinct registers but AMD64 has 16, there are two bits in the prefix to indicate which of the two operands are extended registers.

This sounds rather convoluted, however the code changes required to the assembler were rather minor. Basically there is a set of 3 words to compute the relevant prefix (or no prefix at all, if the operands are 32-bit registers, or the instruction is one of a small set which do not require a prefix, despite having 64-bit operands). Of course the assembler works exactly as before if no 64-bit registers or instructions are used.

One aspect of the AMD64 architecture which is somewhat annoying is that most instructions still take 32-bit immediates. In fact, the only instruction that can take a 64-bit immediate is the simple form of MOV, where the destination is a register. So in effect,
MOV EAX,1234123412341234    ;; b8 variant permits a 64-bit constant -- valid
MOV [EAX],1234123412341234 ;; c7 variant only permits 32-bit constant -- INVALID

This means, among other things, that a JMP to an address further than 2^31 bytes away from the instruction pointer has to be encoded as two instructions; one to load the address, one to jump (recall that on x86, all jump and branch instructions take a relative address, hence "further away").

Once I had the assembler working, and various issues such as the above ironed out, I proceeded to refactor the generators for the various linear IR nodes (VOPs). The old x86 backend code had a lot of hard-coded registers in the code. There were two reasons for this:
  • x86 weirdness. Some instructions, such as those for multiplication and division, use implicit operands. For example IMUL takes one operand only, and multiplies it by the value in EAX (RAX in 64-bit mode).
  • Scratch registers. My compiler framework had no provision for scratch registers used in VOPs; I would just pull an unused register out of thin air and use it. Of course, this had a huge problem: the VOP's implementation now depended on the VOP's inputs and outputs being tied to certain registers.

I solved the first problem by defining words that would push an assembler register symbol on the stack, then I would call this word from the implementation of various VOPs, instead of referencing the register directly. Then I'd just define the word differently in the x86 and AMD64 architecture description files.

The second problem is solved by a new scratch register framework. Calling it a framework is a bit of a stretch, since it only involves 3 new words. Basically, when a VOP is being generated, the compiler builds a sequence of all registers the VOP takes as input and output. It then takes a list of all scratch registers provided by the architecture, and removes the ones referenced by the VOP. The result is a set of scratch registers the implementation of the VOP can use for whatever purpose.

There is a minor difficulty with this type of random register clobbering. Some VOPs are "basic block" VOPs; basically, any VOP that cannot not result in a change in the flow of control. Consecutive basic block VOPs are merged into a basic block and more aggressive optimization is performed on this basic block. For example, basic block VOPs are assumed not to touch any registers other than their inputs and outputs, and as such, the basic block optimizer can use this assumption to eliminate dead loads and stores. Of course, basic block VOPs thus cannot use the new scratch register framework. This has not been a problem yet, since basic block VOPs tend to be simple, and the only VOPs that use scratch registers are the more complicated ones (for examples, overflow-checking fixnum arithmetic) which are not basic block VOPs.

Finally, I had to fix some code in the x86 backend that would assume the word size was 32 bits. Surprisingly, there was very little code that made this assumption. In fact, it was only used in a handful of places, where a tagged fixnum was being converted into a an object slot offset. Tagged fixnums are cells with the low 3 bits set to zero; the value of the fixnum is the cell shifted to the right by 3 bits. Object slot offsets are mutiples of the word size. On x86, converting from a tagged fixnum to a slot offset is in effect division by two, or a right shift by 1. On AMD64, it is a no-op. Again, I simply abstracted out the call to the assembler sequence to shift a register to the right by 1 in the relevant places, replacing them with a word call. This word was defined in the old way in the x86 architecture description, and as a no-op in the AMD64 description.

The port is not finished yet. While basic code compiles and runs, more complicated words will raise sig11 errors or hang. There are some bugs to work out yet, but it should be pretty easy. Also, I have not yet implemented C library calls, fixnum overflow checks, or code relocation. Here is the status of each of these three items:
  • C library calls will actually require new code to be written; the existing x86 code cannot be adopted to this task. The reason is that the AMD64 calling convention is rather different from x86; function parameters are passed in registers, not on the stack, and stack frames have an elaborate structure that must be set up correctly; on x86 pushing a return address suffices.
  • Fixnum overflow checks actually do work, however the action they must take on overflow, upgrading fixnums to bignums, does not work. The reason is, of course, upgrading a fixnum to a bignum involves calling into the Factor runtime, and the Factor runtime is written in C, thus the C library interface must be implemented first.
  • The final item, code relocation, concerns saving compiled code to disk. When the image is saved, compiled code blocks require associated relocation information, so that if later the image is loaded at a different base address, any absolute addresses and references to symbols in the Factor runtime or C libraries can be fixed up. The runtime currently only supports word-size relocation; so far this has been sufficient, but for AMD64 the ability to fix up both 32-bit and 64-bit addresses is required. This will not involve a lot of code, but might be a pain to debug; code relocation is always tricky because it is easy to screw up, and not always obvious how to fix things when they go wrong -- the image just crashes on startup because some random instruction got clobbered.

I hope to finish the port within the next week or so; after this is done, Factor will compile native code on three CPU architectures, and I will have mastered the assembly language for all three. Compiler-wise, my next project is rewriting the translation from dataflow IR to linear IR (VOPs); right now a lot of information in the dataflow IR is lost, resulting in sub-optimal register usage and VOP selection. Once this is done, I will implement floating point and complex number VOPs using the SSE2 vector math extensions for x86 and AMD64; this should result in a substantial performance gain for numerical code.

However, first, I would like to finish the new help system that I started, but put aside. I put it aside for two reasons; I wanted to finish the AMD64 port first, so that I can develop Factor on my Athlon instead of the slow-as-fuck Mac Mini, and also, because I am stuck on how best to design an extended formatted stream protocol. This concerns a generic way to perform word wrap, indentation, and tabular output. Right now the prettyprinter and inspector have some ad-hoc code to do this, however I don't want to duplicate it yet again for the help system's styled text engine; I'd rather consolidate it in one place with a generic protocol allowing textual, HTML and GUI output.

Thursday, December 01, 2005

New web site address:

I registered the domain name, and set up virtual hosting with SourceForge, so going to that address now takes you to the same place as By the end of this year, I will set up proper web hosting and finally be able to use the Factor HTTP server to run the Factor web site.

X11 window manager written in Factor

Eduardo Cavazos has been working on an X11 bindings library for Factor, and a window manager named Factory. Here is a screenshot. Cool stuff!

Doug Coleman is working on Win32 bindings, and I will later start work on Carbon (Mac OS X) bindings. These three libraries will be used to build platform-specific backends for Factor's UI, replacing the SDL library we use now.

My hat goes off to all the contributors; there's 6 people (not including me) hacking Factor now, which is ideal this stage of the project.

Monday, November 28, 2005

New hashtable implementation brings 400kb reduction in image size

I put Factoroids aside for now. I got it in a further state than shown in the screenshot, but I wanted to return to Factor hacking.

I reimplemented the hashtable to use open addressing instead of bucket chaining. With bucket chaining, the hashtable is backed by an array of buckets; keys are sorted by hashcode into buckets, where each bucket is an association list. With open addressing, the hashcode determines the starting index in the array; if the index is occupied by another key, the next index is searched, and so on.

On a 32-bit system, the old implementation, each key/value pair stored in a hashtable required two cons cells -- that's 16 bytes -- and each bucket required 4 bytes of storage. Since the number of buckets was kept at roughly the same as the number of key/value pairs, that is 20 bytes per hashtable entry. With the new implementation, a hashtable entry is just a pair of consecutive array elements -- 8 bytes. The underlying arrays are kept larger than before, so the end result is a savings of not quite 12 bytes per key/value pair, but enough to reduce the image size by 400Kb, and the time to do a full garbage collection by 10ms (on a 1.25 Ghz PowerPC G4).

There were some changes to the hashtable words as a result.

hash* ( key hash -- [[ key value ]] ) is now hash* ( key hash -- value ? ). Unlike hash ( key hash -- value ), this word can detect the difference between a value of f, and a key that is not present at all. However, the old stack effect was not only clumsy -- most callers would call cdr on it after checking if it is not f; but it makes no sense with the new implementation, and would require allocating a cons cell. So the new implementation returns the value, but also pushes a boolean that denotes if the key is in the hashtable.

Along similar lines, the hashtable combinators that would formely pass a cons holding a key/value pair to their quotation now pass the key and value separately on the stack:

Also, association lists are, for all intents and purposes, gone. An association list is a list of cons cells where the car is a key and the cdr is a value. They were used in a few places as "lighter weight" mappings than hashes. However, now there's no reason to use them anymore. This means the following words are now gone:

One final change is the literal syntax. Here is the new syntax:
{ "garlic bread" "appetizer" }
{ "steak" "main" }
{ "ice cream" "desert" }

The difference is that key/value pairs are entered with { ... } syntax, not [[ ... ]]. I always thought the old syntax looked kind of cluttered, and now the reason to use cons cells here is gone.

The other thing I've worked on lately is fixing some compiler bugs; one that was reported recently, and a handful of long-standing ones. All of them turned out to be related by a common thread: inlined recursive words. This is always tricky because the compiler must take care to note which values remain constant between recursive calls, and which do not. While the idea is simple there are a lot of tricky details, and we had a number of problems where values would be assumed to be literal when they were not, resulting in inappropriate optimizations being performed.

In other news, I now have an dual core AMD64 machine. I will begin porting Factor to this architecture very soon. I know a few people have been looking forward to running Factor on their AMD64 machines, and I hope to make them happy.

Tuesday, November 15, 2005

Working on a 3D game in Factor

I started work on Factor 0.80 in CVS, but decided to take a break from working on the Factor core, and instead, try writing a 3D game! I only did a few hours hacking on it so far, just fleshing out the bare necessities of a game engine. Here is what I have so far; you can move the player around the rather dull-looking world, and shoot projectiles which fly off into the distance:

Sunday, November 06, 2005

Factor 0.79 released

Factor 0.79 is now out after two months of development.
  • Incompatible changes:
    • The ifte combinator has been renamed to if.
    • Syntax changes:

      { 1 2 3 } ! arrays
      V{ 1 2 3 } ! vectors
      H{ [[ key value ]] ... } ! hashtables
      C{ 1.02 -7.44 } ! complex numbers
      T{ class slots ... } ! tuple
  • Compiler:
    • New basic block optimizer performs more aggressive dead load and store elimination.
    • Stack shuffles are compiled more efficiently.
    • Pushing literals on either side of a stack shuffle is now compiled more efficiently.
    • Fixed a problem with relocation of compiled code on PowerPC.
    • Fixed various FFI issues on Mac OS X.
  • User interface:
    • Graphics rendering is now done using OpenGL and text rendering is done via FreeType. SDL_gfx and SDL_ttf libraries are no longer required.
    • Added expandable outliners. Used by the inspector, .s, usage., uses., vocabs., and various other words.
    • Added word completion to the listener pane; press TAB.
    • Added word navigation shortcuts to the listener pane; press C+LEFT and C+RIGHT to move a word at a time, and C+BACKSPACE and C+DELETE to delete the previous and next word, respectively.
    • Added mouse-over help for presentations.
    • Previously-entered output is now clickable in the listener.
    • New, better-looking widget theme.
  • Collections:
    • Faster map, 2each and 2map.
    • Arrays are now better supported and should be used instead of vectors where resizing is not desired.
    • Some new sequence words that do not bounds check: nth-unsafe and set-nth-unsafe. These should only be used in special circumstances, such as inner loops (each, map and so on use them).
    • New replace-slice ( new from to seq -- seq ) word replaces a slice of a sequence with another sequence.
    • Hashtables did not obey the rule that equal objects must have equal hashcodes, so using hashtables as hashtable keys did not work.
    • New mismatch ( seq seq -- i ) word returns first index where two sequences differ, or -1 if they have equal elements.
    • New drop-prefix ( seq seq -- seq seq ) word returns a pair of sequences which have the same elements as the two input sequences, with the common prefix removed.
  • Everything else:
    • On Mac OS X, signal 11 errors caused by stack under/overflow no longer trigger the OS X crash reporter dialog.
    • Easier exception handling. The cleanup ( try cleanup -- ) word encapsulates the following idiom:

      [ A ] [ B rethrow ] catch
      [ A ] [ B ] cleanup
      The recover ( try recover -- ) word encapsulates the following idiom:

      [ A ] [ [ B ] when* ] catch
      [ A ] [ B ] recover
      The catch ( try -- error/f ) word no longer takes a quotation that receives the error caught; rather, it just pushes the error that was caught on the stack. That is, the old way:

      [ A ] [ B ] catch

      [ A ] catch B
      However, most uses of catch can be replaced by cleanup and recover.
    • The distinct t type is gone. Now, the t object is just a symbol.
    • A new with-server ( port ident quot -- ) combinator takes care of listening on a network socket, logging client connections, spawning client handler threads, and error handling. The quotation is called for each client, in a new scope with the client socket as the default stream.
    • New 1+, 1- words are a bit faster than 1 + and 1 - since there is one less generic dispatch involved.
    • On Windows, FFI errors would raise a signal 11 instead of reporting a nice message.
  • Contributed code:
    • The HTTP server and client has been moved from library/httpd/ to library/contrib/.
    • Intel 8080 CPU and Space Invaders emulator in contrib/space-invaders (Chris Double)
    • AOL Instant Messenger chat client library in contrib/aim (Doug Coleman)
    • Cairo graphics library binding in contrib/cairo. (Sampo Vuori)
    • Advanced math library with quaternions, matrices, polynomials, statistics and various functions in contrib/math/. (Doug Coleman)
    • Dimensioned units in contrib/units/. (Doug Coleman)
    • X11 binding in contrib/x11/ (Eduardo Cavazos)

Sunday, October 30, 2005

Incompatible changes in Factor 0.79

Factor 0.79 is almost ready for release. There is a minor prettyprinter bug, and a show-stopper OpenGL failure on Windows that no doubt will be resolved soon. After that, I have to update the documentation, and it will be released. This release shakes things up a little, so I thought it would be worthwhile to post an entry about the most noticable changes, now that we have a number of active contributors.

First of all, the ifte combinator has been renamed to if.

Next up, literal syntax for various types has changed:

Data typeOld syntaxNew syntax
Vector{ 1 2 3 }V{ 1 2 3 }
ArrayNone{ 1 2 3 }
Hashtable{{ [[ key value ]] ... }}H{ [[ key value ]] ... }
Tuple<< class slots ... >>T{ class slots }
Complex number#{ real imaginary }#C{ real imaginary }

In particular, note that in 0.79, arrays play a much larger role in the language. Formely, arrays were an implementation detail; they lived in the kernel-internals vocabulary, did not perform bounds checking, and were only used internally to implement vectors and hashtables. In 0.79, arrays have a literal syntax, are fully safe, and live in the arrays vocabulary. Arrays are not resizable like vectors are, but are slightly more efficient. This is the only real difference; the same operations work on both, since Factor's sequences are fully generic. Arrays are now the preferred data type for literal sequences. In fact, literal vectors are extremely rare; there's really only two cases where they are needed:

V{ } clone -- make a new, empty vector
[ 1 , 2 , 3 , ] V{ } make -- implicit sequence construction

This is why arrays now have the old vector syntax, and vectors now have a slightly more verbose syntax.

In particular, these changes have made the inspector display much more readable.

Finally, I also wanted to touch on a topic that I have discussed on IRC, but never put in writing. Lists and conses are being phased out over time. This does not affect the upcoming release of 0.79, but eventually I hope that quotations can become a first-class immutable type with underlying array storage; they will reuse the [ ... ] syntax. Code that uses conses to store data will be refactored to use arrays and vectors instead. There are several reasons for this:
  • Conses entail a 2x space overhead, and make the inner interpreter needlessly slow as it traverses the 'cdr' pointer while interpreting code. While the compiler makes this irrelevant, not all code is compiled. During bootstrap, interpretation overhead accounts for a significant portion of run time.
  • If quotations become arrayed, first-class types, then the debugger will be radically improved, and return stack traces will become much more readable due to extra information available at run time. Anybody who has looked at :r output knows what I mean. Its hard to understand if it is just a quotation soup, without any idea of what word each quotation came from.
  • Conses require their own implementations of algorithms such as each, map, head, and so on. Already, some combinators like 2map are only implemented for arrays, and result in a quadratic performance degradation with conses. I realize the "iterator" design pattern is supposed to eliminate this type of duplication, but iterators have performance and complexity of implementation problems.
  • The Factor programming style, which tends to favor the creation of new sequences from old ones rather than direct mutation, virtual sequences, and using combinators instead of explicit recursion, does not really require conses. Most algorithms can be expressed effectively using arrays and vectors instead.
  • Factor's conses are already quite constrained by being immutable. Thus circular lists cannot be implemented, which the main application of conses that is difficult to achieve with arrays.
  • If a specific problem calls for conses, then nothing prevents anybody from implementing a library for working with them.
  • Removal of conses would remove the requirement for the object memory manager to handle headerless objects. This will allow switching to a generational mark-sweep-compact garbage collector, which uses less memory than the generational copying collector we have now. Also, giving each object a header allows incremental GC to be implemented.

The total removal of conses from the core library is quite a major change. However, to keep the language lean and mean, this is something that should be done. Some people might find it surprising that I am still considering major changes to Factor, even as it is more than two years old. However, I did not have the benefit of extensive research and experience in language design before I began this effort, so I'd rather streamline my design while I still can rather than accumulate a large body of historical cruft and scar tissue.

Friday, October 28, 2005

Factor 0.79 almost ready

Its been almost two months since Factor 0.78, but CVS is looking very good and its almost ready for release. I only have a few items on my list. Here is a screenshot showing off the working FreeType font rendering, and new GUI theme. The OpenGL rendering backend has improved performance considerably, however it is still not as snappy as I'd like. However, this will all get solved over time as the compiler matures, and we replace SDL with OS-specific windowing substrate.

Some other developments include Doug Coleman's extended math library, which adds many functions not provided in the core math library. This includes dimensioned units, combinatorial functions, polynomials, statistics, and number theory. Also, an X11 binding is in development. This will replace SDL on *nix platforms as the windowing substrate for the UI (except for OS X, which will use a Carbon binding that I will develop).

Sunday, October 23, 2005

Rendering text with FreeType and OpenGL

The port of the UI to OpenGL is almost complete. I have finished implementing the last major missing component, text rendering. Text is now drawn by calling FreeType directly; there is no SDL_ttf dependency any more. Each glyph is rendered to a display list that first draws a textured quad, and then translates the origin by the width of the glyph. This allows a string to be drawn simply by calling each glyph's display list in turn. Here is a screenshot:

Thursday, October 13, 2005

A random status update

First, hats off to Chris Double who continues to do cool stuff with his continuation-based web framework. His latest hack is a to do list web app that uses a capability-based security model. Instead of having to remember a username/password, you bookmark a randomly-generated URL.

Last week I acquired an Apple Mac. This has given me a chance to fix some long-standing problems in Factor on Mac OS X. I got the UI running, and I also fixed a couple of relocation bugs in the compiler.

I have also started overhauling the UI to use OpenGL for rendering instead of SDL. So far, I've implemented all graphics primitives except for text, developed a FreeType binding, and started writing on some code to display FreeType-rendered glyphs as OpenGL textures.

Moving to OpenGL will provide increased performance and better eye-candy, and is one step in my plan to get rid of SDL altogether. SDL has some annoying limitations, like only supporting one top level window, not providing any way to access the clipboard, and more. Instead of SDL, the UI will have platform-specific backends; the set I'm looking at so far is Carbon, X11 and Windows. I will personally develop the Carbon backend -- I ordered a book on this topic and it is coming in the mail. I might also do the X11 backend later if nobody else wants to. As for Windows, a couple of people have already expressed interest in helping out.

Here is a screenshot of what the OpenGL rendering looks like os far:

Not very pretty -- there is a rendering flaw with polygon outlines, and of course, text is missing.

Tuesday, October 04, 2005

Word name completion in listener, and quaternions

I added completion popups to the listener. Keyboard selection of possibilities has not been implemented yet. You have to use the mouse.

I also added support for quaternions to Factor's math library. Quaternions are useful for certain geometrical calculations; a 3-dimensional rotation around an axis can be represented as multiplication by a quaternion.

Quaternions are represented as pairs of real numbers. Literal syntax is Q{ a b c d }Q, where a, b, c and d are real numbers.

Many quaternion operations can be done using Factor's existing vector algebra words; this follows from the mathematical fact that the quaternions are a 2-dimensional vector space over the complex numbers:

Addition: v+
Subtraction: v-
Negation: vneg
Norm: norm
Multiplication by a complex number on the left: n*v

A few new words are also needed:

Convert complex number to quaternion: c>n
Multiply by a complex number on the right: q*n
Quaternion multiplication: q*
Quaternion division: q/
Quaternion conjugate: qconjugate

Sunday, October 02, 2005

Tuesday, September 27, 2005

Surprising reduction in image size, and more outliner goodness

In Factor, each word has a word properties hashtable associated with it. Word properties can be read and written with the word-prop and set-word-prop words. When browsing around in the inspector, I noticed that a lot of word properties simply had the value f; denoting no value. So I tweaked set-word-prop to simply call remove-hash if the new value is f, thus removing a hashtable key instead of storing a key/value pair with f as the value. This elimination of redundant key/value pairs has reduced image size by a whopping 500Kb! So a full factor.image is only 3.1mb now. This is a 1.3Mb reduction since Factor 0.77, and a 700Kb reduction since Factor 0.78.

I wrote a couple of days ago about a new outliner gadget in the UI. I have cleaned it up, made it look better and decided on a protocol for calling it.

The new protocol for outputting outliners is decoupled from the UI itself and can be used in plain text streams (of course no outliner is displayed in that case). To output an outliner, you call the write-outliner ( string object quot -- ) word. When the outliner is expanded, the quotation is called with the default stream rebound to another stream; any output written to the stream is displayed in the outliner's expanded area. This quotation can therefore output more outliners, creating a nested tree.

Here is a example implementing a call hierarchy outliner; notice how simple it is, with no calls into the UI implementation, or event handling. First we have the utility object-outline word, which takes a sequence of objects, and a quotation that can expand an outline for each object. It prints a set of outliners, one for each object in the sequence, with the quotation as the callback. The quotation receives the object on the stack.

: simple-outliner ( seq quot -- | quot: obj -- )
swap [
[ unparse-short ] keep rot dupd curry write-outliner
] each-with ;

Now, the usage. word uses simple-outliner to display a hierarchy of words that call a given word:

: usage. ( word -- )
#! Outlining usages browser.
usage [ usage. ] simple-outliner ;

Here is what it looks like after a few nested outliners have been expanded. Note that each word in the outline is a live presentation that can be clicked to display a command menu:

I have similarly updated the inspector and stack display to use outlining. As previously mentioned, the new vocabs. displays an outliner for browsing vocabularies and word definitions.

As you can see in the screenshot, scroll bars look a bit nicer now, and have up/down arrows. There is a clipping flaw in the arrows; its either an sdl-gfx problem or with how I'm calling it, but in any case it should be easy to fix.

I have been studying the part of the CLIM spec for accepting presentations, input editing, and completion. This is a major omission from the Factor UI toolkit at this point. I'll probably implement something rather different from the CLIM way, however understanding how others have solved similar problems helps a lot.

Sunday, September 25, 2005

CLIM-like features in Factor UI

(First, I have to say that's user interface is much nicer than JRoller. It supports WYSIWYG editing of weblog posts and makes great use of AJAX for a snappier user interface.)

While my understanding of CLIM is incomplete, the most striking aspect is how central input/output streams are. CLIM extends output streams to support marked up text and graphics. To output a set of widgets with a vertical layout, you write them to an extended output stream, with a line break between each widget; horizontal layout is the same except with no intermediate line breaks. The extended output stream is connected to a pane which displays the output. Live presentations may be written to the output stream; they may be clicked and used as input to commands. CLIM supports a notion of extended input streams, which support input of presentations, together with completion.

The Factor UI already supports a pane gadget, with extended stream output protocol, and (currently) only plain text input. Today, I spent some time cleaning up the pane gadget to be more general purpose, and changed the stack display widget in the listener to use a pane instead of constructing a vertical layout by hand.

I implemented an outliner gadget that uses CLIM concepts; it is similar to "Tree" gadgets in other GUI toolkits, but there is a fundamental difference. The outliner gadget is given a quotation; when the outliner is expanded, the quotation is run inside a dynamic scope whose standard output is bound to a new pane. The quotation can then write attributed text and live gadgets to the stream; some of those gadgets might themselves be outliners. This results in much simpler code than a typical "tree view"/"tree model" separation. Of course there is a limitation in that it is not easy to remove individual tree nodes; instead, you must regenerate the outliner view. However I don't expect this to be a problem.

This screenshot shows the outliner in action, with a simple word browser. I know it is ugly; I will be working on appearance later:

The code is trivial:

: word. ( word -- )
swap [ literalize , \ see , ] [ ] make outliner. ;

: vocab. ( vocab -- )

Here is another one. This time its a messier hack and I won't show the source; I still have to think about a few things and integrate it in a clean manner. This shows expandable slots in an outlining inspector.

The new outliner allows a very powerful class of UI tools to be developed; however the tools will continue to work on the command line, albeit with reduced functionality.

Saturday, September 24, 2005