Monday, July 31, 2006

GUI single stepper

Using the old listener-based single stepper as a starting point, I implemented a much improved graphical tool for walking through code. It is easier to use than the old single stepper, automatically shows the stacks at each point, supports a new 'step out' action which executes until the end of the current quotation, and has better support for stepping through code which throws errors. This last part is still pretty broken, however. Also stepping through code which works with continuations doesn't do the right thing at all. I plan on fixing both issues before 0.84.

Friday, July 28, 2006 HTTPD crashes after 1 month of uptime

Looks like there's another I/O bug. I think this one was introduced recently. Still, this is the longest it has stayed up, IIRC. After the next round of I/O debugging and optimizations, I intend to move the downloads from SourceForge to the Factor server to give it more of a workout.

Factor 0.83 now available

Factor 0.83 is now available. Here is the change log.

Update: Mac OS X Intel and Windows binaries are now available, thanks to Doug Coleman. The Windows port does not support native I/O this time round, so there is no multithreaded I/O, and HTTPD and the Factor plugin for jEdit are not functional. This will be rectified in 0.84.

Thursday, July 27, 2006

Poor I/O performance

When I saw this weblog entry, I decided to port the reverse complement benchmark to Factor. The benchmark took a whopping 51 minutes to run on my G5, compared to 3 seconds for the Java version. There are definitely some low-hanging fruit in the I/O code. I'll start working on I/O performance in Factor 0.84, and hopefully improve on the benchmark result by several orders of magnitude.

Update: I ported the benchmark wrong. The initial buffer size was wrong; 3mb versus 30mb, so there was a lot of copying going on. Fixing this, the run time decreased to 170 seconds. Still poor, but not nearly as bad. 120 seconds are spent simply reading the file, line by line. This will be the first thing I will optimize in 0.84.

Here is the benchmark code. It seems to be clearer and simpler than most of the implementations on that page.

Tuesday, July 25, 2006


I've mentioned InterLisp a few times, when talking about Factor's future structure editor and code analysis tools. Most people probably have not heard of InterLisp since it is quite dead. You can read about it at the computer history project, in particular the environment manual.

Java-style university degrees

I received an amusing spam today, the subject line: "A Genuine University Degree in 4-6 weeks! BA BSc MA MSc MBA PhD, No Study Required!" This is where our society is heading, and it is the same sort of attitude as you see in the jokers who list every possible web buzzword on their CV, along with 10 years .NET "experience", yet the only code they can come up with is copy and pasted from tutorials.

Shortest path between two words and other goodness

I got frustrated with an obscure stack effect inference bug today, so I decided to work on something completely different, the beginnings of a system like InterLisp's MasterScope. In MasterScope, you can perform queries like the following:
  • Which functions call my function?
  • Which functions does my function call?
  • What's the shortest call path between two functions?
  • What dynamic variables does my function read or write?

Factor can already tell you the first two, and has been able to for quite some time. Now I implemented a program that computes the shortest path between two words. Its a little inefficient since I'm missing a fast priority queue, and the code is a bit rough so I'm not putting it in the core yet. It will go in soon, though. Here are some examples:
  \ nth \ parse shortest-path .
{ parse parse-lines set namespace peek nth }
\ set-nth \ each shortest-path .
{ }

In the first example a call path was found. In the second example, the set-nth word is never called by any word called from each.
I also implemented uses/usages at the vocab level. You can ask what vocabularies does a given vocabulary depend on (when one considers the totality of words defined therein):
  "opengl" vocab-uses .
"sequences" "kernel-internals" "generic" "alien" "math"
"io" "kernel" "opengl"

Also, you can ask what vocabularies call words from this one:
  "gadgets-text" vocab-usage .
"gadgets" "gadgets-listener" "gadgets-search"
"gadgets-text" "gadgets-theme" "io" "models"

Right now there is a lot of circular dependency between vocabularies, but over time I hope to reduce this and tools such as the above will help.
What will be really neat is wrapping these cross-referencing tools in an UI frontend.

Monday, July 24, 2006

Clipboard access: Cocoa compared to X11

Compare the Cocoa clipboard code with the X11 clipboard code. Makes me wonder how crappy X11 drag and drop will be (and yes, I do plan on supporting drag and drop in the UI).

Faster bootstrap

Factor now bootstraps in 2 minutes 30 seconds on my 2.5 GHz Power Mac G5. Just 13 days ago when I did some benchmarks, this process would take 6 minutes and 19 seconds. All I did was make hashtable lookup faster, and make a minor change to the code that computes the class precedence cache.

Saturday, July 22, 2006

Lazy lists in Factor

I'd like to welcome a new contributor, Matthew Willis. He updated Chris Double's lazy lists library -- one of the oldest contributed libraries -- for the latest changes in Factor 0.83, namely the removal of cons cells. Thanks, Matt!

Wednesday, July 19, 2006

A status update

The Windows port is in the process of being updated for Factor 0.83 by Doug Coleman. The UI works; native I/O does not, yet. As soon as this is working, I will release 0.83. I've been working on a multi-line text editor gadget. Its mostly functional now, and the old single-line editor gadget has been removed. Later on I might try building a text editor around this gadget, athough it will not be as big and complex as jEdit.

Friday, July 14, 2006

Gaussian elimination in Factor

I needed to compute the rank and nullspace of a matrix, so I implemented the Gaussian elimination algorithm in Factor. This implemention is rather inefficient. While I intend to optimize it somewhat, it will probably remain somewhat inefficient since I want it to serve as a benchmark of Factor's sequences.
! Copyright (C) 2006 Slava Pestov.
! See for BSD license.
IN: matrices
USING: kernel math namespaces parser sequences ;

SYMBOL: matrix

: with-matrix ( matrix quot -- )
[ swap matrix set call matrix get ] with-scope ; inline

: nth-row ( row# -- seq ) matrix get nth ;

: nth-col ( col# ignore-rows -- seq )
matrix get tail-slice [ nth ] map-with ;

: change-row ( row# quot -- | quot: seq -- seq )
matrix get -rot change-nth ; inline

: exchange-rows ( row# row# -- ) matrix get exchange ;

: rows ( -- n ) matrix get length ;

: cols ( -- n ) 0 nth-row length ;

: first-col ( row# -- n )
#! First non-zero column
0 swap nth-row [ zero? not ] skip ;

: clear-scale ( col# pivot-row i-row -- n )
>r over r> nth dup zero? [
3drop 0
] [
>r nth dup zero? [
r> 2drop 0
] [
r> swap / neg
] if
] if ;

: (clear-col) ( col# pivot-row i -- )
[ [ clear-scale ] 2keep >r n*v r> v+ ] change-row ;

: (each-row) ( row# -- slice )
rows dup <slice> ;

: each-row ( row# quot -- )
>r (each-row) r> each ; inline

: clear-col ( col# row# -- )
[ nth-row ] keep 1+
[ >r 2dup r> (clear-col) ] each-row
2drop ;

: do-row ( exchange-with row# -- )
[ exchange-rows ] keep
[ first-col ] keep
clear-col ;

: find-row ( row# quot -- i elt )
>r (each-row) r> find ; inline

: pivot-row ( col# row# -- n )
[ dupd nth-row nth zero? not ] find-row 2nip ;

: (row-reduce) ( -- )
0 cols rows min [
over pivot-row dup
[ over do-row 1+ ] [ drop ] if
] each drop ;

: row-reduce ( matrix -- matrix' )
[ (row-reduce) ] with-matrix ;

: null/rank ( matrix -- null rank )
row-reduce [ [ [ zero? ] all? ] subset ] keep
[ length ] 2apply over - ;

JRoller error

Don't you love error messages from Java web apps? Usually they consist of exceptions nested 10 levels deep.
javax.servlet.jsp.JspException: ServletException in '/main.jsp': An error
occurred while evaluating custom action attribute "value" with value "${post.category.path}":
An error occurred while getting property "path" from an instance of class
org.roller.pojos.WeblogCategoryData (java.lang.OutOfMemoryError: Java heap

Their web site might not be working properly, but at least they're POJO-compliant.

Tuesday, July 11, 2006

Factor performance shootout

I ran a number of Factor benchmarks on the following machines:
  • 2.8 GHz Pentium 4, running Debian Linux, proprietary NVidia driver
  • 2.4 GHz Athlon 64 (dual processor), running Ubuntu Linux, proprietary NVidia driver
  • Apple MacBook Pro - 2.0 GHz Intel Core Duo (dual processor) running Mac OS X
  • Apple PowerMac - 2.5 GHz G5 (quad processor) running Mac OS X

The benchmarks were:
  • Stage 2 bootstrap - this involves compiling all words, building the online help search index, and loading the UI backend.
  • Fibonacci - this tests procedure activation overhead and integer math.
  • Mandelbrot - this tests complex number arithmetic.
  • Raytracer - this tests floating point arithmetic.
  • Evaluating [ tests compiler-tests ] time in the UI listener - the unit tests themselves include a lot of performance-intensive code, and this also measures how fast the UI can relayout and redraw, since the tests generate a lot of output.

The results were interesting. The best result in each row is shown in italics.
Pentium 4AMD64MacBook ProPowerMac G5
Bootstrap (mins:secs)4:564:404:386:19
Fibonacci (ms)360320470830
Mandelbrot (secs)
Raytracer (secs)18211842
Unit tests (secs)24222038

The results indicate there is a lot more work to do on the PowerPC compiler backend. The Core Duo's good performance is rather surprising.

Nasty bug in AMD64 compiler backend fixed

In Factor 0.82 I introduced a nasty bug which I have only just now managed to fix. It would manifest as random crashes when running code compiled with SSE2 float intrinsics. Without float intrinsics, everything would work fine.

I finally spent some time with gdb, fiddled with the disassembler and looked at memory dumps. Turns out the cause of the problem was that the inline float allocation intrinsic was compiling 32-bit instructions. So when depositing a boxed float in the heap, it would only write to the lower 32 bits of the header cell. If the high 32 bits contained garbage data, the resulting object would be corrupt. The thing is, regression testing did not catch this error, since I supposed in most cases inline float allocation was hitting memory which happened to be zeroed.

Sunday, July 09, 2006

Onyx programming language

I just discovered Onyx, another stack-based language.

Differential graded Hopf algebras in Factor

The number of people interested in both Algebraic Topology and Factor is probably exactly zero, however here goes anyway. If you are interested in both topics, please let me know, we have a lot to discuss.

I implemented a library for working with finitely-generated connected Hopf algebras over the complex numbers. The code can be found in the darcs repository. While Hopf algebras can be quite abstract, finitely-generated connected Hopf algebras can be realized as a tensor product of a polynomial algebra together with a exterior algebra.

Elements are added using h+, multiplied using /\ and printed using h.. The unit is 1 and the co-unit is co1. I have not implemented co-multiplication yet, since I'm not sure how to represent an element of the tensor product of two Hopf algebras.

One starts by defining some generators, and their degrees:
  SYMBOLS: a b c u ;
1 a deg=
1 b deg=
1 c deg=
2 u deg=

Every Hopf algebra embeds a copy of the ground field:
  3 4 h+ h.
10 2 /\ h.

Even degree generators behave like polynomial indeterminites:
  u u /\ 3 /\ u 4 /\ h+ h.
4u + 3u/\u

Odd degree elements "anti-commute:"
  b a /\ h.
- a/\b

You can also define a differential -- this is of most interest to me personally, since my masters thesis will most likely focus on cohomology of Lie algebras.

Suppose we want the derivative of u to be abc:
  a b /\ c /\ u d=

The derivative of a product is evaluated using the Liebnitz rule (d(u/\v)=du/\v + (-1)^deg(v) u/\dv).

Now we can differentiate some expressions:
  u u /\ d h.

An expression whose derivative is zero is known as a cycle:
  u a /\ d h.

An element in the range of the differential is known as a boundary. It is a theorem that all boundaries are cycles; that is, applying d twice yields zero:
  u u /\ d d h.

New release of IsForth

Mark Manning has released a new version of IsForth, a Forth system for Linux with a nice ncurses-based user interface.

Friday, July 07, 2006

Big cleanup of Factor's C sources

Taking some cues from SBCL, I did a big cleanup of Factor's C runtime. I eliminated the majority of the #ifdefs in favor of having files like os-unix.c, cpu-x86.h, cpu-ppc.h, and so on; now all the preprocessor conditionals are concentrated in a single header file. I reorganized the sources to be more logical. I cleaned up the headers. I also simplified the build system; instead of one huge Makefile there are configuration files for each build target, such as Config.linux.ppc, Config.macosx, and so on. You can browse the code.

Thursday, July 06, 2006

Factor running on Intel Macs

After a non-trivial porting effort, Factor can now run on Intel Mac OS. Here is a screenshot.

I encountered a number of issues:

  • The mach exception handler code was PowerPC-specific. It needed some changes to work in i386, among them a non-trivial one which took me a while track down: when the exception handler thread receives a mach exception, it changes the program counter of the main Factor thread to the signal handler. However on x86, we also need to align the stack on a 16 byte boundary, since there is no guarantee this will be so when the exception occurs.

  • The ABI is different. As I said above, the stack must be aligned before every procedure call, so I had to implement a new set of compiler backend operations to support Intel Macs.

  • Getting the Objective C bridge working was the toughest of all. On PowerPC, any message returning a struct must be called with objc_msgSend_stret. On Intel, structs 8 bytes and smaller are returned in registers. This took a while to pinpoint, since it would manifest as memory corruption and random lossage. Also, when calling a message which returns a struct larger than 8 bytes, Apple's convention is that the callee drops one of the arguments from the C stack. This is completely absurd. It took me ages to figure out what's going on; I finally caught on after studying the disassembly of several Cocoa methods and noticing that the last opcode was RET 4 and not RET.

There are two issues I still have not fixed:

  • While calling Objective C methods which return structs now works, methods defined in Factor which return structs do not. There are extra complications to handle here, all because of Apple's funky ABI.

  • The mach exception handler does not report the faulting address anymore, so instead of "Data stack underflow" or similar you see a generic "Operating System Signal 11" error. I'm not sure what's going on here, since I barely understand the Mach code (I snarfed it from OpenMCL) and everything works fine on PowerPC.

Neither of these issues are major, so I'll probably release 0.83 with the Mac Intel port as-is, and fix the issues later.

In general, I hate how cruddy and convoluted the whole infrastructure around C is. Mac OS X on Intel seems particularly bad; I have no idea why Apple decided to screw up struct returns and make everything ten times more complicated than it has to be. And of course Mach continues to suck like it always has and always will.

On a final note, it would be nice if somebody volunteered to do Mac Intel binary packages for Factor. The process is trivial (make clean macosx macosx.dmg) and I do not actually own an Intel Mac, I just borrowed it for the purposes of doing the port.

Wednesday, July 05, 2006

Bug in Intel Core Duo CPU?

Check it out. The following test case prints "The result is 2" on a MacBook Pro (Intel Core Duo) but "The result is 1" on a Pentium 4:

int main(char **argv, int argc)
int result;

__asm__("movl $0x10000000,%%eax; shl $3,%%eax; jno x; movl $1,%0; jmp end; x: movl $2,%0; end:"

printf("The result is %d\n",result);

The SHL instruction is not setting the overflow flag on the Core Duo.

Update: Andy Hefner in #lisp says "overflow flag is undefined if the shift count is not equal to 1". So its not a bug after all.

Embedded scripting in HTML pages

I finally put two and two together, and integrated Alex Chapman's contrib/embedded.factor with the HTTP server. Now when serving any file with the .fhtml suffix, the HTTP server preprocesses it to evaluate blocks of Factor code delimited by <% and %>.

You can include other pages relative to the document root of the current site by calling include-page:
<% "templates/foo.html" include-page %>

Don't forget to put this near the top of your .fhtml file though:
<% USE: file-responder %>

Monday, July 03, 2006

Working on Mac Intel support

Factor 0.83 will be able to run natively on Intel Macs. So far, I've gotten the runtime to compile. This required changing some PowerPC exception handling code. Also, for whatever reason Factor would crash with an illegal instruction error until I changed the retain stack pointer register from EBX to ESI. I'm not sure why EBX cannot be used for this purpose on Mac OS X.

I'm in the process of updating the compiler backend to support the Mac OS X Intel ABI. This requires setting up stack frames in much the same way as the PowerPC and AMD64 backends already do; so a lot of code can be reused. During bootstrap, the x86 image will load the appropriate ABI support depending on what OS it is running on.