Monday, June 26, 2006

Factor web site now running latest HTTP server code

The Factor web site was previously served by a pre-release Factor 0.81 snapshot, with various random bug fixes "patched in" live by me copy and pasting the relevant colon definitions in the listener. I finally got around to updating the server to run Factor 0.83.

You will notice an increase in the documentation available for browsing. HTML formatting of Factor markup is still not ideal, however I am working on it.

I hope this new version is as stable as the previous one was. For a long time I was having I/O bugs, but in the last few weeks the 0.81 code did not crash at all. Lets hope there are no regressions in 0.83.

Sunday, June 18, 2006

Module system

I got sick of having to load dependencies in the right order when running stuff in contrib/, so I coded a simple module system.

You can load a module and all dependencies in the listener using the REQUIRE: parsing word:
REQUIRE: crypto

This searches for the following files:
contrib/crypto.factor
contrib/crypto/load.factor

A module definition looks like so:
REQUIRES: math ;

PROVIDE: crypto {
"common.factor"
"base64.factor"
"barrett.factor"
"montgomery.factor"
"random.factor"
"miller-rabin.factor"

! Rngs
"blum-blum-shub.factor"

! Hash
"crc32.factor"
"md5.factor"
"sha1.factor"

! Block ciphers
"rc4.factor"

! Public key
"rsa.factor"

} ;

The REQUIRES: parsing word lists required modules which are loaded first if they are not already loaded, and PROVIDE: defines a module along with its consitutuent files, which are then loaded.

Note that REQUIRE: takes one module name and is designed for interactive use, because it recompiles all words after the module has been loaded. REQUIRES: does no such thing, and takes multiple module names; it is designed for use in a load.factor file.

You can force a module to reload during development:
"x11" reload-module

Another advanced feature is that PROVIDE: can actually take a second list of files, which define unit tests:
PROVIDE: concurrency
{ "concurrency.factor" }
{ "concurrency-examples.factor" "concurrency-tests.factor" } ;

Tests can then be run with "concurrency" test-module or just test-modules to test every loaded module.

Future improvements that I plan to implement is using the module system to organize the core library (the list of files in boot-stage1.factor is getting rather long), and an UI tool for browsing loaded modules and loading new modules.

In the long term, the number of contributed libraries may or may not grow to a point where we can switch to a decentralized system, perhaps using a Wiki to store module meta-data and download URLs, together with an automatic downloader tool in Factor (along the lines of ASDF-Install for Common Lisp).

Factor 0.83 looks like a really nice release. I'm disappointed that there won't be a Windows package this time round -- Doug Coleman is on vacation. If you are interested in seeing Factor run better on Windows, please contribute!

Cat programming language

Christopher Diggins has been working on a new stack-based language he calls Cat. Cat borrows heavily from Joy, and is implemented in C#.

Saturday, June 17, 2006

Factory window manager

I've written about the Factory window before. It has been improving at a steady rate.

Here is a screenshot showing Factory running in Factor 0.83.

Levenshtein edit distance algorithm in Factor

Here is a Factor implementation of the Levenshtein edit distance algorithm:
! Copyright (C) 2006 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: arrays kernel math namespaces sequences test ;
IN: levenshtein

: <matrix> ( m n -- matrix )
[ drop 0 <array> ] map-with ; inline

: matrix-> nth nth ; inline
: ->matrix nth set-nth ; inline

SYMBOL: d

: ->d ( n i j -- ) d get ->matrix ; inline
: d-> ( i j -- n ) d get matrix-> ; inline

SYMBOL: costs

: init-d ( str1 str2 -- )
[ length 1+ ] 2apply 2dup >matrix> d set
[ 0 over ->d ] each
[ dup 0 ->d ] each ; inline

: compute-costs ( str1 str2 -- )
>array [
swap >array [ = 0 1 ? ] map-with
] map-with costs set ; inline

: levenshtein-step ( i j -- )
[ 1+ d-> 1+ ] 2keep
[ >r 1+ r> d-> 1+ ] 2keep
[ d-> ] 2keep
[ costs get matrix-> + min min ] 2keep
>r 1+ r> 1+ ->d ; inline

: levenshtein-result ( -- n ) d get peek peek ; inline

: levenshtein ( str1 str2 -- n )
[
2dup init-d
2dup compute-costs
[ length ] 2apply [
swap [ swap levenshtein-step ] each-with
] each-with
levenshtein-result
] with-scope ;

[ 3 ] [ "sitting" "kitten" levenshtein ] unit-test
[ 3 ] [ "kitten" "sitting" levenshtein ] unit-test
[ 1 ] [ "freshpak" "freshpack" levenshtein ] unit-test
[ 1 ] [ "freshpack" "freshpak" levenshtein ] unit-test

I've decided not to use it in the online help search code at this stage, since it yields a lot of false positives so I'd need to implement a better ranking system first. So for now I'm just putting it in examples/.

Thursday, June 15, 2006

Full text search in help now works

This news is a couple of days old, but I should mention that Factor's online help now supports full text search. At the moment, it performs exact matches after stemming, and uses a simple ranking algorithm to rank results based on the number of search terms that appear, with a special bonus given to terms appearing in the title. In the next round of improvements, I will investigate better ranking algorithms, and implement Levenshtein edit distance matching. For now this is good enough, and I'm going to focus on bug fixes as I get ready to release Factor 0.83.

Wednesday, June 14, 2006

GC bug

I found and fixed a bug in the garbage collector today. It seems to have been introduced when quotations became arrays. I triggered it by following the following steps:
  • Running 'tests' in the UI
  • Redefining a core word
  • Recompiling everything
  • Invoking full-gc

At this point the runtime crashed with a memory corruption error. Further investigation revealed that some code was holding on to an address which appears to have been moved by the GC. Several hours later, I uncovered the suspect code.

When a callback is called, the current interpreter state is saved in a stack_state struct, and these structures are chained. This includes a copy of all stacks, and the currently executing quotation. This is because each nested callback runs with its own isolated data and call stacks. When leaving the callback, the topmost entry in the linked list is removed, and the saved state is restored. The top-level stack_state struct does not contain a valid saved current quotation field, since there was no current quotation when the top level object was created. So the garbage collector would not consider the saved quotation there a valid root, and would not copy this object. However, the test for this case was wrong; it would ignore the saved current quotation in the first stack_state, and not the last. The result was that if callbacks were never used, then the first and last elements of the linked list would coincide and there would be no problem. However when the garbage collector was invoked from within a nested callback, the correct pointer would not be updated, and thus the interpreter would continue executing at an address which was not even valid anymore.

Sunday, June 11, 2006

Full text search in help

I'm working on implementing a search engine for the Factor documentation. As a first step, I wrote a Factor version of the Porter stemmer algorithm, using the CL version as a reference point. Personally, I find the Factor code clearer than all the other versions on that page.

Sunday, June 04, 2006

Nearly All Binary Searches and Mergesorts are Broken

The naive implementation of a binary search in a language with machine arithmetic (C, Java, ...) is prone to an overflow error. You can read about it at Joshua Bloch's weblog. Bloch writes:
"The binary-search bug applies equally to mergesort, and to other divide-and-conquer algorithms. If you have any code that implements one of these algorithms, fix it now before it blows up. The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code."

It is even harder to write correct code if your language is broken. It is sad that three decades(?) after the "number tower" became a common fixture of Lisp systems, we still have programmers struggling with integer overflow issues.

Why is Groovy is big?

Sorry to beat a dead horse, but I remember several months ago somebody in #concatenative (eiz, perhaps) mentioning that Groovy has reached 120,000 lines of Java code... they asked how can a scripting language get so big, especially as it doesn't provide its own runtime services and runs on the JVM? Well if you look at the Groovy CVS today, you'll see it has in fact managed to grow to 300,000 lines.

In comparison, Factor consists of 8400 lines of C and 29,000 lines of Factor. This includes a lot of code that a JVM-based language does not have to implement, such as platform-specific GUI bindings, native code generation, and the UI.

I tend to think the majority of code people write is overly complicated, full of redundancy, and designed for such flexibility that in practice is not needed at all. I hope one day this trend reverses.

Saturday, June 03, 2006

Starting copy and paste support

I implemented some simple clipboard handling code for X11 and Cocoa. So far you can paste text from other applications, but you cannot copy text yet. This will wait for selection support in the UI -- I not only want text inside editor gadgets to be selectable, but all text in the UI, including pane output.