Saturday, September 30, 2006

Cat programming language 0.7.1

Christopher Diggins has released Cat 0.7.1. Cat is a statically-typed stack language implemented in C#. This release adds working type inference.

Annoying CS PhD students

Every few weeks, I receive an e-mail much like the following:
I came across the jEdit project at

I am a doctoral student at the Department of Computer Science, Florida
State University. As a part of my dissertation, I have developed a set
of metrics and a mechanism to track the effects of changing
requirements on software design - we already have several
publications on these ideas. We are looking to apply the mechanism as
a case study on your project. In addition to the material available at
SourceForge could you provide the requirement and design documentation
to help us understand the system better ?

If this was a one-time thing, it wouldn't bother me at all. But it seems that large numbers of PhD students simply spam the entire SourceForge web site with queries much like the following:
  • Can I have design documentation?
  • Does your team use XP, Agile, SOA methodology?
  • Can you please fill out this survey?
  • etc

I will not write your thesis for you. Tell your supervisor that spamming open source projects is not the right way to do research. If you want design documentation, read the fucking source code, ask questions about how it works if you need to, but don't expect me to do your work for you.

Or at least, don't tell me that you're wasting 4 years of your life doing a PhD in computer science.

Thursday, September 28, 2006

Code GC works

Mark and sweep GC is not very hard. The only thing that remains to be done is to automatically invoke code GC when the code heap is full. Presently there is a code-gc word one must call.

Sunday, September 24, 2006

Code GC

I've started work on a garbage collector for the compiled code heap. A code block is eligible for reclamation when the word owning the code block is no longer referenced in the data heap, and when no other compiled code blocks reference the code block. So if you redefine a compiled word, or forget a word, the old compiled block will be reclaimed.

Code GC will use the mark and sweep algorithm, without moving the code blocks around, in contrast to the generational copying GC used for data. The reason code cannot be moved around reliably is callbacks -- C code may hold on to pointers to Factor code. The heap will be compacted when the image is saved.

Of course if a callback refers to a word which has been GC'd, your program will crash if it attempts to invoke the callback. The solution is "don't do it, then" -- if a callback is actively held on to by C code, the consequences of forgeting or redefining it are undefined.

Friday, September 22, 2006

JNI bridge

Chris Double is working on a JNI bridge for Factor. So far it only provides wrappers for JNI library functions, but a high-level interface will follow. This means Factor can now interface with three other languages directly:
  • C
  • Objective C
  • Java

Wednesday, September 20, 2006

Long computation

It took Factor 10 and a half hours to row reduce a matrix with 24 million entries, using exact rational math. The algorithm I use pretty naive, see this file.

Monday, September 18, 2006

Cat programming language 0.6

Statically-typed stack-based languages are few and far in between. In fact the only two I know of are:
  • StrongForth - based on Forth
  • Cat - based on Joy, and not really fully statically typed yet

A new version of Cat was released recently. It would be interesting to see these concepts ported to Factor in the form of a soft type checker, along the lines of MZScheme's MzFlow. Factor's stack effect checker is already most of the way there, and non-core libraries in contrib can extend it in various ways.

Saturday, September 16, 2006

Adding a new compiler optimization

I noticed that after inlining, the following type of code was coming up in a few words:
dup array? [ dup array? [ ... ] [ ... ] if ] [ ... ] if
That is, a value being tested against a type twice. Implementing a compiler optimization to deal with this was easy:
: known-type? ( node -- ? )
0 node-class# types length 1 number= ;

: fold-known-type ( node -- node )
dup 0 node-class# types first 1array inline-literals ;

\ type [
{ [ dup known-type? ] [ fold-known-type ] }
] define-optimizers

Faster I/O performance on Unix

The time to read a 606Mb file, 64 bytes at a time using the read word, went down from 458 seconds to 45 seconds after I made a few improvements to the Unix I/O code.

I've been doing a bit of work on I/O performance ever since the Factor I/O performance sucks entry. I expect another big speedup to arrive when I implement faster generic method dispatch, and compiled continuations. Hopefully by then Factor's I/O performance will be competitive with other languages.

Wednesday, September 13, 2006

Warning: User mode Linux mmap() is broken

On my linode, this program hangs instead of dying with a segfault:
#include <sys/mman.h>
#include <stdlib.h>

int main()
int pagesize = getpagesize();

int *array = mmap((void*)0,pagesize,



This is a serious issue. It causes Factor to hang on stack underflow if the underflow was caused by a memory read (eg, dup with an empty stack).

Of course I don't use stack underflow errors for control flow, but it makes debugging on the server rather annoying. Perhaps its time to find another hosting provider.

Tuesday, September 12, 2006

Patterns in programming languages

Another critique of "design patterns".
Patterns are signs of weakness in programming languages.

When we identify and document one, that should not be the end of the story. Rather, we should have the long-term goal of trying to understand how to improve the language so that the pattern becomes invisible or unnecessary.

Monday, September 11, 2006

Strongtalk is now open source

Strongtalk is a very fast Smalltalk implementation with an interactive environment and optional static typing. It only ran on Windows. Sun acquired the company in the 90's, used the technology to implement the Java HotSpot JIT and killed Strongtalk. But now it is open source and hopefully can be maintained again, or at least inspire other language implementations.

Sunday, September 10, 2006

Visualizing stack code

I implemented a nifty hack today:

It still looks ugly (both in presentation and code) and needs some tweaks.

Friday, September 08, 2006

TextMate integration

Benjamin Pollack created a TextMate bundle together with a Factor editor hook for TextMate. His work is in contrib/textmate now.

Wednesday, September 06, 2006

Better development workflow

Previously when you made changes to source files you were working on, you had to run-file them all by hand. There was also the reload word, which saved some keystrokes because it would reload the source file containing a definition. But now there's a new way which beats both.

Factor now tracks the modification time of every source file in every loaded module. So you can change source files in a loaded module to your heart's content, and then invoke reload-modules in the listener, or even better, just press F7 in the UI. This reloads all changed source files, in the correct order.

Tuesday, September 05, 2006

Serializable space invaders

Chris Double has got the serialization code to the point where it can be used to transport UI gadgets between Factor instances. He managed to serialize a running space invaders instance, and another participant on IRC got it running on their machine. Cool stuff!

As for me, I've started working on 0.85 with a few mundane cleanups and bug fixes. Nothing I do is cool anymore :)

On learning

Mike Vanier writes:
As somebody who loves to learn new programming languages and paradigms, I hate to admit this, but one of the biggest reasons bad languages persist is that most people hate learning new programming languages. They would rather stick to a shitty language that they more-or-less understand than take the one month or so to become familiar with a new language. This is an example of a very, very general phenomenon both in computing and elsewhere, which is that nobody ever wants to learn anything new.

Monday, September 04, 2006

Interesting weblog

A blogger calling himself "sigfpe" writes a fascinating weblog about mathematics and Haskell. I like his implementation of the Clifford algebras.

Sunday, September 03, 2006

Factor 0.84

Factor 0.84 is now available. Check out the change log; it is rather extensive.

Is American high school education really that bad?

Some of the responses here made me chuckle:

  • What is the number less than one closest in value to one?
  • I haven't actually read the proof, but I'm not convinced by it. 0.999... is definitely smaller than 1, or it would be called 1.

    I believe it is just common sense.

Illiteracy might no longer be a problem in the developed world, but innumeracy is rampant. :-)

Friday, September 01, 2006

Recent UI changes

I implemented a CLIM-like commands framework. There is now a keyboard help window (press F1) and a generic "operations" framework that allows commands for working with words to be applied to an input gadget, for example, which causes the word to be extracted from caret position. This is conceptually similar to CLIM presentation translators.

Also, the UI now builds all tools into a single workspace window. Multiple workspaces can be opened.

Here is a rather large screenshot.

Factor is now three years old

That's right, and if you read the blog post from a year ago, you'll see Factor has made a lot of progress.

Just for kicks I downloaded Factor 0.77. It took 34 minutes to bootstrap on my x86 machine. Current Factor releases bootstrap in 2 minutes 30 seconds on that machine. Not only has the performance of the compiler improved drastically, but the amount of optimizations it does -- and not to mention the volume of code being compiled -- has increased too.

A quick overview of just some of the major improvements in the last year:
  • The UI has improved a lot: OpenGL rendering, multi-window support, browser, graphical single stepper, etc
  • Hypertext online help, full text search
  • AMD64 port
  • Intel Mac port
  • Objective C library interface
  • Callbacks from C to Factor
  • Restartable exceptions
  • Formal stack effect declarations

In my Factor is two years old post, I gave some line number counts for Factor at the time:
  • Factor runtime, written in C: 7326 lines
  • Factor library, written in Factor: 17591 lines
  • Unit test suite, written in Factor: 4160 lines
  • Contributed code, written in Factor: 6598 lines

Here are the stats as of today:
  • Documentation, written in a Factor DSL: 94347 words
  • Factor runtime, written in C: 8261 lines
  • Factor library, written in Factor: 29555 lines
  • Unit test suite, written in Factor: 4772 lines
  • Contributed code, written in Factor: 24737 lines

Good to see the contributed section growing fastest of all. I hope the core library doesn't get too large, and that a year from now I can look back and say that Factor has again made a lot of progress.