Wednesday, July 25, 2007

Smoother stand-alone app deployment

I polished the deployment tool a bit. Now all it takes is a vocab name:
USE: deploy.app
"lsys.ui" deploy.app

Each vocabulary has a deployment configuration. The configuration can be inspected and set with deploy-config and set-deploy-config:
( scratchpad ) "lsys.ui" deploy-config .
V{
{ strip-word-props? t }
{ strip-word-names? t }
{ strip-dictionary? t }
{ strip-debugger? t }
{ strip-c-types? t }
{ deploy-math? t }
{ deploy-compiled? t }
{ deploy-io? f }
{ deploy-ui? t }
{ "bundle-name" "Lindenmayer Systems.app" }
}

You can also change the configuration:
f strip-word-names? "lsys.ui" set-deploy-flag

The configuration is stored in a deploy.factor file of the vocabulary directory.

Deployment has been tested to work with the following vocabs; they all include sensible deploy configs now:
  • automata.ui
  • boids.ui
  • bunny
  • color-picker
  • gesture-logger
  • golden-section
  • hello-ui
  • lsys.ui
  • maze
  • nehe
  • tetris

To deploy your own app, you should only have to set two flags:
  • deploy-ui? - set this to true if your app uses the UI
  • deploy-io? - set this to true if you want non-blocking I/O or sockets

Specialized float arrays

I added float arrays to Factor. A float array behaves like an array of floats, except the representation is more efficient; individual elements are not boxed.

Literal float arrays look like F{ 0.64 0.85 0.43 0.16 0.37 }.

Float arrays can provide a performance benefit if the compiler is able to infer enough information to unbox float values. For example, consider the following word:
: v+ [ + ] 2map ;

It takes two arrays and adds elements pairwise. Let's try timing the performance of this word with normal arrays:
( scratchpad ) { 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3200 ms run / 25 ms GC time

Now float arrays:
( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3653 ms run / 70 ms GC time

It is actually slower! This is because each element access has to allocate a new float on the heap. But now, lets use the new hints vocabulary to give a hint to the compiler that v+ should be optimized for float arrays:
HINTS: v+ float-array float-array ;

This has the effect of compiling a version of this word specialized to float arrays. Here, the compiler can work some magic and eliminate boxing altogether:
( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
974 ms run / 10 ms GC time

I used float arrays to make the spectral norm benchmark faster: the run time went from 120 seconds to 30 seconds. The raytracer was not improved by float arrays, though; I need to investigate why.

Also float array operations are only compiled efficiently on PowerPC right now. I need to code some new assembly intrinsics for the other platforms.

Float arrays can be passed to C functions directly. Long-term, somebody should look into using SSE2 and AltiVec to optimize vector operations on float arrays. That would really rock.

Sunday, July 22, 2007

execve() returning ENOTSUP on Mac OS X

Apple's man page for execve() does not list ENOTSUP as a possible error code. This problem had me stumped for hours. After I was convinced I wasn't doing anything wrong, and there was no bug with the Factor FFI, I did some Googling and finally stumbled across this.

Turns out execve() returns this error if your program is multi-threaded. Makes perfect sense, but I wish Apple had kept the man page up to date!

In any case, I'm posting it here so that any other poor sucker who runs into this can find the solution with Google in less time than I did.

Saturday, July 21, 2007

Support for IPv6 and Unix domain sockets

Some very new code, not completely tested.

The <client> and <server> words used to take a host/port pair, and a port number, respectively. This was too limiting. Now both words take address specifiers, of which are there four types:
  • "/path/to/socket" <local> - an IPC socket with the given path (Unix domain sockets on Unix)
  • "dotted.quad" 123 <inet4> - an IPv4 address
  • "ipv6.addr" 1234 <inet6> - an IPv6 address
  • "hostname" "http" <inet> - Internet host named by DNS entry; will be either IPv4 or IPv6

Note that in most cases, <client> will be called with an instance of inet; this invokes the domain name resolver, which may produce a list of multiple IPv4 and IPv6 addresses. Factor tries each one in turn until a connection succeeds. This is the expected behavior for client sockets, since users generally input host names and not IP addresses, and don't care if the connection is made over IPv4 or IPv6.

The <server> word requires an instance of the more specific inet4 and inet6 classes. Since in most cases, a server interested in connections on, say, port 8080 wants to receive connections over both IPv4 and IPv6, the with-server combinator should be used instead of calling <server> directly. Here's a usage example which waits for connections on port 8080 and sends "Hello, client" to each one:
8080 internet-server [ "Hello, client." print ] with-server

If your system is configured for IPv6, this example will spawn two threads, for IPv4 and IPv6 connections.

Server sockets can now be restricted to the loopback network interface by passing 8080 local-server instead of 8080 internet-server. Before this was provided by the Unix-only loopback-server module; this module is now obsolete.

The domain name resolver can now be invoked directly with the resolve-host word. Given a host name and port name or number, it returns a list of address specifiers.

UDP/IP works as before; words which used to take host/port pairs now take address specifiers. Note that the inet address specifier is not supported for UDP/IP; clients must first resolve the destination host name, and pick the appropriate address specifier, whether it be an IPv4 or IPv6 address, from the result.

Unix domain sockets are accessed with the local address specifier.

Stream Unix domain sockets are very commonly used, but a little-known fact is that datagram Unix domain sockets are supported too. Here is some proof.

Enter this in one listener:
"/tmp/dgram-dst" <local> <datagram>
dup receive
stream-close

Then this in another:
"/tmp/dgram-src" <local> <datagram>
B{ 1 2 3 4 5 } "/tmp/dgram-dst" <local> pick send
stream-close

The first machine should now have the following two objects on the stack:
B{ 1 2 3 4 }
T{ local f "/tmp/dgram-src" }

I still have to fix assorted bugs in this code. Also Windows network support is lagging; not only does it have to be updated to support address specifiers, but it is also missing UDP hooks at this stage.

I won't be adding more features to networking for a little while, but the next thing I will do is probably raw sockets.

Thursday, July 19, 2007

Generating stand-alone Mac OS X applications from Factor

This is all subject to change, but right now, the following works:
USE: tools.deploy
USE: tools.deploy.app
"Tetris.app" "tetris" H{ { deploy-compiled? t } { deploy-ui? t } } deploy.app

After a rather slow bootstrap and deploy process, you're left with Tetris.app, 2.2 Mb in size. Of this, 1.3 Mb is the FreeType font rendering library; soon, Factor will use native font rendering on Mac OS X, thus we won't have to ship FreeType anymore.

A lot remains to be done: we need a graphical way to configure the deployment tool (there are a lot of switches which can be toggled in the hashtable passed to deploy.app), and there needs to be an automated way of packaging required resources and native libraries with the app. And of course, deployment of native Linux and Windows binaries is a whole other kettle of fish. But over time, all of this will be implemented.