Monday, November 12, 2007

Improved process launcher

The io.launcher vocabulary now supports passing environment variables to the child process.

Launching processes is a complex and inherently OS-specific task. For example, until version 1.5, Java had a bunch of methods in the Runtime class for this, each one taking a different set of arguments:
Process exec(String command)
Process exec(String[] cmdarray)
Process exec(String[] cmdarray, String[] envp)
Process exec(String[] cmdarray, String[] envp, File dir)
Process exec(String command, String[] envp)
Process exec(String command, String[] envp, File dir)

The fundamental limitation here was that it would always launch processes with stdin/stdout rebound to a new pipe; so there was no easy way to launch the process to read/write the JVM's stdin/stdout file descriptors, you had to spawn two threads to copy streams. In 1.5, they added a new ProcessBuilder class, but it doesn't seem to offer any new functionality except for merging stdout with stderr. It still always opens pipes.

The approach I decided to take in Factor is similar to the ProcessBuilder approach, but more lightweight. The run-process word takes one of the following types:
  • A string -- in which case it is passed to the system shell as a single command line
  • A sequence of strings -- which becomes the command line itself
  • An association -- a "process descriptor", containing keys from the below set.

The latter is the most general. The allowed keys are all symbols in the io.launcher vocabulary:
  • +command+ -- the command to spawn, a string, or f
  • +arguments+ -- the command to spawn, a sequence of strings, or f. Only one of +command+ and +arguments+ can be specified.
  • +detached+ -- t or f. If t, run-process won't wait for completion.
  • +environment+ - an assoc of additional environment variables to pass along.
  • +environment-mode+ - one of prepend-environment, append-environment, or replace-environment. This value controls the function used to combine the current environment with the value of +environment+ before passing it to the child process.

While run-process spawns a process which inherits the current one's stdin/stdout, <process-stream> spawns a process reading and writing on a pipe.

Idiomatic usage of run-process uses make-assoc to build the assoc:
{ "ls" "/etc" } +arguments+ set
H{ { "PATH" "/bin:/sbin" } } +environment+ set
] H{ } make-assoc run-process

I'm happy with the new interface. With two words it achieves more than Java's process launcher API, which consists of a number of methods together with a class.

Not only is the interface simple but so is the implementation.

The run-process first converts strings and arrays into full-fledged descriptors, then calls run-process* which is an OS-specific hook.

In the implementation of this hook, the fact that any assoc can be treated as a dynamically scoped namespace really into play. The io.launcher implementation has a pair of words:
: default-descriptor
{ +command+ f }
{ +arguments+ f }
{ +detached+ f }
{ +environment+ H{ } }
{ +environment-mode+ append-environment }
} ;

: with-descriptor ( desc quot -- )
default-descriptor [ >r clone r> bind ] bind ; inline

Passing a descriptor and a quotation to with-descriptor calls it in a scope where the various +foo+ keys can be read, assuming their default values if they're not explicitly set in the descriptor.

So, if we look at the Unix implementation for example,
M: unix-io run-process* ( desc -- )
+detached+ get [
] [
spawn-process wait-for-process
] if
] with-descriptor ;

It calls various words, all of which simply use get to read a dynamically scoped variable. These are resolved in the namespace set up by with-descriptor. For example, there is a word
: get-environment ( -- env )
+environment+ get
+environment-mode+ get {
{ prepend-environment [ os-envs union ] }
{ append-environment [ os-envs swap union ] }
{ replace-environment [ ] }
} case ;

It retrieves the environment set by the user, together with the environment of the current process, and composes them according to the function in +environment-mode+. There is no need to pass anything around on the stack; any word can call get-environment from inside a with-descriptor.

Notice that constructing a namespace with make-assoc, and then passing it to a word which binds to this namespace and builds some kind of object, is similar to the "builder design pattern" in OOP. However, it is simpler. Instead of defining a new data type with methods, we essentially just construct a namespace then run code in this namespace.

1 comment:

stesch said...

Couldn't compile it on Linux x86.32 because of a missing definition of environ. environ is defined in unistd.h when __USE_GNU is set. To define __USE_GNU, _GNU_SOURCE has to be defined:

$ git diff vm/Config.linux
diff --git a/vm/Config.linux b/vm/Config.linux
index 5f83cc8..6ebc493 100644
--- a/vm/Config.linux
+++ b/vm/Config.linux
@@ -1,4 +1,4 @@
include vm/Config.unix
PLAF_DLL_OBJS += vm/os-genunix.o vm/os-linux.o
-CFLAGS += -export-dynamic
+CFLAGS += -export-dynamic -D_GNU_SOURCE
LIBS = -ldl -lm $(X11_UI_LIBS)