Monday, May 05, 2008

I/O changes, and process pipeline support

I made some improvements to the I/O system today.

Default stream variables


The stdio variable has been replaced by input-stream and output-stream, and there are four new words:
  • with-input-stream
  • with-output-stream
  • with-input-stream*
  • with-output-stream*

The first two close the stream after, the latter do not. The with-stream and with-stream* words are still around, they expect a duplex stream, unpack it, and bind both variables.

I've changed many usages of with-stream to use one of the unidirectional variants instead. This means that you can now write code like the following:
"foo.txt" utf8 [
"blah.txt" utf8 [
... read from the first file, write to the second file,
using read and write ...
] with-file-writer
] with-file-reader

Before you had to use this really ugly "design pattern":
"foo.txt" utf8 <file-reader> [
"blah.txt" utf8 <file-writer> [
<duplex-stream> [
...
] with-stream
] with-disposal
] with-disposal

Speaking of duplex streams, because they're not used by anything in the core anymore I have moved them to extra. They are still used by <process-stream> and <client>. I added a <process-reader> and <process-writer> word for those cases where you only want a pipe in one direction; they express intention better.

Pipes


The <process-stream> word has been around for a while, and this word used pipes internally, but they were not exposed in a nice, cross-platform way, until now.

The io.pipes vocabulary contains two words:
  • <pipe> creates a new pipe and wraps a pair of streams around it. The streams are packaged into a single duplex stream; any data written to the stream can be read back from the same stream (presumably, in a different thread). This is actually implemented with native pipes on Unix and Windows.
  • The run-pipeline word takes a sequence of quotations or launch descriptors, and runs them all in parallel with input wired up as if it were a Unix shell pipe. For example,
    { "cat foo.txt" "grep x" "sort" "uniq" } run-pipeline

    Corresponds to the following shell command:
    cat foo.txt | grep x | sort | uniq

    In addition, being able to place process objects and quotations in the pipeline gives you a lot of expressive power.

Appending process output to files


The io.launcher vocabulary supports the full array of input and output redirection features, and now, pipelines. There was one missing component: redirecting process output to a file opened for appending. Now this is possible. The following Factor code:
<process>
"do-stuff" >>command
"log.txt" <appender> >>stderr
run-process

Corresponds to this shell command:
do-stuff 2>> do-stuff.txt

Of course, shell script is a DSL for launching processes so it is more concise than Factor. However, Factor's io.launcher library now supports all of the features that the shell does, and its pretty easy to build a shell command parser using Chris Double's PEG library, which translates shell commands to sequences of Factor process descriptors in a pipeline.

And now, I present a concise illustration of the difference between the Unix philosophy and the Windows philosophy. Here we have two pieces of code, which do the exact same thing: create a new pipe, open both ends, return a pair of handles.

Unix:
USING: system alien.c-types kernel unix math sequences
qualified io.unix.backend io.nonblocking ;
IN: io.unix.pipes
QUALIFIED: io.pipes

M: unix io.pipes:(pipe) ( -- pair )
2 "int" <c-array>
dup pipe io-error
2 c-int-array> first2
[ [ init-handle ] bi@ ] [ io.pipes:pipe boa ] 2bi ;


Windows:
USING: alien alien.c-types arrays destructors io io.windows libc
windows.types math.bitfields windows.kernel32 windows namespaces
kernel sequences windows.errors assocs math.parser system random
combinators accessors io.pipes io.nonblocking ;
IN: io.windows.nt.pipes

: create-named-pipe ( name -- handle )
{ PIPE_ACCESS_INBOUND FILE_FLAG_OVERLAPPED } flags
PIPE_TYPE_BYTE
1
4096
4096
0
security-attributes-inherit
CreateNamedPipe
dup win32-error=0/f
dup add-completion
f <win32-file> ;

: open-other-end ( name -- handle )
GENERIC_WRITE
{ FILE_SHARE_READ FILE_SHARE_WRITE } flags
security-attributes-inherit
OPEN_EXISTING
FILE_FLAG_OVERLAPPED
f
CreateFile
dup win32-error=0/f
dup add-completion
f <win32-file> ;

: unique-pipe-name ( -- string )
[
"\\\\.\\pipe\\factor-" %
pipe counter #
"-" %
32 random-bits #
"-" %
millis #
] "" make ;

M: winnt (pipe) ( -- pipe )
[
unique-pipe-name
[ create-named-pipe dup close-later ]
[ open-other-end dup close-later ]
bi pipe boa
] with-destructors ;

The Windows API makes things difficult for no reason. Anonymous pipes do not support overlapped I/O, so you have to open a named pipe with a randomly-generated name (I'm not making this up, many other frameworks do the same thing and Microsoft even recommends this approach).

On the flipside, the nice thing about supporting both Unix and Windows is that I get to come up with true high-level abstractions that make sense, instead of being able to get away with having thin wrappers over Unix system calls as so many other language implementations do. For example, Ocaml's idea of "high-level I/O" is some basic POSIX bindings, together with incomplete emulation of Unix semantics on Windows. Look forward to writing a page of code if you want to map a file into memory or run a process and read its output into a string. And of course Java doesn't support pipes, I/O redirection for launching processes, or file system change monitoring, at all.

4 comments:

Anonymous said...

Process#get{Input,Output,Error}Stream() ?

Unknown said...

You probably mean

do-stuff 2>> log.txt

Anonymous said...

Is there a way to simulate popen()?

Slava Pestov said...

Anonymous: the <pipe> word does that.