Monday, November 12, 2007

CGI support in the Factor HTTP server

Now that io.launcher has the ability to pass environment variables to child processes, I was able to implement CGI support very easily. I realize CGI is slow and obsolete, however it is still handy sometimes, as a lowest common denominator for integrating third-party web apps. I intend to use it to run gitweb.cgi (a git repo browsing tool written in Perl) on factorcode.org.

The CGI support is loaded by default when one does USE: http.server, however to enable it the cgi-root variable must be set to a directory containing CGI scripts. This variable can either be set globally, per vhost, or per responder.

Files whose extension is .cgi are executed and the results are served to the client; other files are served literally.

The implementation of the CGI responder is quite straightforward. First, we have some boilerplate:
! Copyright (C) 2007 Slava Pestov.
! See http://factorcode.org/license.txt for BSD license.
USING: namespaces kernel assocs io.files combinators
arrays io.launcher io http.server http.server.responders
webapps.file sequences strings ;
IN: webapps.cgi

Now, we define the configurable variable:
SYMBOL: cgi-root

Next, we have a word to build the associative mapping of environment variables which we pass to the child process. The HTTP server calls responders after storing various values pertaining to the request in variables; our task here is to convert the HTTP server's representation of this data into a form which is expected by the CGI script. This word breaks the "keep words short" rule, but its definition is so straightforward that it is readable regardless, and breaking it up into smaller words would probably not help:
: post? "method" get "post" = ;

: cgi-variables ( name -- assoc )
[
"SCRIPT_NAME" set

"CGI/1.0" "GATEWAY_INTERFACE" set
"HTTP/1.0" "SERVER_PROTOCOL" set
"Factor " version append "SERVER_SOFTWARE" set
host "SERVER_NAME" set
"" "SERVER_PORT" set
"request" get "PATH_INFO" set
"request" get "PATH_TRANSLATED" set
"" "REMOTE_HOST" set
"" "REMOTE_ADDR" set
"" "AUTH_TYPE" set
"" "REMOTE_USER" set
"" "REMOTE_IDENT" set

"method" get >upper "REQUEST_METHOD" set
"raw-query" get "QUERY_STRING" set

"User-Agent" header-param "HTTP_USER_AGENT" set
"Accept" header-param "HTTP_ACCEPT" set

post? [
"Content-Type" header-param "CONTENT_TYPE" set
"raw-response" get length "CONTENT_LENGTH" set
] when
] H{ } make-assoc ;

Next, we have a word which constructs a launch descriptor for the CGI script. See my post from earlier today about io.launcher improvements to learn about descriptors, which are a new feature. This word uses the cgi-root variable together with the above cgi-variables word:
: cgi-descriptor ( name -- desc )
[
cgi-root get over path+ 1array +arguments+ set
cgi-variables +environment+ set
] H{ } make-assoc ;

Now, the word which does the hard work. It spawns the CGI script, sends it the user input if the request type is a POST, then copies output to the socket:
: (do-cgi) ( name -- )
"200 CGI output follows" response
stdio get swap cgi-descriptor &t;process-stream> [
post? [
"raw-response" get
stream-write stream-flush
] when
stdio get swap (stream-copy)
] with-stream ;

Files whose extension is not .cgi are simply served to the client; we have a little word which fakes a file responder:
: serve-regular-file ( -- )
cgi-root get "doc-root" [ file-responder ] with-variable ;

Now, the main dispatcher. First we weed out invalid inputs, pass files which don't end with .cgi to serve-regular-file, and send everything else to (do-cgi):
: do-cgi ( name -- )
{
{ [ dup ".cgi" tail? not ] [ drop serve-regular-file ] }
{ [ dup empty? ] [ "403 forbidden" httpd-error ] }
{ [ cgi-root get not ] [ "404 cgi-root not set" httpd-error ] }
{ [ ".." over subseq? ] [ "403 forbidden" httpd-error ] }
{ [ t ] [ (do-cgi) ] }
} cond ;

That's it as far as code is concerned. All that's left is to register a responder which calls do-cgi:
global [
"cgi" [ "argument" get do-cgi ] add-simple-responder
] bind

No comments: