Monday, April 16, 2007

Embedding Factor into C applications

I implemented a simple way of embedding Factor into C applications and evaluating Factor code from C via a simple C API. This is very preliminary, and needs a lot of work. However it is a good first step.

First, the build process for the VM has changed. It now produces two files, a VM engine library and a VM executable.

On Windows and Mac OS X, the library is built as a shared library, on other Unices it is built as a static library. The reason is that on Linux, there is no way to build an executable which looks for a required shared library in the same directory as the executable itself. The only alternative is to install the shared library in a known location, such as /usr/lib, or to set the LD_LIBRARY_PATH environment variable. This is unacceptable since it complicates matters for people who want to try Factor. It should be possible to just run make then run Factor from the current directory. So, no shared library on Linux.

The VM executable is very small; in fact, it consists of a single source file:
#include "factor.h"

int main(int argc, char **argv)
{
init_factor_from_args((char*)0,argc,argv,false);
return 0;
}

The factor.h file defines the exported entry points into the Factor VM library. So far, there are only a small handful of those:
void init_factor_from_args(char *image, int argc, char **argv, bool embedded);
char *factor_eval_string(char *string);
void factor_eval_free(char *result);
void factor_yield(void);

Here is a description of each:
  • init_factor_from_args() initializes Factor. C applications embedding Factor should always set the embedded flag to true; this causes init_factor_from_args() to return as soon as Factor has been initialized.
  • factor_eval_string() evaluates a Factor expression and captures any output it performs to a new string. This string is then returned. The expression must not take any inputs from the stack, or leave values on the stack.
  • factor_eval_free() frees a string returned by factor_eval_string().
  • factor_yield() yields a time-slice to any Factor threads. This should be called if you evaluate an expression which spawns a thread with in-thread or a similar Factor word.

Here is an example:
#include "factor.h"

int main(int argc, char **argv)
{
init_factor_from_args(NULL,argc,argv,true);
char *result = factor_eval_string("2 2 + .");
printf("%s",result);
factor_eval_free(result);
return 0;
}

This API has a number of limitations:
  • On Unix, Factor takes over a number of signal handlers. Signal handlers suck for this reason -- but Factor really does need to use signals.
  • Only one Factor instance can exist per C process, and there's no way to de-initialize a Factor instance and free its resources. This will be addressed at some point in the future.
  • The Factor instance can only be accessed from a single native thread for its entire life time -- this is because the Factor runtime is not thread-safe. This will be addressed in Factor 2.0, which will bring first-class support for native threading.
  • The default Factor image is quite large (~7mb) and building minimal images involves having a load file. This will be addressed soon; not only for embedding, but also for deployment, it makes sense to be able to build minimal images containing only a certain set of modules.

As you can see, right now this is more of a novelty than a useful feature, but over time I plan on improving this interface and make Factor a viable choice for scripting C applications -- you will be able to build minimal images containing only the code your application needs. Unlike Lua and Python, Factor is natively-compiled, and Factor's FFI for calling back into C is extremely powerful.

In fact, I didn't even plan on working on a C embedding API at this point, however a seemingly unrelated task required it -- Doug is porting Factor to Windows CE, and on Windows CE, .exe files cannot dynamically look up their own symbols. Factor's compiler does this because generated code often has to call into various VM routines -- so we went for the simplest fix, moving the entire VM into a DLL and only leaving a small stub function in the executable. I polished this a bit and made it minimally useful in other contexts, resulting in the the above embedding API.

4 comments:

orib said...

What you say about Linux is not exactly true. First, you can have your own static library loader dlopen() the library in the proper place. A hack, but it works.

But you can also instruct the linker to look at the current directory by using the '-R$ORIGIN' directive while linking the executable -- see http://docsun.cites.uiuc.edu/sun_docs/C/solaris_9/SUNWdev/LLM/p70.html for an example.

For some reason, the increadibly useful $ORIGIN is almost unknown.

fawcett said...

On Windows and Mac OS X, the library is built as a shared library, on other Unices it is built as a static library. The reason is that on Linux, there is no way to build an executable which looks for a required shared library in the same directory as the executable itself.

Are you sure, Slava? I'm no expert, but as an example, Chicken Scheme compiles its extension modules as shared libraries, and on Linux they can be loaded from the current directory. Perhaps there's some trick to it, but it's not based on LD_LIBRARY_PATH.

Slava Pestov said...

Thanks guys! Passing '-rpath $ORIGIN' to the linker seems to work!

Anonymous said...

Nice info from orib. The "fact" that linux wouldn't look for libraries in the app directory always bugged me. I knew about the suboptimal dlopen hack, but never knew about the linker trick.

Good stuff.