Friday, April 11, 2008

Improvements to io.monitors; faster refresh-all

Factor's io.monitors library previously supported Mac OS X, Windows and Linux. Now it also supports BSD, but in a much more restricted fashion than the other platforms. Basically you cannot monitor directories, just individual files. This is because kqueue() only provides very limited functionality in this regard. However, having something is better than nothing, and the functionality provided on BSDs is still useful for monitoring log files and such.

On Linux, inotify doesn't directly support monitoring recursive directory hierarchies so Factor's monitors didn't support recursive monitoring, but a mailing list post by Robert Love discusses how to solve this issue in user-space. I used his solution to implement recursive monitors on Linux.

Another oddity relating to inotify is that if you add the same directory twice to the same inotify, you get the same watch ID both times, and events are only reported once. This means that the previous implementation where there was one global inotify instance shared by all monitors wasn't really as general as one would hope, because you couldn't run two programs that monitor overlapping portions of the file system. I thought of several possible fixes but in the end just changed the monitors API to accommodate this case. All monitor operations must now be wrapped in a with-monitors combinator. On Linux, it creates an inotify instance and stores it in a dynamically-scoped variable, so that subsequent calls to <monitor use this inotify. Independent inotifies in different threads don't interact at all. On Mac OS X, BSD and WIndows, with-monitors just calls the quotation without doing any special setup.

Another issue I fixed was that on Mac OS X, monitors would only work when used from the UI because no run loop was running otherwise. I made a run loop run all of the time and this allows monitors to work in the terminal listener.

Now that monitors are working better, I was able to use them to make refresh-all. This word finds all changed source files in the vocabulary roots and reloads them. It does this by comparing cached CRC32 checksums with the actual checksum of the file. Previously it would also compare modification times, but I took that code out because filesystem meta-data queries got moved out of the VM and into the native I/O code, which isn't available during bootstrap. A side-effect of this is that refresh-all became much slower, because it had to read all files. Using monitors I was able to make this faster than it has ever been. A thread waiting on a monitor is started on startup. Then, the source tree only has to be checksummed in its entirety the first time refresh-all is used in a session. Subsequently, only files for which the monitor reported changes have to be scanned. So refresh-all runs instantly if there are no changes, and so on.

No comments: