Monday, October 16, 2006

Crash while running unit tests caused by unit tests

I spent an hour last night on a weird crashing bug that manifested itself on Windows: if you ran test-modules from the UI a few times, it would eventually crash Factor in unpredictable ways. I narrowed this down to the test for I/O buffers, which are low-level character queues, allocated in malloc() space, for when the Factor implementation needs to call native I/O functions with a pointer which will not be moved by the GC.

Now one of the unit tests created two buffers from a pair of strings, appended one to the other, then converted the result to a string, and compared the result against the expected string. Unfortunately, the word to append them was written rather carelessly. It used memcpy to copy the contents of one buffer to another, without checking bounds and growing the buffer first. The result? Random crashes, yet amazingly they never appeared on Linux or Mac OS X.

Lessons learned? Well, I knew all of these already, but this incident underscores them:
  • Pointer arithmetic is dangerous. Fortunately I don't believe in "hair shirt" programming, so Factor makes minimal use of unsafe constructs, only resorting to direct memory manipulation when calling C code which wants us to deal with C data. User programs never have to step outside the memory safe sandbox.
  • When unit tests fail, it doesn't necessarily imply the code being tested is buggy. The unit tests could be broken too! The reason it took me a long time to track down the bug is simply that I did not think to check the unit tests themselves, since after all, the buffers test passed most of the time.
  • Testing on multiple platforms helps weed out bugs.


Anonymous said...

First sensible post I've seen from you yet! Well, except for the annoying phd students.

Slava Pestov said...

Umm, ok...

Anonymous said...

Just kidding. Multiple platforms are a killer for me. Years of legacy code + new platform = headache