Thursday, November 02, 2006

Memory allocation refactoring

Factor's garbage collector moves objects around in memory. This means that if there is not enough memory to satify an allocation request, the garbage collector has to be able to find all pointers to live objects and update them if necessary.

Until now, memory allocation in the Factor runtime did not check to see if the heap was full:
INLINE void allot_barrier(CELL address)
{
F_CARD *ptr = ADDR_TO_CARD(address);
F_CARD c = *ptr;
CELL b = card_base(c);
CELL a = (address & ADDR_CARD_MASK);
*ptr = (card_marked(c) | ((b < a) ? b : a));
}

INLINE void *allot_zone(F_ZONE *z, CELL a)
{
CELL h = z->here;
z->here = h + align8(a);

allot_barrier(h);
return (void*)h;
}

INLINE void *allot(CELL a)
{
return allot_zone(&nursery,a);
}

Instead every primitive which could at some stage allocate memory would simply first check if the heap was 75% full, and if so, call the GC. So the idea was that we call the GC early enough, and hope that the heap does not completely fill up until the next GC check. If the heap did fill up, a guard page would be hit, but all that we could do was kill Factor, since calling the GC from the signal handler is out of the question -- suspended stack frames could contain C local variables pointing to Factor objects.

Indeed, the GC check was always performed at points where no Factor object could be stored in a C local variable, so this made things very simple. The disadvantage is that you cannot use more than 75% of the heap, every primitive has to remember to call the GC check, and if some primitive enters with 74% of the heap full and suddenly allocates a large amount of memory, Factor would die with an 'out of memory' error even if after a GC there would be enough. Also this approach made it hard to implement a growable data heap -- at what point do you grow? If the GC check did not free enough memory, you could grow the heap there, but if the GC check did free enough memory but the primitive subsequently allocates a huge array, you'd run out of memory instead of growing the heap. So the manual GC check, and restriction on performing a GC when C variables contain roots, had to go.

So I decided to redo things, and instead potentially perform GC from the lowest-level allocation functions. The allot() function now looks like this:
INLINE void maybe_gc(CELL a)
{
if(nursery.here + a > nursery.limit)
garbage_collection(NURSERY,false);
}

INLINE void *allot(CELL a)
{
maybe_gc(a);
return allot_zone(&nursery,a);
}

This means that before calling a runtime function which can potentially allocate memory, the C code has to register any local variables which can contain objects as GC roots. For example, here is the low-level function implementing the <array> primitive:
F_ARRAY *allot_array(CELL type, F_FIXNUM capacity, CELL fill)
{
int i;
REGISTER_ROOT(fill);
F_ARRAY* array = allot_array_internal(type, capacity);
UNREGISTER_ROOT(fill);
for(i = 0; i > capacity; i++)
set_array_nth(array,i,fill);
return array;
}

The REGISTER_ROOT macro adds the object to a stack, internal to the runtime, and the UNREGISTER_ROOT macro removes the topmost object and stores it back in the given variable. The GC scans this stack for roots just like it does with the data stack, etc. This stack is not accessible from the Factor side, and if an error is thrown between calls to REGISTER_ROOT and UNREGISTER_ROOT, its contents are reset.

There are only a few cases where GC roots must be registered this way, and most of them are in the bignum code, especially after I simplified a lot of runtime C code and moved a number of primitives from C code into the Factor library. However forgetting to register a root can have disastrous consequences, namely random, hard to reproduce crashes (after all, 99% of the time a given allocation call does not GC). I developed two testing tool to help catch this:
  • After a GC, overwriting the former semispace with junk data so that reading from it immediately causes bogus behavior, rather than subtle corruption
  • Modifying the allocation function in the runtime to call the GC every time, or every 100 allocations

These two debugging features slow things down considerably, but have helped me find a number of problems and I'm confident I'll get the root registration 100% correct this way. The first item will be available to users as the -S command line switch; it was actually first suggested by Doug Coleman in the context of cryptography algorithms, where you don't want your private key to hang around in memory longer than needed in the event of your swap file being stolen (although this seems way too paranoid for me). Obviously the second feature will be removed before release and I don't intend to add a switch to GC on every allocation after this refactoring is done.

I've almost finished getting everything working again. Compiled alien calls and callbacks need to be updated now, since the segment of code generated by the compiler to convert Factor objects to C values (and vice versa) now needs to accomodate the fact that objects may be moved around in the middle of this process. Also in one place, the runtime's allot() function is bypassed, and allocation of a boxed float is done manually in a compiled intrinsic; this ensures that this can be done in a "basic block" without clobbering register contents, improving the efficiency of compiled floating point code.

Interestingly enough, this refactoring has shaved several hundreds of lines of C code from the runtime. It gave me a chance to review the C code and clean it up and remove some redundancies.

Once everything is stable and I fix some minor bugs on my to do list, I will release 0.86. I will not implement a growable data heap until 0.87, however with this refactoring it has become much easier to do.

3 comments:

Anonymous said...

Slava,,
Have a look at how Icon/UnIcon handle memory management. It has been a while since I looked at it, but I do recall that they handled the GC and allocation with a single call.

Basically, the call requested an amount x which if the memory meanager couldn't give you, it would automatically invoke the GC. To ensure that C code didn't have pointers allocated, they used a specific protocol.

I have written some very memory intensive programs and they have still been pretty quick.

This meant that they didn't have to worry about pointers getting lost and unallocating memory that was still in use. The dynamically allocation of of more memory was also handled in the same code (AFAIR).

If you have a look, it may save you a bit of time.

regards

Bruce Rennie
(God's Own Country Downunder)

Slava Pestov said...

The C side of the refactoring is done now anyway. All that remains is to update the compiler. Basically, manually recording roots is not a problem for me at all, because there is so little C code.

Anonymous said...

Slava,

If any additional changes are made to the C code, how detailed is your description of the memory allocation and GC calling protocol you use?

regards

Bruce Rennie
(God's Own Country Downunder)