First of all, there was a potential memory leak situation I overlooked. Suppose that there is a compiled definition in the code heap which references a very large object in the data heap, and neither the compiled definition or the large object is referenced from anywhere else. Because the code heap was only ever collected when it filled up, it was possible that this large object would never be reclaimed, and it would incur unnecessary collection cycles and memory pressure as a result. This could even result in unbounded heap growth if this was done in a loop. For example, the following test case would crash Factor even though it should have run in constant space:
: leak-step 800000 f <array> 1quotation call drop ;
: leak-loop 1000 [ leak-step ] times ;
The fix is to always collect the code heap when collecting the oldest generation. Only collecting the code heap when it fills up is simply unsound because the code heap can reference objects in the data heap. If this was not the case then collecting them independently would be okay, but it isn't.
The next thing I did was improve how allocation of large objects is handled. Previously, if a new object was too large to fit in the nursery, the entire heap would grow, and every time the heap grew it would increase the size of all generations. This is generally not what you want, because if the nursery is too large, we lose locality, and if the accumulation space is too large we waste time copying objects back and forth that should really be in tenured space.
Now, the nursery and accumulation space have a fixed size that can be changed on startup with command line switches, but never changes while Factor is running. If an attempt is made to allocate an object larger than the nursery, it is directly allocated in tenured space. Presumably, if you're making a 2 megabyte array, you're going to do something with it, and hold on top it for at least a few collection cycles, rather than discard it immediately, so it makes sense to avoid the copying altogether and stick it in tenured space. On retarded microbenchmarks this change might result in worse performance, but on more realistic workloads it should be better; having a small nursery helps with locality and avoiding the inevitable copying of the large array from the nursery to accumulation space and then to tenured space is good too.
Future improvements to the GC will include:
- Shrinking the heap when memory usage lowers and remains low for several collections (Zed Shaw experimented with implementing this but didn't finish due to lack of time).
- Allocating chunks from the OS in small increments, say 1mb, instead of allocating one large heap. This would make heap growth more efficient (no need to copy everything over) and it would also allow full use of the entire address space (and not half). However it will require rethinking Factor's card marking write barrier, which currently assumes a contiguous heap.
- Using mark/sweep/compact for the oldest generation.
- Incremental marking and sweeping.
No comments:
Post a Comment