Garbage collection, the quirky name used for automatic storage management, might well be called memory recycling if it were invented today. Indeed, it has a venerable history, nearly as long as that of computing itself, extending back to early LISP implementations, with papers appearing from 1960 onward. GC, as it is affectionately known, also developed a reputation for being slow and requiring a lot more memory than explicitly managed memory. If that was not enough, most GC algorithms would wait for the designated memory to fill, then stop the program, collect, and restart the program, introducing pauses into the primary computation.
Proponents of GC have persuasive software engineering arguments on their side: automatic storage reclamation simplifies programming and reduces errors, thus increasing programmer productivity. Ultimately, increasing computer speed and memory capacity made GC a reasonable choice for a much wider range of systems. Java brought it into the mainstream, where it has been quite successful. We have even reached the point where C++, originally posted with signs reading "No GC here!", now offers optional support for it.
But the holy grail for automatic storage management has been to achieve GC for real-time systems. Real-time GC is challenging for several reasons. One is that real-time systems cannot tolerate large pauses. This requires the collector either to be incremental at a fine grain or to run concurrently with the program (called the mutator in GC parlance because it nefariously changes pointers around while the GC is at work). I can recall when I was a graduate student it seemed there was a veritable industry around presenting concurrent GC algorithms in this very publication, with proofs of correctness. In fact, these proofs were offered because they were among the hardest correctness proofs researchers could conceive. Needless to say, this suggests the difficulty of getting concurrent GC algorithms right, much less translating them into correct implementations.
It is quite a tour de force that the authors of the following paper have built a provably correct real-time collector for reconfigurable hardware.
Concurrent GC alone is not enough to achieve real-time storage management. You also need provable bounds on the time to collect and the maximum memory needed by your program running under this scheme. Firstly, the collector cannot fall behind the mutator: it must be able to collect disused memory and recycle it for mutator use at least as fast as the mutator allocates memory units. An implication is that unless you can impose an artificial bound on the mutator’s allocation rate, your collector must not only be concurrent but also fast.
While people have developed a wide range of GC algorithms, we are concerned here with ones that start from program variables (roots) and follow pointers from object to object, finding all the reachable objects. Such tracing collectors work in cycles: from the roots, trace the reachable objects, reclaim what is left, then go on to the next cycle.
Though it is a bit of a misnomer, GC developers also call reachable objects live, and their total volume at a given point in execution is the live size. In addition to a fast enough concurrent GC algorithm, for real-time GC you need not only a hard bound on the program’s maximum live size—probably needed for real-time mutator behavior anyway—but also a bound on the amount of garbage (unreachable objects) that will accumulate during a collection cycle.
It is thus quite a tour de force that the authors of the following paper have built a provably correct real-time collector for reconfigurable hardware (field programmable gate arrays). How can this be? It turns out the FPGA setting makes the problem simpler in some ways than is the case with software GC running on stock processors. They can reasonably impose simple uniformity on the memory and its layout; they can exploit single-clock read-modify-write steps; and perhaps most importantly they have, via dual ported memory, completely concurrent memory access. This all leads to one of the cleanest and perhaps most understandable implementations of a concurrent GC that has ever been presented. The other prerequisites for real-time collection also follow easily. It is difficult to find the right word to express the feeling I get seeing such a sophisticated algorithmic idea reduced to such a straightforward hardware implementation—"Cool!" will have to suffice.
This work does not appear to offer a direct path to simple real-time GC support for software implementations on stock hardware. At the same time, it helps to inform that work through its contribution to knowledge about real-time GC beyond its benefits to practice. In particular, it shows how tight a bound we can achieve on the total space needed for a no-pause collector to run. I feel certain it will inspire creative approaches that will help bring garbage collection into acceptance in almost every corner of the system implementation space.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment