☀ Light

code header

Garbology: A Study of How Objects Die

October 22, 2017

Raoul Veroy, Samuel Guyer, and Karl Cronburg — Onward! 2017. — Back to overview — PDF slides

Overview

Empirical study of garbage collection
Focus on how programs dispose of objects
Key finding: death patterns are structured and predictable
Goal: a new approach to GC algorithm design based on object death patterns

GC is easy (in theory)

Theory: find all reachable objects using a graph search
Practice: good performance is hard
Key insight: program behavior is not random — there are patterns we can exploit
Examples: generational, garbage first, age-based, type-directed GC algorithms

Finding patterns

Prior work on object demographics: classify lifetimes into short, medium, long
Lots of work on allocation patterns and heap structure
Conspicuously missing: how do objects die? What program actions create garbage?

The hard problem: pinpointing death

There is no "free" operation — how do we know exactly when an object dies?
Tool: GC tracing — record every heap event (allocations, pointer updates, stack references)
Breakthrough: the Merlin algorithm computes precise death times for all objects

Merlin

Key insight: time-stamp objects when they might become unreachable
```
o.f = p;
o = p.next();
```
At GC time, the last timestamp is the time of death — discovered in retrospect
Problem: stack updates are very frequent
Compromise: timestamp stack refs at each allocation site

GCTrace tool

Built in JikesRVM: timestamps heap modifications online, timestamps stack at allocation sites
Good: new insights — unprecedented precision
Issue: time is measured in allocations, which is coarse; a lot of code can run between allocations

Elephant Tracks: a better clock

Key idea: tick the clock at method entry/exit — 10–20× more frequent than allocations
Only need to timestamp local refs
Bonus: time means something — each tick is a calling context; death events can be localized to a place in the program

Methodology

Run Elephant Tracks on benchmarks (caveat: the usual suspects — representativeness unknown)
Collect program traces including precise death times
Analyze the trace: where do objects die and why?
Classify in various ways

Clusters

One event often kills a cluster of objects — e.g., discarding a reference to a data structure
Finding: clusters are usually small — programs dispose of data structures piecemeal, not wholesale

Cause of death

Three broad categories. Objects become unreachable by:

Die-by-heap: a heap write
Die-by-stack: a stack reference going out of scope
End of program (not interesting)

All objects in a cluster share the same cause of death
Finding: many objects die by stack
Common pattern: remove an object from a container, process it, then drop the reference
Refinement: splitting the by-stack category reveals stack-after-heap — programs often disassemble data structures rather than disposing of them wholesale

Properties of clusters

Interesting property: cyclic garbage — a problem for reference-counting collectors
Compute SCCs on objects in each cluster
Finding: garbage cycles are very small

Lifetimes

Traditional measures (allocation time, coarse categories) are hard to predict
Finding: death sites are very stable — we can predict where an object will die even if we cannot predict when

Memory flow

Connect two ends: allocation site and death site — how many bytes flow between each pair?
Finding: a small number of allocation-to-death-site pairs account for a large fraction of memory use

Conclusion

Programs exhibit very strong patterns of memory use
Program structure strongly predicts these patterns
Open question: how can we exploit these patterns to design new kinds of garbage collectors?