Light
code header

Garbology: A Study of How Objects Die

Raoul Veroy, Samuel Guyer, and Karl Cronburg — Onward! 2017. — Back to overviewPDF slides

Overview

  • Empirical study of garbage collection
  • Focus on how programs dispose of objects
  • Key finding: death patterns are structured and predictable
  • Goal: a new approach to GC algorithm design based on object death patterns

GC is easy (in theory)

  • Theory: find all reachable objects using a graph search
  • Practice: good performance is hard
  • Key insight: program behavior is not random — there are patterns we can exploit
  • Examples: generational, garbage first, age-based, type-directed GC algorithms

Finding patterns

  • Prior work on object demographics: classify lifetimes into short, medium, long
  • Lots of work on allocation patterns and heap structure
  • Conspicuously missing: how do objects die? What program actions create garbage?

The hard problem: pinpointing death

  • There is no "free" operation — how do we know exactly when an object dies?
  • Tool: GC tracing — record every heap event (allocations, pointer updates, stack references)
  • Breakthrough: the Merlin algorithm computes precise death times for all objects

Merlin

  • Key insight: time-stamp objects when they might become unreachable
    o.f = p;
    o = p.next();
  • At GC time, the last timestamp is the time of death — discovered in retrospect
  • Problem: stack updates are very frequent
  • Compromise: timestamp stack refs at each allocation site

GCTrace tool

  • Built in JikesRVM: timestamps heap modifications online, timestamps stack at allocation sites
  • Good: new insights — unprecedented precision
  • Issue: time is measured in allocations, which is coarse; a lot of code can run between allocations

Elephant Tracks: a better clock

  • Key idea: tick the clock at method entry/exit — 10–20× more frequent than allocations
  • Only need to timestamp local refs
  • Bonus: time means something — each tick is a calling context; death events can be localized to a place in the program

Methodology

  • Run Elephant Tracks on benchmarks (caveat: the usual suspects — representativeness unknown)
  • Collect program traces including precise death times
  • Analyze the trace: where do objects die and why?
  • Classify in various ways

Clusters

  • One event often kills a cluster of objects — e.g., discarding a reference to a data structure
  • Finding: clusters are usually small — programs dispose of data structures piecemeal, not wholesale

Cause of death

Three broad categories. Objects become unreachable by:

  1. Die-by-heap: a heap write
  2. Die-by-stack: a stack reference going out of scope
  3. End of program (not interesting)
  • All objects in a cluster share the same cause of death
  • Finding: many objects die by stack
  • Common pattern: remove an object from a container, process it, then drop the reference
  • Refinement: splitting the by-stack category reveals stack-after-heap — programs often disassemble data structures rather than disposing of them wholesale

Properties of clusters

  • Interesting property: cyclic garbage — a problem for reference-counting collectors
  • Compute SCCs on objects in each cluster
  • Finding: garbage cycles are very small

Lifetimes

  • Traditional measures (allocation time, coarse categories) are hard to predict
  • Finding: death sites are very stable — we can predict where an object will die even if we cannot predict when

Memory flow

  • Connect two ends: allocation site and death site — how many bytes flow between each pair?
  • Finding: a small number of allocation-to-death-site pairs account for a large fraction of memory use

Conclusion

  • Programs exhibit very strong patterns of memory use
  • Program structure strongly predicts these patterns
  • Open question: how can we exploit these patterns to design new kinds of garbage collectors?