Light
code header

Debugging Memory Safety Errors in Memory Managers

Karl Cronburg — NEPLS 2017. — Back to overviewPDF slides

The problem

Memory managers are easy to write incorrectly:

I set my GC header bit to zero.
Other runtime system code runs.
My GC header bit is now set to one.

The distinction matters: this is a memory safety error, not a static bug. Bugs are statically possible violations; errors are runtime violations of dynamic semantics. A bug need not produce an error, and an error need not trace to an obvious bug.

Why current solutions fall short

The standard toolkit — printf debugging, code editors, static control-flow analysis, mailing lists — addresses bugs, not errors. Memory safety errors in memory managers require:

  • Knowing the current configuration of memory
  • Knowing what code is running
  • Knowing what memory that code is allowed to access

Documentation is a programming-language concern, not just a comment in a source file. Weeks of manual debugging typically follow from the false assumption that all errors are the result of statically-catchable bugs.

PL approach: every problem is a language

Frame the problem formally. Define:

  • M(VM) — the set of valid memory configurations: Is address 0x0000…0xFFFF ∈ M(VM)?
  • C(VM) — the set of valid (code, memory-type) pairs: Is (Code(ip), MemoryType(esi)) ∈ C(VM)?

Specify M(VM) and C(VM) explicitly, then check membership dynamically.

Prior work: dynamic binary instrumentation (Permchecker)

  • Specify M(VM) and C(VM) imperatively, in terms of the runtime
  • Half-portable — requires VM-specific alloc() and free() hooks
  • Shared Permchecker API across VMs

Ongoing work: Floorplan (FLP) specification language

Memory manager memory has three structural properties that Permchecker exploits:

  • Layered: Page → Space → Block → Cell
  • Bifurcated: alloc-path vs. free-path
  • Owned: metadata at each layer, free chunks at each layer

Floorplan (FLP) is a language for specifying M(VM) separately from the memory manager implementation — making the layout explicit so tools can reason about it.

FLP toolchain

  • Parser implemented in Haskell
  • Transpiler in progress: tiles FLP layers onto C structs or C++ classes
  • Target implementations: HotSpot, JikesRVM, GHC
  • Pin and Valgrind DBI backend
  • Take-away: a language for specifying memory layout capable of describing a wide variety of VM implementations (Java, Haskell, JavaScript, …)

Future work: specifying C(VM)

Integrate permission specifications with the host language or a framework:

  • Target C++ class methods (portability: lowest common denominator)
  • Target Rust (safety: stronger static guarantees)
  • Integrate with LLVM (automation)