Light
code header

Floorplan: Spatial Layout in Memory Management Systems

Samuel Z. Guyer and Karl Cronburg — GPCE 2019. — Back to overviewPDF slidesPDF with speaker notes

Goals of this talk

  • Familiarize you with Floorplan layout operators
  • Motivate customized memory managers as a domain lacking in language support
  • Not covered: a workshop to learn Floorplan syntax, full compilation semantics, or performance results — read the paper for those

Garbage collection: the good

Modern GCs are generational because most programs have two kinds of objects:

  • Applications exhibit high allocation rates and objects die young — lots of ephemeral data
  • Long-lived objects survive multiple minor GCs — small to moderate amounts of enduring data

Why customize: three ways GC performance can fail

Performance is poor on non-generational and specialized workloads via three routes:

  • Way 1: Object sizes & layout
  • Way 2: Object lifetimes & allocation patterns
  • Way 3: Locality and access patterns

Way 1: Object sizes — streaming science

Homogeneous object sizes limit window size in streaming algorithms, constraining accuracy. For example, k-means accuracy correlates with available memory and inversely with runtime. A Rust sensor message type:

type Mac  = [u8; 6];
type Data = [u8; 13];
pub struct Msg { ptr: Addr }
impl Msg {
    const SIZE: usize = Self::ID_SIZE + Self::POWER_SIZE
        + Self::MAC_SIZE + Self::EPOCH_SIZE + Self::DATA_SIZE;
    const ID_OFFSET:    usize = 0;
    const ID_SIZE:      usize = size_of::();
    // ...
}

Way 1: Object sizes — JVM runtime code

JVM runtime code has ad hoc offset calculations scattered across the codebase:

int SCALAR_HEADER_SIZE = JAVA_HEADER_BYTES + OTHER_HEADER_BYTES;
int ARRAY_HEADER_SIZE  = SCALAR_HEADER_SIZE + ARRAY_LENGTH_BYTES;
Offset TIB_OFFSET      = JAVA_HEADER_OFFSET;
Offset STATUS_OFFSET   = TIB_OFFSET.plus(STATUS_BYTES);

A bug in one of these offset calculations was discovered and fixed in JikesRVM in 2018 (PR on GitHub).

Way 1: Floorplan specification

A Floorplan specification for the sensor message layout above:

Msg -> seq {
    id:    4 bytes,   // Sensor identifier
    epoch: 8 bytes,   // Unix epoch time
    Mac,              // Bluetooth MAC address
    power: 1 bytes,   // RSSI
    data:  13 bytes,  // Portion of BLE packet data
}
Mac -> 6 bytes

The specification makes the layout explicit and checkable, replacing the ad hoc arithmetic currently spread across source files.

Way 2: Object lifetimes — sliding window workloads

Temporal sliding-window data structures have non-generational allocation patterns: new entries arrive, old ones are evicted. A naïve FIFO linked-list experiences O(n) cost per GC. What we want is GC that acts like a ring buffer in the common case (O(1) allocation and eviction), but still supports arbitrary eviction order.

Way 2: Object lifetimes — block allocator

A Floorplan specification for a heap that aligns Msg objects within blocks:

Heap  -> # Block
Block -> @|2^21 bytes|@ -> # Msg
         * BYTES_IN_MSG;

This specification constrains how many Msg objects fit per block, enabling the GC to collect entire blocks (ring-buffer style) rather than individual objects.

What we do currently — and its costs

Today's approaches to specialized memory layout:

  • Apache Spark: hides individual data entries from the GC by managing byte arrays manually, at the cost of deserialization overhead
  • NumPy: memory-maps out-of-core data, but the format is documented only in prose comments
  • HDF5: a general-purpose layout framework, but adoption is slow across ecosystems

The common pattern: layout assumptions are either implicit (scattered arithmetic), verbal (comments), or framework-specific (not portable).

Way 3: Access patterns — bitmaps

A lookup-table for Msg addresses, specified in Floorplan:

// Computes the address of the given Msg ptr's lookup table
// entry, and the mask for accessing the appropriate bit.
#[inline(always)]
fn table_addr(ptr: MsgAddr) -> (BitsAddr, BitsMask) {
    let block: BlockAddr = ptr.get_containing_BlockAddr();
    let msgs:  ObjsAddr  = block.msgs();
    let idx:   usize     = msgs.get_idx_of(ptr);
    // ...
}

Semantics not covered in this talk

  • Byte order
  • Macros
  • Architectural constants
  • Performance characteristics
  • Reduction semantics defining a Floorplan type
  • Compiler implementation details
  • Coq proofs verifying certain language-level properties

Tradeoffs of custom allocators

Writing a custom allocator introduces costs that Floorplan helps manage but does not eliminate:

  • New spatially optimized data structures must be written with custom collection in mind (Floorplan handles the types, not the algorithms)
  • Bookkeeping burden for each custom memory manager
  • Custom interfaces needed for memory debuggers (Valgrind, GDB, Purify)
  • Sunk cost: custom allocators are difficult to write and maintain
  • Risk of pre-emptively optimizing microbenchmarks that are not actually representative

Additional slides (related work)

Heap Layers

Heap Layers is a C++ framework for composing allocators from reusable layer objects, described in Berger et al.. Floorplan operates at a higher level of abstraction — specifying spatial structure rather than composing allocator behaviors.

GHC block descriptors

GHC's block allocator uses a highly optimized lookup table: Bdescr(p) = ((p & 2m-1) >> k) << d) | (p & ~(2m-1)). This implicitly encodes a spatial layout assumption that Floorplan would make explicit. See the GHC wiki.

PADS-Haskell: layouts for parsing

PADS describes file layouts declaratively, generating parsers and printers automatically. Floorplan describes heap layouts. The two approaches are complementary — PADS parses external data into heap structures whose layout Floorplan can specify.

newtype MapsFile = MapsFile ([Line Region] terminator EOF)

data RegionName =
    Heap  "[heap]"
  | Stack "[stack]"
  | VDSO  "[vdso]"
  | VVAR  "[vvar]"
  | VSyscall "[vsyscall]"