Skip to content

Memory Model and Garbage Collection

Andrew Appel edited this page Nov 17, 2023 · 6 revisions

Value Representation

CertiCoq uses the same memory representation for values of inductive types as the OCaml compiler [1].

0-ary constructors have an unboxed representation which is unique for each unboxed constructor of an inductive type. The last bit is set to 1 so that the garbage collector can distinguish them from pointers.

N-ary constructors with N>0, are represented as pointers to the second word of a heap allocated block. A block consists of a header followed by N words, representing the arguments of the constructor. The header uniquely identifies constructors of an inductive type and includes metadata about the size and the garbage collection state.

Note that the arity of a constructor does not include the parameters of the inductive type. These are erased during compilation.

The CertiCoq Heap

CertiCoq allocates values in the CertiCoq heap. The CertiCoq heap is an array of linear memory regions. Each linear region (i.e., generation) has twice the size of the previous one.

Garbage Collection

CertiCoq uses a generational garbage collector[2] to collect unreachable objects from the CertiCoq heap. The generated C code performs a garbage collection test to determine if there is enough memory for all the allocations of the program until the next garbage collection test, and invokes the garbage collector if there is not. Garbage collection tests are placed upon function entry and after the return of (non-tail) function calls in the body of a function.

Shadow Stack

The garbage collector determines the live portion of the heap by following the chain of memory references from the set of live roots, which is the set of live variables (parameters and local variables) at a particular point in the execution of the program. Since CertiCoq does not have access to the runtime stack of the C program, the generated C code maintains a shadow stack which is a linked list threaded through the call stack of the C program. Each element of this list is a shadow stack frame which is defined as local variable in the stack frame of each active function. The shadow stack scheme works in the following way. Each activation of a function will allocate in its stack a new shadow stack frame and will link it to the shadow stack frame of the caller, which is passed to the function as a parameter. Before each intermediate function call and garbage-collector call, we push the variables that are live (i.e., variables that occur free in the rest of the program) in the current shadow stack frame and pop them upon function return. At the end of the function body, before the function returns or performs a tail call, the current stack frame is discarded. When the garbage collector is called, it traverses the linked shadow stack frames in order to find the set of live roots.

[1] Memory Representation of Values, in Real World OCaml, (Chapter 20, in 1st edition 2014; Chapter 23 in 2nd edition 2022). J. Hickey, A. Madhavapeddy, and Y. Minsky.

[2] Certifying Graph-Manipulating C Programs via Localizations within Data Structures. Shengyi Wang, Qinxiang Cao, Anshuman Mohan, and Aquinas Hobor. OOPSLA: Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2019), Athens, Greece, October 2019.