Garbage collector design

Garbage collector design#Abstract#

The main garbage collection algorithm used by CPython is reference counting. The basic idea isthat CPython counts how many different places there are that have a reference to anobject. Such a place could be another object, or a global (or static) C variable, ora local variable in some C function. When an object’s reference count becomes zero,the object is deallocated. If it contains references to other objects, theirreference counts are decremented. Those other objects may be deallocated in turn, ifthis decrement makes their reference count become zero, and so on. The referencecount field can be examined using the sys.getrefcount function (notice that thevalue returned by this function is always 1 more as the function also has a referenceto the object when called):

>>> x = object()>>> sys.getrefcount(x)2>>> y = x>>> sys.getrefcount(x)3>>> del y>>> sys.getrefcount(x)2

The main problem with the reference counting scheme is that it does not handle referencecycles. For instance, consider this code:

>>> container = []>>> container.append(container)>>> sys.getrefcount(container)3>>> del container

In this example, container holds a reference to itself, so even when we removeour reference to it (the variable “container”) the reference count never falls to 0because it still has its own internal reference. Therefore it would never becleaned just by simple reference counting. For this reason some additional machineryis needed to clean these reference cycles between objects once they becomeunreachable. This is the cyclic garbage collector, usually called just GarbageCollector (GC), even though reference counting is also a form of garbage collection.

Starting in version 3.13, CPython contains two GC implementations:

The default build implementation relies on the global interpreterlock for thread safety.

The free-threaded build implementation pauses other executing threads whenperforming a collection for thread safety.

Both implementations use the same basic algorithms, but operate on differentdata structures. The Differences between GC implementations section summarizes thedifferences between the two GC implementations.

Memory layout and object structure#

The garbage collector requires additional fields in Python objects to supportgarbage collection. These extra fields are different in the default and thefree-threaded builds.

GC for the default build#

Normally the C structure supporting a regular Python object looks as follows:

object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \ |ob_refcnt | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD |*ob_type| | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / | ... |

In order to support the garbage collector, the memory layout of objects is alteredto accommodate extra information before the normal layout:

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \ |*_gc_next | | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyGC_Head |*_gc_prev | |object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / |ob_refcnt | \ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD |*ob_type| | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / | ... |

In this way the object can be treated as a normal python object and when the extrainformation associated to the GC is needed the previous fields can be accessed by asimple type cast from the original object: ((PyGC_Head *)(the_object)-1).

As is explained later in the Optimization: reusing fields to save memory section,these two extra fields are normally used to keep doubly linked lists of all theobjects tracked by the garbage collector (these lists are the GC generations, more onthat in the Optimization: generations section), but they are alsoreused to fulfill other purposes when the full doubly linked list structure is notneeded as a memory optimization.

Doubly linked lists are used because they efficiently support most frequently required operations. Ingeneral, the collection of all objects tracked by GC are partitioned into disjoint sets, each in its owndoubly linked list. Between collections, objects are partitioned into “generations”, reflecting howoften they’ve survived collection attempts. During collections, the generation(s) being collectedare further partitioned into, e.g., sets of reachable and unreachable objects. Doubly linked listssupport moving an object from one partition to another, adding a new object, removing an objectentirely (objects tracked by GC are most often reclaimed by the refcounting system when GCisn’t running at all!), and merging partitions, all with a small constant number of pointer updates.With care, they also support iterating over a partition while objects are being added to - andremoved from - it, which is frequently required while GC is running.

GC for the free-threaded build#

In the free-threaded build, Python objects contain a 1-byte fieldob_gc_bits that is used to track garbage collection related state. Thefield exists in all objects, including ones that do not support cyclicgarbage collection. The field is used to identify objects that are trackedby the collector, ensure that finalizers are called only once per object,and, during garbage collection, differentiate reachable vs. unreachable objects.

object -----> +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ \ | ob_tid| | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | pad | ob_mutex | ob_gc_bits | ob_ref_local| | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | PyObject_HEAD | ob_ref_shared| | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | |*ob_type| | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ / | ... |

Note that not all fields are to scale. pad is two bytes, ob_mutex andob_gc_bits are each one byte, and ob_ref_local is four bytes. Theother fields, ob_tid, ob_ref_shared, and ob_type, are allpointer-sized (i.e., eight bytes on a 64-bit platform).

The garbage collector also temporarily repurposes the ob_tid (thread ID)and ob_ref_local (local reference count) fields for other purposes duringcollections.

C APIs#

Specific APIs are offered to allocate, deallocate, initialize, track, and untrackobjects with GC support. These APIs can be found in the Garbage Collector C APIdocumentation.

Apart from this object structure, the type object for objects supporting garbagecollection must include the Py_TPFLAGS_HAVE_GC in its tp_flags slot andprovide an implementation of the tp_traverse handler. Unless it can be proventhat the objects cannot form reference cycles with only objects of its type or unlessthe type is immutable, a tp_clear implementation must also be provided.

Identifying reference cycles#

The algorithm that CPython uses to detect those reference cycles isimplemented in the gc module. The garbage collector only focuseson cleaning container objects (i.e. objects that can contain a referenceto one or more objects). These can be arrays, dictionaries, lists, customclass instances, classes in extension modules, etc. One could think thatcycles are uncommon but the truth is that many internal references needed bythe interpreter create cycles everywhere. Some notable examples:

Exceptions contain traceback objects that contain a list of frames thatcontain the exception itself.

Module-level functions reference the module’s dict (which is needed to resolve globals),which in turn contains entries for the module-level functions.

Instances have references to their class which itself references its module, and the modulecontains references to everything that is inside (and maybe other modules)and this can lead back to the original instance.

When representing data structures like graphs, it is very typical for them tohave internal links to themselves.

To correctly dispose of these objects once they become unreachable, they needto be identified first. To understand how the algorithm works, let’s takethe case of a circular linked list which has one link referenced by avariable A, and one self-referencing object which is completelyunreachable:

>>> import gc>>> class Link:...def __init__(self, next_link=None):...self.next_link = next_link>>> link_3 = Link()>>> link_2 = Link(link_3)>>> link_1 = Link(link_2)>>> link_3.next_link = link_1>>> A = link_1>>> del link_1, link_2, link_3>>> link_4 = Link()>>> link_4.next_link = link_4>>> del link_4# Collect the unreachable Link object (and its .__dict__ dict).>>> gc.collect()2

The GC starts with a set of candidate objects it wants to scan. In thedefault build, these “objects to scan” might be all container objects or asmaller subset (or “generation”). In the free-threaded build, the collectoralways operates scans all container objects.

The objective is to identify all the unreachable objects. The collector doesthis by identifying reachable objects; the remaining objects must beunreachable. The first step is to identify all of the “to scan” objects thatare directly reachable from outside the set of candidate objects. Theseobjects have a refcount larger than the number of incoming references fromwithin the candidate set.

Every object that supports garbage collection will have an extra referencecount field initialized to the reference count (gc_ref in the figures)of that object when the algorithm starts. This is because the algorithm needsto modify the reference count to do the computations and in this way theinterpreter will not modify the real reference count field.

The GC then iterates over all containers in the first list and decrements by one thegc_ref field of any other object that container is referencing. Doingthis makes use of the tp_traverse slot in the container class (implementedusing the C API or inherited by a superclass) to know what objects are referenced byeach container. After all the objects have been scanned, only the objects that havereferences from outside the “objects to scan” list will have gc_ref > 0.

Notice that having gc_ref == 0 does not imply that the object is unreachable.This is because another object that is reachable from the outside (gc_ref > 0)can still have references to it. For instance, the link_2 object in our exampleended having gc_ref == 0 but is referenced still by the link_1 object thatis reachable from the outside. To obtain the set of objects that are reallyunreachable, the garbage collector re-scans the container objects using thetp_traverse slot; this time with a different traverse function that marks objects withgc_ref == 0 as “tentatively unreachable” and then moves them to thetentatively unreachable list. The following image depicts the state of the lists in amoment when the GC processed the link_3 and link_4 objects but has notprocessed link_1 and link_2 yet.

Then the GC scans the next link_1 object. Because it has gc_ref == 1,the gc does not do anything special because it knows it has to be reachable (and isalready in what will become the reachable list):

When the GC encounters an object which is reachable (gc_ref > 0), it traversesits references using the tp_traverse slot to find all the objects that arereachable from it, moving them to the end of the list of reachable objects (wherethey started originally) and setting its gc_ref field to 1. This is what happensto link_2 and link_3 below as they are reachable from link_1. From thestate in the previous image and after examining the objects referred to by link_1the GC knows that link_3 is reachable after all, so it is moved back to theoriginal list and its gc_ref field is set to 1 so that if the GC visits it again,it will know that it’s reachable. To avoid visiting an object twice, the GC marks allobjects that have already been visited once (by unsetting the PREV_MASK_COLLECTINGflag) so that if an object that has already been processed is referenced by some otherobject, the GC does not process it twice.

Notice that an object that was marked as “tentatively unreachable” and was latermoved back to the reachable list will be visited again by the garbage collectoras now all the references that that object has need to be processed as well. Thisprocess is really a breadth first search over the object graph. Once all the objectsare scanned, the GC knows that all container objects in the tentatively unreachablelist are really unreachable and can thus be garbage collected.

Pragmatically, it’s important to note that no recursion is required by any of this,and neither does it in any other way require additional memory proportional to thenumber of objects, number of pointers, or the lengths of pointer chains. Apart fromO(1) storage for internal C needs, the objects themselves contain all the storagethe GC algorithms require.

Why moving unreachable objects is better#

It sounds logical to move the unreachable objects under the premise that most objectsare usually reachable, until you think about it: the reason it pays isn’t actuallyobvious.

Suppose we create objects A, B, C in that order. They appear in the young generationin the same order. If B points to A, and C to B, and C is reachable from outside,then the adjusted refcounts after the first step of the algorithm runs will be 0, 0,and 1 respectively because the only reachable object from the outside is C.

When the next step of the algorithm finds A, A is moved to the unreachable list. Thesame for B when it’s first encountered. Then C is traversed, B is moved back tothe reachable list. B is eventually traversed, and then A is moved back to the reachablelist.

So instead of not moving at all, the reachable objects B and A are each moved twice.Why is this a win? A straightforward algorithm to move the reachable objects insteadwould move A, B, and C once each. The key is that this dance leaves the objects inorder C, B, A - it’s reversed from the original order. On all subsequent scans,none of them will move. Since most objects aren’t in cycles, this can save anunbounded number of moves across an unbounded number of later collections. The onlytime the cost can be higher is the first time the chain is scanned.

Destroying unreachable objects#

Once the GC knows the list of unreachable objects, a very delicate process startswith the objective of completely destroying these objects. Roughly, the processfollows these steps in order:

Handle and clear weak references (if any). Weak references to unreachable objectsare set to None. If the weak reference has an associated callback, the callbackis enqueued to be called once the clearing of weak references is finished. We onlyinvoke callbacks for weak references that are themselves reachable. If both the weakreference and the pointed-to object are unreachable we do not execute the callback.This is partly for historical reasons: the callback could resurrect an unreachableobject and support for weak references predates support for object resurrection.Ignoring the weak reference’s callback is fine because both the object and the weakrefare going away, so it’s legitimate to say the weak reference is going away first.

If an object has legacy finalizers (tp_del slot) move it to thegc.garbage list.

Call the finalizers (tp_finalize slot) and mark the objects as alreadyfinalized to avoid calling finalizers twice if the objects are resurrected orif other finalizers have removed the object first.

Deal with resurrected objects. If some objects have been resurrected, the GCfinds the new subset of objects that are still unreachable by running the cycledetection algorithm again and continues with them.

Call the tp_clear slot of every object so all internal links are broken andthe reference counts fall to 0, triggering the destruction of all unreachableobjects.

Optimization: generations#

In order to limit the time each garbage collection takes, the GCimplementation for the default build uses a popular optimization:generations. The main idea behind this concept is the assumption that mostobjects have a very short lifespan and can thus be collected soon after theircreation. This has proven to be very close to the reality of many Pythonprograms as many temporary objects are created and destroyed very quickly.

To take advantage of this fact, all container objects are segregated intothree spaces/generations. Every newobject starts in the first generation (generation 0). The previous algorithm isexecuted only over the objects of a particular generation and if an objectsurvives a collection of its generation it will be moved to the next one(generation 1), where it will be surveyed for collection less often. Ifthe same object survives another GC round in this new generation (generation 1)it will be moved to the last generation (generation 2) where it will besurveyed the least often.

The GC implementation for the free-threaded build does not use multiplegenerations. Every collection operates on the entire heap.

In order to decide when to run, the collector keeps track of the number of objectallocations and deallocations since the last collection. When the number ofallocations minus the number of deallocations exceeds threshold_0,collection starts. Initially only generation 0 is examined. If generation 0 hasbeen examined more than threshold_1 times since generation 1 has beenexamined, then generation 1 is examined as well. With generation 2,things are a bit more complicated; see Collecting the oldest generation formore information. These thresholds can be examined using thegc.get_threshold() function:

>>> import gc>>> gc.get_threshold()(700, 10, 10)

The content of these generations can be examined using thegc.get_objects(generation=NUM) function and collections can be triggeredspecifically in a generation by calling gc.collect(generation=NUM).

>>> import gc>>> class MyObj:... pass...# Move everything to the last generation so it's easier to inspect# the younger generations.>>> gc.collect()0# Create a reference cycle.>>> x = MyObj()>>> x.self = x# Initially the object is in the youngest generation.>>> gc.get_objects(generation=0)[..., , ...]# After a collection of the youngest generation the object# moves to the next generation.>>> gc.collect(generation=0)0>>> gc.get_objects(generation=0)[]>>> gc.get_objects(generation=1)[..., , ...]Collecting the oldest generation#

In addition to the various configurable thresholds, the GC only triggers a fullcollection of the oldest generation if the ratio long_lived_pending / long_lived_totalis above a given value (hardwired to 25%). The reason is that, while “non-full”collections (i.e., collections of the young and middle generations) will alwaysexamine roughly the same number of objects (determined by the aforementionedthresholds) the cost of a full collection is proportional to the totalnumber of long-lived objects, which is virtually unbounded. Indeed, it hasbeen remarked that doing a full collection every of objectcreations entails a dramatic performance degradation in workloads which consistof creating and storing lots of long-lived objects (e.g. building a large listof GC-tracked objects would show quadratic performance, instead of linear asexpected). Using the above ratio, instead, yields amortized linear performancein the total number of objects (the effect of which can be summarized thusly:“each full garbage collection is more and more costly as the number of objectsgrows, but we do fewer and fewer of them”).

Optimization: reusing fields to save memory#

In order to save memory, the two linked list pointers in every object with GCsupport are reused for several purposes. This is a common optimization knownas “fat pointers” or “tagged pointers”: pointers that carry additional data,“folded” into the pointer, meaning stored inline in the data representing theaddress, taking advantage of certain properties of memory addressing. This ispossible as most architectures align certain types of datato the size of the data, often a word or multiple thereof. This discrepancyleaves a few of the least significant bits of the pointer unused, which can beused for tags or to keep other information – most often as a bit field (eachbit a separate tag) – as long as code that uses the pointer masks out thesebits before accessing memory. E.g., on a 32-bit architecture (for bothaddresses and word size), a word is 32 bits = 4 bytes, so word-alignedaddresses are always a multiple of 4, hence end in 00, leaving the last 2 bitsavailable; while on a 64-bit architecture, a word is 64 bits = 8 bytes, soword-aligned addresses end in 000, leaving the last 3 bits available.

The CPython GC makes use of two fat pointers that correspond to the extra fieldsof PyGC_Head discussed in the Memory layout and object structure section:

Warning

Because the presence of extra information, “tagged” or “fat” pointers cannot bedereferenced directly and the extra information must be stripped off beforeobtaining the real memory address. Special care needs to be taken withfunctions that directly manipulate the linked lists, as these functionsnormally assume the pointers inside the lists are in a consistent state.

The _gc_prev field is normally used as the “previous” pointer to maintain thedoubly linked list but its lowest two bits are used to keep the flagsPREV_MASK_COLLECTING and _PyGC_PREV_MASK_FINALIZED. Between collections,the only flag that can be present is _PyGC_PREV_MASK_FINALIZED that indicatesif an object has been already finalized. During collections _gc_prev istemporarily used for storing a copy of the reference count (gc_ref), inaddition to two flags, and the GC linked list becomes a singly linked list until_gc_prev is restored.

The _gc_next field is used as the “next” pointer to maintain the doubly linkedlist but during collection its lowest bit is used to keep theNEXT_MASK_UNREACHABLE flag that indicates if an object is tentativelyunreachable during the cycle detection algorithm. This is a drawback to using onlydoubly linked lists to implement partitions: while most needed operations areconstant-time, there is no efficient way to determine which partition an object iscurrently in. Instead, when that’s needed, ad hoc tricks (like theNEXT_MASK_UNREACHABLE flag) are employed.

Optimization: delay tracking containers#

Certain types of containers cannot participate in a reference cycle, and so donot need to be tracked by the garbage collector. Untracking these objectsreduces the cost of garbage collection. However, determining which objects maybe untracked is not free, and the costs must be weighed against the benefitsfor garbage collection. There are two possible strategies for when to untracka container:

When the container is created.

When the container is examined by the garbage collector.

As a general rule, instances of atomic types aren’t tracked and instances ofnon-atomic types (containers, user-defined objects…) are. However, sometype-specific optimizations can be present in order to suppress the garbagecollector footprint of simple instances. Some examples of native types thatbenefit from delayed tracking:

Tuples containing only immutable objects (integers, strings etc,and recursively, tuples of immutable objects) do not need to be tracked. Theinterpreter creates a large number of tuples, many of which will not surviveuntil garbage collection. It is therefore not worthwhile to untrack eligibletuples at creation time. Instead, all tuples except the empty tuple are trackedwhen created. During garbage collection it is determined whether any survivingtuples can be untracked. A tuple can be untracked if all of its contents arealready not tracked. Tuples are examined for untracking in all garbage collectioncycles. It may take more than one cycle to untrack a tuple.

Dictionaries containing only immutable objects also do not need to be tracked.Dictionaries are untracked when created. If a tracked item is inserted into adictionary (either as a key or value), the dictionary becomes tracked. During afull garbage collection (all generations), the collector will untrack any dictionarieswhose contents are not tracked.

The garbage collector module provides the Python function is_tracked(obj), which returnsthe current tracking status of the object. Subsequent garbage collections may change thetracking status of the object.

>>> gc.is_tracked(0)False>>> gc.is_tracked("a")False>>> gc.is_tracked([])True>>> gc.is_tracked({})False>>> gc.is_tracked({"a": 1})False>>> gc.is_tracked({"a": []})TrueDifferences between GC implementations#

This section summarizes the differences between the GC implementation in thedefault build and the implementation in the free-threaded build.

The default build implementation makes extensive use of the PyGC_Head datastructure, while the free-threaded build implementation does not use thatdata structure.

The default build implementation stores all tracked objects in a doublylinked list using PyGC_Head. The free-threaded build implementationinstead relies on the embedded mimalloc memory allocator to scan the heapfor tracked objects.

The default build implementation uses PyGC_Head for the unreachableobject list. The free-threaded build implementation repurposes theob_tid field to store a unreachable objects linked list.

The default build implementation stores flags in the _gc_prev field ofPyGC_Head. The free-threaded build implementation stores these flagsin ob_gc_bits.

The default build implementation relies on the global interpreter lockfor thread safety. The free-threaded build implementation has two “stop theworld” pauses, in which all other executing threads are temporarily paused sothat the GC can safely access reference counts and object attributes.

The default build implementation is a generational collector. Thefree-threaded build is non-generational; each collection scans the entireheap.

Keeping track of object generations is simple and inexpensive in the defaultbuild. The free-threaded build relies on mimalloc for finding trackedobjects; identifying “young” objects without scanning the entire heap wouldbe more difficult.

Document History

Pablo Galindo Salgado - Original Author