Custom Allocators: `GlobalAlloc`, `#[global_allocator]`, and Swapping in jemalloc / mimalloc

20 min read

In Node.js, every {}, [], and new Buffer() goes through V8’s allocator and the platform malloc underneath — you never see it, never choose it, and never override it. Rust exposes that machinery as a real, swappable interface. With a single attribute you can replace the program-wide allocator with one tuned for throughput (jemalloc), low fragmentation (mimalloc), or your own bookkeeping logic, without touching a single Vec or Box in your code.

Quick Overview

Rust routes every heap allocation — Box, Vec, String, HashMap, Rc, and the rest — through one program-wide global allocator. By default that allocator is std::alloc::System, a thin wrapper over the platform’s malloc/free. You can replace it by writing a type that implements the unsafe GlobalAlloc trait and tagging a static instance of it with the #[global_allocator] attribute. The most common reasons to do this are performance (drop in jemalloc or mimalloc) and observability (count or cap allocations).

For a TypeScript/JavaScript developer, the headline is control with zero call-site churn: you change how memory is obtained, but the rest of your program — including all the std collections — keeps working unmodified. This is something the V8 heap simply does not let you do from JavaScript.

TypeScript/JavaScript Example

In JavaScript the allocator is sealed inside the engine. The closest you get is observing memory, never replacing the allocator:

1
// Node.js v22 — you can MEASURE heap usage, but you cannot replace malloc.
2
const before = process.memoryUsage().heapUsed;
3

4
// Allocate ~8 MB of small objects on the V8 heap.
5
const big: { id: number }[] = [];
6
for (let i = 0; i < 100_000; i++) {
7
  big.push({ id: i });
8
}
9

10
const after = process.memoryUsage().heapUsed;
11
console.log(`heap grew by ${((after - before) / 1024 / 1024).toFixed(1)} MB`);
12
// heap grew by ~8.5 MB (exact value varies by GC timing)
13

14
// You can NUDGE the allocator with V8 flags at startup:
15
//   node --max-old-space-size=512 app.js
16
// ...but you cannot say "use jemalloc instead of V8's allocator for this object",
17
// and there is no `[object]` hook to intercept every allocation.

Key points:

process.memoryUsage() observes the V8 heap; it cannot change the allocator.
Engine flags (--max-old-space-size, --max-semi-space-size) tune the GC, not the underlying malloc.
There is no per-program “use this allocator” switch and no allocation interception hook. The garbage collector decides when memory is reclaimed; you do not free anything explicitly.

Note: The Rust comparison here is not garbage-collected. Rust frees memory deterministically (when a value’s owner is dropped — see Ownership), and the allocator is the component that hands out and reclaims the underlying bytes. Customizing the allocator changes the bookkeeping, not the ownership rules.

Rust Equivalent

Two lines of setup swap the entire program over to mimalloc — and every Vec/Box/String you already wrote now goes through it:

First add the crate (network access required):

1
cargo add mimalloc

Then declare the global allocator:

1
use mimalloc::MiMalloc;
2

3
// This ONE attribute redirects every heap allocation in the whole program.
4
#[global_allocator]
5
static GLOBAL: MiMalloc = MiMalloc;
6

7
fn main() {
8
    // Nothing else changes. These all allocate through mimalloc now.
9
    let data: Vec<String> = (0..5).map(|i| format!("item {i}")).collect();
10
    println!("{data:?}");
11
}

Real output:

1
["item 0", "item 1", "item 2", "item 3", "item 4"]

And here is what the trait you are plugging into actually looks like when you write your own allocator — a wrapper around System that counts live bytes:

1
use std::alloc::{GlobalAlloc, Layout, System};
2
use std::sync::atomic::{AtomicUsize, Ordering};
3

4
// A counting allocator that forwards to the System allocator and tracks
5
// how many bytes are currently live.
6
struct Counting;
7

8
static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
9
static ALLOC_CALLS: AtomicUsize = AtomicUsize::new(0);
10

11
unsafe impl GlobalAlloc for Counting {
12
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
13
        // Forward to the real system allocator for the actual memory.
14
        let ptr = unsafe { System.alloc(layout) };
15
        if !ptr.is_null() {
16
            ALLOCATED.fetch_add(layout.size(), Ordering::Relaxed);
17
            ALLOC_CALLS.fetch_add(1, Ordering::Relaxed);
18
        }
19
        ptr
20
    }
21

22
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
23
        unsafe { System.dealloc(ptr, layout) };
24
        ALLOCATED.fetch_sub(layout.size(), Ordering::Relaxed);
25
    }
26
}
27

28
#[global_allocator]
29
static GLOBAL: Counting = Counting;
30

31
fn main() {
32
    let before = ALLOCATED.load(Ordering::Relaxed);
33
    let v: Vec<u64> = (0..1000).collect();
34
    let during = ALLOCATED.load(Ordering::Relaxed);
35
    println!("before allocating vec: {before} bytes live");
36
    println!("with a Vec<u64> of 1000 items: {during} bytes live");
37
    drop(v);
38
    println!("after drop: {} bytes live", ALLOCATED.load(Ordering::Relaxed));
39
    println!("total alloc() calls so far: {}", ALLOC_CALLS.load(Ordering::Relaxed));
40
}

Real output (the exact numbers vary by platform and by what std allocates at startup, but the shape is stable — the Vec<u64> adds 8000 bytes, then frees them on drop):

1
before allocating vec: 524 bytes live
2
with a Vec<u64> of 1000 items: 8524 bytes live
3
after drop: 1612 bytes live
4
total alloc() calls so far: 6

Note: The current stable toolchain is Rust 1.96.0 on the 2024 edition; cargo new selects it automatically. GlobalAlloc, #[global_allocator], and std::alloc::System are all long-stable — none of this needs nightly.

Detailed Explanation

The `GlobalAlloc` trait

GlobalAlloc lives in std::alloc and has exactly two required methods:

1
// (from the standard library — shown for reference)
2
pub unsafe trait GlobalAlloc {
3
    unsafe fn alloc(&self, layout: Layout) -> *mut u8;
4
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);
5
    // alloc_zeroed and realloc have default implementations you may override.
6
}

Walking through the pieces a TypeScript developer has never had to think about:

unsafe trait / unsafe impl. The trait is unsafe to implement because the compiler cannot verify that your alloc returns a block that is actually layout.size() bytes long and correctly aligned, nor that dealloc is given back a pointer your alloc produced. You promise those invariants by writing unsafe impl. (This is the inverse of an unsafe fn, which is unsafe to call. See Unsafe Rust.)

Layout. Every request carries a Layout: the (size, align) pair the allocation must satisfy. There is no “allocate me an object of unknown size” — the size and alignment are always known up front, because Rust types have a fixed, compile-time layout (see Memory Layout).

1
use std::alloc::Layout;
2

3
fn main() {
4
    // A Layout is the (size, alignment) pair the allocator must satisfy.
5
    let l = Layout::new::<[u64; 4]>();
6
    println!("[u64; 4]: size={} align={}", l.size(), l.align());
7

8
    let l2 = Layout::new::<u8>();
9
    println!("u8:       size={} align={}", l2.size(), l2.align());
10

11
    // Layout for a slice whose length you compute at runtime.
12
    let l3 = Layout::array::<u32>(10).unwrap();
13
    println!("[u32; 10]: size={} align={}", l3.size(), l3.align());
14
}

Real output:

1
[u64; 4]: size=32 align=8
2
u8:       size=1 align=1
3
[u32; 10]: size=40 align=4

*mut u8. alloc returns a raw pointer to the start of the block, or null on failure. Raw pointers are how you talk to allocators (see Raw Pointers). Vec/Box build their safe abstractions on top of this.
&self, not &mut self. The global allocator is shared across all threads simultaneously, so its methods take &self. Any internal state you keep (like the byte counter above) must be thread-safe — which is why the counting example uses AtomicUsize rather than a plain usize. (Atomics are covered in Atomic Operations.)

What `#[global_allocator]` does

The #[global_allocator] attribute marks one static as the allocator for the entire program (and everything it links, including dependencies). The compiler wires the language’s allocation “lang items” — the hidden hooks that Box::new, Vec::push, String, etc. call — to your static’s alloc/dealloc. You write zero changes at any call site; the redirection is global and automatic.

You may declare at most one #[global_allocator] per program, and it must be a static of a type implementing GlobalAlloc.

Forwarding vs. replacing

The counting allocator above is a forwarding (or “shim”) allocator: it does bookkeeping and then hands the real work to System. jemalloc and mimalloc are replacement allocators: their alloc talks to a completely different memory manager that often outperforms the system malloc under multi-threaded, high-churn workloads — the exact profile of a busy web server.

Key Differences

Aspect	JavaScript (Node/V8)	Rust
Who allocates?	The V8 engine; you cannot replace it	The global allocator — `System` by default, swappable
How to swap	Not possible from JS	One `#[global_allocator]` static
Reclamation	Garbage collector, non-deterministic	Deterministic `drop` → allocator’s `dealloc`
Interception hook	None	Implement `GlobalAlloc` yourself
Per-object choice	None	Stable global; per-collection allocators are nightly (`allocator_api`)
Tuning knobs	GC flags (`--max-old-space-size`)	Crate features + env vars (e.g. `MALLOC_CONF` for jemalloc)
Cost of swapping	N/A	Zero call-site changes; recompile only

The deepest conceptual difference: in JavaScript the allocator and the garbage collector are one inseparable, hidden subsystem. In Rust, ownership decides when memory is freed and the allocator decides how the bytes are obtained and returned — two independent concerns. Customizing the allocator never changes your program’s correctness or its drop timing; it only changes the byte-management strategy underneath.

Tip: Swapping to jemalloc or mimalloc is one of the highest-leverage, lowest-risk performance changes available to a Rust server. It is two lines of code and frequently buys double-digit-percent throughput gains on allocation-heavy, multi-threaded workloads — measure before and after with the techniques in Benchmarking.

Common Pitfalls

Pitfall 1: Forgetting `unsafe` on the `impl`

GlobalAlloc is an unsafe trait, so the implementation block must be unsafe impl, not plain impl.

1
use std::alloc::{GlobalAlloc, Layout, System};
2

3
struct MyAlloc;
4

5
// does not compile (error[E0200]): missing the `unsafe` keyword on the impl.
6
impl GlobalAlloc for MyAlloc {
7
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
8
        unsafe { System.alloc(layout) }
9
    }
10
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
11
        unsafe { System.dealloc(ptr, layout) }
12
    }
13
}
14

15
fn main() {}

The real compiler error:

1
error[E0200]: the trait `GlobalAlloc` requires an `unsafe impl` declaration
2
 --> src/main.rs:6:1
3
  |
4
6 | impl GlobalAlloc for MyAlloc {
5
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6
  |
7
  = note: the trait `GlobalAlloc` enforces invariants that the compiler can't check. Review the trait documentation and make sure this implementation upholds those invariants before adding the `unsafe` keyword
8
help: add `unsafe` to this trait implementation

The fix is exactly what the compiler says: write unsafe impl GlobalAlloc for MyAlloc.

Pitfall 2: Two `#[global_allocator]` declarations

You get exactly one. Declaring two (a classic mistake when you copy a snippet into a crate that already sets one) is a hard error.

1
use mimalloc::MiMalloc;
2
use tikv_jemallocator::Jemalloc;
3

4
#[global_allocator]
5
static A: MiMalloc = MiMalloc;
6

7
#[global_allocator] // does not compile: a second global allocator
8
static B: Jemalloc = Jemalloc;
9

10
fn main() {}

The real compiler error:

1
error: cannot define multiple global allocators
2
 --> src/main.rs:8:1
3
  |
4
5 | static A: MiMalloc = MiMalloc;
5
  | ------------------------------ previous global allocator defined here
6
6 |
7
7 | #[global_allocator]

This also bites if a dependency already sets a global allocator — only the final binary crate should choose one. Libraries should not declare #[global_allocator]; leave that decision to the application.

Pitfall 3: Allocating inside your allocator → infinite recursion → stack overflow

This is the single nastiest custom-allocator trap. Your alloc/dealloc hooks run on every allocation. If they themselves allocate — for example by calling println!, format!, or building a String for a log line — that inner allocation re-enters your hook, which allocates again, forever.

A version of the budget allocator (in the Real-World Example below) that called eprintln!("...") inside alloc produces, at runtime:

1
thread 'main' has overflowed its stack
2
fatal runtime error: stack overflow, aborting

Warning: Inside GlobalAlloc::alloc/dealloc, never do anything that allocates. Set an Atomic flag or update an AtomicUsize counter, then read/log it outside the hook. (eprintln! of a bare static &str can also trip startup machinery — the safe pattern is to record state in atomics and report it from normal code.)

Pitfall 4: Expecting a per-`Vec` allocator on stable

You may have seen Vec::new_in(my_alloc) and the Allocator trait. That per-collection allocator API (allocator_api) is still nightly-only as of Rust 1.96.0. On stable you choose the allocator once, globally, via #[global_allocator]. If you need region/arena allocation for a subset of your data on stable, reach for a crate like bumpalo (which gives you bumpalo::Bump and its own Vec/String types) rather than the nightly Allocator trait.

Pitfall 5: Forgetting jemalloc’s `unprefixed_malloc` / stats features

The tikv-jemalloc-ctl crate gates its statistics modules behind a Cargo feature. Importing stats without enabling it fails:

1
error[E0432]: unresolved import `tikv_jemalloc_ctl::stats`
2
  --> src/main.rs:2:32
3
   |
4
 2 | use tikv_jemalloc_ctl::{epoch, stats};
5
   |                                ^^^^^ no `stats` in the root
6
   |
7
note: found an item that was configured out
8
...
9
98 | #[cfg(feature = "stats")]
10
   |       ----------------- the item is gated behind the `stats` feature

Fix it with cargo add tikv-jemalloc-ctl --features stats.

Best Practices

Default to a battle-tested replacement allocator for servers. For multi-threaded, allocation-heavy services, dropping in jemalloc (tikv-jemallocator) or mimalloc (mimalloc) is a cheap, well-understood win. Pick based on measurement, not folklore.
Only the binary crate chooses. Never put #[global_allocator] in a library you publish — it would force the choice on every downstream user and collide with theirs.
Keep allocator hooks allocation-free and fast. They are on the hottest path in the program. Use atomics for any bookkeeping; never log, format, or lock a Mutex that could allocate inside them.
Forward to System unless you truly manage memory yourself. Most custom allocators are shims (count, cap, trace) that delegate the real work to System. Only write the actual byte management when you have a specific strategy (arena, pool, bump).
Measure, do not guess. Use the stats hooks (jemalloc’s tikv-jemalloc-ctl) and the profiling tools in Profiling and Benchmarking to confirm a swap actually helps your workload.
Reach for bumpalo for arenas on stable. If you want fast bump allocation for a batch of short-lived values, bumpalo is the idiomatic stable choice; reserve a custom GlobalAlloc for whole-program policy.

Real-World Example

A production-flavored use case that does not need a faster allocator but does benefit from a custom one: a memory budget guardrail for staging/test builds. It forwards every allocation to System, tracks the peak and current live bytes, and flips a flag if the program ever exceeds a configured budget — a cheap way to catch a memory regression in CI before it reaches production.

1
use std::alloc::{GlobalAlloc, Layout, System};
2
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
3

4
/// A global allocator that forwards to the System allocator but records the
5
/// peak live byte count and trips a flag if a budget is exceeded. Useful as a
6
/// debug/staging guardrail to catch runaway allocation in tests and CI.
7
struct BudgetAlloc {
8
    limit: usize,
9
}
10

11
static LIVE: AtomicUsize = AtomicUsize::new(0);
12
static PEAK: AtomicUsize = AtomicUsize::new(0);
13
static OVER_BUDGET: AtomicBool = AtomicBool::new(false);
14

15
unsafe impl GlobalAlloc for BudgetAlloc {
16
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
17
        let ptr = unsafe { System.alloc(layout) };
18
        if !ptr.is_null() {
19
            let now = LIVE.fetch_add(layout.size(), Ordering::Relaxed) + layout.size();
20
            PEAK.fetch_max(now, Ordering::Relaxed);
21
            if now > self.limit {
22
                // CRITICAL: never allocate inside the allocator. Just set a flag;
23
                // do NOT call println!/format! here (they allocate -> recursion).
24
                OVER_BUDGET.store(true, Ordering::Relaxed);
25
            }
26
        }
27
        ptr
28
    }
29

30
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
31
        unsafe { System.dealloc(ptr, layout) };
32
        LIVE.fetch_sub(layout.size(), Ordering::Relaxed);
33
    }
34
}
35

36
#[global_allocator]
37
static GLOBAL: BudgetAlloc = BudgetAlloc { limit: 4 * 1024 };
38

39
fn main() {
40
    let _small: Vec<u8> = vec![0; 1024]; // under budget
41
    let big: Vec<u8> = vec![0; 8 * 1024]; // exceeds the 4 KiB budget
42
    drop(big);
43

44
    // Safe to format/print HERE, outside the allocator hook.
45
    println!("peak live bytes: {}", PEAK.load(Ordering::Relaxed));
46
    println!("ever over budget: {}", OVER_BUDGET.load(Ordering::Relaxed));
47
}

Real output:

1
peak live bytes: 9740
2
ever over budget: true

Observability with jemalloc’s stats

If you ship jemalloc, you get rich, free statistics through tikv-jemalloc-ctl. Add both crates:

1
cargo add tikv-jemallocator
2
cargo add tikv-jemalloc-ctl --features stats

1
use tikv_jemallocator::Jemalloc;
2
use tikv_jemalloc_ctl::{epoch, stats};
3

4
#[global_allocator]
5
static GLOBAL: Jemalloc = Jemalloc;
6

7
fn main() {
8
    // jemalloc caches its statistics; advancing the "epoch" refreshes them.
9
    let e = epoch::mib().unwrap();
10
    let allocated = stats::allocated::mib().unwrap();
11
    let resident = stats::resident::mib().unwrap();
12

13
    let _big: Vec<u8> = vec![0; 10 * 1024 * 1024]; // 10 MiB
14

15
    e.advance().unwrap(); // refresh the cached statistics
16
    println!("allocated: {} bytes", allocated.read().unwrap());
17
    println!("resident:  {} bytes", resident.read().unwrap());
18
}

Real output (numbers vary by run; allocated tracks bytes handed to the program, resident tracks bytes jemalloc holds from the OS):

1
allocated: 10557528 bytes
2
resident:  15482880 bytes

This is the kind of per-process memory telemetry you would normally export to your metrics backend (see Metrics) — and it comes essentially for free once jemalloc is your allocator.

Exercises

Exercise 1: Track peak memory, not just current

Difficulty: Beginner

Objective: Extend the counting allocator so it also records the peak live byte count (the high-water mark), and override alloc_zeroed so zeroed allocations are tracked too.

Instructions:

Start from the counting allocator. Add a static PEAK: AtomicUsize. In alloc, after incrementing the live counter, update the peak with fetch_max. Add an alloc_zeroed override (forwarding to System.alloc_zeroed) that does the same bookkeeping. In main, allocate two large vectors inside a scope, let them drop, allocate a tiny one, then print both the current live bytes and the peak — the peak should be much larger than the live total.

Solution

1
use std::alloc::{GlobalAlloc, Layout, System};
2
use std::sync::atomic::{AtomicUsize, Ordering};
3

4
struct Tracking;
5

6
static LIVE: AtomicUsize = AtomicUsize::new(0);
7
static PEAK: AtomicUsize = AtomicUsize::new(0);
8

9
unsafe impl GlobalAlloc for Tracking {
10
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
11
        let ptr = unsafe { System.alloc(layout) };
12
        if !ptr.is_null() {
13
            let now = LIVE.fetch_add(layout.size(), Ordering::Relaxed) + layout.size();
14
            PEAK.fetch_max(now, Ordering::Relaxed);
15
        }
16
        ptr
17
    }
18

19
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
20
        unsafe { System.dealloc(ptr, layout) };
21
        LIVE.fetch_sub(layout.size(), Ordering::Relaxed);
22
    }
23

24
    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
25
        let ptr = unsafe { System.alloc_zeroed(layout) };
26
        if !ptr.is_null() {
27
            let now = LIVE.fetch_add(layout.size(), Ordering::Relaxed) + layout.size();
28
            PEAK.fetch_max(now, Ordering::Relaxed);
29
        }
30
        ptr
31
    }
32
}
33

34
#[global_allocator]
35
static GLOBAL: Tracking = Tracking;
36

37
fn main() {
38
    {
39
        let _a: Vec<u64> = (0..2000).collect();
40
        let _b: Vec<u64> = (0..2000).collect();
41
    } // both dropped here
42
    let _c: Vec<u8> = vec![0; 10];
43

44
    println!("live now: {} bytes", LIVE.load(Ordering::Relaxed));
45
    println!("peak:     {} bytes", PEAK.load(Ordering::Relaxed));
46
}

Real output (peak greatly exceeds the live total because the two big vectors were alive simultaneously):

1
live now: 534 bytes
2
peak:     32524 bytes

Exercise 2: Swap in mimalloc and confirm it changed nothing else

Difficulty: Beginner

Objective: Prove the “zero call-site churn” claim by running an allocation-heavy program first with the default allocator, then with mimalloc, with no other code changes.

Instructions:

Write a main that builds a Vec<String> of 100,000 formatted strings and prints its length. Run it as-is. Then cargo add mimalloc, add the two-line #[global_allocator] declaration at the top, and run again. The output (the length) must be identical; only the allocator underneath changed.

Solution

1
// After: cargo add mimalloc
2
use mimalloc::MiMalloc;
3

4
#[global_allocator]
5
static GLOBAL: MiMalloc = MiMalloc;
6

7
fn main() {
8
    let data: Vec<String> = (0..100_000).map(|i| format!("row-{i}")).collect();
9
    println!("built {} strings, last = {:?}", data.len(), data.last());
10
}

Real output:

1
built 100000 strings, last = Some("row-99999")

Remove the use line and the #[global_allocator] static, and the program prints the exact same line — the only thing that differs is which allocator served the 100,000 Strings. That is the whole point: allocator choice is orthogonal to program logic.

Exercise 3: A bump allocator with a `System` fallback

Difficulty: Advanced

Objective: Implement a real (not forwarding) global allocator: a fixed-size bump allocator that hands out aligned slices from a static arena by advancing an offset, and falls back to System once the arena is exhausted. Never frees individual arena allocations.

Instructions:

Create a 64 KiB static arena inside an UnsafeCell<[u8; N]> wrapped in a #[repr(align(16))] struct, with a manual unsafe impl Sync (synchronization is provided by an AtomicUsize offset). In alloc, round the current offset up to layout.align(), reserve layout.size() bytes with a compare_exchange_weak loop, and return base + aligned; if the request would overflow the arena, forward to System. In dealloc, free only pointers that fall outside the arena range (those came from the System fallback); arena pointers are never freed. Test it by boxing a value and building a small Vec, then print how many arena bytes were used.

Solution

1
use std::alloc::{GlobalAlloc, Layout, System};
2
use std::cell::UnsafeCell;
3
use std::sync::atomic::{AtomicUsize, Ordering};
4

5
const ARENA_SIZE: usize = 64 * 1024;
6

7
// Over-aligned so the arena's base satisfies common alignment requirements.
8
#[repr(align(16))]
9
struct Arena(UnsafeCell<[u8; ARENA_SIZE]>);
10

11
// SAFETY: all access is coordinated through the atomic `offset` in BumpAlloc.
12
unsafe impl Sync for Arena {}
13

14
struct BumpAlloc {
15
    arena: Arena,
16
    offset: AtomicUsize,
17
}
18

19
unsafe impl GlobalAlloc for BumpAlloc {
20
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
21
        let align = layout.align();
22
        let size = layout.size();
23
        let base = self.arena.0.get() as *mut u8;
24

25
        // Reserve an aligned slice with a CAS loop (lock-free, thread-safe).
26
        let mut old = self.offset.load(Ordering::Relaxed);
27
        loop {
28
            let aligned = (old + align - 1) & !(align - 1);
29
            let new = aligned + size;
30
            if new > ARENA_SIZE {
31
                // Arena full: fall back to the System allocator.
32
                return unsafe { System.alloc(layout) };
33
            }
34
            match self.offset.compare_exchange_weak(
35
                old, new, Ordering::Relaxed, Ordering::Relaxed,
36
            ) {
37
                Ok(_) => return unsafe { base.add(aligned) },
38
                Err(actual) => old = actual,
39
            }
40
        }
41
    }
42

43
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
44
        let base = self.arena.0.get() as *mut u8;
45
        let end = unsafe { base.add(ARENA_SIZE) };
46
        // Only free pointers that came from the System fallback.
47
        if ptr < base || ptr >= end {
48
            unsafe { System.dealloc(ptr, layout) };
49
        }
50
        // Arena allocations are never individually freed (that's the bump trade-off).
51
    }
52
}
53

54
#[global_allocator]
55
static GLOBAL: BumpAlloc = BumpAlloc {
56
    arena: Arena(UnsafeCell::new([0; ARENA_SIZE])),
57
    offset: AtomicUsize::new(0),
58
};
59

60
fn main() {
61
    let a = Box::new(42u64);
62
    let b: Vec<u8> = vec![7; 100];
63
    println!("boxed = {a}, vec[0] = {}, len = {}", b[0], b.len());
64
    println!("arena bytes used so far: {}", GLOBAL.offset.load(Ordering::Relaxed));
65
}

Real output (the exact byte count depends on what std allocates before main):

1
boxed = 42, vec[0] = 7, len = 100
2
arena bytes used so far: 1728

This is the core idea behind arena/bump allocation: allocation is just an atomic add, deallocation is free (literally a no-op), and you trade the ability to reclaim individual objects for raw speed. For a production-quality, scoped version on stable, use the bumpalo crate rather than wiring a bump allocator in globally.

Custom Allocators: GlobalAlloc, #[global_allocator], and Swapping in jemalloc / mimalloc

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

The GlobalAlloc trait

What #[global_allocator] does

Forwarding vs. replacing

Key Differences

Common Pitfalls

Pitfall 1: Forgetting unsafe on the impl

Pitfall 2: Two #[global_allocator] declarations

Pitfall 3: Allocating inside your allocator → infinite recursion → stack overflow

Pitfall 4: Expecting a per-Vec allocator on stable

Pitfall 5: Forgetting jemalloc’s unprefixed_malloc / stats features

Best Practices

Real-World Example

Observability with jemalloc’s stats

Further Reading

Exercises

Exercise 1: Track peak memory, not just current

Exercise 2: Swap in mimalloc and confirm it changed nothing else

Exercise 3: A bump allocator with a System fallback

Custom Allocators: `GlobalAlloc`, `#[global_allocator]`, and Swapping in jemalloc / mimalloc

The `GlobalAlloc` trait

What `#[global_allocator]` does

Pitfall 1: Forgetting `unsafe` on the `impl`

Pitfall 2: Two `#[global_allocator]` declarations

Pitfall 4: Expecting a per-`Vec` allocator on stable

Pitfall 5: Forgetting jemalloc’s `unprefixed_malloc` / stats features

Exercise 3: A bump allocator with a `System` fallback