`PhantomData` and Zero-Sized Types

22 min read

Sometimes you need a type to behave as if it owned, borrowed, or were parameterized over some T — for ownership, variance, lifetime, or thread-safety purposes — without actually storing a T. Rust’s answer is PhantomData<T>: a zero-sized marker that you place in a struct field to tell the compiler “pretend this type relates to T,” while costing exactly zero bytes at runtime. There is no TypeScript equivalent, because TypeScript types are erased at runtime and carry no ownership, lifetime, or thread-safety meaning.

Quick Overview

A zero-sized type (ZST) is a type whose values occupy 0 bytes — (), a fieldless struct, or PhantomData<T>. PhantomData<T> is the standard library’s purpose-built ZST: putting a PhantomData<T> field in your struct makes the type act as though it contains a T for the compiler’s analyses (drop-check, variance, auto-trait inference, lifetime tracking) without adding any storage. The two payoffs for a TypeScript/JavaScript developer are: (1) you can encode extra facts in the type system — “this ID belongs to a User,” “this connection is open,” “this value cannot leave its thread” — and (2) those facts are checked at compile time and then completely erased, so the abstraction is genuinely free.

Note: This file covers PhantomData and ZSTs specifically. The marker traits Send/Sync/Sized/Copy are introduced in marker traits; raw pointers and unsafe ownership live in Section 20 and Section 26. PhantomData shows up there because it is how you encode ownership over a raw pointer.

TypeScript/JavaScript Example

TypeScript developers reach for branded (nominal) types to make two structurally identical types distinct — for example, to stop a UserId being passed where an OrderId is expected. Since TypeScript is structurally typed, the trick is to intersect with a phantom property that exists only in the type system:

1
// TypeScript: "branded" types simulate nominal typing.
2
// The `__brand` field never exists at runtime — it is purely a type-level tag.
3
type Brand<T, B> = T & { readonly __brand: B };
4

5
type UserId = Brand<number, "UserId">;
6
type OrderId = Brand<number, "OrderId">;
7

8
function makeUserId(n: number): UserId {
9
  return n as UserId; // a cast; nothing is stored
10
}
11
function makeOrderId(n: number): OrderId {
12
  return n as OrderId;
13
}
14

15
function fetchUser(id: UserId): void {
16
  console.log(`fetching user ${id}`);
17
}
18

19
const userId = makeUserId(42);
20
const orderId = makeOrderId(42);
21

22
fetchUser(userId); // ok
23
// fetchUser(orderId);
24
// ^ Argument of type 'OrderId' is not assignable to parameter of type 'UserId'.
25
//   Type '"OrderId"' is not assignable to type '"UserId"'.

Key points for a TypeScript developer:

The __brand property is a lie: it is never written at runtime. userId is just the number 42. The brand exists only so the checker keeps the two types apart.
This is a workaround for TypeScript’s structural typing — without the brand, UserId and OrderId are both just number and freely interchangeable.
The brand carries no ownership, lifetime, or thread-safety meaning. TypeScript has no notion of any of those, because the type system is erased before the code ever runs.

PhantomData is Rust’s analogue of the brand — but it does far more than nominal tagging, because Rust’s type system tracks ownership, lifetimes, variance, and thread-safety, and PhantomData lets you participate in all of them.

Rust Equivalent

Here is the same “typed IDs” idea in Rust. We make Id<T> generic over a marker type, but the only data we store is a u64 — the T lives in a PhantomData<T>:

1
use std::marker::PhantomData;
2

3
// Same u64 representation, but the compiler keeps the two ID kinds distinct.
4
struct Id<T> {
5
    raw: u64,
6
    _marker: PhantomData<T>,
7
}
8

9
impl<T> Id<T> {
10
    fn new(raw: u64) -> Self {
11
        Id { raw, _marker: PhantomData }
12
    }
13
}
14

15
// The marker types. They are never constructed — they only exist as tags.
16
struct User;
17
struct Order;
18

19
fn fetch_user(id: Id<User>) {
20
    println!("fetching user {}", id.raw);
21
}
22

23
fn main() {
24
    let user_id: Id<User> = Id::new(42);
25
    let order_id: Id<Order> = Id::new(42);
26

27
    fetch_user(user_id);
28
    // fetch_user(order_id); // does not compile: expected `Id<User>`, found `Id<Order>`
29
    println!("order id raw = {}", order_id.raw);
30

31
    // The marker costs nothing: Id<User> is the same size as a bare u64.
32
    println!("size_of Id<User> = {}", std::mem::size_of::<Id<User>>());
33
    println!("size_of u64      = {}", std::mem::size_of::<u64>());
34
}

Running it:

1
fetching user 42
2
order id raw = 42
3
size_of Id<User> = 8
4
size_of u64      = 8

Id<User> and Id<Order> are different types the compiler refuses to mix, yet size_of::<Id<User>>() is 8 — identical to a raw u64. The PhantomData<T> field adds zero bytes. That is the headline: the safety is real, the cost is nothing.

Why not just struct Id<T> { raw: u64 } with no marker? Because Rust forbids unused type parameters. Without PhantomData, the compiler emits error[E0392]: type parameter 'T' is never used — see Common Pitfalls.

Detailed Explanation

What `PhantomData<T>` is

PhantomData<T> is defined in the standard library, roughly as pub struct PhantomData<T: ?Sized>; — a struct with no fields. Therefore:

Its size is 0. std::mem::size_of::<PhantomData<T>>() is always 0, no matter what T is.
You construct it by writing the bare value PhantomData (the type is inferred from the field’s declared type, e.g. _marker: PhantomData<T>).
It holds no T. It never runs T’s constructor or destructor on its own.

What it does is feed information to four of the compiler’s static analyses. When you write _marker: PhantomData<T>, the compiler treats your struct as if it contained a T for the purposes of:

Drop check — whether the compiler believes your type owns a T that will be dropped.
Variance — how subtyping of T (or a lifetime 'a) relates to subtyping of your struct.
Auto traits (Send/Sync) — whether your struct can move to / be shared with another thread.
Lifetime tracking — keeping a 'a “in use” so the borrow checker enforces it, even though you only store a pointer or offset.

The marker type itself (User, Order above) is usually a fieldless struct — itself a ZST — that you never instantiate. It is just a name the type system can distinguish.

Encoding ownership over a raw pointer

The most important correctness use of PhantomData is telling the compiler that your struct owns a T it only points at. Consider a hand-rolled Box-like container built over a raw pointer:

1
use std::marker::PhantomData;
2
use std::ptr::NonNull;
3

4
// A toy owning container built over a raw pointer. `PhantomData<T>` tells the
5
// compiler "this struct OWNS a T", which drives drop-check and variance.
6
struct MyBox<T> {
7
    ptr: NonNull<T>,
8
    _owns: PhantomData<T>,
9
}
10

11
impl<T> MyBox<T> {
12
    fn new(value: T) -> Self {
13
        let boxed = Box::new(value);
14
        let ptr = NonNull::new(Box::into_raw(boxed)).unwrap();
15
        MyBox { ptr, _owns: PhantomData }
16
    }
17
    fn get(&self) -> &T {
18
        // Safe: we created this pointer from a valid Box and still own it.
19
        unsafe { self.ptr.as_ref() }
20
    }
21
}
22

23
impl<T> Drop for MyBox<T> {
24
    fn drop(&mut self) {
25
        // Reconstruct the Box so T's destructor runs and the memory is freed.
26
        unsafe { drop(Box::from_raw(self.ptr.as_ptr())); }
27
    }
28
}
29

30
fn main() {
31
    let b = MyBox::new(String::from("owned heap data"));
32
    println!("value = {}", b.get());
33
    // `b` drops here: `Box::from_raw` frees the String, no leak.
34
}

1
value = owned heap data

The crucial detail: NonNull<T> is a raw pointer, and raw pointers do not express ownership. Without the _owns: PhantomData<T> field, the compiler would not know that MyBox<T> is responsible for a T, which can lead the drop-checker to permit unsound code in the presence of borrowed data with the same lifetime as the box. Adding PhantomData<T> makes MyBox<T> behave like the real Box<T> for the “does this own a T?” question. This is the canonical pattern documented in the Rustonomicon for any collection or smart pointer built over *mut T / NonNull<T>.

Encoding a lifetime without storing a reference

Sometimes you store an integer offset or a raw pointer that logically borrows from some buffer, and you want the borrow checker to keep that buffer alive. PhantomData<&'a T> does exactly that:

1
use std::marker::PhantomData;
2

3
// `Token` borrows from a `&'src str`, but stores only offsets — no reference.
4
// `PhantomData<&'src str>` keeps the borrow alive in the type system.
5
struct Token<'src> {
6
    start: usize,
7
    len: usize,
8
    _src: PhantomData<&'src str>,
9
}

Now a Token<'src> cannot outlive the &'src str it conceptually points into, even though at runtime it is just two usizes. The full lexer that produces these tokens is in Exercises.

Controlling variance and thread-safety with the “right” `PhantomData`

The type you put inside PhantomData matters, because different forms convey different ownership/variance/auto-trait facts. The four canonical forms:

You write	”Owns a `T`”? (drop-check)	Variance in `T`	`Send`/`Sync`	Typical use
`PhantomData<T>`	Yes	covariant	inherits from `T`	container/smart pointer that owns a `T`
`PhantomData<&'a T>`	No	covariant	inherits (`Sync` if `T: Sync`)	a shared borrow you only model
`PhantomData<*const T>`	No	covariant	neither (`!Send`, `!Sync`)	a tag that must stay on one thread
`PhantomData<fn() -> T>`	No	covariant	both (`Send + Sync`)	a pure type tag with no ownership

For a “nominal tag” like Id<T> or units of measure, PhantomData<fn() -> T> is often the most conservative choice: it claims no ownership and stays Send + Sync regardless of T. This is verifiable — PhantomData<fn() -> T> is Send even when T is the !Send type Rc<i32>:

1
use std::marker::PhantomData;
2
use std::rc::Rc;
3

4
fn assert_send<T: Send>() {}
5

6
struct UsesT<T> {
7
    _marker: PhantomData<fn() -> T>,
8
}
9

10
fn main() {
11
    // Rc<i32> is !Send, yet PhantomData<fn() -> Rc<i32>> is still Send.
12
    assert_send::<UsesT<Rc<i32>>>();
13
    println!("PhantomData<fn() -> T> is Send even when T is !Send");
14
}

1
PhantomData<fn() -> T> is Send even when T is !Send

Conversely, PhantomData<*const ()> is the standard way to make a struct !Send and !Sync — see the thread-safety pitfall below.

Zero-sized types more broadly

PhantomData is one ZST, but ZSTs are a general concept. The unit type (), an empty struct, and an empty enum’s inhabited unit variants all have size 0. The compiler and standard library exploit this:

1
use std::collections::HashMap;
2
use std::mem::{size_of, size_of_val};
3

4
struct Marker; // a zero-sized type (ZST)
5

6
fn main() {
7
    println!("size_of::<()>()      = {}", size_of::<()>());
8
    println!("size_of::<Marker>()  = {}", size_of::<Marker>());
9

10
    // A Vec of 1000 ZSTs allocates no heap memory for its elements.
11
    let zeros: Vec<()> = vec![(); 1000];
12
    println!("len = {}, but elements occupy 0 bytes", zeros.len());
13

14
    // HashSet<K> is literally HashMap<K, ()> under the hood — value is a ZST.
15
    let mut set: HashMap<&str, ()> = HashMap::new();
16
    set.insert("a", ());
17
    set.insert("b", ());
18
    println!("set has {} keys; each value is {} bytes", set.len(), size_of::<()>());
19

20
    let m = Marker;
21
    println!("size_of_val(&m) = {}", size_of_val(&m));
22
}

1
size_of::<()>()      = 0
2
size_of::<Marker>()  = 0
3
len = 1000, but elements occupy 0 bytes
4
set has 2 keys; each value is 0 bytes
5
size_of_val(&m) = 0

std::collections::HashSet<K> really is a thin wrapper over HashMap<K, ()> — the () value is a ZST, so a set costs the same as the map’s keys alone. Likewise a Vec<()> of a million elements allocates nothing for the elements; it just tracks the length.

Key Differences

Concept	TypeScript / JavaScript	Rust
Purpose of phantom field	Nominal tagging only (brands)	Ownership, variance, lifetimes, thread-safety, and tagging
Runtime presence	Erased; the value is the underlying primitive	Erased too — `PhantomData` is genuinely 0 bytes
Checked when?	At type-check (then erased before runtime)	At compile time (then monomorphized + erased)
Distinguishing two identical shapes	`& { __brand }` intersection workaround	Generic param + `PhantomData<T>`, fully nominal
Lifetime / ownership meaning	None — no such concepts exist	`PhantomData<&'a T>`, `PhantomData<T>` model exactly these
Thread-safety meaning	None — no compile-time threading model	`PhantomData<*const T>` opts out of `Send`/`Sync`

The conceptual leap for a TypeScript developer: a brand is only a tag, but PhantomData is a participant in Rust’s ownership and borrow analyses. Rust needs PhantomData precisely because its type system tracks things TypeScript’s does not. When you wrote as UserId in TypeScript, nothing was being protected at runtime; when you put PhantomData<T> in a Rust struct, you may be the difference between sound and unsound memory management.

Note: Unlike a TypeScript brand, PhantomData<T> can change whether your type compiles at all (drop-check, Send/Sync) — it is not a no-op annotation you can sprinkle freely. Pick the form that matches the real ownership relationship.

Common Pitfalls

Pitfall 1: Unused type parameter without `PhantomData`

The first thing every TypeScript developer tries is a generic struct that “remembers” T without storing it:

1
struct Id<T> {
2
    raw: u64,
3
}
4

5
fn main() {
6
    let _id: Id<String> = Id { raw: 1 };
7
}

This does not compile. The real error:

1
error[E0392]: type parameter `T` is never used
2
 --> src/main.rs:1:11
3
  |
4
1 | struct Id<T> {
5
  |           ^ unused type parameter
6
  |
7
  = help: consider removing `T`, referring to it in a field, or using a marker such as `PhantomData`
8
  = help: if you intended `T` to be a const parameter, use `const T: /* Type */` instead

The compiler itself suggests the fix: add a PhantomData<T> field. Rust forbids unused generic parameters because the parameter affects variance and drop-check, and the compiler needs you to state how T relates to the struct.

Pitfall 2: Forgetting to actually construct the field

Once you add _marker: PhantomData<T>, you must initialize it in every constructor — but it is just the literal PhantomData:

1
use std::marker::PhantomData;
2

3
struct Wrapper<T> {
4
    value: i32,
5
    _marker: PhantomData<T>,
6
}
7

8
impl<T> Wrapper<T> {
9
    fn new(value: i32) -> Self {
10
        // Correct: the field is initialized with the bare `PhantomData` value.
11
        Wrapper { value, _marker: PhantomData }
12
    }
13
}
14

15
fn main() {
16
    let _w: Wrapper<String> = Wrapper::new(7);
17
    println!("ok");
18
}

If you omit the field you get error[E0063]: missing field '_marker' in initializer of 'Wrapper<T>'. There is no runtime work — PhantomData is the unit-like value of a zero-sized type.

Pitfall 3: Assuming a tagged type is `Send` — or that it is not

If you use PhantomData<*const T> (or *mut T) to model a pointer, you silently make the whole struct !Send and !Sync. That is usually desirable for thread-bound handles, but surprising if you only wanted a tag. Here is the intended use — a handle that must never leave its creating thread:

1
use std::marker::PhantomData;
2
use std::thread;
3

4
struct ThreadBound {
5
    handle: usize,
6
    _not_send: PhantomData<*const ()>,
7
}
8

9
fn main() {
10
    let bound = ThreadBound { handle: 1, _not_send: PhantomData };
11
    let join = thread::spawn(move || {
12
        let b = bound; // capture the whole struct
13
        println!("{}", b.handle);
14
    });
15
    join.join().unwrap();
16
}

This is // does not compile. The real error:

1
error[E0277]: `*const ()` cannot be sent between threads safely
2
   --> src/main.rs:11:30
3
    |
4
 11 |       let join = thread::spawn(move || {
5
    |                  ------------- ^------
6
    |                  |             |
7
    |  ________________|_____________within this `{closure@src/main.rs:11:30: 11:37}`
8
    | |                |
9
    | |                required by a bound introduced by this call
10
 12 | |         let b = bound; // capture the whole struct
11
 13 | |         println!("{}", b.handle);
12
 14 | |     });
13
    | |_____^ `*const ()` cannot be sent between threads safely
14
    |
15
    = help: within `{closure@src/main.rs:11:30: 11:37}`, the trait `Send` is not implemented for `*const ()`
16
note: required because it appears within the type `PhantomData<*const ()>`
17
note: required because it appears within the type `ThreadBound`
18
note: required because it's used within this closure
19
note: required by a bound in `spawn`

If you wanted the struct to remain Send, use PhantomData<fn() -> T> (or PhantomData<T> when T: Send) instead of PhantomData<*const T>.

Warning: Disjoint closure captures (stable since the 2021 edition) mean a move closure that only touches bound.handle would capture just the usize and compile fine. Capturing the whole struct (as above) is what surfaces the !Send constraint. Do not rely on accidental field-level capture to dodge thread-safety — it is a footgun.

Pitfall 4: Using `PhantomData<T>` when you do not own a `T`

PhantomData<T> claims ownership of a T for drop-check. If your struct merely borrows a T (e.g. holds a &T you reconstruct manually), use PhantomData<&'a T> instead. Over-claiming ownership can make otherwise-valid programs fail to compile (the drop-checker becomes stricter than necessary). Match the marker to the real relationship.

Best Practices

Name the field with a leading underscore (_marker, _owns, _state) to signal “this is intentionally unused storage” and silence dead-code lints.
Choose the marker form deliberately using the variance/Send/Sync table above. For a pure nominal tag with no ownership, PhantomData<fn() -> T> is the safest default; for an owning raw-pointer container, use PhantomData<T>; for a borrowed view, PhantomData<&'a T>.
Prefer the typestate pattern (a generic state parameter held in PhantomData) over runtime boolean flags when an invalid state should be unrepresentable. The compiler then rejects misuse instead of you writing runtime checks. See the Real-World example.
Keep marker types fieldless and never construct them — struct Open; not struct Open {} with data. They exist only as type-level names.
Reach for ZSTs to express “no data, only meaning”: an empty struct implementing a trait, a unit value in a map (HashMap<K, ()>), or a strategy/dispatch tag. They compile away entirely.
Do not over-reach. If a plain newtype (struct UserId(u64);) already gives you the distinctness you need without generics, use that — it is simpler. Reach for PhantomData<T> when you need to be generic over the tag, model a lifetime/ownership relationship, or build typestate.

Real-World Example

A production-grade use of PhantomData is the typestate pattern: encode an object’s state in its type so that methods only valid in one state are unavailable in others, enforced at compile time. Here, a database/socket connection cannot be sent on before it is opened, and cannot be opened twice — and there is zero runtime cost, because the state lives entirely in a PhantomData:

1
use std::marker::PhantomData;
2

3
// State markers — fieldless ZSTs that are never constructed.
4
struct Open;
5
struct Closed;
6

7
struct Connection<State> {
8
    socket_fd: i32,
9
    _state: PhantomData<State>,
10
}
11

12
// Methods available only on a *closed* connection.
13
impl Connection<Closed> {
14
    fn new(fd: i32) -> Self {
15
        Connection { socket_fd: fd, _state: PhantomData }
16
    }
17
    fn open(self) -> Connection<Open> {
18
        println!("opening fd {}", self.socket_fd);
19
        Connection { socket_fd: self.socket_fd, _state: PhantomData }
20
    }
21
}
22

23
// Methods available only on an *open* connection.
24
impl Connection<Open> {
25
    fn send(&self, msg: &str) {
26
        println!("send on fd {}: {msg}", self.socket_fd);
27
    }
28
    fn close(self) -> Connection<Closed> {
29
        println!("closing fd {}", self.socket_fd);
30
        Connection { socket_fd: self.socket_fd, _state: PhantomData }
31
    }
32
}
33

34
fn main() {
35
    let conn = Connection::<Closed>::new(7);
36
    let conn = conn.open();    // Closed -> Open
37
    conn.send("hello");        // only valid because `conn` is Open
38
    let _conn = conn.close();  // Open -> Closed
39

40
    // conn.send("again");     // does not compile: `conn` was moved into close()
41
    // Connection::<Closed>::new(7).send("x"); // does not compile: no `send` on Closed
42

43
    // The state tag is free: Connection<Open> is the same size as its only real field.
44
    println!(
45
        "size_of Connection<Open> = {}",
46
        std::mem::size_of::<Connection<Open>>()
47
    );
48
    println!("size_of i32             = {}", std::mem::size_of::<i32>());
49
}

1
opening fd 7
2
send on fd 7: hello
3
closing fd 7
4
size_of Connection<Open> = 4
5
size_of i32             = 4

A few things to notice. send exists only in impl Connection<Open>, so calling it on a Connection<Closed> is not a runtime error — it does not type-check at all. The state transitions consume self and return a new type (open(self) -> Connection<Open>), so you cannot accidentally keep using the old state. And size_of::<Connection<Open>>() is 4 — identical to the lone i32 field. The entire state machine is enforced by the compiler and then erased. Real libraries use this pattern extensively: HTTP request builders that require a URL before send(), embedded HAL crates that model GPIO pins as input/output at the type level, and parser combinators that track whether input remains.

A second realistic use is units of measure, where the unit is a phantom tag that prevents mixing dimensions:

1
use std::marker::PhantomData;
2
use std::ops::Add;
3

4
#[derive(Debug, Clone, Copy)]
5
struct Quantity<Unit> {
6
    value: f64,
7
    _unit: PhantomData<Unit>,
8
}
9

10
impl<Unit> Quantity<Unit> {
11
    const fn new(value: f64) -> Self {
12
        Quantity { value, _unit: PhantomData }
13
    }
14
}
15

16
// Addition is allowed only within the SAME unit.
17
impl<Unit> Add for Quantity<Unit> {
18
    type Output = Quantity<Unit>;
19
    fn add(self, rhs: Self) -> Self::Output {
20
        Quantity::new(self.value + rhs.value)
21
    }
22
}
23

24
struct Meters;
25
struct Seconds;
26

27
fn main() {
28
    let distance = Quantity::<Meters>::new(100.0) + Quantity::<Meters>::new(50.0);
29
    let time = Quantity::<Seconds>::new(9.58);
30

31
    // let bad = distance + time; // does not compile: mismatched units
32
    //   error[E0308]: mismatched types
33
    //   expected `Quantity<Meters>`, found `Quantity<Seconds>`
34

35
    println!("distance = {} m, time = {} s", distance.value, time.value);
36
    println!(
37
        "size_of Quantity<Meters> = {} (a bare f64 is {})",
38
        std::mem::size_of::<Quantity<Meters>>(),
39
        std::mem::size_of::<f64>()
40
    );
41
}

1
distance = 150 m, time = 9.58 s
2
size_of Quantity<Meters> = 8 (a bare f64 is 8)

Adding meters to seconds is a compile error (error[E0308]: mismatched types ... expected 'Quantity<Meters>', found 'Quantity<Seconds>'), while Quantity<Meters> is byte-for-byte an f64. The uom crate generalizes this to the full SI system using exactly this technique.

Exercises

Exercise 1: Tagged sanitized strings

Difficulty: Beginner

Objective: Use a phantom state parameter to make “unsanitized” and “sanitized” user input distinct types, so only sanitized input can be rendered.

Instructions: Define marker types Raw and Sanitized, and a UserInput<State> struct holding a String plus a PhantomData<State>. Provide UserInput::<Raw>::new(...) and a sanitize(self) -> UserInput<Sanitized> method that escapes < and >. Add a render(&self) -> &str method available only on UserInput<Sanitized>. Prove that you cannot call render on raw input.

Solution

1
use std::marker::PhantomData;
2

3
struct Raw;
4
struct Sanitized;
5

6
struct UserInput<State> {
7
    text: String,
8
    _state: PhantomData<State>,
9
}
10

11
impl UserInput<Raw> {
12
    fn new(text: impl Into<String>) -> Self {
13
        UserInput { text: text.into(), _state: PhantomData }
14
    }
15
    fn sanitize(self) -> UserInput<Sanitized> {
16
        let cleaned = self.text.replace('<', "&lt;").replace('>', "&gt;");
17
        UserInput { text: cleaned, _state: PhantomData }
18
    }
19
}
20

21
impl UserInput<Sanitized> {
22
    // `render` exists only for Sanitized input, so `raw.render()` won't compile.
23
    fn render(&self) -> &str {
24
        &self.text
25
    }
26
}
27

28
fn main() {
29
    let raw = UserInput::<Raw>::new("<script>alert(1)</script>");
30
    // raw.render(); // does not compile: no method `render` on UserInput<Raw>
31
    let safe = raw.sanitize();
32
    println!("rendered: {}", safe.render());
33
}

1
rendered: &lt;script&gt;alert(1)&lt;/script&gt;

Exercise 2: A lifetime-tied lexer token

Difficulty: Intermediate

Objective: Build a Token<'src> that stores only byte offsets but is tied via PhantomData<&'src str> to the source string it came from, so it cannot outlive that source.

Instructions: Write a Lexer<'src> over a &'src str with a next_word(&mut self) -> Option<Token<'src>> method that skips spaces and returns the start offset and length of each word. The returned Token<'src> must carry a PhantomData<&'src str> so the borrow checker keeps the source alive. In main, iterate the tokens and resolve each back to a &str slice of the source.

Solution

1
use std::marker::PhantomData;
2

3
struct Token<'src> {
4
    start: usize,
5
    len: usize,
6
    _src: PhantomData<&'src str>,
7
}
8

9
struct Lexer<'src> {
10
    source: &'src str,
11
    pos: usize,
12
}
13

14
impl<'src> Lexer<'src> {
15
    fn new(source: &'src str) -> Self {
16
        Lexer { source, pos: 0 }
17
    }
18
    fn next_word(&mut self) -> Option<Token<'src>> {
19
        let bytes = self.source.as_bytes();
20
        while self.pos < bytes.len() && bytes[self.pos] == b' ' {
21
            self.pos += 1;
22
        }
23
        if self.pos >= bytes.len() {
24
            return None;
25
        }
26
        let start = self.pos;
27
        while self.pos < bytes.len() && bytes[self.pos] != b' ' {
28
            self.pos += 1;
29
        }
30
        Some(Token { start, len: self.pos - start, _src: PhantomData })
31
    }
32
}
33

34
fn main() {
35
    let source = String::from("phantom data is free");
36
    let mut lexer = Lexer::new(&source);
37
    while let Some(tok) = lexer.next_word() {
38
        println!("token: {:?}", &source[tok.start..tok.start + tok.len]);
39
    }
40
    println!("size_of Token = {}", std::mem::size_of::<Token>());
41
}

1
token: "phantom"
2
token: "data"
3
token: "is"
4
token: "free"
5
size_of Token = 16

The Token is just two usizes (16 bytes on a 64-bit target); the PhantomData<&'src str> adds nothing but ties the token’s lifetime to the source.

Exercise 3: A thread-bound handle

Difficulty: Advanced

Objective: Use PhantomData<*const ()> to build a handle that the compiler refuses to move to another thread, while keeping it usable on the thread that created it.

Instructions: Define a GlHandle struct holding a u32 id and a PhantomData<*const ()> field. Add new and bind(&self) methods. Confirm it works on the current thread, then (in prose or a commented-out block) explain what happens if you thread::spawn a closure that moves the handle.

Solution

1
use std::marker::PhantomData;
2

3
struct GlHandle {
4
    id: u32,
5
    // `*const ()` is neither Send nor Sync, so GlHandle inherits !Send + !Sync.
6
    _not_send: PhantomData<*const ()>,
7
}
8

9
impl GlHandle {
10
    fn new(id: u32) -> Self {
11
        GlHandle { id, _not_send: PhantomData }
12
    }
13
    fn bind(&self) {
14
        println!("binding GL handle {}", self.id);
15
    }
16
}
17

18
fn main() {
19
    let h = GlHandle::new(1);
20
    h.bind(); // fine on the current thread
21

22
    // std::thread::spawn(move || { h.bind(); });
23
    // ^ does not compile: error[E0277] `*const ()` cannot be sent between
24
    //   threads safely — `GlHandle` is !Send because of the PhantomData marker.
25

26
    println!("size_of GlHandle = {} (just the u32)", std::mem::size_of::<GlHandle>());
27
}

1
binding GL handle 1
2
size_of GlHandle = 4 (just the u32)

The handle behaves exactly like a non-thread-safe resource (think OpenGL contexts, FFI handles, or anything !Send), yet costs only the bytes of its real u32 field. The PhantomData<*const ()> is the idiomatic way to opt a type out of Send/Sync without unsafe negative impls.

PhantomData and Zero-Sized Types

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

What PhantomData<T> is

Encoding ownership over a raw pointer

Encoding a lifetime without storing a reference

Controlling variance and thread-safety with the “right” PhantomData

Zero-sized types more broadly

Key Differences

Common Pitfalls

Pitfall 1: Unused type parameter without PhantomData

Pitfall 2: Forgetting to actually construct the field

Pitfall 3: Assuming a tagged type is Send — or that it is not

Pitfall 4: Using PhantomData<T> when you do not own a T

Best Practices

Real-World Example

Further Reading

Exercises

Exercise 1: Tagged sanitized strings

Exercise 2: A lifetime-tied lexer token

Exercise 3: A thread-bound handle

`PhantomData` and Zero-Sized Types

What `PhantomData<T>` is

Controlling variance and thread-safety with the “right” `PhantomData`

Pitfall 1: Unused type parameter without `PhantomData`

Pitfall 3: Assuming a tagged type is `Send` — or that it is not

Pitfall 4: Using `PhantomData<T>` when you do not own a `T`