Cache-Friendly Code: Data-Oriented Design

19 min read

How you lay out data in memory often matters more than which algorithm you run over it. This topic covers data-oriented design, the Struct-of-Arrays (SoA) versus Array-of-Structs (AoS) trade-off, and why a flat, contiguous Vec beats pointer-chasing data structures on modern hardware.

Quick Overview

A modern CPU core can do billions of arithmetic operations per second, but a single fetch from main memory costs roughly a couple hundred cycles. To hide that gap, the CPU has small, fast caches (L1/L2/L3) and loads memory in fixed-size cache lines (64 bytes on x86-64 and Apple Silicon). Code that reads memory in straight, predictable, contiguous runs keeps those lines full of useful data and lets the hardware prefetcher stay ahead of you; code that hops between scattered heap allocations stalls waiting on memory.

For a TypeScript/JavaScript developer this is mostly invisible: the V8 engine boxes objects, manages a garbage-collected heap, and hides layout behind hidden classes. In Rust you decide the layout, so you can deliberately make data cache-friendly: store the fields you iterate over together and contiguously, and avoid chasing pointers.

Note: This is the layout side of performance. Its sibling memory-layout.md covers the size and alignment of a single struct (field ordering, #[repr], niche optimization). This file is about the layout of collections of data.

TypeScript/JavaScript Example

In JavaScript, an array of objects is an Array of Structs (AoS) — but with an extra level of indirection you cannot remove. Each object is a separately allocated, garbage-collected heap cell, and the array holds references (pointers) to them.

1
// A particle simulation, the way you'd naturally write it in TypeScript.
2
interface Particle {
3
  x: number;
4
  y: number;
5
  z: number;
6
  vx: number;
7
  vy: number;
8
  vz: number;
9
  hp: number;
10
  name: string;
11
}
12

13
const particles: Particle[] = [];
14
for (let i = 0; i < 1_000_000; i++) {
15
  particles.push({ x: i, y: 0, z: 0, vx: 1, vy: 0.5, vz: 0.25, hp: 100, name: "p" });
16
}
17

18
// A common query: average the x coordinate. We touch ONE field of eight,
19
// but each `p` is a separate heap object reached through a pointer.
20
function averageX(ps: Particle[]): number {
21
  let sum = 0;
22
  for (const p of ps) sum += p.x;
23
  return sum / ps.length;
24
}
25

26
console.log(particles[0]); // { x: 0, y: 0, z: 0, vx: 1, ... name: 'p' }
27
console.log(averageX(particles));

When you want true contiguity in JavaScript, you reach for typed arrays — a manual Struct-of-Arrays:

1
// Struct-of-Arrays via typed arrays: each "column" is a flat, contiguous buffer.
2
const N = 1_000_000;
3
const xs = new Float64Array(N);
4
const ys = new Float64Array(N);
5
const vxs = new Float64Array(N);
6
// ...one TypedArray per field...
7

8
for (let i = 0; i < N; i++) {
9
  xs[i] = i;
10
  vxs[i] = 1;
11
}
12

13
// Now `averageX` reads one tightly-packed buffer — no per-element pointer hop.
14
function averageX(xs: Float64Array): number {
15
  let sum = 0;
16
  for (let i = 0; i < xs.length; i++) sum += xs[i];
17
  return sum / xs.length;
18
}

A Float64Array is genuinely contiguous: new Float64Array(2).byteLength is 16 — exactly two 8-byte slots, no boxing. This is the closest JavaScript gets to manual memory layout, and it is exactly the SoA idea we will make idiomatic in Rust. The catch in JS: typed arrays only hold numbers, so anything richer (a name string) has to live in a parallel plain array, and you lose the ergonomics of a real object.

Rust Equivalent

In Rust, Vec<Particle> is already contiguous — the structs are stored inline, back-to-back, with no per-element pointer. That is a big head start over JavaScript’s array-of-references. But it is still Array of Structs: to read one field you stride over every other field too. The data-oriented alternative is Struct of Arrays, where each field is its own Vec.

1
// Array of Structs (AoS): the natural, OOP-flavored layout.
2
#[derive(Clone)]
3
struct Particle {
4
    x: f32,
5
    y: f32,
6
    z: f32,
7
    vx: f32,
8
    vy: f32,
9
    vz: f32,
10
    hp: i32,
11
    name: String,
12
}
13

14
// Struct of Arrays (SoA): one contiguous Vec per field ("column").
15
#[derive(Default)]
16
struct Particles {
17
    x: Vec<f32>,
18
    y: Vec<f32>,
19
    z: Vec<f32>,
20
    vx: Vec<f32>,
21
    vy: Vec<f32>,
22
    vz: Vec<f32>,
23
    hp: Vec<i32>,
24
    name: Vec<String>,
25
}
26

27
impl Particles {
28
    fn push(&mut self, x: f32, y: f32, vx: f32, vy: f32) {
29
        self.x.push(x);
30
        self.y.push(y);
31
        self.z.push(0.0);
32
        self.vx.push(vx);
33
        self.vy.push(vy);
34
        self.vz.push(0.0);
35
        self.hp.push(100);
36
        self.name.push(String::from("p"));
37
    }
38

39
    fn len(&self) -> usize {
40
        self.x.len()
41
    }
42

43
    // Reading the average x now streams one tight Vec<f32> — every byte
44
    // pulled into cache is a value we actually use.
45
    fn average_x(&self) -> f32 {
46
        self.x.iter().sum::<f32>() / self.len() as f32
47
    }
48
}
49

50
fn main() {
51
    let mut ps = Particles::default();
52
    for i in 0..1_000_000 {
53
        ps.push(i as f32, 0.0, 1.0, 0.5);
54
    }
55
    println!("count: {}", ps.len());
56
    println!("average x: {}", ps.average_x());
57
}

The two layouts hold the same data; they differ only in where the bytes sit. For a query that touches one field, SoA loads only that field’s bytes, while AoS drags the whole struct (including the name heap pointer and the unused stats) through cache.

Detailed Explanation

Why a cache line is the unit that matters

The CPU never loads one f32. It loads the whole 64-byte cache line containing it. So the real question for any loop is: of the 64 bytes I just paid to fetch, how many will I actually use?

Consider summing the x field of a “fat” entity. Here is the struct from the benchmark below; std::mem::size_of reports its real size:

1
#[derive(Clone)]
2
struct Entity {
3
    x: f32, y: f32, z: f32,
4
    vx: f32, vy: f32, vz: f32,
5
    hp: i32, mana: i32, level: i32, xp: u64,
6
    name: String,
7
    inventory: [u32; 32], // 128 bytes of cold data
8
    flags: u64,
9
    cooldowns: [f32; 8],  // 32 bytes
10
}
11

12
fn main() {
13
    println!("size_of::<Entity>() = {} bytes", std::mem::size_of::<Entity>());
14
}

Real output:

1
size_of::<Entity>() = 240 bytes

AoS (Vec<Entity>): each element is 240 bytes. To read the 4-byte x, the CPU loads the cache line(s) holding that element, and the prefetcher streams in the neighbors — but those neighbors are mostly inventory, cooldowns, and a String pointer you never touch in this loop. You use about 4 of every ~240 bytes you bring in: under 2% of the bandwidth is doing useful work.
SoA (Vec<f32> for x): the x values are packed 16 per 64-byte line. Every loaded byte is a value you sum, and the compiler can autovectorize the loop into SIMD adds because the data is a flat f32 stream.

The benchmark

Measured with criterion (which handles warm-up and statistics so the numbers are trustworthy) on the Entity above, summing only the x field across 1,000,000 elements:

1
sum_x/aos               time:   [4.1552 ms 4.2436 ms 4.3890 ms]
2
sum_x/soa               time:   [1.0204 ms 1.1019 ms 1.2093 ms]

That is roughly a 4x speedup for SoA on this machine — purely from layout, with identical arithmetic. The exact ratio is hardware- and load-dependent (re-runs on the same laptop landed between about 3x and 5x), so reproduce it on your own target rather than quoting a fixed figure. The direction is the reliable part: the wider the struct relative to the field you touch, the bigger the SoA win.

Warning: Do not take a single Instant::now() micro-measurement as gospel — first-touch page faults, allocator warmth, and background load can swing a naive timing by 5x or more. Always confirm a layout change with a real benchmark harness. See benchmarking.md and when-to-optimize.md.

When AoS is actually fine (honesty check)

SoA is not a free win. When a loop touches most of a struct’s fields, AoS keeps that struct’s bytes together on one cache line, so reading vx and writing x for the same particle is already local. In that situation a benchmark of a full position-integration loop (touching x, y, z, vx, vy, vz) showed AoS tying or beating SoA, because SoA then juggles several separate memory streams and bounds checks. SoA pays off specifically when:

you frequently process a subset of fields (“give me every x”), and/or
the struct is large with cold fields you rarely read, and/or
you want SIMD — a flat Vec<f32> autovectorizes; an Vec<Struct> usually does not.

This honest “it depends” is the whole point of data-oriented design: organize data around how it is accessed, not around real-world taxonomy.

Pointer-chasing is the real villain

The opposite of contiguous data is a structure where each element lives in its own heap allocation and you reach the next one by dereferencing a pointer — a linked list, a tree of Boxes, a graph of Rcs. Each hop is a potential cache miss the prefetcher cannot predict, because the address of the next node is only known after you have loaded the current one (a data dependency).

1
use std::time::Instant;
2
use std::hint::black_box;
3

4
const N: usize = 5_000_000;
5

6
struct Node {
7
    value: u64,
8
    next: Option<Box<Node>>,
9
}
10

11
fn main() {
12
    // Contiguous: a flat Vec.
13
    let contiguous: Vec<u64> = (0..N as u64).collect();
14

15
    // Pointer-chasing: a singly linked list of separate heap allocations.
16
    let mut head: Option<Box<Node>> = None;
17
    for v in (0..N as u64).rev() {
18
        head = Some(Box::new(Node { value: v, next: head }));
19
    }
20

21
    let _: u64 = contiguous.iter().sum(); // warm up
22

23
    let t = Instant::now();
24
    let mut sum1 = 0u64;
25
    for &v in &contiguous {
26
        sum1 = sum1.wrapping_add(v);
27
    }
28
    let vec_time = t.elapsed();
29

30
    let t = Instant::now();
31
    let mut sum2 = 0u64;
32
    let mut cur = head.as_deref();
33
    while let Some(node) = cur {
34
        sum2 = sum2.wrapping_add(node.value);
35
        cur = node.next.as_deref();
36
    }
37
    let list_time = t.elapsed();
38

39
    println!("Vec  sum: {:?}  (= {})", vec_time, black_box(sum1));
40
    println!("List sum: {:?}  (= {})", list_time, black_box(sum2));
41
    println!("Vec is {:.1}x faster", list_time.as_secs_f64() / vec_time.as_secs_f64());
42
}

Representative real output (two runs, release build):

1
Vec  sum: 3.611167ms  (= 12499997500000)
2
List sum: 14.188292ms  (= 12499997500000)
3
Vec is 3.9x faster

1
Vec  sum: 3.892125ms  (= 12499997500000)
2
List sum: 22.985875ms  (= 12499997500000)
3
Vec is 5.9x faster

Same data, same sum, same O(n) algorithm — the Vec is several times faster and far more consistent, because the linked list spends most of its time stalled on cache misses. This is why Rust’s standard std::collections::LinkedList carries a documentation note steering you to Vec or VecDeque for almost everything. The cure for pointer-chasing is to put the data in a contiguous container and use indices (usize) instead of pointers when you need to refer between elements.

Key Differences

Concept	TypeScript / JavaScript	Rust
`obj[]` of records	Array of references to GC’d heap objects (double indirection)	`Vec<Struct>` stores structs inline, contiguously (single indirection)
Contiguous numeric data	`Float64Array` / `Int32Array` (numbers only)	Any `Vec<T>` of a `Copy`/POD type; works for structs too
Choosing memory layout	Mostly out of your hands; V8 hidden classes	Fully under your control (AoS vs SoA, `Box`, `#[repr]`)
SoA ergonomics	Parallel typed arrays, manual index bookkeeping	A `struct` of `Vec`s with methods; the type system tracks length
Pointer-chasing structures	Idiomatic (linked lists, object graphs) and GC-managed	Possible (`Box`, `Rc`) but discouraged on hot paths; prefer indices into a `Vec`
SIMD / autovectorization	JIT may vectorize typed-array loops opportunistically	A flat `Vec<f32>` loop reliably autovectorizes at `--release`

The deeper difference: in JavaScript the engine owns your layout and optimizes heuristically; in Rust the layout is part of your design, chosen at compile time, with zero runtime metadata. That is what makes “data-oriented design” a Rust idiom rather than a fight against the runtime.

Common Pitfalls

Pitfall 1: Borrow-checker conflicts when updating SoA columns through `self`

A natural-looking column update fails to compile, because iterating one field mutably while calling a method that borrows self again is a double borrow:

1
struct Particles {
2
    x: Vec<f32>,
3
    vx: Vec<f32>,
4
}
5

6
impl Particles {
7
    fn vx_at(&self, i: usize) -> f32 { self.vx[i] }
8

9
    fn integrate(&mut self) {
10
        // does not compile (error[E0502]): self is mutably borrowed by x.iter_mut()
11
        for (i, xi) in self.x.iter_mut().enumerate() {
12
            *xi += self.vx_at(i);
13
        }
14
    }
15
}
16

17
fn main() {}

The real compiler error:

1
error[E0502]: cannot borrow `*self` as immutable because it is also borrowed as mutable
2
  --> src/main.rs:12:20
3
   |
4
11 |         for (i, xi) in self.x.iter_mut().enumerate() {
5
   |                        -----------------------------
6
   |                        |
7
   |                        mutable borrow occurs here
8
   |                        mutable borrow later used here
9
12 |             *xi += self.vx_at(i);
10
   |                    ^^^^ immutable borrow occurs here

Fix: iterate the fields directly with zip, not through a helper that re-borrows self. The borrow checker can see that self.x and self.vx are disjoint fields and allows one mutable and one immutable borrow simultaneously:

1
struct Particles {
2
    x: Vec<f32>,
3
    vx: Vec<f32>,
4
}
5

6
impl Particles {
7
    fn integrate(&mut self) {
8
        // x is borrowed mutably, vx immutably — disjoint fields, no conflict.
9
        for (xi, &vxi) in self.x.iter_mut().zip(self.vx.iter()) {
10
            *xi += vxi;
11
        }
12
    }
13
}
14

15
fn main() {
16
    let mut p = Particles { x: vec![0.0, 10.0], vx: vec![1.0, 2.0] };
17
    p.integrate();
18
    println!("{:?}", p.x); // [1.0, 12.0]
19
}

Output: [1.0, 12.0]. (See 05-ownership/README.md for why disjoint-field borrows are allowed.)

Pitfall 2: Columns drifting out of sync

SoA’s biggest correctness hazard is silent: nothing forces x.len() == vx.len(). If you push to some columns but not others, your invariant breaks with no error. Encapsulate every insertion behind a single push/spawn method that updates all columns together, never expose the Vecs as pub, and index with the same i everywhere. (Crates like soa_derive generate this boilerplate for you.)

Pitfall 3: Reaching for SoA before measuring

SoA complicates your code and only helps specific access patterns (Pitfall from the “honesty check” above). Converting a struct to SoA when your hot loop reads most fields can make things slower and harder to read. Profile first (profiling.md), confirm the loop is memory-bound, then restructure. Premature data-oriented design is still premature optimization — see when-to-optimize.md.

Pitfall 4: Assuming `Vec<Box<T>>` is contiguous

Vec<Box<Widget>> stores the pointers contiguously, but each Widget is a separate allocation scattered across the heap — you reintroduced pointer-chasing. Prefer Vec<Widget> (values inline). Reach for Box inside a Vec only when you genuinely need stable addresses, trait objects (Vec<Box<dyn Trait>>), or recursive types.

Best Practices

Default to Vec<T> with T stored by value. Contiguous-by-value is the cache-friendly baseline and usually the right answer.
Use indices, not pointers, to link elements. Replace next: Option<Box<Node>> with next: Option<u32> indexing into a Vec<Node> (an “arena” or “slot map”). You keep relationships without the cache misses, and you sidestep Rc/lifetime gymnastics. See 10-smart-pointers/README.md and 22-common-patterns/README.md.

Hot/cold split fat structs. Keep frequently-touched fields inline and box the rarely-used remainder behind one pointer:

1
struct MonsterInline {        // everything inline
2
    x: f32, y: f32, hp: i32,
3
    name: String, lore: String, loot_table: Vec<u32>,
4
}
5

6
struct MonsterSplit {         // hot fields inline, cold data behind a Box
7
    x: f32, y: f32, hp: i32,
8
    cold: Box<MonsterColdData>,
9
}
10
struct MonsterColdData { name: String, lore: String, loot_table: Vec<u32> }
11

12
fn main() {
13
    println!("inline = {} bytes", std::mem::size_of::<MonsterInline>());
14
    println!("split  = {} bytes", std::mem::size_of::<MonsterSplit>());
15
}

Real output:

1
inline = 88 bytes
2
split  = 24 bytes

A Vec<MonsterSplit> packs ~2.6x more entities per cache line for the hot loop; the cold data is fetched only when you actually need it.

Vec::with_capacity when you know the count. Pre-sizing every column avoids reallocations mid-build and keeps each column in one allocation. (Capacity management is covered in depth in optimization.md.)
Iterate with iter()/zip(), not indexed [i], on the hottest loops. Iterators elide bounds checks and autovectorize cleanly; see zero-cost.md for the generated-assembly evidence.
Measure, don’t guess. Confirm any layout change with criterion and a profiler.

Real-World Example

A data-oriented particle system in SoA layout — the shape you would find in a game engine or simulation. Each per-frame update is a set of tight, contiguous passes, and a read-only query touches a single column.

1
/// Data-oriented particle system, Struct-of-Arrays layout.
2
/// Each "column" is a contiguous Vec, so the per-step update streams
3
/// linearly through memory and the autovectorizer can use SIMD.
4
#[derive(Default)]
5
struct ParticleSystem {
6
    px: Vec<f32>,
7
    py: Vec<f32>,
8
    vx: Vec<f32>,
9
    vy: Vec<f32>,
10
}
11

12
impl ParticleSystem {
13
    fn with_capacity(n: usize) -> Self {
14
        ParticleSystem {
15
            px: Vec::with_capacity(n),
16
            py: Vec::with_capacity(n),
17
            vx: Vec::with_capacity(n),
18
            vy: Vec::with_capacity(n),
19
        }
20
    }
21

22
    fn spawn(&mut self, px: f32, py: f32, vx: f32, vy: f32) {
23
        self.px.push(px);
24
        self.py.push(py);
25
        self.vx.push(vx);
26
        self.vy.push(vy);
27
    }
28

29
    fn len(&self) -> usize {
30
        self.px.len()
31
    }
32

33
    /// One simulation step: apply gravity, then integrate position.
34
    fn step(&mut self, dt: f32, gravity: f32) {
35
        // Pass 1: gravity touches only vy.
36
        for vy in self.vy.iter_mut() {
37
            *vy += gravity * dt;
38
        }
39
        // Pass 2: integrate x. Disjoint columns -> the borrow checker is happy.
40
        for (x, &vx) in self.px.iter_mut().zip(self.vx.iter()) {
41
            *x += vx * dt;
42
        }
43
        // Pass 3: integrate y.
44
        for (y, &vy) in self.py.iter_mut().zip(self.vy.iter()) {
45
            *y += vy * dt;
46
        }
47
    }
48

49
    /// A read-only query that touches a single column — the SoA payoff.
50
    fn average_height(&self) -> f32 {
51
        if self.py.is_empty() {
52
            return 0.0;
53
        }
54
        self.py.iter().sum::<f32>() / self.py.len() as f32
55
    }
56
}
57

58
fn main() {
59
    let mut sim = ParticleSystem::with_capacity(1000);
60
    for i in 0..1000 {
61
        sim.spawn(i as f32, 100.0, 1.0, 0.0);
62
    }
63

64
    // Simulate one second at 60 FPS.
65
    for _ in 0..60 {
66
        sim.step(1.0 / 60.0, -9.81);
67
    }
68

69
    println!("particles: {}", sim.len());
70
    println!("average height after 1s: {:.3}", sim.average_height());
71
    println!("particle 0 position: ({:.3}, {:.3})", sim.px[0], sim.py[0]);
72
}

Real output:

1
particles: 1000
2
average height after 1s: 95.014
3
particle 0 position: (1.000, 95.013)

Each pass is a straight walk through one or two contiguous buffers — the ideal access pattern for the prefetcher and the autovectorizer. This is the core idea behind ECS (Entity-Component-System) game architectures and columnar analytics engines: store data in columns, process it in bulk.

Exercises

Exercise 1: Convert an Array of Structs to a Struct of Arrays

Difficulty: Beginner

Objective: Practice the mechanical AoS→SoA transformation and see why a single-column query becomes cheap.

Instructions: Given the Order struct below, write an Orders SoA type with one Vec per field and a from_aos(orders: Vec<Order>) -> Orders constructor. Add a revenue_cents(&self) -> u64 method that sums the total_cents column. Verify it on three orders.

1
#[derive(Clone)]
2
struct Order {
3
    id: u64,
4
    customer: String,
5
    total_cents: u64,
6
    shipped: bool,
7
}
8

9
// TODO: define `struct Orders { ... }`, `impl Orders { fn from_aos(...) ...; fn revenue_cents(...) ... }`

Solution

1
#[derive(Clone)]
2
struct Order {
3
    id: u64,
4
    customer: String,
5
    total_cents: u64,
6
    shipped: bool,
7
}
8

9
struct Orders {
10
    id: Vec<u64>,
11
    customer: Vec<String>,
12
    total_cents: Vec<u64>,
13
    shipped: Vec<bool>,
14
}
15

16
impl Orders {
17
    fn from_aos(orders: Vec<Order>) -> Self {
18
        let mut out = Orders {
19
            id: Vec::with_capacity(orders.len()),
20
            customer: Vec::with_capacity(orders.len()),
21
            total_cents: Vec::with_capacity(orders.len()),
22
            shipped: Vec::with_capacity(orders.len()),
23
        };
24
        for o in orders {
25
            out.id.push(o.id);
26
            out.customer.push(o.customer);
27
            out.total_cents.push(o.total_cents);
28
            out.shipped.push(o.shipped);
29
        }
30
        out
31
    }
32

33
    // Touches exactly one contiguous column.
34
    fn revenue_cents(&self) -> u64 {
35
        self.total_cents.iter().sum()
36
    }
37
}
38

39
fn main() {
40
    let aos = vec![
41
        Order { id: 1, customer: "Ada".into(), total_cents: 1500, shipped: true },
42
        Order { id: 2, customer: "Bob".into(), total_cents: 2500, shipped: false },
43
        Order { id: 3, customer: "Cy".into(),  total_cents: 1000, shipped: true },
44
    ];
45
    let orders = Orders::from_aos(aos);
46
    println!("revenue: {} cents", orders.revenue_cents()); // 5000
47
}

Output: revenue: 5000 cents.

Exercise 2: A single-column predicate query

Difficulty: Intermediate

Objective: See how a filtered count over one column avoids loading the rest of each record.

Instructions: Using the Orders SoA type from Exercise 1, write a free function count_shipped(orders: &Orders) -> usize that returns how many orders are shipped, reading only the shipped column. Explain in a comment why this is cache-friendlier than iterating an Vec<Order> and checking o.shipped.

Solution

1
struct Orders {
2
    id: Vec<u64>,
3
    customer: Vec<String>,
4
    total_cents: Vec<u64>,
5
    shipped: Vec<bool>,
6
}
7

8
// Reads only the `shipped` column: a contiguous Vec<bool> (1 byte each).
9
// An AoS version would load each whole Order — including the `customer`
10
// String pointer and the 8-byte id/total — just to check one bool, wasting
11
// most of every cache line it fetched.
12
fn count_shipped(orders: &Orders) -> usize {
13
    orders.shipped.iter().filter(|&&s| s).count()
14
}
15

16
fn main() {
17
    let orders = Orders {
18
        id: vec![1, 2, 3],
19
        customer: vec!["Ada".into(), "Bob".into(), "Cy".into()],
20
        total_cents: vec![1500, 2500, 1000],
21
        shipped: vec![true, false, true],
22
    };
23
    println!("shipped: {}", count_shipped(&orders)); // 2
24
}

Output: shipped: 2.

Exercise 3: A synchronized multi-column update without tripping the borrow checker

Difficulty: Advanced

Objective: Update several columns together in one pass, using zip so the borrow checker permits the simultaneous borrows.

Instructions: Given struct Bodies { x: Vec<f64>, y: Vec<f64>, mass: Vec<f64> }, write scale_positions_by_mass(&mut self) that multiplies each body’s x and y by its mass, in a single pass. Chain zip so both x and y are borrowed mutably and mass immutably at once. (Hint: self.x.iter_mut().zip(self.y.iter_mut()).zip(self.mass.iter()) yields ((&mut x, &mut y), &mass).)

Solution

1
struct Bodies {
2
    x: Vec<f64>,
3
    y: Vec<f64>,
4
    mass: Vec<f64>,
5
}
6

7
impl Bodies {
8
    fn scale_positions_by_mass(&mut self) {
9
        // x and y are disjoint fields, so both can be borrowed mutably while
10
        // mass is borrowed immutably — all in one contiguous pass.
11
        for ((x, y), &m) in self
12
            .x
13
            .iter_mut()
14
            .zip(self.y.iter_mut())
15
            .zip(self.mass.iter())
16
        {
17
            *x *= m;
18
            *y *= m;
19
        }
20
    }
21
}
22

23
fn main() {
24
    let mut bodies = Bodies {
25
        x: vec![1.0, 2.0],
26
        y: vec![3.0, 4.0],
27
        mass: vec![10.0, 0.5],
28
    };
29
    bodies.scale_positions_by_mass();
30
    println!("bodies.x = {:?}", bodies.x); // [10.0, 1.0]
31
    println!("bodies.y = {:?}", bodies.y); // [30.0, 2.0]
32
}

Output:

1
bodies.x = [10.0, 1.0]
2
bodies.y = [30.0, 2.0]

Cache-Friendly Code: Data-Oriented Design

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

Why a cache line is the unit that matters

The benchmark

When AoS is actually fine (honesty check)

Pointer-chasing is the real villain

Key Differences

Common Pitfalls

Pitfall 1: Borrow-checker conflicts when updating SoA columns through self

Pitfall 2: Columns drifting out of sync

Pitfall 3: Reaching for SoA before measuring

Pitfall 4: Assuming Vec<Box<T>> is contiguous

Best Practices

Real-World Example

Further Reading

Exercises

Exercise 1: Convert an Array of Structs to a Struct of Arrays

Exercise 2: A single-column predicate query

Exercise 3: A synchronized multi-column update without tripping the borrow checker

Pitfall 1: Borrow-checker conflicts when updating SoA columns through `self`

Pitfall 4: Assuming `Vec<Box<T>>` is contiguous