Collection Performance: Big-O, Choosing a Collection, and Capacity

22 min read

In JavaScript you reach for Array, Object, Map, and Set mostly by feel — the engine hides the cost model, and “it’s fast enough” usually holds. Rust gives you the same four shapes of collection (Vec, HashMap, BTreeMap, HashSet) but also hands you the cost model directly: you choose the data structure, you choose when memory is allocated, and you choose between a manual loop and a lazy iterator. This page is about making those choices deliberately.

Quick Overview

Every standard collection has documented Big-O complexity, and picking the wrong one quietly turns an O(n) job into an O(n²) one. The three decisions that matter most for a TypeScript/JavaScript developer learning Rust are: (1) which collection to use — Vec for ordered lists, HashMap for keyed lookup, BTreeMap when you need sorted keys or range queries; (2) whether to pre-allocate with with_capacity/reserve to avoid repeated reallocations; and (3) iterator vs hand-written loop — which in Rust produce essentially identical machine code, so you can pick whichever reads better. Unlike a JavaScript engine, the Rust compiler will not silently switch representations behind your back, so the choice is yours and it sticks.

TypeScript/JavaScript Example

A small analytics pass over web request logs. Notice how you reach for each built-in collection almost reflexively, without thinking about its cost.

1
interface LogEntry {
2
  timestamp: number; // unix seconds
3
  ip: string;
4
  path: string;
5
  status: number;
6
}
7

8
function analyze(logs: LogEntry[]) {
9
  // Object/Map as a frequency counter — keyed lookup, order irrelevant.
10
  const hitsPerPath = new Map<string, number>();
11
  // Set for "have I seen this visitor?" — membership only.
12
  const uniqueIps = new Set<string>();
13
  // Another Map, but here we secretly *want* sorted keys for a range query.
14
  const perMinute = new Map<number, number>();
15

16
  for (const entry of logs) {
17
    hitsPerPath.set(entry.path, (hitsPerPath.get(entry.path) ?? 0) + 1);
18
    uniqueIps.add(entry.ip);
19
    const bucket = Math.floor(entry.timestamp / 60);
20
    perMinute.set(bucket, (perMinute.get(bucket) ?? 0) + 1);
21
  }
22

23
  // A "range query": hits in the first two minutes. With a plain Map you must
24
  // iterate ALL entries and filter — there is no ordered range lookup.
25
  let firstTwoMinutes = 0;
26
  for (const [bucket, count] of perMinute) {
27
    if (bucket < 2) firstTwoMinutes += count;
28
  }
29

30
  return { uniqueVisitors: uniqueIps.size, hitsPerPath, firstTwoMinutes };
31
}

Two cost questions JavaScript never makes you answer: Is perMinute actually sorted? (A Map preserves insertion order, not key order, so the range scan above is O(n) and fragile.) And how big should these structures be? (You cannot pre-size a Map or Set at all.)

Rust Equivalent

The same analysis, but each collection is chosen for its cost characteristics, and the one that needs ordered range queries uses BTreeMap.

1
use std::collections::{BTreeMap, HashMap, HashSet};
2

3
#[derive(Debug)]
4
struct LogEntry {
5
    timestamp: u64,
6
    ip: String,
7
    path: String,
8
    status: u16,
9
}
10

11
struct Analytics {
12
    // Count hits per path: keyed lookup, order irrelevant -> HashMap (avg O(1)).
13
    hits_per_path: HashMap<String, u64>,
14
    // Distinct visitors: membership only -> HashSet (avg O(1)).
15
    unique_ips: HashSet<String>,
16
    // Requests per 60s bucket: we want ORDER and RANGE queries -> BTreeMap (O(log n)).
17
    per_minute: BTreeMap<u64, u64>,
18
    errors: u64,
19
}
20

21
impl Analytics {
22
    fn new(expected_paths: usize) -> Self {
23
        Analytics {
24
            // Pre-size the map we expect to fill the most.
25
            hits_per_path: HashMap::with_capacity(expected_paths),
26
            unique_ips: HashSet::new(),
27
            per_minute: BTreeMap::new(),
28
            errors: 0,
29
        }
30
    }
31

32
    fn record(&mut self, entry: &LogEntry) {
33
        *self.hits_per_path.entry(entry.path.clone()).or_insert(0) += 1;
34
        self.unique_ips.insert(entry.ip.clone());
35
        let bucket = entry.timestamp / 60;
36
        *self.per_minute.entry(bucket).or_insert(0) += 1;
37
        if entry.status >= 500 {
38
            self.errors += 1;
39
        }
40
    }
41

42
    // Range query over the SORTED BTreeMap: total hits in [from, to) buckets.
43
    fn hits_between(&self, from_bucket: u64, to_bucket: u64) -> u64 {
44
        self.per_minute
45
            .range(from_bucket..to_bucket)
46
            .map(|(_, c)| *c)
47
            .sum()
48
    }
49
}
50

51
fn main() {
52
    let logs = vec![
53
        LogEntry { timestamp: 0,   ip: "1.1.1.1".into(), path: "/home".into(),  status: 200 },
54
        LogEntry { timestamp: 30,  ip: "2.2.2.2".into(), path: "/home".into(),  status: 200 },
55
        LogEntry { timestamp: 61,  ip: "1.1.1.1".into(), path: "/login".into(), status: 200 },
56
        LogEntry { timestamp: 75,  ip: "3.3.3.3".into(), path: "/home".into(),  status: 500 },
57
        LogEntry { timestamp: 130, ip: "1.1.1.1".into(), path: "/home".into(),  status: 200 },
58
    ];
59

60
    let mut stats = Analytics::new(8);
61
    for entry in &logs {
62
        stats.record(entry);
63
    }
64

65
    println!("unique visitors: {}", stats.unique_ips.len());
66
    println!("5xx errors: {}", stats.errors);
67
    // hits in the first two minutes — a true O(log n) range query, not a full scan.
68
    println!("hits in first 2 minutes: {}", stats.hits_between(0, 2));
69
}

Verified output:

1
unique visitors: 3
2
5xx errors: 1
3
hits in first 2 minutes: 4

Note: range(0..2) walks only the matching slice of the tree, not the whole map. That single capability is the reason to pick BTreeMap over HashMap even though HashMap has faster individual lookups.

Detailed Explanation

The cost table you should memorize

Here are the average-case complexities for the standard collections. “Amortized” means the cost averaged over many operations (a single one may occasionally be more expensive because of a reallocation).

Operation	`Vec<T>`	`HashMap<K,V>` / `HashSet<T>`	`BTreeMap<K,V>` / `BTreeSet<T>`
Insert at end / add	amortized O(1)	average O(1)	O(log n)
Lookup by key	O(n) (scan)	average O(1)	O(log n)
Random access by index	O(1)	n/a	n/a
Membership test	O(n) (scan)	average O(1)	O(log n)
Insert/remove in middle	O(n) (shift)	n/a	O(log n)
Remove from end	O(1)	average O(1)	O(log n)
Iterate all (n items)	O(n), in order	O(n), unordered	O(n), sorted order
Range / sorted query	O(n) (must scan)	not supported	O(log n + k)
Min / max	O(n) unless sorted	O(n)	O(log n) (ends of tree)

Warning: The “average O(1)” for HashMap/HashSet is average, not worst case. With adversarial keys a hash collision storm can degrade lookups, which is exactly why Rust’s default hasher (SipHash 1-3) is randomized and DoS-resistant — see HashMaps. For trusted internal keys you can swap in a faster hasher; for untrusted external input the default is the safe choice.

Each operation here can be confirmed against the standard library docs, which state the complexity of every method.

Choosing a collection: the decision flow

The standard-library documentation gives a concise rule of thumb, and it maps cleanly onto what you already do in TypeScript:

You have a list and mostly append to the end or iterate it in order → Vec<T>. This is your default, the way Array is your default in JavaScript. It is also the most cache-friendly because its elements are contiguous in memory.
You look things up by key and don’t care about order → HashMap<K,V> (or HashSet<T> for keys with no value). This is the Map/Object/Set of JavaScript, and individual lookups are the fastest of the keyed collections.
You look things up by key AND need keys kept in sorted order, or you need range queries / min / max → BTreeMap<K,V> (or BTreeSet<T>). JavaScript has no built-in equivalent; people fake it with a sorted array plus binary search. See BTreeMap and BTreeSet.

A Vec of (key, value) pairs can act like a tiny map — and for a handful of entries it is actually faster than a HashMap because there is no hashing and the data is contiguous. But its lookup is a linear O(n) scan, so it collapses as the collection grows. The next section measures exactly that.

Why “wrong collection” is the most expensive mistake

Using a Vec and contains() as a stand-in for a HashSet is the single most common performance trap. Each contains() is an O(n) scan, and doing it inside a loop makes the whole thing O(n²).

1
use std::collections::HashSet;
2

3
fn main() {
4
    let input = vec![3, 1, 2, 3, 1, 2, 4];
5

6
    // Accidentally O(n^2): contains() scans the whole Vec each time.
7
    let mut seen: Vec<i32> = Vec::new();
8
    let mut unique_slow: Vec<i32> = Vec::new();
9
    for &x in &input {
10
        if !seen.contains(&x) {     // O(n) linear scan per element
11
            seen.push(x);
12
            unique_slow.push(x);
13
        }
14
    }
15
    println!("slow unique = {unique_slow:?}");
16

17
    // O(n): HashSet membership is average O(1).
18
    let mut set: HashSet<i32> = HashSet::new();
19
    let mut unique_fast: Vec<i32> = Vec::new();
20
    for &x in &input {
21
        if set.insert(x) {          // insert returns true if newly added
22
            unique_fast.push(x);
23
        }
24
    }
25
    println!("fast unique = {unique_fast:?}");
26
}

Verified output:

1
slow unique = [3, 1, 2, 4]
2
fast unique = [3, 1, 2, 4]

Both produce the same answer, but their cost diverges sharply with size. Timed on 20,000 mostly-unique values (release build):

1
n = 20000
2
Vec::contains (O(n^2)) = 10.6615ms
3
HashSet (O(n))         = 542.333µs

Note: Exact timings depend on your machine, and these were a single run — the ratio is what matters. At n = 20,000 the HashSet version was roughly 20× faster, and the gap widens as n grows because one curve is quadratic and the other linear. This is the same trap as using Array.includes inside a loop in JavaScript instead of a Set.

Capacity: the second cost JavaScript hides

A Vec (and a HashMap/HashSet) has both a length (how many items it holds) and a capacity (how many it can hold before it must allocate a bigger buffer and copy everything over). When you push past capacity, the Vec reallocates — roughly doubling its buffer. Those reallocations are the “amortized” part of “amortized O(1)”.

If you know roughly how many items you’ll add, tell the collection up front with with_capacity:

1
fn main() {
2
    // Count reallocations: Vec::new() vs Vec::with_capacity(n).
3
    let mut grown: Vec<u64> = Vec::new();
4
    let mut reallocs = 0;
5
    let mut cap = grown.capacity();
6
    for i in 0..1_000u64 {
7
        grown.push(i);
8
        if grown.capacity() != cap {
9
            reallocs += 1;
10
            cap = grown.capacity();
11
        }
12
    }
13
    println!("Vec::new()       -> {reallocs} reallocations to reach len {}", grown.len());
14

15
    let mut sized: Vec<u64> = Vec::with_capacity(1_000);
16
    let mut reallocs2 = 0;
17
    let mut cap2 = sized.capacity();
18
    for i in 0..1_000u64 {
19
        sized.push(i);
20
        if sized.capacity() != cap2 {
21
            reallocs2 += 1;
22
            cap2 = sized.capacity();
23
        }
24
    }
25
    println!("with_capacity()  -> {reallocs2} reallocations to reach len {}", sized.len());
26
}

Verified output:

1
Vec::new()       -> 9 reallocations to reach len 1000
2
with_capacity()  -> 0 reallocations to reach len 1000

Nine reallocations (each copying the entire buffer) versus zero. HashMap and HashSet work the same way — HashMap::with_capacity(n) and reserve(n) pre-size them. The deep-dive on Vec capacity and growth lives in Vectors; this page is about when the pre-allocation is worth it.

1
use std::collections::HashMap;
2

3
fn main() {
4
    // HashMap also supports pre-allocation.
5
    let mut counts: HashMap<&str, u32> = HashMap::with_capacity(16);
6
    let words = ["a", "b", "a", "c", "a", "b"];
7
    for w in words {
8
        *counts.entry(w).or_insert(0) += 1;
9
    }
10
    let mut pairs: Vec<(&str, u32)> = counts.into_iter().collect();
11
    pairs.sort(); // HashMap order is randomized; sort for stable display
12
    println!("{pairs:?}");
13

14
    // reserve() grows capacity ahead of a known bulk insert.
15
    let mut m: HashMap<u32, u32> = HashMap::new();
16
    m.reserve(1_000);
17
    println!("capacity after reserve(1000): >= {}", m.capacity() >= 1000);
18
}

Verified output:

1
[("a", 3), ("b", 2), ("c", 1)]
2
capacity after reserve(1000): >= true

When you build a collection from an iterator with .collect(), you usually get this for free: the iterator reports a size hint and collect pre-allocates accordingly, so an explicit with_capacity is rarely needed in that case. Reclaiming over-allocated memory is the job of shrink_to_fit:

1
fn main() {
2
    let mut v: Vec<i32> = Vec::with_capacity(100);
3
    v.extend([1, 2, 3]);
4
    println!("before shrink: len={}, cap={}", v.len(), v.capacity());
5
    v.shrink_to_fit(); // hand the unused capacity back to the allocator
6
    println!("after shrink:  len={}, cap={}", v.len(), v.capacity());
7
}

Verified output:

1
before shrink: len=3, cap=100
2
after shrink:  len=3, cap=3

Iterator vs loop: a non-choice for performance

In JavaScript, arr.map().filter().reduce() allocates one or more intermediate arrays and runs several passes, so it can be meaningfully slower than a single hand-written for loop on a hot path. In Rust this trade-off essentially does not exist. Iterator adaptors are lazy and get fused and inlined by the compiler into the same machine code a manual loop would produce — this is what “zero-cost abstraction” means. The iterator chain also drops per-element bounds checks that a manual v[i] would incur.

1
fn main() {
2
    let data: Vec<u64> = (0..1_000_000).collect();
3

4
    // Manual index loop — every `data[i]` is bounds-checked.
5
    let mut sum_loop: u64 = 0;
6
    for i in 0..data.len() {
7
        sum_loop += data[i];
8
    }
9

10
    // Iterator chain — no manual indexing, no per-element bounds checks.
11
    let sum_iter: u64 = data.iter().sum();
12

13
    // Filtered + mapped sum, the iterator way (still one pass, no intermediate Vec).
14
    let even_doubled: u64 = data.iter().filter(|&&x| x % 2 == 0).map(|&x| x * 2).sum();
15

16
    println!("sum_loop      = {sum_loop}");
17
    println!("sum_iter      = {sum_iter}");
18
    println!("even_doubled  = {even_doubled}");
19
}

Verified output:

1
sum_loop      = 499999500000
2
sum_iter      = 499999500000
3
even_doubled  = 499999000000

Because the iterator version compiles to equivalent (often better) code, the rule is: write whichever is clearest, which is almost always the iterator. The crucial difference from JavaScript is that .filter().map().sum() here makes no intermediate Vec — adaptors are lazy and only pull one element through the whole chain at a time. The mechanics of laziness are covered in Iterators and Iterator Consumers. Pre-sizing applies to iterators too: collect() uses the size hint, as below.

1
fn main() {
2
    // collect into a Vec sizes the allocation from the iterator's size hint.
3
    let squares: Vec<u64> = (1..=5u64).map(|n| n * n).collect();
4
    println!("squares = {squares:?}");
5
}

Verified output:

1
squares = [1, 4, 9, 16, 25]

Key Differences

Concern	TypeScript / JavaScript	Rust
Cost model	Opaque; engine may reshape `Object`/`Array`	Explicit and documented per method
Sorted-key map	None built in (sorted array + binary search)	`BTreeMap` / `BTreeSet`, O(log n) with range queries
Pre-allocation	Not exposed for `Map`/`Set`; `Array(n)` is hole-y	`with_capacity(n)` / `reserve(n)` on all growable collections
`Map`/`Object` iteration order	Insertion order (deterministic)	`HashMap` order is randomized; `BTreeMap` is sorted
`map`/`filter` chains	Allocate intermediate arrays, multiple passes	Lazy, fused, zero intermediate allocations
Loop vs higher-order method	Loop can be measurably faster on hot paths	Iterator ≈ loop; pick for readability
Hash collisions / DoS	Engine-dependent	Default `HashMap` hasher is randomized + DoS-resistant
Memory layout	Engine-managed, often pointer-chasing	`Vec` is contiguous (cache-friendly); maps/sets hash- or tree-backed

Why Rust exposes all of this

JavaScript optimizes for “don’t make the developer think.” Rust optimizes for “the developer is in control and nothing is hidden.” That means you pick the collection, you decide when memory is allocated, and the compiler guarantees the cost model won’t change underneath you. The upside is predictable, tunable performance with no surprise deopts; the cost is that choosing badly (a Vec scan where a HashMap belongs) is on you, not the runtime.

Common Pitfalls

Pitfall 1: Using `Vec::contains` in a loop (accidental O(n²))

This is the trap shown above, restated as the mistake itself. If you find yourself calling .contains(), .iter().any(), or .iter().find() inside a loop over the same data, you almost certainly want a HashSet or HashMap. The code compiles and is correct — it is just quadratic, and it will not show up until your inputs grow. There is no compiler error here; this is a design pitfall the type checker cannot catch for you.

Fix: build a HashSet/HashMap once (O(n)) and do O(1) lookups against it, as in the unique_fast example above.

Pitfall 2: Expecting `HashMap` to iterate in insertion or sorted order

Coming from JavaScript, where Map and modern Object preserve insertion order, it is natural to assume iteration order is stable. A Rust HashMap iterates in an unspecified, randomized order that can differ between runs of the same program.

1
use std::collections::HashMap;
2

3
fn main() {
4
    let mut m = HashMap::new();
5
    m.insert("a", 1);
6
    m.insert("b", 2);
7
    m.insert("c", 3);
8
    // The order of this loop is NOT guaranteed and may change run-to-run.
9
    for (k, v) in &m {
10
        println!("{k} = {v}");
11
    }
12
}

Fix: if you need a deterministic order, either collect into a Vec and sort() it, or use a BTreeMap (always sorted by key). Relying on HashMap order is a latent bug, not a performance issue.

Pitfall 3: `Vec::dedup` without sorting first

dedup only removes consecutive duplicates — it is not a general “make unique” operation.

1
fn main() {
2
    let mut v = vec![1, 2, 1, 1, 3];
3
    v.dedup();
4
    println!("{v:?}"); // the non-adjacent 1 survives
5
}

Verified output:

1
[1, 2, 1, 3]

Fix: v.sort(); v.dedup(); to remove all duplicates while keeping it O(n log n) and order-by-value, or use a HashSet if you don’t need sorting and want O(n). Choosing between them is a performance decision: sort+dedup keeps the result sorted; the HashSet route is faster but loses order.

Pitfall 4: Pre-allocating the wrong thing (or everywhere)

with_capacity only helps when you would otherwise reallocate repeatedly while growing a single collection to a size you can estimate. Sprinkling it on every Vec::new() — including ones you build with .collect() (which already pre-sizes) or ones that stay tiny — adds noise without benefit and can even waste memory if your estimate is too high.

Fix: pre-allocate when (a) you know the approximate final size and (b) you are growing element-by-element in a hot loop. Otherwise let collect() or the default growth strategy handle it, and reach for shrink_to_fit only if you over-allocated and the slack matters.

Best Practices

1. Default to `Vec`, escalate deliberately

Start with Vec<T>. Move to HashMap/HashSet the moment you do keyed lookups or membership tests more than occasionally. Move to BTreeMap/BTreeSet only when you genuinely need sorted iteration, range queries, or min/max — they pay O(log n) for that ordering.

2. Pre-allocate on hot, size-known paths

When a loop pushes a predictable number of items, call Vec::with_capacity(n) / HashMap::with_capacity(n) first. It converts O(log n) reallocations (each an O(n) copy) into zero. For collections built by .collect(), trust the automatic size hint instead.

3. Prefer iterators; they are not slower

Write .iter().filter(...).map(...).sum() over manual index loops. It is clearer, it drops bounds checks, it allocates no intermediates, and it compiles to equivalent machine code. Reserve manual loops for cases the iterator genuinely cannot express cleanly.

4. Measure before micro-optimizing

For real performance work, use the Criterion benchmarking crate rather than ad-hoc Instant::now() timing — it handles warm-up, statistical noise, and outliers. The quick timings on this page are illustrative; Criterion is what you’d commit to a repo. See Section 21 — Performance for the full treatment.

5. Pick the right hasher for the situation

The default HashMap hasher is DoS-resistant, which is right for untrusted keys (user input, network data). For trusted internal keys where speed dominates, a faster non-cryptographic hasher (e.g. the fxhash or ahash crate) can be a meaningful win. Default first; swap only after measuring. See HashMaps.

Real-World Example

Deduplicating a large stream of IDs is a textbook “wrong collection” decision. Here both approaches are factored into functions and timed at scale so you can see the curve, not just assert it.

1
use std::collections::HashSet;
2
use std::time::Instant;
3

4
/// O(n^2): each `contains` linearly scans everything seen so far.
5
fn dedup_vec(input: &[u32]) -> Vec<u32> {
6
    let mut seen: Vec<u32> = Vec::new();
7
    let mut out = Vec::new();
8
    for &x in input {
9
        if !seen.contains(&x) {
10
            seen.push(x);
11
            out.push(x);
12
        }
13
    }
14
    out
15
}
16

17
/// O(n): HashSet membership is average O(1); `insert` returns whether it was new.
18
fn dedup_set(input: &[u32]) -> Vec<u32> {
19
    let mut seen: HashSet<u32> = HashSet::with_capacity(input.len());
20
    let mut out = Vec::with_capacity(input.len());
21
    for &x in input {
22
        if seen.insert(x) {
23
            out.push(x);
24
        }
25
    }
26
    out
27
}
28

29
fn main() {
30
    let input: Vec<u32> = (0..20_000).collect();
31

32
    let t = Instant::now();
33
    let a = dedup_vec(&input);
34
    let vec_time = t.elapsed();
35

36
    let t = Instant::now();
37
    let b = dedup_set(&input);
38
    let set_time = t.elapsed();
39

40
    assert_eq!(a.len(), b.len());
41
    println!("n = {}", input.len());
42
    println!("Vec::contains (O(n^2)) = {vec_time:?}");
43
    println!("HashSet (O(n))         = {set_time:?}");
44
}

Verified output (single run, release build — your numbers will differ, the ratio is the point):

1
n = 20000
2
Vec::contains (O(n^2)) = 10.6615ms
3
HashSet (O(n))         = 542.333µs

The HashSet version is both correct and roughly 20× faster here, and the gap grows with n. Note the deliberate with_capacity(input.len()) calls in dedup_set: we know the upper bound on output size, so we pre-allocate and avoid all reallocation. This is the whole chapter in one example: right collection, right pre-allocation, idiomatic iteration.

Tip: If you don’t need to preserve first-seen order, the entire dedup_set body collapses to one line — input.iter().copied().collect::<HashSet<_>>() — and then .into_iter().collect::<Vec<_>>() if you want a Vec back. The explicit loop is shown here to make the cost visible.

Exercises

Exercise 1: Pick the Right Counter

Difficulty: Beginner

Objective: Choose the correct collection for keyed counting and pre-size it.

Instructions: Write word_frequencies(text: &str) -> HashMap<String, u32> that counts case-insensitive word frequencies (split on whitespace). Use the entry API so you never look a key up twice, and pre-size the map. Print the pairs sorted for a stable result.

1
use std::collections::HashMap;
2

3
fn word_frequencies(text: &str) -> HashMap<String, u32> {
4
    // TODO: pre-size the map, then count each lowercased word with the entry API
5
    todo!()
6
}
7

8
fn main() {
9
    let freq = word_frequencies("the cat the dog the bird");
10
    let mut pairs: Vec<(&String, &u32)> = freq.iter().collect();
11
    pairs.sort();
12
    println!("{pairs:?}");
13
}

Solution

1
use std::collections::HashMap;
2

3
fn word_frequencies(text: &str) -> HashMap<String, u32> {
4
    let mut freq: HashMap<String, u32> = HashMap::with_capacity(text.len() / 5);
5
    for word in text.split_whitespace() {
6
        *freq.entry(word.to_lowercase()).or_insert(0) += 1;
7
    }
8
    freq
9
}
10

11
fn main() {
12
    let freq = word_frequencies("the cat the dog the bird");
13
    let mut pairs: Vec<(&String, &u32)> = freq.iter().collect();
14
    pairs.sort();
15
    println!("{pairs:?}");
16
}

Output:

1
[("bird", 1), ("cat", 1), ("dog", 1), ("the", 3)]

The entry(...).or_insert(0) pattern does a single hash lookup that both finds-or-creates the slot; *... += 1 mutates it in place. with_capacity(text.len() / 5) is a rough guess at the word count to avoid reallocations.

Exercise 2: Eliminate the Reallocations

Difficulty: Intermediate

Objective: Pre-allocate a Vec so that building it triggers zero reallocations.

Instructions: Write build_ids(n: usize) -> Vec<u64> that returns the squares 0, 1, 4, 9, ... for 0..n. Build it with a loop (not .collect()) but pre-allocate so the capacity never grows mid-loop. Print the result and its capacity to prove it equals n.

1
fn build_ids(n: usize) -> Vec<u64> {
2
    // TODO: pre-allocate, then push n squared values with no reallocation
3
    todo!()
4
}
5

6
fn main() {
7
    let ids = build_ids(5);
8
    println!("{ids:?}, cap={}", ids.capacity());
9
}

Solution

1
fn build_ids(n: usize) -> Vec<u64> {
2
    let mut ids = Vec::with_capacity(n); // exact size known -> zero reallocations
3
    for i in 0..n as u64 {
4
        ids.push(i * i);
5
    }
6
    ids
7
}
8

9
fn main() {
10
    let ids = build_ids(5);
11
    println!("{ids:?}, cap={}", ids.capacity());
12
}

Output:

1
[0, 1, 4, 9, 16], cap=5

Because the capacity (5) exactly matches the number of pushes, the buffer is allocated once and never copied. With Vec::new() this would have reallocated a few times as it grew through 4 → 8.

Exercise 3: Range Query Needs the Right Map

Difficulty: Advanced

Objective: Use a BTreeMap to answer a sorted range query that a HashMap cannot do efficiently.

Instructions: Given a list of (name, score) pairs, build a BTreeMap<u32, &str> keyed by score. Then (a) print everyone whose score falls in the B band [80, 90) using a range query, and (b) print the top scorer by reading the last entry of the sorted map — no manual scan over all entries.

1
use std::collections::BTreeMap;
2

3
fn main() {
4
    let scores = [("ana", 91), ("ben", 72), ("cy", 85), ("dee", 60), ("ed", 95)];
5
    // TODO: build a BTreeMap<u32, &str> keyed by score
6
    // TODO: print the B band [80, 90) via a range query
7
    // TODO: print the top scorer using the sorted order (no full scan)
8
}

Solution

1
use std::collections::BTreeMap;
2

3
fn main() {
4
    let scores = [("ana", 91), ("ben", 72), ("cy", 85), ("dee", 60), ("ed", 95)];
5

6
    let mut by_score: BTreeMap<u32, &str> = BTreeMap::new();
7
    for (name, score) in scores {
8
        by_score.insert(score, name);
9
    }
10

11
    // Range query: the B band, scores in [80, 90). O(log n + k), not a full scan.
12
    let b_band: Vec<&str> = by_score.range(80..90).map(|(_, &name)| name).collect();
13
    println!("B band (80..90): {b_band:?}");
14

15
    // Top scorer is simply the last (highest-key) entry of the sorted map.
16
    if let Some((score, name)) = by_score.iter().next_back() {
17
        println!("top: {name} with {score}");
18
    }
19
}

Output:

1
B band (80..90): ["cy"]
2
top: ed with 95

range(80..90) walks only the matching slice of the tree, and iter().next_back() reads the maximum key in O(log n) — both are impossible to do efficiently on a HashMap, which has no order at all. (If two students could share a score you’d key by a tuple or store a Vec of names per score; here scores are unique.)

Collection Performance: Big-O, Choosing a Collection, and Capacity

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

The cost table you should memorize

Choosing a collection: the decision flow

Why “wrong collection” is the most expensive mistake

Capacity: the second cost JavaScript hides

Iterator vs loop: a non-choice for performance

Key Differences

Why Rust exposes all of this

Common Pitfalls

Pitfall 1: Using `Vec::contains` in a loop (accidental O(n²))

Pitfall 2: Expecting `HashMap` to iterate in insertion or sorted order

Pitfall 3: `Vec::dedup` without sorting first

Pitfall 4: Pre-allocating the wrong thing (or everywhere)

Best Practices

1. Default to `Vec`, escalate deliberately

2. Pre-allocate on hot, size-known paths

3. Prefer iterators; they are not slower

4. Measure before micro-optimizing

5. Pick the right hasher for the situation

Real-World Example

Further Reading

Official Documentation

Exercises

Exercise 1: Pick the Right Counter

Exercise 2: Eliminate the Reallocations

Exercise 3: Range Query Needs the Right Map

Collection Performance: Big-O, Choosing a Collection, and Capacity

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

The cost table you should memorize

Choosing a collection: the decision flow

Why “wrong collection” is the most expensive mistake

Capacity: the second cost JavaScript hides

Iterator vs loop: a non-choice for performance

Key Differences

Why Rust exposes all of this

Common Pitfalls

Pitfall 1: Using Vec::contains in a loop (accidental O(n²))

Pitfall 2: Expecting HashMap to iterate in insertion or sorted order

Pitfall 3: Vec::dedup without sorting first

Pitfall 4: Pre-allocating the wrong thing (or everywhere)

Best Practices

1. Default to Vec, escalate deliberately

2. Pre-allocate on hot, size-known paths

3. Prefer iterators; they are not slower

4. Measure before micro-optimizing

5. Pick the right hasher for the situation

Real-World Example

Further Reading

Official Documentation

Related Topics in This Guide

Exercises

Exercise 1: Pick the Right Counter

Exercise 2: Eliminate the Reallocations

Exercise 3: Range Query Needs the Right Map

Pitfall 1: Using `Vec::contains` in a loop (accidental O(n²))

Pitfall 2: Expecting `HashMap` to iterate in insertion or sorted order

Pitfall 3: `Vec::dedup` without sorting first

1. Default to `Vec`, escalate deliberately