Measuring Performance Gains Honestly

23 min read

When you port a Node.js service to Rust, someone will eventually ask: “So how much faster is it?” Answering that question honestly — with the right metric, the right workload, and a clear-eyed view of where Rust actually helps — is what separates a credible migration from a hype-driven one.

Quick Overview

The headline “Rust is 10x faster” is almost always misleading, because it measures the wrong thing. For a typical web service the bottleneck is rarely raw CPU; it is the tail latency (p99/p99.9), memory footprint, and predictability under load. This page shows you how to benchmark the right thing, report latency percentiles instead of averages, measure memory honestly, and avoid the traps that make a migration look better — or worse — than it really is.

Note: “Honestly” is the operative word. A migration that genuinely cuts p99 latency from 800 ms to 40 ms is a huge win even if median latency barely moved. Selling it as “20x faster on average” is both wrong and unnecessary — the real, defensible number is impressive enough.

TypeScript/JavaScript Example

Here is the kind of measurement most teams start with: wrap the handler in console.time, hit it a few times, eyeball the average.

1
// bench-naive.ts -- the way most teams "measure" first (and get fooled)
2
import { performance } from "node:perf_hooks";
3

4
function dayOfYear(date: string): number | null {
5
  const parts = date.split("-");
6
  if (parts.length !== 3) return null;
7
  const [y, m, d] = parts.map(Number);
8
  if (!Number.isInteger(y) || m < 1 || m > 12 || d < 1) return null;
9
  const leap = (y % 4 === 0 && y % 100 !== 0) || y % 400 === 0;
10
  const days = [31, leap ? 29 : 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31];
11
  if (d > days[m - 1]) return null;
12
  let total = d;
13
  for (let i = 0; i < m - 1; i++) total += days[i];
14
  return total;
15
}
16

17
const iters = 1_000_000;
18
const start = performance.now();
19
for (let i = 0; i < iters; i++) dayOfYear("2026-06-02");
20
const ms = performance.now() - start;
21
console.log(`${iters} iters in ${ms.toFixed(1)} ms (~${((ms * 1e6) / iters).toFixed(1)} ns/call)`);

This has three problems that recur in every naive benchmark, in any language:

It measures the wrong unit of work. A microsecond-level pure function is almost never what limits a real service; the JSON parsing, the database round-trip, and the event loop are.
It reports a single average. One number hides the distribution. Users feel the slow requests, not the mean.
The JIT and the optimizer can cheat. V8 may hoist or eliminate a pure call whose result is unused; the loop can warm into a wildly different code path than production ever runs. You are timing the benchmark, not the workload.

To measure a service, you need percentiles of end-to-end request latency under realistic concurrency — not a tight loop over a pure function.

1
// percentiles.ts -- the metric that actually matters: the distribution, not the mean
2
function percentile(sorted: number[], pct: number): number {
3
  if (sorted.length === 0) return 0;
4
  const rank = Math.ceil((pct / 100) * sorted.length);
5
  const idx = Math.min(Math.max(rank - 1, 0), sorted.length - 1);
6
  return sorted[idx];
7
}
8

9
// Latencies (ms) collected from a load test: mostly fast, a few slow.
10
const samples = [
11
  120, 130, 118, 125, 122, 119, 121, 117, 200, 9800,
12
  124, 126, 123, 128, 131, 115, 116, 127, 129, 4200,
13
].sort((a, b) => a - b);
14

15
const mean = samples.reduce((a, b) => a + b, 0) / samples.length;
16
console.log("mean", mean.toFixed(1)); // 814.5  <- dominated by two outliers
17
console.log("p50 ", percentile(samples, 50)); // 124
18
console.log("p90 ", percentile(samples, 90)); // 200
19
console.log("p99 ", percentile(samples, 99)); // 9800 <- what 1% of users actually feel

Running it under Node v22:

The mean of 814.5 ms is a lie — no single request took that long. Half the users saw 124 ms; the unlucky 1% saw nearly 10 seconds. This is exactly why you report percentiles.

Rust Equivalent

The same percentile logic in Rust, with the same data, produces the same conclusion — the average hides the tail:

1
/// Value at a given percentile from a sorted slice, using the nearest-rank method.
2
fn percentile(sorted: &[u64], pct: f64) -> u64 {
3
    if sorted.is_empty() {
4
        return 0;
5
    }
6
    // nearest-rank: rank = ceil(p/100 * N), 1-based
7
    let rank = ((pct / 100.0) * sorted.len() as f64).ceil() as usize;
8
    let idx = rank.saturating_sub(1).min(sorted.len() - 1);
9
    sorted[idx]
10
}
11

12
fn main() {
13
    // Simulated request latencies in milliseconds (the same data as the TS example).
14
    let mut samples: Vec<u64> = vec![
15
        120, 130, 118, 125, 122, 119, 121, 117, 200, 9800,
16
        124, 126, 123, 128, 131, 115, 116, 127, 129, 4200,
17
    ];
18
    samples.sort_unstable();
19

20
    let n = samples.len();
21
    let mean = samples.iter().sum::<u64>() as f64 / n as f64;
22

23
    println!("samples   : {n}");
24
    println!("mean      : {mean:.1} ms");
25
    println!("p50       : {} ms", percentile(&samples, 50.0));
26
    println!("p90       : {} ms", percentile(&samples, 90.0));
27
    println!("p99       : {} ms", percentile(&samples, 99.0));
28
    println!("max       : {} ms", samples[n - 1]);
29
}

Real output from cargo run:

1
samples   : 20
2
mean      : 814.5 ms
3
p50       : 124 ms
4
p90       : 200 ms
5
p99       : 9800 ms
6
max       : 9800 ms

For real workloads, do not hand-roll percentiles over a giant Vec — it costs memory proportional to sample count and a full sort. Use a histogram. The hdrhistogram crate (a port of the widely used HdrHistogram) records values into fixed-precision buckets in O(1) per sample and answers any quantile cheaply:

1
[dependencies]
2
hdrhistogram = "7.5.4"

1
use hdrhistogram::Histogram;
2

3
fn main() {
4
    // Record latencies in milliseconds with 3 significant digits of precision.
5
    let mut hist = Histogram::<u64>::new(3).expect("create histogram");
6

7
    let samples: [u64; 20] = [
8
        120, 130, 118, 125, 122, 119, 121, 117, 200, 9800,
9
        124, 126, 123, 128, 131, 115, 116, 127, 129, 4200,
10
    ];
11
    for &v in &samples {
12
        hist.record(v).expect("value in range");
13
    }
14

15
    println!("count : {}", hist.len());
16
    println!("mean  : {:.1} ms", hist.mean());
17
    println!("p50   : {} ms", hist.value_at_quantile(0.50));
18
    println!("p90   : {} ms", hist.value_at_quantile(0.90));
19
    println!("p99   : {} ms", hist.value_at_quantile(0.99));
20
    println!("max   : {} ms", hist.max());
21
}

Real output:

1
count : 20
2
mean  : 814.8 ms
3
p50   : 124 ms
4
p90   : 200 ms
5
p99   : 9807 ms
6
max   : 9807 ms

Note: The p99 reads 9807 instead of 9800 and the mean is 814.8 instead of 814.5. That is not a bug — a histogram trades a tiny, bounded error (here, 3 significant digits) for constant memory and O(1) recording, so a billion samples cost the same as twenty. For latency dashboards this is exactly the trade you want.

Detailed Explanation

Why percentiles, not averages

A web service’s latency distribution is right-skewed: most requests are fast, a long tail is slow (GC pauses, a cold cache, a lock contended under load, a slow database query). The mean is dragged toward the tail by a handful of outliers, so it describes no real request. Percentiles describe the actual experience:

p50 (median): the typical request. Half are faster, half slower.
p90 / p95: what your “normal but busy” users feel.
p99 / p99.9: the tail. On a page that fans out to 100 backend calls, a p99 of 1% means roughly every page render hits at least one slow call. Tail latency compounds.

This is the single most important reporting change a migration should make, and it is independent of language. But it matters more when comparing Node.js to Rust, because the two have very different tail behavior.

Where Rust actually moves the needle

Rust does not make your business logic algorithmically faster — an O(n²) loop is O(n²) in both languages. The wins come from a few specific places:

Source of gain	Why Rust helps	Where it shows up
No GC pauses	No stop-the-world collector; memory freed deterministically at scope end	p99/p99.9 tail latency, jitter
No JIT warm-up	Ahead-of-time compiled native code from the first request	cold start, autoscaling, serverless
Compact memory layout	`Vec<i64>` is a packed array; no per-element boxing	RSS, cache locality, throughput
True parallelism	No single event loop; CPU-bound work spreads across cores	CPU-bound endpoints, batch jobs
Predictable cost	No hidden megamorphic deopts or hidden-class churn	latency variance

Notice what is not on that list: a 10x drop in median latency for an I/O-bound CRUD endpoint. If your handler spends 95% of its time waiting on Postgres, rewriting the other 5% in Rust changes p50 by almost nothing. The honest pitch is: Rust flattens the distribution. The median may improve modestly; the tail and the memory often improve dramatically.

The benchmark that fools you

Here is the naive Rust micro-benchmark equivalent to the TypeScript one — and it is just as misleading:

1
use std::hint::black_box;
2
use std::time::Instant;
3

4
/// Parse an ISO-8601 date "YYYY-MM-DD" and return the day of the year.
5
fn day_of_year(date: &str) -> Option<u32> {
6
    let mut parts = date.split('-');
7
    let year: i32 = parts.next()?.parse().ok()?;
8
    let month: u32 = parts.next()?.parse().ok()?;
9
    let day: u32 = parts.next()?.parse().ok()?;
10
    if parts.next().is_some() || !(1..=12).contains(&month) || day == 0 {
11
        return None;
12
    }
13
    let leap = (year % 4 == 0 && year % 100 != 0) || year % 400 == 0;
14
    let days = [31, if leap { 29 } else { 28 }, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31];
15
    if day > days[(month - 1) as usize] {
16
        return None;
17
    }
18
    let mut total = day;
19
    for m in 0..(month - 1) as usize {
20
        total += days[m];
21
    }
22
    Some(total)
23
}
24

25
fn main() {
26
    let iters = 1_000_000;
27
    let start = Instant::now();
28
    for _ in 0..iters {
29
        // black_box stops the optimizer from deleting a pure call whose
30
        // result is unused -- without it, this loop could compile to nothing.
31
        black_box(day_of_year(black_box("2026-06-02")));
32
    }
33
    let elapsed = start.elapsed();
34
    println!("{iters} iters in {elapsed:?}");
35
    println!("~{:.1} ns/call", elapsed.as_nanos() as f64 / iters as f64);
36
}

Real output from cargo run --release:

1
1000000 iters in 28.742084ms
2
~28.7 ns/call

std::hint::black_box is the one thing this version gets right: it forces the compiler to treat the value as opaque, so it cannot constant-fold the input or eliminate the unused result. (The TypeScript version has no equivalent guard at all — V8 may quietly delete the work.) But it is still a bad service benchmark: one sample, no warm-up control, and a workload (a pure function on a fixed string) that no production request actually runs. It tells you the function is fast. It tells you nothing about your service.

For a defensible per-function number you want a statistical harness — see the Real-World Example below.

Key Differences

Aspect	Node.js / TypeScript	Rust
Headline metric people quote	”X requests/sec” or mean latency	should be p99 latency + RSS
Memory model	GC heap; per-object overhead; periodic pauses	deterministic free at scope; packed layouts
Latency tail driver	GC, JIT deopt, single event loop	mostly lock contention / I/O, no GC pauses
Cold start	JIT must warm up	native code, fast from request one
Built-in micro-bench guard	none (`performance.now()` is manual)	`std::hint::black_box`, plus Criterion
Standard bench tool	`tinybench`, `mitata`, autocannon (HTTP)	Criterion (functions), wrk/oha/k6 (HTTP)
Parallel CPU work	worker threads (heavy)	threads / `rayon` (cheap, safe)

The biggest conceptual shift for a TypeScript developer is that the average stops being the metric and the tail becomes the metric — and Rust’s advantage is most visible precisely in that tail, where Node.js pays for garbage collection and JIT.

Tip: When you publish migration numbers, always report the measurement conditions alongside them: hardware, concurrency level, payload size, dataset size, build profile (--release!), and number of samples. A number without its conditions is not reproducible and not credible.

Common Pitfalls

Benchmarking a debug build

The single most common mistake. A debug build (cargo run, no flags) has no optimizations and can be 10-100x slower than release. If you compare a Node.js service against a debug-mode Rust binary, Rust may look slower. Always benchmark --release.

1
fn main() {
2
    // Run this with `cargo run` and again with `cargo run --release`.
3
    let start = std::time::Instant::now();
4
    let mut acc: u64 = 0;
5
    for i in 0..50_000_000u64 {
6
        acc = acc.wrapping_add(i);
7
    }
8
    println!("{acc} in {:?}", start.elapsed());
9
}

The debug build runs the bounds-checked, unoptimized loop; the release build often optimizes the whole thing into a closed-form computation. The gap is enormous, and it is entirely an artifact of the build profile, not the language.

Comparing apples to oranges

Make the two services do the same work under the same conditions. A mismatch quietly invalidates the comparison:

Same connection pool size, same database, same dataset, same indexes.
Same payload shapes and sizes (see api-compatibility.md).
Same concurrency in the load generator. A Rust service that uses all cores will trivially beat a single Node.js process — but the fair comparison is against a Node.js cluster across the same cores.
Warm both services before measuring (JIT warm-up, connection pools, OS page cache).

Reporting the mean

We have hammered this, but it bears repeating because it is so tempting: the mean is dominated by outliers and describes no real request. Report p50/p90/p99 and the max. If you must give one number, give p99.

Coordinated omission

A subtle but devastating measurement bug. If your load generator sends a request, waits for the response, and only then sends the next one, then a slow response delays the requests behind it — which are never recorded as slow. Your tool silently omits exactly the measurements that matter, and your p99 looks far better than reality. Use a load generator that sends at a fixed rate (an open-model tool like wrk2, oha, or k6 configured with a constant arrival rate), or correct for it, so the queue delay shows up in the numbers.

Letting the optimizer delete your benchmark

In Rust, a pure function whose result is discarded can be optimized away entirely, making it look infinitely fast. Wrap inputs and outputs in std::hint::black_box (as shown above), or use Criterion, which does this for you. In Node.js, V8 can do the same thing — assign the result somewhere observable so the JIT cannot eliminate the call.

Best Practices

Measure the service, not the function. End-to-end request latency under realistic concurrency is the number leadership and users care about. Per-function micro-benchmarks are for finding a regression, not for the migration headline.
Always report percentiles. p50, p90, p99, p99.9, and max. Pair each with the measurement conditions.
Use the right tool for each layer. Criterion for pure functions; an HTTP load generator with a fixed arrival rate (oha, k6, wrk2) for the service; hdrhistogram to aggregate latencies in your own load harness.
Always benchmark --release (and ideally with lto = true for the production profile, matching what you actually deploy).
Measure memory honestly (RSS, not just heap) and report the steady-state footprint under load, not at idle.
Gate regressions in CI. Once you have a baseline, fail the build when p99 or memory regresses past a budget. See the Real-World Example.
Be honest about where Rust does not help. If a CRUD endpoint is 95% database wait, say so. Overclaiming erodes trust in the genuinely large wins. See common-challenges.md for when not to migrate at all.

Real-World Example

Here is a credible, reproducible benchmarking setup for a ported function, using Criterion — the standard statistical benchmarking harness for Rust. It runs many iterations, controls warm-up, applies black_box, and reports a confidence interval instead of a single noisy number.

1
[package]
2
name = "probe"
3
version = "0.1.0"
4
edition = "2024"
5

6
[dev-dependencies]
7
criterion = "0.8.2"
8

9
[[bench]]
10
name = "parsing"
11
harness = false

1
// src/lib.rs -- the hot, pure function we ported from the Node service.
2
/// Parse an ISO-8601 date "YYYY-MM-DD" and return the day of the year.
3
/// Returns None on malformed input.
4
pub fn day_of_year(date: &str) -> Option<u32> {
5
    let mut parts = date.split('-');
6
    let year: i32 = parts.next()?.parse().ok()?;
7
    let month: u32 = parts.next()?.parse().ok()?;
8
    let day: u32 = parts.next()?.parse().ok()?;
9
    if parts.next().is_some() || !(1..=12).contains(&month) || day == 0 {
10
        return None;
11
    }
12
    let leap = (year % 4 == 0 && year % 100 != 0) || year % 400 == 0;
13
    let days = [31, if leap { 29 } else { 28 }, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31];
14
    if day > days[(month - 1) as usize] {
15
        return None;
16
    }
17
    let mut total = day;
18
    for m in 0..(month - 1) as usize {
19
        total += days[m];
20
    }
21
    Some(total)
22
}

1
use criterion::{criterion_group, criterion_main, Criterion};
2
use probe::day_of_year;
3
use std::hint::black_box;
4

5
fn bench_day_of_year(c: &mut Criterion) {
6
    c.bench_function("day_of_year 2026-06-02", |b| {
7
        b.iter(|| day_of_year(black_box("2026-06-02")));
8
    });
9
}
10

11
criterion_group!(benches, bench_day_of_year);
12
criterion_main!(benches);

Real output from cargo bench:

1
day_of_year 2026-06-02  time:   [228.07 ns 287.79 ns 349.46 ns]
2
Found 5 outliers among 100 measurements (5.00%)
3
  5 (5.00%) high mild

The three numbers are the lower bound, the estimate, and the upper bound of a 95% confidence interval — a range, not a single point, which is exactly the honesty a migration report needs. Criterion also detects outliers and, on a second run, will tell you whether performance changed since the last run.

Measuring memory honestly

Throughput is half the story; the other half is memory footprint, which is often where a Rust migration delivers its quietest, biggest win. A Vec<i64> of 100,000 user IDs in Rust is a single packed allocation of exactly 100_000 * 8 bytes. The same array in JavaScript carries boxing and per-element overhead that can be several times larger.

You can prove the Rust side with a global allocator that counts live bytes:

1
use std::alloc::{GlobalAlloc, Layout, System};
2
use std::sync::atomic::{AtomicUsize, Ordering};
3

4
/// A wrapper allocator that tracks bytes currently allocated.
5
struct Counting;
6

7
static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
8

9
unsafe impl GlobalAlloc for Counting {
10
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
11
        let ptr = unsafe { System.alloc(layout) };
12
        if !ptr.is_null() {
13
            ALLOCATED.fetch_add(layout.size(), Ordering::Relaxed);
14
        }
15
        ptr
16
    }
17
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
18
        unsafe { System.dealloc(ptr, layout) };
19
        ALLOCATED.fetch_sub(layout.size(), Ordering::Relaxed);
20
    }
21
}
22

23
#[global_allocator]
24
static GLOBAL: Counting = Counting;
25

26
fn live_bytes() -> usize {
27
    ALLOCATED.load(Ordering::Relaxed)
28
}
29

30
fn main() {
31
    let before = live_bytes();
32
    let ids: Vec<i64> = (0..100_000).collect();
33
    let after = live_bytes();
34
    println!("live heap before : {before} bytes");
35
    println!("live heap after  : {after} bytes");
36
    println!(
37
        "delta            : {} bytes (~{} KiB)",
38
        after - before,
39
        (after - before) / 1024
40
    );
41
    println!("len              : {}", ids.len());
42
}

Real output:

1
live heap before : 524 bytes
2
live heap after  : 800524 bytes
3
delta            : 800000 bytes (~781 KiB)
4
len              : 100000

Exactly 800,000 bytes — 100_000 × 8 — with no per-element overhead. That precision is the point: in Rust you can predict memory from the types. For the whole-process number that operations teams track, measure RSS (resident set size) under steady-state load — on Linux, /proc/self/status (VmRSS) or ps -o rss, on macOS ps -o rss. Report RSS at idle and under load, because Node.js’s footprint grows with the GC heap while a Rust service’s tends to stay flat. A migration that drops steady-state RSS from 512 MB to 40 MB per instance can cut your cloud bill more than any latency number — and it is a number you can defend with ps.

Gating regressions in CI

Once you have a baseline p99, turn it into a budget the build enforces, so a future change cannot silently regress the tail:

1
/// Fail CI if the new p99 regressed by more than `tolerance` (e.g. 0.10 = 10%).
2
fn check_regression(baseline_p99: f64, new_p99: f64, tolerance: f64) -> Result<(), String> {
3
    let allowed = baseline_p99 * (1.0 + tolerance);
4
    if new_p99 > allowed {
5
        Err(format!(
6
            "p99 regression: {new_p99:.1} ms exceeds budget {allowed:.1} ms \
7
             (baseline {baseline_p99:.1} ms + {:.0}%)",
8
            tolerance * 100.0
9
        ))
10
    } else {
11
        Ok(())
12
    }
13
}
14

15
fn main() {
16
    match check_regression(120.0, 138.0, 0.10) {
17
        Ok(()) => println!("within budget"),
18
        Err(e) => println!("FAIL: {e}"),
19
    }
20
    match check_regression(120.0, 125.0, 0.10) {
21
        Ok(()) => println!("within budget"),
22
        Err(e) => println!("FAIL: {e}"),
23
    }
24
}

Real output:

1
FAIL: p99 regression: 138.0 ms exceeds budget 132.0 ms (baseline 120.0 ms + 10%)
2
within budget

Exercises

Exercise 1: Expose the tail

Difficulty: Beginner

Objective: Internalize why the median can look healthy while the tail is on fire.

Instructions: Given a vector of request latencies in milliseconds, compute p50 and p99 using the nearest-rank method, then print the “tail amplification” ratio p99 / p50. Use this data: [10, 11, 9, 12, 10, 11, 10, 9, 13, 250, 10, 11, 10, 12, 11, 9, 10, 11, 10, 180].

Solution

1
fn percentile(sorted: &[u64], pct: f64) -> u64 {
2
    if sorted.is_empty() {
3
        return 0;
4
    }
5
    let rank = ((pct / 100.0) * sorted.len() as f64).ceil() as usize;
6
    let idx = rank.saturating_sub(1).min(sorted.len() - 1);
7
    sorted[idx]
8
}
9

10
fn main() {
11
    let mut latencies: Vec<u64> = vec![
12
        10, 11, 9, 12, 10, 11, 10, 9, 13, 250,
13
        10, 11, 10, 12, 11, 9, 10, 11, 10, 180,
14
    ];
15
    latencies.sort_unstable();
16
    let p50 = percentile(&latencies, 50.0);
17
    let p99 = percentile(&latencies, 99.0);
18
    println!("p50 = {p50} ms, p99 = {p99} ms");
19
    println!("tail amplification (p99/p50) = {:.1}x", p99 as f64 / p50 as f64);
20
}

Real output:

1
p50 = 10 ms, p99 = 250 ms
2
tail amplification (p99/p50) = 25.0x

A median of 10 ms looks great on a dashboard, but 1% of users wait 250 ms — 25x longer. This is the gap a Rust migration most often closes.

Exercise 2: Aggregate without storing everything

Difficulty: Intermediate

Objective: Use a histogram to compute percentiles in bounded memory, the way a real load harness does.

Instructions: Add the hdrhistogram crate. Record 1,000 latencies where most are around 50 us but every 100th sample is a 5,000 us spike. Print p50, p99, and the max. Verify the median is unaffected by the spikes while the tail captures them.

Solution

1
[dependencies]
2
hdrhistogram = "7.5.4"

1
use hdrhistogram::Histogram;
2

3
fn main() {
4
    let mut hist = Histogram::<u64>::new(3).expect("create histogram");
5

6
    for i in 0..1_000u64 {
7
        let latency = if i % 100 == 99 { 5_000 } else { 50 };
8
        hist.record(latency).expect("value in range");
9
    }
10

11
    println!("count : {}", hist.len());
12
    println!("p50   : {} us", hist.value_at_quantile(0.50));
13
    println!("p99   : {} us", hist.value_at_quantile(0.99));
14
    println!("max   : {} us", hist.max());
15
}

Real output:

1
count : 1000
2
p50   : 50 us
3
p99   : 50 us
4
max   : 5003 us

With spikes at only 1% frequency, even p99 sits at the fast value — you would need p99.9 (value_at_quantile(0.999)) to see them. That is the lesson: pick the percentile that matches how rare your bad events are, and always look at the max. The histogram used a fixed amount of memory regardless of the 1,000 samples (and would for a billion).

Exercise 3: A regression gate with a memory budget

Difficulty: Advanced

Objective: Build a CI check that fails on either a latency regression or a memory regression, the way a production migration should guard its gains.

Instructions: Write a function check_budget that takes a baseline and a candidate measurement (each a struct with p99_ms: f64 and rss_mb: f64) plus a tolerance fraction, and returns Result<(), Vec<String>> listing every metric that regressed past baseline * (1 + tolerance). Test it with a candidate that improves p99 but regresses RSS.

Solution

1
#[derive(Clone, Copy)]
2
struct Measurement {
3
    p99_ms: f64,
4
    rss_mb: f64,
5
}
6

7
/// Returns Ok(()) if every metric is within budget, otherwise the list of
8
/// regressions. A metric regresses if it exceeds baseline * (1 + tolerance).
9
fn check_budget(
10
    baseline: Measurement,
11
    candidate: Measurement,
12
    tolerance: f64,
13
) -> Result<(), Vec<String>> {
14
    let mut failures = Vec::new();
15

16
    let p99_budget = baseline.p99_ms * (1.0 + tolerance);
17
    if candidate.p99_ms > p99_budget {
18
        failures.push(format!(
19
            "p99 {:.1} ms exceeds budget {:.1} ms",
20
            candidate.p99_ms, p99_budget
21
        ));
22
    }
23

24
    let rss_budget = baseline.rss_mb * (1.0 + tolerance);
25
    if candidate.rss_mb > rss_budget {
26
        failures.push(format!(
27
            "rss {:.1} MB exceeds budget {:.1} MB",
28
            candidate.rss_mb, rss_budget
29
        ));
30
    }
31

32
    if failures.is_empty() {
33
        Ok(())
34
    } else {
35
        Err(failures)
36
    }
37
}
38

39
fn main() {
40
    let baseline = Measurement { p99_ms: 120.0, rss_mb: 64.0 };
41
    // Latency got better, but a leak pushed memory up 30%.
42
    let candidate = Measurement { p99_ms: 95.0, rss_mb: 84.0 };
43

44
    match check_budget(baseline, candidate, 0.10) {
45
        Ok(()) => println!("PASS: all metrics within budget"),
46
        Err(regressions) => {
47
            println!("FAIL: {} regression(s):", regressions.len());
48
            for r in regressions {
49
                println!("  - {r}");
50
            }
51
        }
52
    }
53
}

Real output:

1
FAIL: 1 regression(s):
2
  - rss 84.0 MB exceeds budget 70.4 MB

The lesson: a migration can win on latency and lose on memory at the same time. A budget that watches both stops you from shipping a regression you were not looking for. Wire this into your CI step after the load test, reading the baseline from a committed file.

Measuring Performance Gains Honestly

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

Why percentiles, not averages

Where Rust actually moves the needle

The benchmark that fools you

Key Differences

Common Pitfalls

Benchmarking a debug build

Comparing apples to oranges

Reporting the mean

Coordinated omission

Letting the optimizer delete your benchmark

Best Practices

Real-World Example

Measuring memory honestly

Gating regressions in CI

Further Reading

Exercises

Exercise 1: Expose the tail

Exercise 2: Aggregate without storing everything

Exercise 3: A regression gate with a memory budget

Next: Common Migration Challenges

Measuring Performance Gains Honestly

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

Why percentiles, not averages

Where Rust actually moves the needle

The benchmark that fools you

Key Differences

Common Pitfalls

Benchmarking a debug build

Comparing apples to oranges

Reporting the mean

Coordinated omission

Letting the optimizer delete your benchmark

Best Practices

Real-World Example

Measuring memory honestly

Gating regressions in CI

Further Reading

Related sections in this guide

Exercises

Exercise 1: Expose the tail

Exercise 2: Aggregate without storing everything

Exercise 3: A regression gate with a memory budget

Next: Common Migration Challenges