Rate Limiting

21 min read

A public endpoint that anyone can call as fast as they like is a denial-of-service waiting to happen. Rate limiting caps how many requests a given client may make in a window of time, protecting your service (and the databases and third-party APIs behind it) from both malicious floods and accidental hammering. In a Rust web service the idiomatic approach is a Tower middleware layer — the same composable building block your other middleware uses — so the limiter slots in next to logging, tracing, and timeouts without touching your handlers.

Quick Overview

Rate limiting answers one question: has this client used up its allowance? The dominant algorithm is a token bucket — each client gets a bucket of N tokens, every request spends one, and tokens drip back at a fixed rate; an empty bucket means a 429 Too Many Requests. In Node you reach for express-rate-limit; in Rust you add the tower-governor crate, which wraps the high-performance governor limiter in a tower::Layer you attach with .layer(...). The big wins over the typical Node setup are that the limiter is in-process, lock-light, and allocation-frugal (no Redis round-trip needed for the common single-instance case), and that mistakes like forgetting per-IP context surface as obvious behavior rather than silent global limits.

Note: This page covers application-level rate limiting inside your Rust service: per-IP limits, a global cap, and per-route policies via a Tower layer. For caching responses to reduce load (a complementary technique), see caching.md; for the broader hardening picture, see production-checklist.md. Authentication-adjacent throttling (login brute-force protection) connects to ../27-security/README.md.

TypeScript/JavaScript Example

In an Express service the standard tool is express-rate-limit. You configure a window and a limit, register it as middleware, and it tracks counts per client IP in an in-memory store by default.

1
import express from "express";
2
import { rateLimit } from "express-rate-limit";
3

4
const app = express();
5

6
// 5 requests per minute per IP; replies 429 with a JSON body when exceeded.
7
const limiter = rateLimit({
8
  windowMs: 60_000, // 1 minute fixed window
9
  limit: 5,
10
  standardHeaders: "draft-7", // emit RateLimit-* headers
11
  legacyHeaders: false,
12
  message: { error: "rate_limited" },
13
});
14

15
app.use(limiter);
16
app.get("/", (_req, res) => res.send("hello"));
17

18
app.listen(3001);

Firing seven requests in quick succession from the same IP, the sixth and seventh are rejected:

1
req 1: 200
2
req 2: 200
3
req 3: 200
4
req 4: 200
5
req 5: 200
6
req 6: 429 retry-after=60
7
req 7: 429 retry-after=60
8
blocked body: {"error":"rate_limited"}

This works, but it carries two quiet caveats a senior engineer learns the hard way. First, the default in-memory store is per-process: run two Node instances behind a load balancer and each enforces its own limit, so the effective limit doubles — you need a shared RedisStore to fix it. Second, express-rate-limit reads the client IP from the connection unless you set app.set("trust proxy", ...); behind a reverse proxy, everyone shares the proxy’s IP and a single client can starve the rest. Both traps exist in Rust too, and tower-governor makes the proxy case explicit via the key extractor you choose.

Rust Equivalent

In Rust, rate limiting is a Tower layer. The tower-governor crate provides GovernorLayer, configured by a GovernorConfigBuilder, and attaches to an axum (or any Tower-based) router with .layer(...). Start a fresh project (cargo new selects the latest stable toolchain — currently Rust 1.96.0 on the 2024 edition) and add the dependencies:

1
[dependencies]
2
axum = "0.8"
3
tokio = { version = "1", features = ["full"] }
4
tower_governor = "0.8"

1
use std::net::SocketAddr;
2
use std::sync::Arc;
3
use std::time::Duration;
4

5
use axum::{routing::get, Router};
6
use tokio::net::TcpListener;
7
use tower_governor::{governor::GovernorConfigBuilder, GovernorLayer};
8

9
async fn hello() -> &'static str {
10
    "hello"
11
}
12

13
#[tokio::main]
14
async fn main() {
15
    // Allow a burst of 5 requests per client IP, replenishing one token every 2s.
16
    let governor_conf = Arc::new(
17
        GovernorConfigBuilder::default()
18
            .per_second(2)
19
            .burst_size(5)
20
            .finish()
21
            .unwrap(),
22
    );
23

24
    // Periodically evict idle IP buckets so memory does not grow unbounded.
25
    let limiter = governor_conf.limiter().clone();
26
    tokio::spawn(async move {
27
        let mut tick = tokio::time::interval(Duration::from_secs(60));
28
        loop {
29
            tick.tick().await;
30
            limiter.retain_recent();
31
        }
32
    });
33

34
    let app = Router::new()
35
        .route("/", get(hello))
36
        .layer(GovernorLayer::new(governor_conf));
37

38
    let listener = TcpListener::bind("0.0.0.0:3000").await.unwrap();
39
    // `with_connect_info` puts the peer SocketAddr into each request so the
40
    // default per-IP key extractor can read it.
41
    axum::serve(
42
        listener,
43
        app.into_make_service_with_connect_info::<SocketAddr>(),
44
    )
45
    .await
46
    .unwrap();
47
}

Firing eight requests rapidly from one machine, the first five pass and the rest are rejected with 429:

1
req 1 -> 200
2
req 2 -> 200
3
req 3 -> 200
4
req 4 -> 200
5
req 5 -> 200
6
req 6 -> 429
7
req 7 -> 429
8
req 8 -> 429

The body and headers of a blocked request, captured with curl -i:

1
HTTP/1.1 429 Too Many Requests
2
x-ratelimit-after: 1
3
retry-after: 1
4
content-length: 30
5
date: Tue, 02 Jun 2026 06:48:14 GMT
6

7
Too Many Requests! Wait for 1s

tower-governor sends retry-after (and its own x-ratelimit-after) out of the box, telling clients exactly how long to back off — the same contract the Express version provides, but enforced by a layer rather than handler-adjacent middleware.

Detailed Explanation

The token bucket underneath: GCRA

tower-governor is a thin Tower adapter over the governor crate, which implements the Generic Cell Rate Algorithm (GCRA) — a precise, allocation-free variant of the token bucket. Rather than counting requests in fixed windows (the approach express-rate-limit uses by default, which allows a 2x burst at a window boundary), GCRA tracks a single timestamp per key and computes whether enough time has elapsed to permit the next request. It is smooth, has no boundary spikes, and updates with a couple of atomic operations.

You can see the core limiter on its own, without any HTTP, by depending on governor directly:

1
[dependencies]
2
governor = "0.10"

1
use std::num::NonZeroU32;
2

3
use governor::{Quota, RateLimiter};
4

5
fn main() {
6
    // A quota of 3 requests, replenishing the full burst once per second.
7
    let quota = Quota::per_second(NonZeroU32::new(3).unwrap());
8
    let limiter = RateLimiter::direct(quota);
9

10
    // The first 3 checks pass (the burst), the 4th is denied.
11
    for i in 1..=4 {
12
        match limiter.check() {
13
            Ok(()) => println!("request {i}: allowed"),
14
            Err(_) => println!("request {i}: rate limited"),
15
        }
16
    }
17
}

This prints, deterministically (no waiting between checks):

1
request 1: allowed
2
request 2: allowed
3
request 3: allowed
4
request 4: rate limited

RateLimiter::direct is a single, unkeyed bucket; RateLimiter::keyed (what tower-governor uses internally) maintains one bucket per key in a concurrent hash map. The check() returns a Result — Ok to proceed, Err carrying when the next request will be allowed — which is exactly the information tower-governor turns into a retry-after header.

`GovernorConfigBuilder`: period and burst

Two numbers define a quota:

burst_size(n) — the bucket capacity, i.e. how many requests may arrive back-to-back before throttling kicks in.
per_second(s) / per_millisecond(ms) / period(Duration) — how often one token is replenished.

So .per_second(2).burst_size(5) means “up to 5 at once, then one more every 2 seconds.” This pair maps onto the Express windowMs/limit mental model but expresses sustained rate and burst tolerance independently, which fixed windows cannot.

.finish() returns Option<GovernorConfig> — it is None if you pass a zero burst or zero period (an unsatisfiable quota), which is why the examples .unwrap() a known-good config. Wrap the result in Arc once and share it: constructing the same config twice creates two independent limiters, a subtle bug the crate’s own docs warn about.

Key extractors: who counts as “a client”?

The GovernorConfigBuilder carries a key extractor that decides what to bucket on. Three are built in:

PeerIpKeyExtractor (the default) — buckets by the TCP peer address. Correct only when clients connect to you directly.
SmartIpKeyExtractor — reads X-Forwarded-For, then X-Real-IP, then the Forwarded header, falling back to the peer IP. This is what you want behind a load balancer or CDN.
GlobalKeyExtractor — one bucket for all traffic, for a hard cap on total throughput.

The default extractor needs the peer SocketAddr, which axum only injects when you serve with into_make_service_with_connect_info::<SocketAddr>(). Forget that and every request fails to extract a key (see Pitfalls). Switching to a proxy-aware extractor with full rate-limit headers looks like this:

1
use std::sync::Arc;
2

3
use tower_governor::governor::GovernorConfigBuilder;
4
use tower_governor::key_extractor::SmartIpKeyExtractor;
5

6
fn main() {
7
    let per_ip = Arc::new(
8
        GovernorConfigBuilder::default()
9
            .per_second(2)
10
            .burst_size(5)
11
            .key_extractor(SmartIpKeyExtractor)
12
            .use_headers() // emit x-ratelimit-limit / x-ratelimit-remaining
13
            .finish()
14
            .unwrap(),
15
    );
16

17
    // `per_ip` is now ready to hand to `GovernorLayer::new(per_ip)`.
18
    println!("configured: {}", Arc::strong_count(&per_ip));
19
}

With .use_headers() enabled, a successful request now advertises the client’s remaining allowance:

1
HTTP/1.1 200 OK
2
content-type: text/plain; charset=utf-8
3
x-ratelimit-limit: 5
4
x-ratelimit-remaining: 4
5
content-length: 5
6

7
hello

and a blocked one reports zero remaining alongside the retry hint:

1
HTTP/1.1 429 Too Many Requests
2
x-ratelimit-after: 1
3
retry-after: 1
4
x-ratelimit-limit: 5
5
x-ratelimit-remaining: 0
6

7
Too Many Requests! Wait for 1s

Where the layer sits

GovernorLayer is an ordinary Tower layer, so it composes with everything else through .layer(...) (or a ServiceBuilder). Layers wrap outermost-first: the last layer you add is the first to see a request. Put rate limiting before expensive work (auth, DB queries) so rejected requests cost almost nothing, but after request-ID/tracing layers so even a 429 is logged with context. Because the bucket map lives in process memory, the retain_recent background task shown in the main example is important: without it, every distinct IP that ever connects leaves a bucket behind forever.

Key Differences

Concern	TypeScript / Express (`express-rate-limit`)	Rust (`tower-governor`)
Integration point	`app.use(limiter)` middleware	`.layer(GovernorLayer::new(cfg))` Tower layer
Algorithm	fixed window (default)	GCRA token bucket (smooth, no boundary burst)
Quota model	`windowMs` + `limit`	`burst_size` + replenish `period` (independent)
Per-client key	client IP (needs `trust proxy`)	choice of `PeerIp` / `SmartIp` / `Global` / custom extractor
Proxy awareness	opt-in `trust proxy` setting	explicit `SmartIpKeyExtractor`
Multi-instance	per-process unless `RedisStore`	per-process unless you add a shared store
Rejection response	`429` + `RateLimit-*` headers	`429` + `retry-after` / `x-ratelimit-*` headers
Memory growth	store-dependent	manual `retain_recent()` to evict idle buckets
Performance	per-request object + map ops	lock-light atomics, no per-request allocation

The deepest conceptual difference is the algorithm. A fixed window resets its counter at clock boundaries, so a client can fire limit requests at 00:59 and another limit at 01:00 — a 2x burst across the seam. GCRA has no seam: it enforces a steady rate with a configurable burst, which is both fairer and harder to game.

Note: Unlike the Express middleware, which keeps a count per process by default and silently lets your effective limit scale with your replica count, tower-governor’s in-process limiter is the same trade-off — it is not a distributed limiter. The fix is identical in spirit (a shared store), but Rust makes the key you bucket on an explicit type-level choice rather than a config string, so “we forgot to trust the proxy” becomes “we chose PeerIpKeyExtractor,” which is visible in the code.

Common Pitfalls

Pitfall 1: Forgetting `with_connect_info`, so every request 500s

The default PeerIpKeyExtractor needs the peer SocketAddr in the request extensions. axum only puts it there when you serve with into_make_service_with_connect_info::<SocketAddr>(). Use plain into_make_service() and the code still compiles — but at runtime every request fails key extraction:

1
// compiles, but breaks at runtime: no connect info means no key
2
axum::serve(listener, app.into_make_service()).await.unwrap();

The limiter cannot find an IP and returns a GovernorError::UnableToExtractKey, which surfaces as a 500:

1
HTTP/1.1 500 Internal Server Error
2
content-length: 22
3
date: Tue, 02 Jun 2026 06:50:51 GMT
4

5
Unable To Extract Key!

Because this is a runtime failure rather than a compile error, it is easy to ship. Always pair PeerIpKeyExtractor/SmartIpKeyExtractor with into_make_service_with_connect_info::<SocketAddr>(), and test a real request before trusting the limiter.

Pitfall 2: Trusting `X-Forwarded-For` when you are not behind a trusted proxy

SmartIpKeyExtractor reads X-Forwarded-For, which the client fully controls. If your service is exposed directly (no proxy that overwrites the header), an attacker simply sends a different X-Forwarded-For per request and gets an unlimited number of fresh buckets — defeating the limit entirely. Only use SmartIpKeyExtractor when a trusted proxy/load balancer sets that header and strips any client-supplied value. When clients connect to you directly, use the default PeerIpKeyExtractor. This is the exact same hazard as Express’s trust proxy, just made explicit by the extractor name.

Pitfall 3: Building the config twice

Each call to .finish() builds a new, independent limiter with its own bucket map. If you write GovernorLayer::new(GovernorConfigBuilder::default()....finish().unwrap()) inside a per-route closure or a loop, every route gets a separate limiter and the limits do not combine the way you expect. Build one Arc<GovernorConfig> and clone the Arc (cheap, just a refcount bump) wherever you need the layer.

Pitfall 4: `finish()` returns `None`, and `.unwrap()` panics at startup

A zero burst_size or zero period is an impossible quota, so .finish() returns None. Calling .unwrap() on it panics — which is acceptable at startup (a misconfiguration should stop the process from booting, just like the config validation in environment.md), but make sure those values come from validated config, not directly from unchecked user input.

Pitfall 5: Unbounded memory from never evicting buckets

Every distinct key creates a bucket that lives until you remove it. On a public endpoint, that means one entry per IP that has ever connected. Spawn the retain_recent() cleanup task shown in the main example (or call it periodically) so idle buckets are reclaimed; otherwise a long-running service slowly leaks memory under a wide client base.

Best Practices

Pick the key extractor that matches your deployment. Direct exposure → PeerIpKeyExtractor. Behind a trusted proxy/CDN → SmartIpKeyExtractor. A coarse total-throughput cap → GlobalKeyExtractor. Per-account or per-API-key fairness → a custom KeyExtractor.
Layer global and per-IP limits together. A GlobalKeyExtractor cap protects a shared downstream (a database, a paid third-party API) from total overload, while a per-IP limit keeps any single client fair. Apply both as stacked layers.
Set burst_size and the replenish rate from real traffic shapes, not round numbers. Allow enough burst for legitimate clients (a page that fires several XHRs on load) while keeping the sustained rate tight.
Emit retry-after (and consider .use_headers()). Well-behaved clients honor it and back off, smoothing load instead of retrying in a tight loop.
Build the config once, share it via Arc. Never reconstruct it per request or per route.
Run retain_recent() on a timer to bound memory.
Rate limit early in the layer stack so rejected requests don’t touch auth or the database, but keep tracing/request-ID layers outermost so 429s are still observable.
For multiple replicas, move to a shared limiter (e.g. a Redis-backed token bucket) when the per-process approximation is no longer acceptable — see caching.md for the Redis client patterns this builds on.

Tip: Rate limiting and load shedding are different tools. A limiter rejects too many requests from a client; a tower::limit::ConcurrencyLimitLayer or a timeout rejects too much work in flight on the server. Production services usually want both — a per-client rate limit and a server-wide concurrency cap — composed as separate Tower layers.

Real-World Example

A production API typically wants three things at once: a per-IP limit so no single caller dominates, full rate-limit headers so clients can self-throttle, and a 429 body in the same JSON shape as the rest of the API (not the crate’s default plain-text message). tower-governor supports a custom error handler on the layer for exactly this. This self-contained server uses SmartIpKeyExtractor (assume a trusted proxy), evicts idle buckets, and returns JSON errors.

1
[dependencies]
2
axum = "0.8"
3
tokio = { version = "1", features = ["full"] }
4
tower_governor = "0.8"

1
use std::net::SocketAddr;
2
use std::sync::Arc;
3
use std::time::Duration;
4

5
use axum::body::Body;
6
use axum::http::{header, Response, StatusCode};
7
use axum::{routing::get, Router};
8
use tokio::net::TcpListener;
9
use tower_governor::governor::GovernorConfigBuilder;
10
use tower_governor::key_extractor::SmartIpKeyExtractor;
11
use tower_governor::{GovernorError, GovernorLayer};
12

13
async fn hello() -> &'static str {
14
    "hello"
15
}
16

17
// Turn governor's errors into a JSON body matching the rest of our API.
18
fn json_error(err: GovernorError) -> Response<Body> {
19
    let (status, body, retry_after) = match err {
20
        GovernorError::TooManyRequests { wait_time, .. } => (
21
            StatusCode::TOO_MANY_REQUESTS,
22
            format!(r#"{{"error":"rate_limited","retry_after_seconds":{wait_time}}}"#),
23
            Some(wait_time),
24
        ),
25
        GovernorError::UnableToExtractKey => (
26
            StatusCode::INTERNAL_SERVER_ERROR,
27
            r#"{"error":"internal"}"#.to_string(),
28
            None,
29
        ),
30
        GovernorError::Other { code, msg, .. } => (
31
            code,
32
            format!(r#"{{"error":"{}"}}"#, msg.unwrap_or_default()),
33
            None,
34
        ),
35
    };
36

37
    let mut builder = Response::builder()
38
        .status(status)
39
        .header(header::CONTENT_TYPE, "application/json");
40
    if let Some(secs) = retry_after {
41
        builder = builder.header(header::RETRY_AFTER, secs.to_string());
42
    }
43
    builder.body(Body::from(body)).unwrap()
44
}
45

46
#[tokio::main]
47
async fn main() {
48
    let conf = Arc::new(
49
        GovernorConfigBuilder::default()
50
            .per_second(2)
51
            .burst_size(5)
52
            .key_extractor(SmartIpKeyExtractor) // trust the proxy's forwarded IP
53
            .finish()
54
            .unwrap(),
55
    );
56

57
    // Reclaim idle per-IP buckets every minute.
58
    let limiter = conf.limiter().clone();
59
    tokio::spawn(async move {
60
        let mut tick = tokio::time::interval(Duration::from_secs(60));
61
        loop {
62
            tick.tick().await;
63
            limiter.retain_recent();
64
        }
65
    });
66

67
    let app = Router::new()
68
        .route("/", get(hello))
69
        .layer(GovernorLayer::new(conf).error_handler(json_error));
70

71
    let listener = TcpListener::bind("0.0.0.0:3000").await.unwrap();
72
    axum::serve(
73
        listener,
74
        app.into_make_service_with_connect_info::<SocketAddr>(),
75
    )
76
    .await
77
    .unwrap();
78
}

After exhausting the burst for one forwarded IP, a blocked request returns a JSON error with a retry-after header — captured with curl -i -H 'X-Forwarded-For: 203.0.113.9':

1
HTTP/1.1 429 Too Many Requests
2
content-type: application/json
3
retry-after: 1
4
content-length: 48
5
date: Tue, 02 Jun 2026 06:50:16 GMT
6

7
{"error":"rate_limited","retry_after_seconds":1}

Different forwarded IPs get independent buckets, so one noisy client never starves the rest — the property the whole exercise exists to guarantee.

Tip: To rate-limit only some routes — say, throttle /login hard for brute-force protection while leaving /health untouched — attach the layer to a sub-router or an individual route rather than the whole app. A Router::new().route("/login", get(login).layer(GovernorLayer::new(login_rl))) merged with an unthrottled .route("/health", get(health)) lets /health answer every request while /login enforces its quota. Keep health and readiness probes (see health-checks.md) off the limiter so an outage’s probe traffic is never itself rate limited.

Exercises

Exercise 1: A global throughput cap

Difficulty: Beginner

Objective: Configure a single, app-wide rate limit using GlobalKeyExtractor so the whole service never exceeds a fixed request rate, regardless of who is calling.

Instructions: Using axum = "0.8", tokio, and tower_governor = "0.8", build a router with one GET / route returning "ok". Attach a GovernorLayer configured with GlobalKeyExtractor, a burst of 3, and one token replenished per second. Serve it (a global limiter does not need per-IP connect info, but serving with connect info is harmless). Verify that the 4th rapid request from any client returns 429 while the first three return 200.

Solution

1
[dependencies]
2
axum = "0.8"
3
tokio = { version = "1", features = ["full"] }
4
tower_governor = "0.8"

1
use std::net::SocketAddr;
2
use std::sync::Arc;
3

4
use axum::{routing::get, Router};
5
use tokio::net::TcpListener;
6
use tower_governor::governor::GovernorConfigBuilder;
7
use tower_governor::key_extractor::GlobalKeyExtractor;
8
use tower_governor::GovernorLayer;
9

10
async fn ok() -> &'static str {
11
    "ok"
12
}
13

14
#[tokio::main]
15
async fn main() {
16
    let conf = Arc::new(
17
        GovernorConfigBuilder::default()
18
            .per_second(1)
19
            .burst_size(3)
20
            .key_extractor(GlobalKeyExtractor)
21
            .finish()
22
            .unwrap(),
23
    );
24

25
    let app = Router::new()
26
        .route("/", get(ok))
27
        .layer(GovernorLayer::new(conf));
28

29
    let listener = TcpListener::bind("0.0.0.0:3000").await.unwrap();
30
    axum::serve(
31
        listener,
32
        app.into_make_service_with_connect_info::<SocketAddr>(),
33
    )
34
    .await
35
    .unwrap();
36
}

With a burst of 3, the first three rapid requests return 200 and the fourth returns 429, no matter which IP they come from — because GlobalKeyExtractor uses one shared bucket (type Key = ()) for all traffic.

Exercise 2: Per-route policies

Difficulty: Intermediate

Objective: Apply different limits to different routes — a strict cap on a sensitive endpoint and a looser one for read traffic — while leaving a health endpoint unthrottled.

Instructions: Build a router with three routes: GET /login (strict: burst 5, one token/minute), GET /search (loose: burst 30, one token/second), and GET /health (no limit). Attach a separate GovernorLayer to each of the first two routes (build one Arc<GovernorConfig> per policy), and add /health with no layer. Serve with per-IP connect info. Verify that /login returns 429 after its 5th rapid request while /health answers every request.

Solution

1
[dependencies]
2
axum = "0.8"
3
tokio = { version = "1", features = ["full"] }
4
tower_governor = "0.8"

1
use std::net::SocketAddr;
2
use std::sync::Arc;
3

4
use axum::{routing::get, Router};
5
use tokio::net::TcpListener;
6
use tower_governor::{governor::GovernorConfigBuilder, GovernorLayer};
7

8
async fn login() -> &'static str {
9
    "login"
10
}
11
async fn search() -> &'static str {
12
    "results"
13
}
14
async fn health() -> &'static str {
15
    "ok"
16
}
17

18
#[tokio::main]
19
async fn main() {
20
    // Strict: brute-force protection on the login endpoint.
21
    let login_rl = Arc::new(
22
        GovernorConfigBuilder::default()
23
            .per_second(60)
24
            .burst_size(5)
25
            .finish()
26
            .unwrap(),
27
    );
28
    // Loose: read-heavy search traffic.
29
    let search_rl = Arc::new(
30
        GovernorConfigBuilder::default()
31
            .per_second(1)
32
            .burst_size(30)
33
            .finish()
34
            .unwrap(),
35
    );
36

37
    let limited = Router::new()
38
        .route("/login", get(login).layer(GovernorLayer::new(login_rl)))
39
        .route("/search", get(search).layer(GovernorLayer::new(search_rl)));
40

41
    let app = Router::new().merge(limited).route("/health", get(health));
42

43
    let listener = TcpListener::bind("0.0.0.0:3000").await.unwrap();
44
    axum::serve(
45
        listener,
46
        app.into_make_service_with_connect_info::<SocketAddr>(),
47
    )
48
    .await
49
    .unwrap();
50
}

Firing seven rapid requests at /login produces 200, 200, 200, 200, 200, 429, 429, while ten rapid requests at /health all return 200 — the limiter only wraps the routes it is attached to.

Exercise 3: A custom per-API-key extractor

Difficulty: Advanced

Objective: Implement a custom KeyExtractor that buckets by the x-api-key header, falling back to the peer IP for anonymous callers — so each API key gets its own fair allowance.

Instructions: Implement KeyExtractor for a unit struct ApiKeyExtractor with type Key = String. In extract, return format!("key:{value}") when an x-api-key header is present; otherwise read the peer IP from axum::extract::ConnectInfo<SocketAddr> in the request extensions and return format!("ip:{ip}"), or GovernorError::UnableToExtractKey if neither is available. Wire it into a GovernorConfigBuilder (burst 3, one token/2s) on a GET / route and verify that two different API keys get independent buckets.

Solution

1
[dependencies]
2
axum = "0.8"
3
tokio = { version = "1", features = ["full"] }
4
tower_governor = "0.8"

1
use std::net::SocketAddr;
2
use std::sync::Arc;
3

4
use axum::http::Request;
5
use axum::{routing::get, Router};
6
use tokio::net::TcpListener;
7
use tower_governor::governor::GovernorConfigBuilder;
8
use tower_governor::key_extractor::KeyExtractor;
9
use tower_governor::{GovernorError, GovernorLayer};
10

11
async fn hello() -> &'static str {
12
    "hello"
13
}
14

15
#[derive(Clone)]
16
struct ApiKeyExtractor;
17

18
impl KeyExtractor for ApiKeyExtractor {
19
    type Key = String;
20

21
    fn extract<T>(&self, req: &Request<T>) -> Result<Self::Key, GovernorError> {
22
        // Prefer the API key when present...
23
        if let Some(key) = req
24
            .headers()
25
            .get("x-api-key")
26
            .and_then(|v| v.to_str().ok())
27
        {
28
            return Ok(format!("key:{key}"));
29
        }
30
        // ...otherwise fall back to the peer IP.
31
        req.extensions()
32
            .get::<axum::extract::ConnectInfo<SocketAddr>>()
33
            .map(|ci| format!("ip:{}", ci.0.ip()))
34
            .ok_or(GovernorError::UnableToExtractKey)
35
    }
36
}
37

38
#[tokio::main]
39
async fn main() {
40
    let conf = Arc::new(
41
        GovernorConfigBuilder::default()
42
            .per_second(2)
43
            .burst_size(3)
44
            .key_extractor(ApiKeyExtractor)
45
            .finish()
46
            .unwrap(),
47
    );
48

49
    let app = Router::new()
50
        .route("/", get(hello))
51
        .layer(GovernorLayer::new(conf));
52

53
    let listener = TcpListener::bind("0.0.0.0:3000").await.unwrap();
54
    axum::serve(
55
        listener,
56
        app.into_make_service_with_connect_info::<SocketAddr>(),
57
    )
58
    .await
59
    .unwrap();
60
}

Sending four rapid requests with x-api-key: AAA yields 200, 200, 200, 429, while a request with x-api-key: BBB still returns 200 — each key has its own bucket because the extracted Key strings differ. The peer-IP fallback means anonymous callers are still limited, just grouped by source address instead of key.

Note: The KeyExtractor trait also defines name and key_name methods, but those are gated behind the crate’s tracing feature; with the default features the two members shown here are all you need to implement.

Rate Limiting

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

The token bucket underneath: GCRA

GovernorConfigBuilder: period and burst

Key extractors: who counts as “a client”?

Where the layer sits

Key Differences

Common Pitfalls

Pitfall 1: Forgetting with_connect_info, so every request 500s

Pitfall 2: Trusting X-Forwarded-For when you are not behind a trusted proxy

Pitfall 3: Building the config twice

Pitfall 4: finish() returns None, and .unwrap() panics at startup

Pitfall 5: Unbounded memory from never evicting buckets

Best Practices

Real-World Example

Further Reading

Exercises

Exercise 1: A global throughput cap

Exercise 2: Per-route policies

Exercise 3: A custom per-API-key extractor

`GovernorConfigBuilder`: period and burst

Pitfall 1: Forgetting `with_connect_info`, so every request 500s

Pitfall 2: Trusting `X-Forwarded-For` when you are not behind a trusted proxy

Pitfall 4: `finish()` returns `None`, and `.unwrap()` panics at startup