Health and Readiness Endpoints

20 min read

A health check is the contract between your service and whatever is operating it — Kubernetes, a load balancer, an autoscaler, or a paging system. Get it wrong and a perfectly healthy service gets restarted in a loop, or a broken one keeps receiving traffic. This chapter shows how to expose liveness and readiness endpoints in Rust with axum, and how to check downstream dependencies safely.

Quick Overview

Production orchestrators poll two distinct kinds of probe, and conflating them is the single most common health-check bug:

Liveness answers “is this process broken beyond recovery?” — if it fails, the orchestrator restarts the container. It must be cheap and must never depend on the database or other services.
Readiness answers “should this instance receive traffic right now?” — if it fails, the orchestrator stops routing requests to this instance but leaves it running. This is where you check dependencies (database, cache, downstream APIs).

In Node you usually bolt these onto an Express router. In Rust the shape is the same, but the type system makes the response codes explicit, native async lets you check dependencies with hard timeouts, and tokio::join! lets you probe several dependencies concurrently.

TypeScript/JavaScript Example

A typical Express service exposes both probes. Note how readiness pings the database while liveness deliberately does not:

1
// health.ts — Express on Node v22
2
import express, { Request, Response } from "express";
3
import { Pool } from "pg";
4
import { createClient } from "redis";
5

6
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
7
const redis = createClient({ url: process.env.REDIS_URL });
8
await redis.connect();
9

10
// `ready` flips to true once startup work (migrations, warm pools) is done.
11
let ready = false;
12

13
export const health = express.Router();
14

15
// Liveness: cheap, no dependencies. If this fails, restart me.
16
health.get("/health/live", (_req: Request, res: Response) => {
17
  res.status(200).json({ status: "ok" });
18
});
19

20
// Readiness: check dependencies. If this fails, stop sending me traffic.
21
health.get("/health/ready", async (_req: Request, res: Response) => {
22
  if (!ready) {
23
    return res.status(503).json({ status: "starting", checks: [] });
24
  }
25

26
  const checks: { name: string; healthy: boolean; detail?: string }[] = [];
27

28
  try {
29
    await Promise.race([
30
      pool.query("SELECT 1"),
31
      timeout(2000), // a hung DB must not hang the probe
32
    ]);
33
    checks.push({ name: "database", healthy: true });
34
  } catch (err) {
35
    checks.push({ name: "database", healthy: false, detail: String(err) });
36
  }
37

38
  try {
39
    await Promise.race([redis.ping(), timeout(2000)]);
40
    checks.push({ name: "cache", healthy: true });
41
  } catch (err) {
42
    checks.push({ name: "cache", healthy: false, detail: String(err) });
43
  }
44

45
  const allOk = checks.every((c) => c.healthy);
46
  res
47
    .status(allOk ? 200 : 503)
48
    .json({ status: allOk ? "ready" : "degraded", checks });
49
});
50

51
function timeout(ms: number): Promise<never> {
52
  return new Promise((_, reject) =>
53
    setTimeout(() => reject(new Error("timed out")), ms),
54
  );
55
}

Key points:

Liveness is a constant 200; readiness returns 503 when a dependency is down so the load balancer drains the instance.
Promise.race against a timeout protects the probe from a hung dependency — without it, a stuck pool.query would hang the endpoint and the orchestrator would eventually kill a healthy process.
A ready flag gates traffic until startup finishes.

Rust Equivalent

The same service in axum. The dependency clients (Db, Cache) stand in for sqlx::PgPool and a Redis client; the structure is identical to what you would write with the real crates from Section 17: Database.

1
use std::sync::Arc;
2
use std::sync::atomic::{AtomicBool, Ordering};
3
use std::time::Duration;
4

5
use axum::{
6
    Json, Router,
7
    extract::State,
8
    http::StatusCode,
9
    response::IntoResponse,
10
    routing::get,
11
};
12
use serde::Serialize;
13
use serde_json::json;
14

15
// Stand-ins for real clients (`sqlx::PgPool`, a Redis client, ...).
16
#[derive(Clone)]
17
struct Db;
18
impl Db {
19
    async fn ping(&self) -> Result<(), String> {
20
        tokio::time::sleep(Duration::from_millis(3)).await;
21
        Ok(()) // imagine: sqlx::query("SELECT 1").execute(pool).await
22
    }
23
}
24

25
#[derive(Clone)]
26
struct Cache;
27
impl Cache {
28
    async fn ping(&self) -> Result<(), String> {
29
        tokio::time::sleep(Duration::from_millis(2)).await;
30
        Ok(()) // imagine a Redis PING
31
    }
32
}
33

34
#[derive(Clone)]
35
struct AppState {
36
    db: Db,
37
    cache: Cache,
38
    // Flipped to `true` once startup work finishes (see Detailed Explanation).
39
    ready: Arc<AtomicBool>,
40
}
41

42
#[derive(Serialize)]
43
struct CheckResult {
44
    name: &'static str,
45
    healthy: bool,
46
    #[serde(skip_serializing_if = "Option::is_none")]
47
    detail: Option<String>,
48
}
49

50
fn to_result(name: &'static str, r: Result<(), String>) -> CheckResult {
51
    match r {
52
        Ok(()) => CheckResult { name, healthy: true, detail: None },
53
        Err(e) => CheckResult { name, healthy: false, detail: Some(e) },
54
    }
55
}
56

57
// Liveness: cheap, no dependencies. If this fails the orchestrator restarts us.
58
async fn liveness() -> impl IntoResponse {
59
    (StatusCode::OK, Json(json!({ "status": "ok" })))
60
}
61

62
// Readiness: checks every dependency CONCURRENTLY, each bounded by a deadline.
63
async fn readiness(State(state): State<AppState>) -> impl IntoResponse {
64
    if !state.ready.load(Ordering::Relaxed) {
65
        return (
66
            StatusCode::SERVICE_UNAVAILABLE,
67
            Json(json!({ "status": "starting", "checks": [] })),
68
        );
69
    }
70

71
    let deadline = Duration::from_secs(2);
72
    let db_fut = tokio::time::timeout(deadline, state.db.ping());
73
    let cache_fut = tokio::time::timeout(deadline, state.cache.ping());
74

75
    // Run both at once: total latency is max(db, cache), not the sum.
76
    let (db_res, cache_res) = tokio::join!(db_fut, cache_fut);
77

78
    // A timeout (the outer Err) and a failed ping (the inner Err) both mean
79
    // "unhealthy"; collapse them into one Result<(), String>.
80
    let flatten = |r: Result<Result<(), String>, tokio::time::error::Elapsed>| match r {
81
        Ok(inner) => inner,
82
        Err(_) => Err("timed out".to_string()),
83
    };
84

85
    let checks = vec![
86
        to_result("database", flatten(db_res)),
87
        to_result("cache", flatten(cache_res)),
88
    ];
89
    let all_ok = checks.iter().all(|c| c.healthy);
90

91
    let code = if all_ok { StatusCode::OK } else { StatusCode::SERVICE_UNAVAILABLE };
92
    let body = json!({
93
        "status": if all_ok { "ready" } else { "degraded" },
94
        "checks": checks,
95
    });
96
    (code, Json(body))
97
}
98

99
fn app(state: AppState) -> Router {
100
    Router::new()
101
        .route("/health/live", get(liveness))
102
        .route("/health/ready", get(readiness))
103
        .with_state(state)
104
}
105

106
#[tokio::main]
107
async fn main() {
108
    let state = AppState {
109
        db: Db,
110
        cache: Cache,
111
        ready: Arc::new(AtomicBool::new(true)),
112
    };
113
    let listener = tokio::net::TcpListener::bind("127.0.0.1:8772").await.unwrap();
114
    axum::serve(listener, app(state)).await.unwrap();
115
}

The dependencies for this example:

1
[dependencies]
2
axum = "0.8"
3
tokio = { version = "1", features = ["full"] }
4
serde = { version = "1", features = ["derive"] }
5
serde_json = "1"

Note: The current stable toolchain is Rust 1.96.0 on the 2024 edition. cargo new selects it automatically, and cargo add axum tokio serde serde_json resolves the versions above.

Hitting both endpoints against the running server returns compact JSON (use -w '\n[HTTP %{http_code}]\n' to also print the status code):

1
$ curl -s http://127.0.0.1:8772/health/live
2
{"status":"ok"}
3

4
$ curl -s -w '\n[HTTP %{http_code}]\n' http://127.0.0.1:8772/health/ready
5
{"checks":[{"healthy":true,"name":"database"},{"healthy":true,"name":"cache"}],"status":"ready"}
6
[HTTP 200]

Pipe to jq if you want it pretty-printed: curl -s http://127.0.0.1:8772/health/ready | jq.

When a dependency is down, readiness reports the specific failure and returns 503 so the load balancer drains the instance. For example, if the cache ping returned Err("connection refused") the response body would be (again, compact):

1
$ curl -s -w '\n[HTTP %{http_code}]\n' http://127.0.0.1:8772/health/ready
2
{"checks":[{"healthy":true,"name":"database"},{"detail":"connection refused","healthy":false,"name":"cache"}],"status":"degraded"}
3
[HTTP 503]

Detailed Explanation

Why two endpoints, not one

A single /health endpoint cannot serve both purposes, and using one for both causes outages:

Probe	Question it answers	On failure	May touch dependencies?
Liveness	Is the process wedged / deadlocked?	Restart the container	No — never
Readiness	Should this instance get traffic now?	Remove from the LB pool	Yes — that is the point

The trap: if your liveness probe pings the database, then a brief database outage makes liveness fail, the orchestrator restarts every instance, and now you have a database outage and a thundering herd of cold-starting processes hammering the database as it recovers. Liveness must depend only on the process itself.

`impl IntoResponse` and tuple responses

Both handlers return impl IntoResponse. axum implements IntoResponse for many shapes, including:

StatusCode → an empty body with that status.
Json<T> → a 200 with a JSON body.
(StatusCode, Json<T>) → that status with a JSON body.

The readiness handler returns (StatusCode, Json<Value>) from both branches. That uniformity matters: every return path and the tail expression must produce the same type, because impl IntoResponse resolves to one concrete type. (Mixing a bare StatusCode with a tuple is a compile error — see Common Pitfalls.)

Concurrent checks with `tokio::join!`

The Node version awaits the database, then awaits Redis, so the probe’s latency is the sum. tokio::join! polls both futures on the same task concurrently, so latency is the max:

1
let (db_res, cache_res) = tokio::join!(db_fut, cache_fut);

Unlike Promise.all, tokio::join! does not short-circuit on the first failure — it waits for every future and gives you all results, which is exactly what a health report wants: you want to know every unhealthy dependency, not just the first one.

Bounding every check with a timeout

tokio::time::timeout(deadline, fut) wraps a future and returns Err(Elapsed) if it does not finish in time. This is the Rust equivalent of Promise.race([work, timeout(2000)]), but it actually cancels the inner future when the deadline fires (Rust futures are lazy and droppable), rather than leaving an orphaned operation running. A health probe with no timeout is a latent outage: a single hung connection turns into a hung endpoint, and the orchestrator eventually kills a process that was otherwise fine.

The startup gate

ready: Arc<AtomicBool> mirrors the let ready = false flag in the Node example. Until startup work (warming pools, running migrations, priming caches) completes, readiness returns 503 "starting" so traffic is held back. An AtomicBool is the right tool here: it is shared across handler tasks (Arc), needs no lock for a single boolean, and Ordering::Relaxed is sufficient because the value is independent of any other memory. A realistic startup sequence flips it from a spawned task:

1
let ready_flag = state.ready.clone();
2
tokio::spawn(async move {
3
    run_migrations().await;          // imagine real startup work
4
    warm_connection_pools().await;
5
    ready_flag.store(true, Ordering::Relaxed);
6
});

This same flag is what your graceful shutdown handler flips back to false when a SIGTERM arrives, so the load balancer drains the instance before you stop accepting connections.

Key Differences

Concern	TypeScript / Express (Node v22)	Rust / axum
Status code	`res.status(503)` — runtime string/number	`StatusCode::SERVICE_UNAVAILABLE` — a checked constant
Response shape	Any object; mismatches surface at runtime	Every branch must return the same `IntoResponse` type
Timeout	`Promise.race`; loser keeps running	`tokio::time::timeout`; the inner future is cancelled
Concurrent checks	Sequential `await`s = sum of latencies (the closest concurrent analogue, `Promise.all`, short-circuits on first reject)	`tokio::join!` runs both concurrently and waits for all — a full report
Startup flag	`let ready` captured in a closure	`Arc<AtomicBool>` shared across tasks
Missing `await`	Probe silently “passes” on a pending Promise	Won’t compile — `Future` has no `.is_ok()`

The throughline: in Node a sloppy health check is a silent liability — a forgotten await makes the probe pass unconditionally, and a wrong status code is just a typo. In Rust the compiler rejects the forgotten await and forces every response branch into a consistent, typed shape. The runtime cost of a check is also far lower (no event-loop scheduling overhead, no GC pause skewing your probe latency).

Common Pitfalls

Pitfall 1: Forgetting `.await` on a dependency check

In JavaScript, calling an async function without await yields a pending Promise, which is truthy — so a health check like if (db.ping()) ... “passes” forever. Rust catches this at compile time. This program:

1
// does not compile (error[E0599])
2
use std::time::Duration;
3

4
async fn db_ping() -> Result<(), String> {
5
    tokio::time::sleep(Duration::from_millis(1)).await;
6
    Ok(())
7
}
8

9
#[tokio::main]
10
async fn main() {
11
    // Forgot `.await` — `db_ping()` is a Future, not a Result.
12
    let healthy = db_ping().is_ok();
13
    println!("{healthy}");
14
}

produces the real error:

1
error[E0599]: no method named `is_ok` found for opaque type `impl Future<Output = Result<(), String>>` in the current scope
2
  --> src/main.rs:11:29
3
   |
4
11 |     let healthy = db_ping().is_ok();
5
   |                             ^^^^^ method not found in `impl Future<Output = Result<(), String>>`
6
   |
7
help: consider `await`ing on the `Future` and calling the method on its `Output`
8
   |
9
11 |     let healthy = db_ping().await.is_ok();
10
   |                             ++++++

The fix is exactly what the compiler suggests: db_ping().await.is_ok().

Pitfall 2: Inconsistent response types across branches

Returning a bare StatusCode from one branch and a (StatusCode, Json<_>) tuple from another does not compile, because impl IntoResponse must resolve to a single concrete type:

1
// does not compile (error[E0308])
2
use axum::{Json, http::StatusCode, response::IntoResponse};
3
use serde_json::json;
4

5
async fn readiness(ready: bool) -> impl IntoResponse {
6
    if !ready {
7
        // This branch returns a bare StatusCode...
8
        return StatusCode::SERVICE_UNAVAILABLE;
9
    }
10
    // ...but this one returns a (StatusCode, Json) tuple — different types.
11
    (StatusCode::OK, Json(json!({ "status": "ready" })))
12
}
13

14
fn main() {
15
    let _ = readiness(true);
16
}

The real message points right at the mismatch:

1
error[E0308]: mismatched types
2
  --> src/main.rs:10:5
3
   |
4
 4 | async fn readiness(ready: bool) -> impl IntoResponse {
5
   |                                    ----------------- expected `StatusCode` because of return type
6
...
7
10 |     (StatusCode::OK, Json(json!({ "status": "ready" })))
8
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `StatusCode`, found `(StatusCode, Json<Value>)`
9
   |
10
   = note: expected struct `StatusCode`
11
               found tuple `(StatusCode, Json<Value>)`

The fix: make the early return a tuple too — return (StatusCode::SERVICE_UNAVAILABLE, Json(...));. axum’s Response type erases the body, but impl IntoResponse does not, so consistency across branches is required. (Returning Response explicitly via .into_response() on each branch is the escape hatch when branches truly differ.)

Pitfall 3: Liveness that touches a dependency

The most damaging pitfall compiles fine and passes review — it is a design mistake. If /health/live runs SELECT 1, a transient database blip makes liveness fail, and the orchestrator restarts every pod simultaneously, turning a recoverable dependency outage into a full outage with a cold-start stampede. Keep liveness dependency-free; put dependency checks only in readiness.

Pitfall 4: No timeout on a check

state.db.ping().await without a timeout wrapper means a single hung connection hangs the probe. The orchestrator’s probe timeout then fires, the liveness check (if you wired it wrong) fails, and the process is killed. Always wrap dependency calls in tokio::time::timeout with a deadline shorter than the orchestrator’s probe timeout.

Best Practices

Separate the routes. Expose /health/live and /health/ready (or /healthz and /readyz if you follow Kubernetes convention). Never reuse one path for both.
Keep liveness trivial. Return 200 unconditionally, or at most check an in-process invariant (e.g., a critical background task has not panicked). No I/O.
Probe dependencies concurrently and with timeouts. Use tokio::join! plus tokio::time::timeout so one slow dependency cannot dominate or hang the probe.
Report per-dependency detail. A 503 body listing which dependency failed turns a page into a diagnosis. Skip the detail field on healthy checks (skip_serializing_if).
Use a cheap query. SELECT 1 for SQL, PING for Redis. Do not run an expensive query in a probe that the load balancer hits every few seconds.
Cache readiness for a short TTL when probe frequency is high, so a burst of probes does not become a burst of database round-trips (see Exercise 3).
Wire readiness into shutdown. Flip the ready flag to false the moment you receive SIGTERM, then sleep briefly before closing the listener, so the load balancer notices and drains you. See graceful shutdown.
Do not authenticate the liveness probe. The orchestrator that calls it usually cannot present credentials; keep these endpoints unauthenticated but bound to the internal interface, or behind the orchestrator’s network policy.

Real-World Example

A production-shaped service: it starts up asynchronously (so readiness returns 503 until ready), then serves liveness and readiness. This is the complete, runnable program behind the output shown earlier — copy it into src/main.rs of a project with the dependencies listed above.

1
use std::sync::Arc;
2
use std::sync::atomic::{AtomicBool, Ordering};
3
use std::time::Duration;
4

5
use axum::{
6
    Json, Router,
7
    extract::State,
8
    http::StatusCode,
9
    response::IntoResponse,
10
    routing::get,
11
};
12
use serde::Serialize;
13
use serde_json::json;
14

15
#[derive(Clone)]
16
struct Db;
17
impl Db {
18
    async fn ping(&self) -> Result<(), String> {
19
        tokio::time::sleep(Duration::from_millis(3)).await;
20
        Ok(())
21
    }
22
}
23

24
#[derive(Clone)]
25
struct Cache;
26
impl Cache {
27
    async fn ping(&self) -> Result<(), String> {
28
        tokio::time::sleep(Duration::from_millis(2)).await;
29
        Ok(())
30
    }
31
}
32

33
#[derive(Clone)]
34
struct AppState {
35
    db: Db,
36
    cache: Cache,
37
    ready: Arc<AtomicBool>,
38
}
39

40
#[derive(Serialize)]
41
struct CheckResult {
42
    name: &'static str,
43
    healthy: bool,
44
    #[serde(skip_serializing_if = "Option::is_none")]
45
    detail: Option<String>,
46
}
47

48
fn to_result(name: &'static str, r: Result<(), String>) -> CheckResult {
49
    match r {
50
        Ok(()) => CheckResult { name, healthy: true, detail: None },
51
        Err(e) => CheckResult { name, healthy: false, detail: Some(e) },
52
    }
53
}
54

55
async fn liveness() -> impl IntoResponse {
56
    (StatusCode::OK, Json(json!({ "status": "ok" })))
57
}
58

59
async fn readiness(State(state): State<AppState>) -> impl IntoResponse {
60
    if !state.ready.load(Ordering::Relaxed) {
61
        return (
62
            StatusCode::SERVICE_UNAVAILABLE,
63
            Json(json!({ "status": "starting", "checks": [] })),
64
        );
65
    }
66

67
    let deadline = Duration::from_secs(2);
68
    let (db_res, cache_res) = tokio::join!(
69
        tokio::time::timeout(deadline, state.db.ping()),
70
        tokio::time::timeout(deadline, state.cache.ping()),
71
    );
72

73
    let flatten = |r: Result<Result<(), String>, tokio::time::error::Elapsed>| match r {
74
        Ok(inner) => inner,
75
        Err(_) => Err("timed out".to_string()),
76
    };
77

78
    let checks = vec![
79
        to_result("database", flatten(db_res)),
80
        to_result("cache", flatten(cache_res)),
81
    ];
82
    let all_ok = checks.iter().all(|c| c.healthy);
83

84
    let code = if all_ok { StatusCode::OK } else { StatusCode::SERVICE_UNAVAILABLE };
85
    (
86
        code,
87
        Json(json!({
88
            "status": if all_ok { "ready" } else { "degraded" },
89
            "checks": checks,
90
        })),
91
    )
92
}
93

94
fn app(state: AppState) -> Router {
95
    Router::new()
96
        .route("/health/live", get(liveness))
97
        .route("/health/ready", get(readiness))
98
        .with_state(state)
99
}
100

101
#[tokio::main]
102
async fn main() {
103
    let state = AppState {
104
        db: Db,
105
        cache: Cache,
106
        ready: Arc::new(AtomicBool::new(false)),
107
    };
108

109
    // Simulate async startup work: readiness stays 503 until this finishes.
110
    let ready_flag = state.ready.clone();
111
    tokio::spawn(async move {
112
        tokio::time::sleep(Duration::from_millis(50)).await; // migrations, warm pools
113
        ready_flag.store(true, Ordering::Relaxed);
114
        println!("startup complete: now ready");
115
    });
116

117
    let listener = tokio::net::TcpListener::bind("127.0.0.1:8772").await.unwrap();
118
    println!("listening on {}", listener.local_addr().unwrap());
119
    axum::serve(listener, app(state)).await.unwrap();
120
}

This is the program whose verified output appears in the Rust Equivalent section: /health/live returns 200 {"status":"ok"}, and /health/ready returns 200 with a "ready" status once the 50 ms startup task has flipped the flag (and 503 "starting" before that). In a real service, replace Db/Cache with your sqlx::PgPool and Redis client and .ping() with SELECT 1 / PING.

Exercises

Exercise 1: Add a startup probe

Difficulty: Beginner

Objective: Distinguish “still starting up” from “running but a dependency is down” so an orchestrator with a long startup grace period treats them differently.

Instructions: Add a third endpoint, /health/startup, that returns 200 only once state.ready is true, and 503 otherwise. (Kubernetes uses a startup probe to give slow-booting apps extra time before the liveness probe takes over.) Reuse the AppState from the chapter.

Solution

1
use std::sync::Arc;
2
use std::sync::atomic::{AtomicBool, Ordering};
3

4
use axum::{
5
    Json, Router,
6
    extract::State,
7
    http::StatusCode,
8
    response::IntoResponse,
9
    routing::get,
10
};
11
use serde_json::json;
12

13
#[derive(Clone)]
14
struct AppState {
15
    ready: Arc<AtomicBool>,
16
}
17

18
async fn startup(State(state): State<AppState>) -> impl IntoResponse {
19
    if state.ready.load(Ordering::Relaxed) {
20
        (StatusCode::OK, Json(json!({ "status": "started" })))
21
    } else {
22
        (
23
            StatusCode::SERVICE_UNAVAILABLE,
24
            Json(json!({ "status": "starting" })),
25
        )
26
    }
27
}
28

29
fn app(state: AppState) -> Router {
30
    Router::new()
31
        .route("/health/startup", get(startup))
32
        .with_state(state)
33
}
34

35
#[tokio::main]
36
async fn main() {
37
    let state = AppState { ready: Arc::new(AtomicBool::new(false)) };
38

39
    let ready_flag = state.ready.clone();
40
    tokio::spawn(async move {
41
        tokio::time::sleep(std::time::Duration::from_millis(50)).await;
42
        ready_flag.store(true, Ordering::Relaxed);
43
    });
44

45
    let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap();
46
    axum::serve(listener, app(state)).await.unwrap();
47
}

Both arms return (StatusCode, Json<Value>), so impl IntoResponse resolves to one type — the same discipline as the readiness handler.

Exercise 2: A trait-based check registry

Difficulty: Intermediate

Objective: Replace hand-written per-dependency code with a reusable HealthCheck abstraction so adding a dependency is one line.

Instructions: Define a HealthCheck trait with a name(&self) -> &'static str and an async check(&self) -> Result<(), String>. Implement it for a DbCheck type. Write a generic probe function that wraps any HealthCheck in a timeout and returns (bool, Option<String>). (Native async fn in traits works directly for generic dispatch; you only need boxing or a helper crate for a heterogeneous Vec<Box<dyn HealthCheck>>.)

Solution

1
use std::time::Duration;
2

3
// Native `async fn` in traits is stable. It works directly for STATIC dispatch
4
// (generics). For a heterogeneous `Vec<Box<dyn HealthCheck>>` you would box the
5
// returned futures yourself or pull in the `trait-variant` crate.
6
trait HealthCheck {
7
    fn name(&self) -> &'static str;
8
    async fn check(&self) -> Result<(), String>;
9
}
10

11
struct DbCheck {
12
    healthy: bool,
13
}
14

15
impl HealthCheck for DbCheck {
16
    fn name(&self) -> &'static str {
17
        "database"
18
    }
19
    async fn check(&self) -> Result<(), String> {
20
        tokio::time::sleep(Duration::from_millis(5)).await; // imagine SELECT 1
21
        if self.healthy {
22
            Ok(())
23
        } else {
24
            Err("connection refused".into())
25
        }
26
    }
27
}
28

29
async fn probe<C: HealthCheck>(c: &C, deadline: Duration) -> (bool, Option<String>) {
30
    match tokio::time::timeout(deadline, c.check()).await {
31
        Ok(Ok(())) => (true, None),
32
        Ok(Err(e)) => (false, Some(e)),
33
        Err(_) => (false, Some("timed out".into())),
34
    }
35
}
36

37
#[tokio::main]
38
async fn main() {
39
    let db = DbCheck { healthy: true };
40
    let (ok, detail) = probe(&db, Duration::from_secs(2)).await;
41
    println!("{} healthy={ok} detail={detail:?}", db.name());
42
}

Running it prints database healthy=true detail=None. Flip healthy to false and you get database healthy=false detail=Some("connection refused").

Exercise 3: Cache the readiness result

Difficulty: Advanced

Objective: Stop a high-frequency probe (or a noisy load balancer) from turning into a flood of database round-trips, while keeping readiness reasonably fresh.

Instructions: Build a CachedReadiness type that stores the last result with an Instant timestamp behind a tokio::sync::Mutex. Its is_ready method takes a closure producing the fresh check; if the cached value is younger than a TTL, return it without running the closure, otherwise run the closure and update the cache. Prove that three rapid calls within the TTL only probe the dependency once.

Solution

1
use std::sync::Arc;
2
use std::time::{Duration, Instant};
3
use tokio::sync::Mutex;
4

5
#[derive(Clone)]
6
struct CachedReadiness {
7
    ttl: Duration,
8
    inner: Arc<Mutex<Option<(Instant, bool)>>>,
9
}
10

11
impl CachedReadiness {
12
    fn new(ttl: Duration) -> Self {
13
        Self { ttl, inner: Arc::new(Mutex::new(None)) }
14
    }
15

16
    /// Returns a cached value if fresh; otherwise runs `check` and caches it.
17
    async fn is_ready<F, Fut>(&self, check: F) -> bool
18
    where
19
        F: FnOnce() -> Fut,
20
        Fut: std::future::Future<Output = bool>,
21
    {
22
        let mut guard = self.inner.lock().await;
23
        if let Some((at, value)) = *guard {
24
            if at.elapsed() < self.ttl {
25
                return value;
26
            }
27
        }
28
        let fresh = check().await;
29
        *guard = Some((Instant::now(), fresh));
30
        fresh
31
    }
32
}
33

34
#[tokio::main]
35
async fn main() {
36
    let cache = CachedReadiness::new(Duration::from_secs(5));
37
    let mut calls = 0u32;
38

39
    for _ in 0..3 {
40
        let ready = cache
41
            .is_ready(|| {
42
                calls += 1; // counts real dependency probes
43
                async { true }
44
            })
45
            .await;
46
        println!("ready={ready}");
47
    }
48
    println!("dependency was actually probed {calls} time(s)");
49
}

Output:

1
ready=true
2
ready=true
3
ready=true
4
dependency was actually probed 1 time(s)

Holding the Mutex across the check().await also collapses a concurrent burst into a single probe (later callers wait for the in-flight one and then see the fresh cached value). If you would rather not serialize callers during the refresh, swap in an RwLock or a single-flight primitive — but for a check that takes a few milliseconds, the simple mutex is usually the right trade-off.

Health and Readiness Endpoints

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

Why two endpoints, not one

impl IntoResponse and tuple responses

Concurrent checks with tokio::join!

Bounding every check with a timeout

The startup gate

Key Differences

Common Pitfalls

Pitfall 1: Forgetting .await on a dependency check

Pitfall 2: Inconsistent response types across branches

Pitfall 3: Liveness that touches a dependency

Pitfall 4: No timeout on a check

Best Practices

Real-World Example

Further Reading

Exercises

Exercise 1: Add a startup probe

Exercise 2: A trait-based check registry

Exercise 3: Cache the readiness result

`impl IntoResponse` and tuple responses

Concurrent checks with `tokio::join!`

Pitfall 1: Forgetting `.await` on a dependency check