Metrics and Monitoring

18 min read

Metrics are the numeric heartbeat of a production service: request rates, error counts, latency distributions, and resource saturation. In the Node ecosystem you reach for prom-client; in Rust the equivalent is the metrics facade plus a Prometheus exporter. This page shows how to instrument an Axum service, expose a /metrics endpoint, and choose which numbers to track using the RED and USE methods.

Quick Overview

A metric is a cheap, aggregatable number you update on hot paths and scrape periodically into a time-series database (usually Prometheus). The metrics crate gives you a global recorder with three instrument types — counters (monotonically increasing), gauges (go up and down), and histograms (latency/size distributions) — exactly mirroring prom-client’s Counter, Gauge, and Histogram. For a TypeScript developer, the mental model is identical; the differences are that Rust’s macros add labels with near-zero overhead and the exporter renders the same Prometheus text format your dashboards already understand.

TypeScript/JavaScript Example

A typical Express service instrumented with prom-client (the de-facto Node Prometheus library):

1
// npm install express prom-client
2
import express, { Request, Response, NextFunction } from "express";
3
import {
4
  collectDefaultMetrics,
5
  Counter,
6
  Gauge,
7
  Histogram,
8
  register,
9
} from "prom-client";
10

11
// Process-level metrics: heap, event-loop lag, CPU, GC.
12
collectDefaultMetrics();
13

14
const httpRequestsTotal = new Counter({
15
  name: "http_requests_total",
16
  help: "Total HTTP requests handled",
17
  labelNames: ["method", "path", "status"] as const,
18
});
19

20
const httpRequestsInFlight = new Gauge({
21
  name: "http_requests_in_flight",
22
  help: "Requests currently being processed",
23
});
24

25
const httpRequestDuration = new Histogram({
26
  name: "http_request_duration_seconds",
27
  help: "HTTP request latency in seconds",
28
  labelNames: ["method", "path"] as const,
29
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
30
});
31

32
const app = express();
33

34
// One middleware records the RED signals for every route.
35
app.use((req: Request, res: Response, next: NextFunction) => {
36
  const end = httpRequestDuration.startTimer();
37
  httpRequestsInFlight.inc();
38

39
  res.on("finish", () => {
40
    // req.route?.path is the *matched template* ("/users/:id"), not the raw URL.
41
    const path = req.route?.path ?? "unknown";
42
    httpRequestsInFlight.dec();
43
    httpRequestsTotal.inc({ method: req.method, path, status: String(res.statusCode) });
44
    end({ method: req.method, path });
45
  });
46

47
  next();
48
});
49

50
app.get("/users", (_req, res) => res.json([]));
51

52
// Prometheus scrapes this endpoint every 15s or so.
53
app.get("/metrics", async (_req, res) => {
54
  res.set("Content-Type", register.contentType);
55
  res.send(await register.metrics());
56
});
57

58
app.listen(3000);

Key points:

prom-client keeps a global register that every metric attaches to.
You construct metric objects up front and call .inc() / .set() / .observe() on hot paths.
/metrics returns plain text in the Prometheus exposition format.
collectDefaultMetrics() adds process/event-loop stats for free.

Rust Equivalent

The idiomatic Rust stack is the metrics facade (a lightweight global API, analogous to how log/tracing are facades) plus the metrics-exporter-prometheus recorder. The current stable toolchain is Rust 1.96.0 on the 2024 edition; cargo new selects it automatically.

1
cargo add metrics metrics-exporter-prometheus
2
cargo add axum
3
cargo add tokio --features full

1
use std::time::Instant;
2

3
use axum::{
4
    extract::{MatchedPath, Request},
5
    middleware::{self, Next},
6
    response::IntoResponse,
7
    routing::get,
8
    Router,
9
};
10
use metrics::{
11
    counter, describe_counter, describe_gauge, describe_histogram, gauge, histogram, Unit,
12
};
13
use metrics_exporter_prometheus::{Matcher, PrometheusBuilder, PrometheusHandle};
14

15
const DURATION_METRIC: &str = "http_request_duration_seconds";
16

17
// Install the global recorder and describe each metric once at startup.
18
fn setup_metrics_recorder() -> PrometheusHandle {
19
    // Latency buckets, in seconds, for RED-style dashboards.
20
    const BUCKETS: &[f64] = &[
21
        0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0,
22
    ];
23

24
    let handle = PrometheusBuilder::new()
25
        // Render this histogram as native Prometheus buckets, not a summary.
26
        .set_buckets_for_metric(Matcher::Full(DURATION_METRIC.to_string()), BUCKETS)
27
        .expect("invalid bucket configuration")
28
        .install_recorder()
29
        .expect("failed to install Prometheus recorder");
30

31
    // Help text + types, emitted as `# HELP` / `# TYPE` lines.
32
    describe_counter!("http_requests_total", "Total HTTP requests handled");
33
    describe_gauge!("http_requests_in_flight", "Requests currently in flight");
34
    describe_histogram!(DURATION_METRIC, Unit::Seconds, "HTTP request latency");
35
    handle
36
}
37

38
// One middleware records the RED signals for every route.
39
async fn track_metrics(req: Request, next: Next) -> impl IntoResponse {
40
    let start = Instant::now();
41

42
    // Use the matched route template ("/users/{id}"), never the raw path,
43
    // to keep label cardinality bounded.
44
    let path = req
45
        .extensions()
46
        .get::<MatchedPath>()
47
        .map(|p| p.as_str().to_owned())
48
        .unwrap_or_else(|| "unknown".to_owned());
49
    let method = req.method().clone();
50

51
    let in_flight = gauge!("http_requests_in_flight");
52
    in_flight.increment(1.0);
53

54
    let response = next.run(req).await;
55

56
    in_flight.decrement(1.0);
57
    let latency = start.elapsed().as_secs_f64();
58
    let status = response.status().as_u16().to_string();
59

60
    let labels = [
61
        ("method", method.to_string()),
62
        ("path", path),
63
        ("status", status),
64
    ];
65
    counter!("http_requests_total", &labels).increment(1);
66
    histogram!(DURATION_METRIC, &labels[..2]).record(latency);
67

68
    response
69
}
70

71
async fn list_users() -> &'static str {
72
    "[]"
73
}
74

75
#[tokio::main]
76
async fn main() {
77
    let recorder = setup_metrics_recorder();
78

79
    let app = Router::new()
80
        .route("/users", get(list_users))
81
        // Prometheus scrapes this; `render()` produces the exposition text.
82
        .route("/metrics", get(move || std::future::ready(recorder.render())))
83
        .layer(middleware::from_fn(track_metrics));
84

85
    let listener = tokio::net::TcpListener::bind("127.0.0.1:3000")
86
        .await
87
        .unwrap();
88
    axum::serve(listener, app).await.unwrap();
89
}

After hitting /users twice and a missing route once, curl localhost:3000/metrics returns real Prometheus text:

1
# HELP http_requests_total Total HTTP requests handled
2
# TYPE http_requests_total counter
3
http_requests_total{method="GET",path="unknown",status="404"} 1
4
http_requests_total{method="GET",path="/users",status="200"} 2
5

6
# HELP http_requests_in_flight Requests currently in flight
7
# TYPE http_requests_in_flight gauge
8
http_requests_in_flight 1
9

10
# HELP http_request_duration_seconds HTTP request latency
11
# TYPE http_request_duration_seconds histogram
12
http_request_duration_seconds_bucket{method="GET",path="unknown",le="0.005"} 1
13
http_request_duration_seconds_bucket{method="GET",path="unknown",le="0.01"} 1
14
http_request_duration_seconds_bucket{method="GET",path="unknown",le="0.025"} 1
15
http_request_duration_seconds_bucket{method="GET",path="unknown",le="0.05"} 1
16
http_request_duration_seconds_bucket{method="GET",path="unknown",le="0.1"} 1
17
http_request_duration_seconds_bucket{method="GET",path="unknown",le="0.25"} 1
18
http_request_duration_seconds_bucket{method="GET",path="unknown",le="0.5"} 1
19
http_request_duration_seconds_bucket{method="GET",path="unknown",le="1"} 1
20
http_request_duration_seconds_bucket{method="GET",path="unknown",le="2.5"} 1
21
http_request_duration_seconds_bucket{method="GET",path="unknown",le="5"} 1
22
http_request_duration_seconds_bucket{method="GET",path="unknown",le="10"} 1
23
http_request_duration_seconds_bucket{method="GET",path="unknown",le="+Inf"} 1
24
http_request_duration_seconds_sum{method="GET",path="unknown"} 0.000043833
25
http_request_duration_seconds_count{method="GET",path="unknown"} 1
26
http_request_duration_seconds_bucket{method="GET",path="/users",le="0.005"} 2
27
http_request_duration_seconds_bucket{method="GET",path="/users",le="0.01"} 2
28
http_request_duration_seconds_bucket{method="GET",path="/users",le="0.025"} 2
29
http_request_duration_seconds_bucket{method="GET",path="/users",le="0.05"} 2
30
http_request_duration_seconds_bucket{method="GET",path="/users",le="0.1"} 2
31
http_request_duration_seconds_bucket{method="GET",path="/users",le="0.25"} 2
32
http_request_duration_seconds_bucket{method="GET",path="/users",le="0.5"} 2
33
http_request_duration_seconds_bucket{method="GET",path="/users",le="1"} 2
34
http_request_duration_seconds_bucket{method="GET",path="/users",le="2.5"} 2
35
http_request_duration_seconds_bucket{method="GET",path="/users",le="5"} 2
36
http_request_duration_seconds_bucket{method="GET",path="/users",le="10"} 2
37
http_request_duration_seconds_bucket{method="GET",path="/users",le="+Inf"} 2
38
http_request_duration_seconds_sum{method="GET",path="/users"} 0.00010016699999999999
39
http_request_duration_seconds_count{method="GET",path="/users"} 2

Note: The 404 request matched no route, so its latency lands in the path="unknown" histogram series — emitted before the path="/users" series. The _sum values are real but timing-dependent; yours will differ run to run. The gauge reads 1, not 0, because the /metrics request itself is still in flight while render() runs — a small but real detail of measuring yourself.

Detailed Explanation

The facade pattern

metrics is a facade, just like the log and tracing crates. Your code calls counter!, gauge!, and histogram! against a global recorder, but the facade does not decide how those numbers are stored or exported. You install exactly one recorder at startup — here metrics-exporter-prometheus — and the macros route to it. Swap to StatsD, OTLP, or a test recorder by changing only the install line, never the call sites. In prom-client terms, the macros are the metric objects and the PrometheusHandle is the register.

Describing metrics

describe_counter!, describe_gauge!, and describe_histogram! attach help text and an optional Unit. They are the moral equivalent of the help: field you pass to a prom-client constructor, and they produce the # HELP / # TYPE comment lines in the exposition output. Unlike prom-client, describing is optional and decoupled from first use — you can record a metric before it is described, and the exporter still emits it (without help text). Calling the describe macros once in setup_metrics_recorder keeps documentation in one place.

Recording values

Each macro returns a lightweight handle:

counter!("name").increment(1) — add to a monotonic counter (u64 delta).
gauge!("name").set(x) / .increment(x) / .decrement(x) — set or adjust an f64.
histogram!("name").record(x) — observe an f64 sample into the configured buckets.

Labels are key/value pairs passed inline ("method" => "GET") or as a slice of (&str, String) tuples, as in the middleware. Note the slice syntax &labels[..2] — the histogram only needs method and path, so we reuse the first two labels and drop status.

Counters vs histograms in the wire format

A counter renders as a single _total series per label set. A histogram is special: Prometheus represents it as cumulative buckets (_bucket{le="..."}), plus a _sum and a _count. Each le (“less than or equal”) bucket counts every observation at or below that boundary, which is why the counts are non-decreasing and the last real bucket equals +Inf. From these buckets, PromQL’s histogram_quantile() estimates p50/p95/p99 latency across all your instances — something a per-instance summary cannot do.

Buckets vs summaries

By default metrics-exporter-prometheus renders a histogram as a Prometheus summary (client-side quantiles). For aggregatable latency you almost always want histograms with explicit buckets, which is exactly what set_buckets_for_metric configures. The Matcher lets you target metrics by Full name, Prefix, or Suffix, so you can apply one bucket scheme to every *_duration_seconds metric at once.

Bounding label cardinality

The single most important line in the middleware is the MatchedPath extraction. Axum’s MatchedPath is the route template (/users/{id}), not the concrete URL (/users/42). Every distinct label combination becomes its own time series; using raw paths or user IDs as labels creates unbounded series and will eventually take down Prometheus. This is the Rust equivalent of using req.route.path instead of req.url in the Node example.

Key Differences

Aspect	TypeScript (`prom-client`)	Rust (`metrics` + exporter)
Global registry	`register` singleton	The installed recorder (`PrometheusHandle`)
Define a metric	`new Counter({...})` object	`counter!("name")` macro, described separately
Label declaration	`labelNames` array, checked at runtime	Inline `k => v` pairs, no fixed schema
Recording overhead	JS object allocation + map lookup	Static key interning, atomic update
Default process metrics	`collectDefaultMetrics()` (heap, event loop)	Opt-in via `metrics-process` (CPU, RSS, FDs)
Histogram default	Real buckets	Summary unless you set buckets
Pluggable backend	Prometheus-only	Facade: Prometheus, StatsD, OTLP, test recorders
Async runtime needed	No	Only for the HTTP scrape endpoint

Pull, not push

Like prom-client, the metrics + Prometheus model is pull-based: your process holds the current values in memory, and Prometheus scrapes /metrics on its own schedule (typically every 15–60 seconds). Counters never reset on scrape; Prometheus computes rates from successive scrapes. This is the opposite of fire-and-forget StatsD/DogStatsD, where the app pushes individual events. If you must push (short-lived jobs, serverless), metrics-exporter-prometheus also supports a push gateway via .with_push_gateway(...).

RED and USE: which metrics to collect

Instrumenting is easy; choosing what to measure is the skill. Two complementary frameworks:

RED (for request-driven services — your API):
- Rate — requests per second (rate(http_requests_total[5m])).
- Errors — failed requests per second (filter status=~"5..").
- Duration — latency distribution (histogram_quantile(0.99, ...)).
USE (for resources — pools, queues, CPU, memory):
- Utilization — fraction of a resource busy (a gauge, e.g. connection-pool usage).
- Saturation — work that is queued/waiting (a gauge or counter).
- Errors — error events for that resource.

The middleware above gives you all three RED signals from one place. USE signals are gauges you update where the resource lives — a DB pool, a Tokio task queue, a Redis client. The three instrument types map cleanly: counters give you Rate and Errors, histograms give you Duration, and gauges give you Utilization and Saturation.

Common Pitfalls

Pitfall 1: Forgetting to install a recorder

If you call counter!(...) without first installing a recorder, the facade silently routes to a no-op recorder — your metrics simply never appear, with no error. There is no panic and no warning. Always call install_recorder() (or install()) once at the very start of main, before serving traffic, and verify a metric shows up at /metrics in a smoke test.

Pitfall 2: High-cardinality labels

Putting unbounded values into labels is the classic Prometheus footgun:

1
use metrics::counter;
2

3
fn handle_request(user_id: u64, raw_path: &str) {
4
    // Anti-pattern: user_id and raw_path explode cardinality.
5
    // Each unique user creates a brand-new time series that lives forever.
6
    counter!("http_requests_total",
7
        "user_id" => user_id.to_string(),
8
        "path" => raw_path.to_string(), // e.g. "/users/42", "/users/43", ...
9
    ).increment(1);
10
}

This compiles and runs — that is what makes it dangerous. Keep labels low-cardinality: HTTP method, matched route template, status code, region. Put per-user detail in logs or traces (see distributed-tracing.md), never in metric labels.

Pitfall 3: Expecting histograms to render as buckets by default

A common surprise: you record a histogram, open /metrics, and see _sum/_count with quantile="..." lines (a summary) instead of _bucket{le="..."} lines. histogram_quantile() in PromQL needs buckets. Always configure buckets with set_buckets_for_metric (or a global set_buckets) for latency metrics, as shown in the main example.

Pitfall 4: Type mismatches in the macros

The macro arguments are typed, and the compiler enforces it. Counters take an unsigned integer delta (u64); gauges and histograms take an f64. Reaching for the wrong one — say, incrementing a counter by a fraction — does not compile:

1
fn main() {
2
    metrics::counter!("jobs_total").increment(1.5); // does not compile
3
}

The real cargo check error (Rust 1.96.0) is:

1
error[E0308]: mismatched types
2
   --> src/main.rs:2:47
3
    |
4
  2 |     metrics::counter!("jobs_total").increment(1.5); // does not compile
5
    |                                     --------- ^^^ expected `u64`, found floating-point number
6
    |                                     |
7
    |                                     arguments to this method are incorrect
8
    |
9
note: method defined here
10
   --> .../metrics-0.24.6/src/handles.rs:102:12
11
    |
12
102 |     pub fn increment(&self, value: u64) {
13
    |            ^^^^^^^^^

Use counter!(...).increment(1) for whole events, and reach for a gauge! or histogram! (both f64) when you genuinely need fractional values. Note that a literal like gauge!("x").set(5) compiles fine — 5 is inferred as f64 — so the trap mainly bites when you pass an already-typed integer.

Pitfall 5: Securing the scrape endpoint

/metrics leaks operational detail (route names, error rates, internal counters). Do not expose it publicly. Bind it to an internal interface, gate it behind your service mesh / firewall, or require an auth header. See ../27-security/README.md and the production-checklist.md.

Best Practices

Name by convention. Use snake_case, a unit suffix, and _total for counters: http_requests_total, http_request_duration_seconds, db_pool_connections_in_use. Prometheus tooling assumes these conventions.
Describe once, record everywhere. Call the describe_*! macros in one startup function so help text and units stay consistent.
Centralize RED in middleware. One Tower/Axum layer instruments every route — see the main example. Do not sprinkle counter! calls into each handler.
Pick buckets that match your SLO. If your latency SLO is 200ms, ensure a bucket boundary sits near 0.2 so the dashboard can show your SLO compliance precisely.
Add process metrics. cargo add metrics-process and register a collector to get CPU, resident memory, and file-descriptor counts — the analogue of collectDefaultMetrics().
Keep cardinality bounded. Audit every label: is its set of possible values small and stable? If not, it does not belong in a metric.
Separate metrics from traces and logs. Metrics answer “how many / how fast” in aggregate; traces answer “what happened to this request.” They complement, not replace, each other.

Real-World Example

A worker that processes jobs from a queue, instrumented with both RED (per-job rate/errors/duration) and USE (pool utilization) signals, and exposing metrics on a dedicated port via the exporter’s own HTTP listener — handy for background workers that have no web framework:

1
// cargo add metrics metrics-exporter-prometheus
2
// cargo add tokio --features full
3
use std::net::SocketAddr;
4
use std::time::Instant;
5

6
use metrics::{counter, describe_counter, describe_gauge, describe_histogram, gauge, histogram, Unit};
7
use metrics_exporter_prometheus::{Matcher, PrometheusBuilder};
8

9
const JOB_DURATION: &str = "job_processing_duration_seconds";
10

11
struct Job {
12
    id: u64,
13
    kind: &'static str,
14
    will_fail: bool,
15
}
16

17
async fn process(job: &Job) -> Result<(), String> {
18
    if job.will_fail {
19
        Err(format!("job {} failed", job.id))
20
    } else {
21
        Ok(())
22
    }
23
}
24

25
// RED for jobs + USE for the worker pool, all in one place.
26
async fn run_job(job: &Job, pool_in_use: &mut usize, pool_capacity: usize) {
27
    *pool_in_use += 1;
28
    gauge!("worker_pool_utilization_ratio")
29
        .set(*pool_in_use as f64 / pool_capacity as f64);
30

31
    let start = Instant::now();
32
    let result = process(job).await;
33
    let elapsed = start.elapsed().as_secs_f64();
34

35
    let outcome = if result.is_ok() { "success" } else { "error" };
36
    counter!("jobs_processed_total", "kind" => job.kind, "outcome" => outcome).increment(1);
37
    histogram!(JOB_DURATION, "kind" => job.kind).record(elapsed);
38

39
    *pool_in_use -= 1;
40
    gauge!("worker_pool_utilization_ratio")
41
        .set(*pool_in_use as f64 / pool_capacity as f64);
42
}
43

44
#[tokio::main]
45
async fn main() {
46
    // The exporter runs its own HTTP server on :9000/metrics — no web framework needed.
47
    let addr: SocketAddr = "0.0.0.0:9000".parse().unwrap();
48
    PrometheusBuilder::new()
49
        .with_http_listener(addr)
50
        .set_buckets_for_metric(
51
            Matcher::Full(JOB_DURATION.to_string()),
52
            &[0.01, 0.05, 0.1, 0.5, 1.0, 5.0],
53
        )
54
        .expect("invalid bucket configuration")
55
        .install()
56
        .expect("failed to install Prometheus exporter");
57

58
    describe_counter!("jobs_processed_total", "Jobs processed by kind and outcome");
59
    describe_gauge!("worker_pool_utilization_ratio", "Fraction of workers busy (USE)");
60
    describe_histogram!(JOB_DURATION, Unit::Seconds, "Per-job processing time (RED)");
61

62
    let pool_capacity = 4;
63
    let mut pool_in_use = 0;
64

65
    let jobs = [
66
        Job { id: 1, kind: "email", will_fail: false },
67
        Job { id: 2, kind: "email", will_fail: true },
68
        Job { id: 3, kind: "report", will_fail: false },
69
    ];
70

71
    for job in &jobs {
72
        run_job(job, &mut pool_in_use, pool_capacity).await;
73
    }
74

75
    println!("metrics live at http://{addr}/metrics");
76
    // In a real worker you would loop forever pulling jobs; here we keep the
77
    // process alive briefly so the endpoint can be scraped.
78
    tokio::time::sleep(std::time::Duration::from_secs(1)).await;
79
}

This pattern composes with the rest of the section: pair it with graceful-shutdown.md so in-flight jobs drain cleanly, health-checks.md for liveness, and background-jobs.md for the queue itself. The corresponding Grafana dashboard would chart rate(jobs_processed_total{outcome="error"}[5m]) for the error signal and histogram_quantile(0.99, sum(rate(job_processing_duration_seconds_bucket[5m])) by (le, kind)) for p99 duration.

Exercises

Exercise 1: Instrument database query outcomes

Difficulty: Beginner

Objective: Add a counter that records database queries split by success and error, the Errors signal of RED for your data layer.

Instructions: Write a run_query(ok: bool) function that increments a db_queries_total counter with an outcome label of either "success" or "error". Install a Prometheus recorder, describe the metric, simulate two successes and one error, and print the rendered output. Confirm the output contains two series with the correct counts.

Solution

1
// cargo add metrics metrics-exporter-prometheus
2
use metrics::{counter, describe_counter};
3
use metrics_exporter_prometheus::PrometheusBuilder;
4

5
fn run_query(ok: bool) {
6
    let outcome = if ok { "success" } else { "error" };
7
    counter!("db_queries_total", "outcome" => outcome).increment(1);
8
}
9

10
fn main() {
11
    let handle = PrometheusBuilder::new()
12
        .install_recorder()
13
        .expect("failed to install recorder");
14
    describe_counter!("db_queries_total", "Database queries by outcome");
15

16
    run_query(true);
17
    run_query(true);
18
    run_query(false);
19

20
    print!("{}", handle.render());
21
}

Running this prints the real output:

1
# HELP db_queries_total Database queries by outcome
2
# TYPE db_queries_total counter
3
db_queries_total{outcome="success"} 2
4
db_queries_total{outcome="error"} 1

Note: The two series carry the correct counts (success 2, error 1), but the order of series within a metric is not guaranteed — it reflects internal hash-map iteration, so a later run may print error before success. Never depend on series ordering.

Exercise 2: A saturation gauge for a worker pool

Difficulty: Intermediate

Objective: Expose the Utilization signal of the USE method as a gauge.

Instructions: Model a fixed-size worker pool with a capacity and an in_use count. Each time a worker is acquired, set a worker_pool_utilization_ratio gauge to in_use / capacity. Acquire two workers from a pool of capacity 4 and verify the gauge reads 0.5 in the exposition output.

Solution

1
// cargo add metrics metrics-exporter-prometheus
2
use metrics::{describe_gauge, gauge};
3
use metrics_exporter_prometheus::PrometheusBuilder;
4

5
struct Pool {
6
    capacity: usize,
7
    in_use: usize,
8
}
9

10
impl Pool {
11
    fn acquire(&mut self) {
12
        self.in_use += 1;
13
        let utilization = self.in_use as f64 / self.capacity as f64;
14
        gauge!("worker_pool_utilization_ratio").set(utilization);
15
    }
16
}
17

18
fn main() {
19
    let handle = PrometheusBuilder::new()
20
        .install_recorder()
21
        .expect("failed to install recorder");
22
    describe_gauge!("worker_pool_utilization_ratio", "Fraction of workers busy");
23

24
    let mut pool = Pool { capacity: 4, in_use: 0 };
25
    pool.acquire();
26
    pool.acquire();
27

28
    print!("{}", handle.render());
29
}

Real output:

1
# HELP worker_pool_utilization_ratio Fraction of workers busy
2
# TYPE worker_pool_utilization_ratio gauge
3
worker_pool_utilization_ratio 0.5

Exercise 3: Latency histogram with SLO-aligned buckets

Difficulty: Advanced

Objective: Configure a histogram whose buckets are tuned to a 200ms SLO and verify the bucket layout.

Instructions: Build a recorder that renders request_latency_seconds as a histogram (not a summary) with buckets that include a boundary at 0.2 (your SLO). Record a few latencies straddling 200ms, render the output, and confirm you see _bucket{le="0.2"} plus _sum and _count lines. Explain in a comment why a bucket boundary at the SLO matters.

Solution

1
// cargo add metrics metrics-exporter-prometheus
2
use metrics::{describe_histogram, histogram, Unit};
3
use metrics_exporter_prometheus::{Matcher, PrometheusBuilder};
4

5
const METRIC: &str = "request_latency_seconds";
6

7
fn main() {
8
    // A boundary exactly at the 200ms SLO lets a dashboard compute
9
    // "fraction of requests under SLO" precisely from `le="0.2"`,
10
    // instead of interpolating between coarser buckets.
11
    let handle = PrometheusBuilder::new()
12
        .set_buckets_for_metric(
13
            Matcher::Full(METRIC.to_string()),
14
            &[0.05, 0.1, 0.2, 0.5, 1.0],
15
        )
16
        .expect("invalid buckets")
17
        .install_recorder()
18
        .expect("failed to install recorder");
19

20
    describe_histogram!(METRIC, Unit::Seconds, "Request latency");
21

22
    for latency in [0.04, 0.18, 0.25, 0.6] {
23
        histogram!(METRIC).record(latency);
24
    }
25

26
    print!("{}", handle.render());
27
}

Real output:

1
# HELP request_latency_seconds Request latency
2
# TYPE request_latency_seconds histogram
3
request_latency_seconds_bucket{le="0.05"} 1
4
request_latency_seconds_bucket{le="0.1"} 1
5
request_latency_seconds_bucket{le="0.2"} 2
6
request_latency_seconds_bucket{le="0.5"} 3
7
request_latency_seconds_bucket{le="1"} 4
8
request_latency_seconds_bucket{le="+Inf"} 4
9
request_latency_seconds_sum 1.0699999999999998
10
request_latency_seconds_count 4

Two of the four requests (0.04 and 0.18) are at or under the 200ms SLO, which le="0.2" reports as 2 — exactly the number you can divide by _count to get SLO compliance.

Metrics and Monitoring

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

The facade pattern

Describing metrics

Recording values

Counters vs histograms in the wire format

Buckets vs summaries

Bounding label cardinality

Key Differences

Pull, not push

RED and USE: which metrics to collect

Common Pitfalls

Pitfall 1: Forgetting to install a recorder

Pitfall 2: High-cardinality labels

Pitfall 3: Expecting histograms to render as buckets by default

Pitfall 4: Type mismatches in the macros

Pitfall 5: Securing the scrape endpoint

Best Practices

Real-World Example

Further Reading

Official Documentation

Exercises

Exercise 1: Instrument database query outcomes

Exercise 2: A saturation gauge for a worker pool

Exercise 3: Latency histogram with SLO-aligned buckets

Metrics and Monitoring

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

The facade pattern

Describing metrics

Recording values

Counters vs histograms in the wire format

Buckets vs summaries

Bounding label cardinality

Key Differences

Pull, not push

RED and USE: which metrics to collect

Common Pitfalls

Pitfall 1: Forgetting to install a recorder

Pitfall 2: High-cardinality labels

Pitfall 3: Expecting histograms to render as buckets by default

Pitfall 4: Type mismatches in the macros

Pitfall 5: Securing the scrape endpoint

Best Practices

Real-World Example

Further Reading

Official Documentation

Related Guide Sections

Exercises

Exercise 1: Instrument database query outcomes

Exercise 2: A saturation gauge for a worker pool

Exercise 3: Latency histogram with SLO-aligned buckets