Parsing: nom and pest

21 min read

When a string has real structure — nesting, recursion, balanced delimiters, or a grammar you actually want to enforce — a regex stops being the right tool. Rust’s ecosystem answers this with two mature, complementary libraries: nom, a parser-combinator library where you build parsers from small Rust functions, and pest, where you write a formal grammar in a .pest file and a derive macro generates the parser.

Quick Overview

In Node, parsing anything beyond a flat field split usually means either a pile of regular expressions or pulling in a hand-written tokenizer/AST library. Rust pushes you toward real parsers that are fast, allocation-light, and produce precise error positions. nom lets you compose parsers as ordinary functions (great when the input is byte-oriented or you want full control); pest lets you declare a PEG grammar and get a parser plus readable error messages for free. The key mental shift for a TypeScript/JavaScript developer: reach for a parser — not a longer regex — the moment your input can nest.

Note: This page is about structured text parsing. For deserializing known data formats (JSON, YAML, TOML) you almost always want serde instead — see popular-crates.md. For flat pattern matching, the regex crate is covered in regex.md.

TypeScript/JavaScript Example

A common Node task: parse a semantic version string like 1.2.3 or 10.0.0-rc.1 into a structured object. The “just use a regex” reflex looks fine until you actually need precise errors and an optional pre-release tag:

1
// semver.ts — the typical Node approach: one big regex
2
interface SemVer {
3
  major: number;
4
  minor: number;
5
  patch: number;
6
  pre?: string;
7
}
8

9
const SEMVER = /^(\d+)\.(\d+)\.(\d+)(?:-([0-9A-Za-z.]+))?$/;
10

11
function parseSemVer(input: string): SemVer {
12
  const m = SEMVER.exec(input);
13
  if (!m) {
14
    throw new Error(`invalid semver: ${input}`);
15
  }
16
  return {
17
    major: Number(m[1]),
18
    minor: Number(m[2]),
19
    patch: Number(m[3]),
20
    pre: m[4], // undefined when the group didn't match
21
  };
22
}
23

24
console.log(parseSemVer("1.2.3"));
25
// { major: 1, minor: 2, patch: 3, pre: undefined }
26
console.log(parseSemVer("10.0.0-rc.1"));
27
// { major: 10, minor: 0, patch: 0, pre: 'rc.1' }

This works, but notice what the regex doesn’t give you: it can tell you the whole string failed, but not where or why. And the moment requirements grow — build metadata (+build.5), nested optional groups, comparison operators (>=1.2.0) — the regex becomes a write-only liability. That growth pressure is exactly when a real parser earns its place.

Rust Equivalent

Here is the same semver parser built with nom as a set of composable functions. Each small parser does one job; the top-level semver parser threads them together. Add the crate first:

1
[dependencies]
2
nom = "8"

1
// cargo add nom
2
use nom::{
3
    bytes::complete::take_while1,
4
    character::complete::{char, digit1},
5
    combinator::{map_res, opt},
6
    sequence::{preceded, terminated},
7
    IResult, Parser,
8
};
9

10
#[derive(Debug, PartialEq)]
11
struct SemVer {
12
    major: u32,
13
    minor: u32,
14
    patch: u32,
15
    pre: Option<String>,
16
}
17

18
// Parse one run of ASCII digits into a u32.
19
fn u32_num(input: &str) -> IResult<&str, u32> {
20
    map_res(digit1, str::parse::<u32>).parse(input)
21
}
22

23
// Parse an optional "-rc.1" style pre-release tag (the leading '-' is consumed).
24
fn pre_release(input: &str) -> IResult<&str, String> {
25
    let (rest, s) = preceded(
26
        char('-'),
27
        take_while1(|c: char| c.is_alphanumeric() || c == '.'),
28
    )
29
    .parse(input)?;
30
    Ok((rest, s.to_string()))
31
}
32

33
fn semver(input: &str) -> IResult<&str, SemVer> {
34
    let (input, major) = terminated(u32_num, char('.')).parse(input)?;
35
    let (input, minor) = terminated(u32_num, char('.')).parse(input)?;
36
    let (input, patch) = u32_num.parse(input)?;
37
    let (input, pre) = opt(pre_release).parse(input)?;
38
    Ok((input, SemVer { major, minor, patch, pre }))
39
}
40

41
fn main() {
42
    println!("{:?}", semver("1.2.3"));
43
    println!("{:?}", semver("10.0.0-rc.1"));
44
    println!("{:?}", semver("1.2")); // incomplete: missing ".patch"
45
}

Real output from cargo run:

1
Ok(("", SemVer { major: 1, minor: 2, patch: 3, pre: None }))
2
Ok(("", SemVer { major: 10, minor: 0, patch: 0, pre: Some("rc.1") }))
3
Err(Error(Error { input: "", code: Char }))

The first element of every Ok tuple is the remaining unparsed input ("" means the whole string was consumed). The third line shows nom’s superpower over a regex: instead of a flat “no match,” it reports the exact parser (Char, the . it expected after 1.2) and the position (the input remaining at that point).

Detailed Explanation

The core nom type is IResult<I, O> — short for Result<(I, O), nom::Err<E>>. A parser is any function (or value) that takes input I and returns IResult<I, O>: on success, the leftover input plus the parsed output O; on failure, an error that carries where it stopped. This is the parser-combinator model: small parsers are values you combine into bigger ones.

digit1 matches one-or-more ASCII digits and returns the matched &str. It is a primitive parser from nom::character::complete. The complete module assumes the whole input is available (the normal case); the sibling streaming module is for incremental network parsing and returns Incomplete instead of failing at end-of-input.
map_res(parser, f) runs parser, then applies a fallible function f to the result. Here str::parse::<u32> turns the matched digits into a u32; if parsing overflowed, nom converts that into a parse error automatically.
terminated(a, b) runs a, then b, and keeps only a’s output — perfect for “a number followed by a dot, but I only care about the number.” Its siblings are preceded (keep the second), pair/separated_pair (keep both), and delimited (keep the middle).
opt(parser) makes a parser optional, returning Option<O> — exactly mirroring the optional pre field. This is the structured equivalent of the regex’s (?:-(...))? group, but it composes with everything else.
.parse(input) is the call that actually runs a parser. In nom 8 every parser implements the Parser trait, and .parse(...) is the trait method — which is why use nom::Parser; appears in the imports. (Earlier nom versions called parsers like plain functions; nom 8 unified everything behind the trait.)

The semver function itself reads top-to-bottom like the grammar it encodes: major-dot, minor-dot, patch, optional pre-release. Each ? short-circuits on failure and propagates the precise error — the same ? you already use for Result everywhere else in Rust (see section 08).

Contrast with the TypeScript regex: there, the entire structure lived in one opaque pattern string. In nom, the structure lives in ordinary, individually-testable, individually-named Rust functions. You can unit-test u32_num in isolation, reuse it in three different parsers, and the compiler type-checks that each piece produces the type the next piece expects.

pest: a grammar instead of functions

nom puts the grammar in your Rust code. pest takes the opposite approach: you write the grammar in a separate .pest file using PEG notation, and #[derive(Parser)] generates the parser at compile time. This is closer to tools like ANTLR or PEG.js that a Node developer might have used. Add both crates:

1
[dependencies]
2
pest = "2"
3
pest_derive = "2"

Put the grammar in src/ini.pest:

1
WHITESPACE = _{ " " | "\t" }
2

3
section_name = @{ (ASCII_ALPHANUMERIC | "_" | ".")+ }
4
section      = { "[" ~ section_name ~ "]" }
5

6
key   = @{ (ASCII_ALPHANUMERIC | "_")+ }
7
value = @{ (!NEWLINE ~ ANY)* }
8
pair  = { key ~ "=" ~ value }
9

10
line = _{ section | pair }
11
file = { SOI ~ (line? ~ NEWLINE)* ~ line? ~ EOI }

Then walk the parse tree in Rust to build a config map:

1
// cargo add pest pest_derive
2
use std::collections::BTreeMap;
3
use pest::Parser;
4
use pest_derive::Parser;
5

6
#[derive(Parser)]
7
#[grammar = "ini.pest"]
8
struct IniParser;
9

10
fn parse_ini(src: &str) -> BTreeMap<String, BTreeMap<String, String>> {
11
    let mut config: BTreeMap<String, BTreeMap<String, String>> = BTreeMap::new();
12
    let mut current = String::from("default");
13

14
    let file = IniParser::parse(Rule::file, src).unwrap().next().unwrap();
15
    for item in file.into_inner() {
16
        match item.as_rule() {
17
            Rule::section => {
18
                current = item.into_inner().next().unwrap().as_str().to_string();
19
            }
20
            Rule::pair => {
21
                let mut inner = item.into_inner();
22
                let key = inner.next().unwrap().as_str().to_string();
23
                let value = inner.next().unwrap().as_str().trim().to_string();
24
                config.entry(current.clone()).or_default().insert(key, value);
25
            }
26
            _ => {}
27
        }
28
    }
29
    config
30
}
31

32
fn main() {
33
    let src = "\
34
[server]
35
host = 0.0.0.0
36
port = 8080
37

38
[database]
39
url = postgres://localhost/app
40
";
41
    let config = parse_ini(src);
42
    for (section, kvs) in &config {
43
        println!("[{section}]");
44
        for (k, v) in kvs {
45
            println!("  {k} = {v}");
46
        }
47
    }
48
}

Real output:

1
[database]
2
  url = postgres://localhost/app
3
[server]
4
  host = 0.0.0.0
5
  port = 8080

A few grammar notes that map onto regex intuition:

~ is sequence (“then”), | is ordered choice (“try left, else right”), */+/? are the familiar repetition operators, and ! is negative lookahead.
@{ ... } marks an atomic rule: no implicit whitespace inside, and it produces a single token rather than child nodes — use it for terminals like identifiers and numbers.
_{ ... } marks a silent rule that matches but produces no node in the tree (here WHITESPACE and the line wrapper). WHITESPACE is special: pest inserts it automatically between tokens of non-atomic rules.
SOI/EOI are start-of-input and end-of-input anchors; including EOI forces the parser to consume the whole input, the structured analog of a regex’s ^...$.

Key Differences

Aspect	regex (`regex` crate)	nom	pest
Where the grammar lives	one pattern string	Rust functions	separate `.pest` file
Handles nesting/recursion	No (it is a regular language)	Yes	Yes
Error reporting	match / no match	parser + position	rich, line/column, “expected X”
Output	matched substrings/captures	typed Rust values directly	a tree of `Pair`s you walk
Learning curve	low	medium (combinator style)	medium (learn PEG syntax)
Best for	flat fields, validation	byte/binary, performance, fine control	readable grammars, languages, configs
Compile-time grammar check	no	type-checked Rust	pest validates the grammar at build

Tip: A useful rule of thumb: if you can describe the input as “find these fields in a line,” use regex. If you find yourself counting brackets, tracking depth, or writing (?:...) inside (?:...), switch to a parser. If the grammar is something other people will read and extend, prefer pest’s separate file; if you want maximum speed and to emit typed values straight from parsing, prefer nom.

The deepest difference from the TypeScript world is that JavaScript’s RegExp engine backtracks, which lets it fake limited nesting via recursion-ish tricks and backreferences — at the cost of catastrophic backtracking on adversarial input. Rust’s regex crate deliberately has no backtracking and no backreferences (guaranteeing linear time; see regex.md), so it cannot be abused into a half-parser. That constraint is a feature: it pushes you to a real parser exactly when you should be using one.

When to reach for a real parser over regex

Regex is a fine tool — until the input is genuinely a language. The clearest signal is nesting. Consider extracting balanced parentheses from f(g(x), h(y)). In Node:

1
const s = "f(g(x), h(y))";
2
const naive = /\(([^()]*)\)/; // can only match a non-nested group
3
console.log(JSON.stringify(s.match(naive)));
4
// ["(x)","x"]   <-- it grabbed the innermost group, not "g(x), h(y)"

The regex matched (x) and stopped; it has no concept of depth, so it cannot return the outer group’s contents. Validating arbitrary-depth balance is provably impossible with a true regular expression — it requires a stack, i.e. a parser.

The Rust regex crate makes this boundary explicit by rejecting the features people use to fake structure. Backreferences, for instance, simply do not compile:

1
// cargo add regex
2
use regex::Regex;
3

4
fn main() {
5
    // Backreferences are not supported (the crate guarantees linear-time matching).
6
    match Regex::new(r"(\w+)\s+\1") {
7
        Ok(_) => println!("compiled"),
8
        Err(e) => println!("ERROR: {e}"),
9
    }
10
}

Real output:

1
ERROR: regex parse error:
2
    (\w+)\s+\1
3
            ^^
4
error: backreferences are not supported

So in Rust the decision is sharp. Reach for a parser when any of these are true:

The input can nest (expressions, JSON-like data, balanced brackets, S-expressions).
You need a precise error position and message, not just match/no-match.
The grammar is recursive or has operator precedence (arithmetic, query languages).
You want to emit typed values as you parse, not post-process captured strings.
The pattern is becoming an unreadable wall of (?:...) groups.

Stay with regex when the input is flat — log fields, a date inside prose, a quick validation — and you mainly need to find or check substrings.

Common Pitfalls

A parser succeeding does not mean it consumed everything

This trips up nearly every newcomer. A nom parser can succeed while leaving trailing garbage in the leftover input. If you only check is_ok(), you will accept malformed input.

1
// cargo add nom
2
use nom::{
3
    character::complete::digit1,
4
    combinator::{all_consuming, map_res},
5
    IResult, Parser,
6
};
7

8
fn number(input: &str) -> IResult<&str, u32> {
9
    map_res(digit1, str::parse::<u32>).parse(input)
10
}
11

12
fn number_strict(input: &str) -> IResult<&str, u32> {
13
    all_consuming(map_res(digit1, str::parse::<u32>)).parse(input)
14
}
15

16
fn main() {
17
    println!("loose:  {:?}", number("42abc"));        // succeeds, leftover "abc"!
18
    println!("strict: {:?}", number_strict("42abc")); // fails as it should
19
    println!("ok:     {:?}", number_strict("42"));
20
}

Real output:

1
loose:  Ok(("abc", 42))
2
strict: Err(Error(Error { input: "abc", code: Eof }))
3
ok:     Ok(("", 42))

The fix is to wrap your top-level parser in all_consuming, which fails unless the entire input was used. Always do this at the entry point of a complete-input parser.

Forgetting `use nom::Parser` (the `.parse` method)

In nom 8, parsers run via the Parser trait’s .parse() method. If you forget use nom::Parser;, you get an error like no method named parse found. Bring the trait into scope — the imports in the examples above all include it.

Calling a parser binding twice without `mut`

Parser::parse takes &mut self, so a parser bound to a let variable must be mut to be reused. Prefer running combinators inline (as the examples do) so each call constructs a fresh parser, or mark reused bindings mut. The compiler error is the standard cannot borrow as mutable (error[E0596]).

pest: leaving out `EOI` lets trailing junk slip through

A pest grammar only consumes what its rules describe. Without an explicit EOI, IniParser::parse(Rule::file, ...) can match a prefix and silently ignore the rest. Anchor the top rule with SOI ~ ... ~ EOI (as in the grammar above) to require the full input.

pest: `.unwrap()` on the parse tree assumes structure that may not be there

item.into_inner().next().unwrap() assumes a child exists. When you change the grammar, these positional unwrap()s can panic on perfectly valid input. Match on as_rule() and handle the None case, or use named extraction helpers, rather than blindly indexing the tree. pest’s parse errors, by contrast, are excellent — a missing = in the INI input yields:

1
 --> 2:1
2
  |
3
2 | host 0.0.0.0
4
  | ^---
5
  |
6
  = expected EOI, section, or pair

That line/column report is something a regex will never give you, and a big reason to reach for a parser.

Best Practices

Build bottom-up and test each piece. Write and unit-test the smallest parsers (u32_num, pre_release) first, then compose. Small parsers are trivially testable in isolation — a major advantage over one giant regex.
Wrap the entry point in all_consuming (nom) or anchor with SOI/EOI (pest). Make “did not consume everything” a hard error, not a silent success.
Return typed values, not strings, from the parser. Use map/map_res (nom) or build your enum/struct while walking the tree (pest). Parsing and validation should produce the domain type directly.
Use pest’s PrattParser for operator precedence. Do not hand-roll precedence climbing; pest ships a Pratt parser (shown below). nom 8 has no built-in precedence combinator, so with nom you compose precedence by hand from alt/many0 (or pull in a dedicated helper crate).
Pick the library to fit the job, not dogma. nom shines on binary/byte protocols and hot paths; pest shines when the grammar should be human-readable and shared. Both are production-grade — nom parses Cloudflare’s traffic; pest backs many language tools.
Don’t reach for a parser when regex suffices. A 200-line grammar to extract one field from a log line is over-engineering. Match the tool to the structure.

Real-World Example

Parsing a structured access-log line into a typed record is a textbook nom job: there is light structure (quoted request, numeric fields) but no deep nesting, and you want a precise error and typed output. This is production-flavored — the kind of code you would put behind a log-ingestion pipeline.

1
// cargo add nom
2
use nom::{
3
    branch::alt,
4
    bytes::complete::{tag, take_until, take_while1},
5
    character::complete::{char, digit1, space1},
6
    combinator::{all_consuming, map, map_res},
7
    sequence::{delimited, separated_pair, terminated},
8
    IResult, Parser,
9
};
10

11
#[derive(Debug, PartialEq)]
12
enum Method {
13
    Get,
14
    Post,
15
    Put,
16
    Delete,
17
}
18

19
#[derive(Debug, PartialEq)]
20
struct LogLine<'a> {
21
    ip: &'a str,
22
    method: Method,
23
    path: &'a str,
24
    status: u16,
25
    bytes: u64,
26
}
27

28
fn ip(input: &str) -> IResult<&str, &str> {
29
    take_while1(|c: char| c.is_ascii_digit() || c == '.').parse(input)
30
}
31

32
fn method(input: &str) -> IResult<&str, Method> {
33
    alt((
34
        map(tag("GET"), |_| Method::Get),
35
        map(tag("POST"), |_| Method::Post),
36
        map(tag("PUT"), |_| Method::Put),
37
        map(tag("DELETE"), |_| Method::Delete),
38
    ))
39
    .parse(input)
40
}
41

42
fn u16_num(input: &str) -> IResult<&str, u16> {
43
    map_res(digit1, str::parse::<u16>).parse(input)
44
}
45

46
fn u64_num(input: &str) -> IResult<&str, u64> {
47
    map_res(digit1, str::parse::<u64>).parse(input)
48
}
49

50
// 127.0.0.1 - "GET /api/users" 200 1024
51
fn log_line(input: &str) -> IResult<&str, LogLine<'_>> {
52
    let (input, ip) = terminated(ip, tag(" - ")).parse(input)?;
53
    let (input, (method, path)) = delimited(
54
        char('"'),
55
        separated_pair(method, space1, take_until("\"")),
56
        char('"'),
57
    )
58
    .parse(input)?;
59
    let (input, _) = space1.parse(input)?;
60
    let (input, (status, bytes)) = separated_pair(u16_num, space1, u64_num).parse(input)?;
61
    Ok((input, LogLine { ip, method, path, status, bytes }))
62
}
63

64
fn parse_log(line: &str) -> Result<LogLine<'_>, String> {
65
    all_consuming(log_line)
66
        .parse(line)
67
        .map(|(_, parsed)| parsed)
68
        .map_err(|e| format!("invalid log line: {e}"))
69
}
70

71
fn main() {
72
    let line = r#"127.0.0.1 - "GET /api/users" 200 1024"#;
73
    match parse_log(line) {
74
        Ok(parsed) => println!("{parsed:#?}"),
75
        Err(e) => eprintln!("{e}"),
76
    }
77

78
    println!("{:?}", parse_log("garbage"));
79
}

Real output:

1
LogLine {
2
    ip: "127.0.0.1",
3
    method: Get,
4
    path: "/api/users",
5
    status: 200,
6
    bytes: 1024,
7
}
8
Err("invalid log line: Parsing Error: Error { input: \"garbage\", code: TakeWhile1 }")

Note the lifetime 'a on LogLine: the ip and path fields borrow slices directly out of the input string — no allocation, no copying. That zero-copy parsing is a defining nom strength and a big reason it is fast enough for line-rate log processing. The Method enum, meanwhile, is produced during parsing via map, so downstream code gets a real type to match on rather than a raw string.

For the pest equivalent of “structure with precedence,” here is a calculator that evaluates arithmetic with correct operator precedence and parentheses using pest’s built-in PrattParser:

1
WHITESPACE = _{ " " | "\t" }
2

3
integer     = @{ ASCII_DIGIT+ }
4
unary_minus = { "-" }
5
primary     = _{ integer | "(" ~ expr ~ ")" }
6
atom        = _{ unary_minus? ~ primary }
7

8
bin_op   = _{ add | subtract | multiply | divide }
9
    add      = { "+" }
10
    subtract = { "-" }
11
    multiply = { "*" }
12
    divide   = { "/" }
13

14
expr = { atom ~ (bin_op ~ atom)* }

1
// cargo add pest pest_derive
2
use pest::iterators::Pairs;
3
use pest::pratt_parser::{Assoc, Op, PrattParser};
4
use pest::Parser;
5
use pest_derive::Parser;
6

7
#[derive(Parser)]
8
#[grammar = "calc.pest"]
9
struct Calculator;
10

11
fn pratt() -> PrattParser<Rule> {
12
    PrattParser::new()
13
        .op(Op::infix(Rule::add, Assoc::Left) | Op::infix(Rule::subtract, Assoc::Left))
14
        .op(Op::infix(Rule::multiply, Assoc::Left) | Op::infix(Rule::divide, Assoc::Left))
15
        .op(Op::prefix(Rule::unary_minus))
16
}
17

18
fn eval(pairs: Pairs<Rule>, pratt: &PrattParser<Rule>) -> f64 {
19
    pratt
20
        .map_primary(|primary| match primary.as_rule() {
21
            Rule::integer => primary.as_str().parse::<f64>().unwrap(),
22
            Rule::expr => eval(primary.into_inner(), pratt), // parenthesized sub-expression
23
            rule => unreachable!("unexpected primary: {rule:?}"),
24
        })
25
        .map_prefix(|op, rhs| match op.as_rule() {
26
            Rule::unary_minus => -rhs,
27
            _ => unreachable!(),
28
        })
29
        .map_infix(|lhs, op, rhs| match op.as_rule() {
30
            Rule::add => lhs + rhs,
31
            Rule::subtract => lhs - rhs,
32
            Rule::multiply => lhs * rhs,
33
            Rule::divide => lhs / rhs,
34
            _ => unreachable!(),
35
        })
36
        .parse(pairs)
37
}
38

39
fn main() {
40
    let pratt = pratt();
41
    for src in ["1 + 2 * 3", "(1 + 2) * 3", "10 / 2 - 3", "-5 + 8"] {
42
        let expr = Calculator::parse(Rule::expr, src).unwrap().next().unwrap();
43
        println!("{src} = {}", eval(expr.into_inner(), &pratt));
44
    }
45
}

Real output:

1
1 + 2 * 3 = 7
2
(1 + 2) * 3 = 9
3
10 / 2 - 3 = 2
4
-5 + 8 = 3

The grammar declares what an expression is; the PrattParser configuration declares precedence (multiplication binds tighter than addition) and associativity. 1 + 2 * 3 = 7 (not 9) and (1 + 2) * 3 = 9 prove both are handled correctly — something no regex can do, because precedence-aware evaluation fundamentally requires a recursive parse tree.

Exercises

Exercise 1: Parse environment-variable lines

Difficulty: Beginner

Objective: Use nom combinators to parse a single KEY = value line into a (String, String) pair, tolerating optional whitespace around the = and trimming the value.

Instructions: Write a function env_line(input: &str) -> IResult<&str, (String, String)> that parses lines like " DB_HOST = localhost " into ("DB_HOST", "localhost") and "PORT=8080" into ("PORT", "8080"). The key is alphanumeric plus underscores; the value runs to end-of-line. Trim surrounding whitespace from the value. Print the result for both inputs.

Tip: space0 matches zero-or-more spaces; take_while1/take_while match runs of characters by predicate; delimited(space0, char('='), space0) eats the = and any spaces around it.

Solution

1
// cargo add nom
2
use nom::{
3
    bytes::complete::{take_while, take_while1},
4
    character::complete::{char, space0},
5
    sequence::delimited,
6
    IResult, Parser,
7
};
8

9
fn env_line(input: &str) -> IResult<&str, (String, String)> {
10
    let (input, _) = space0.parse(input)?;
11
    let (input, k) = take_while1(|c: char| c.is_alphanumeric() || c == '_').parse(input)?;
12
    let (input, _) = delimited(space0, char('='), space0).parse(input)?;
13
    let (input, v) = take_while(|c: char| c != '\n').parse(input)?;
14
    Ok((input, (k.to_string(), v.trim().to_string())))
15
}
16

17
fn main() {
18
    println!("{:?}", env_line("  DB_HOST = localhost  "));
19
    println!("{:?}", env_line("PORT=8080"));
20
}

Real output:

1
Ok(("", ("DB_HOST", "localhost")))
2
Ok(("", ("PORT", "8080")))

Exercise 2: Parse a human duration into seconds

Difficulty: Intermediate

Objective: Build a nom parser for durations like 1h30m15s that returns the total number of seconds, using many1 to repeat a unit parser and all_consuming to reject trailing junk.

Instructions: Write duration_secs(input: &str) -> IResult<&str, u64> where each part is a number followed by a unit suffix h (3600s), m (60s), or s (1s). Sum all parts. "1h30m15s" should yield 5415, "90m" should yield 5400, and an invalid input like "5x" should produce an error. Wrap the repetition in all_consuming so partial matches fail.

Tip: Parse one unit_part (a number plus a unit), then use many1(unit_part) to collect a Vec<u64> and .into_iter().sum() to total them.

Solution

1
// cargo add nom
2
use nom::{
3
    branch::alt,
4
    bytes::complete::tag,
5
    character::complete::digit1,
6
    combinator::{all_consuming, map, map_res},
7
    multi::many1,
8
    IResult, Parser,
9
};
10

11
fn unit_part(input: &str) -> IResult<&str, u64> {
12
    let (input, n) = map_res(digit1, str::parse::<u64>).parse(input)?;
13
    let (input, mult) = alt((
14
        map(tag("h"), |_| 3600u64),
15
        map(tag("m"), |_| 60u64),
16
        map(tag("s"), |_| 1u64),
17
    ))
18
    .parse(input)?;
19
    Ok((input, n * mult))
20
}
21

22
fn duration_secs(input: &str) -> IResult<&str, u64> {
23
    map(all_consuming(many1(unit_part)), |parts| parts.into_iter().sum()).parse(input)
24
}
25

26
fn main() {
27
    println!("{:?}", duration_secs("1h30m15s").map(|(_, s)| s));
28
    println!("{:?}", duration_secs("90m").map(|(_, s)| s));
29
    println!("is_err for \"5x\": {:?}", duration_secs("5x").is_err());
30
}

Real output:

1
Ok(5415)
2
Ok(5400)
3
is_err for "5x": true

Exercise 3: Parse CSV records with pest

Difficulty: Advanced

Objective: Write a small pest grammar for a comma-separated file and walk the parse tree to produce Vec<Vec<String>>, one inner vector per record.

Instructions: Create a grammar with field, record, and file rules anchored by SOI/EOI, where a field is any run of characters that are not a comma or newline, records are comma-separated fields, and the file is newline-separated records. Parse the input "name,age,city\nAlice,30,NYC\nBob,25,LA" and print each record’s fields as a Vec.

Tip: A field can be written as { (!("," | NEWLINE) ~ ANY)* }. Iterate file.into_inner(), keep the Rule::record pairs, and map each record’s inner pairs to as_str().to_string().

Solution

Grammar (src/csv.pest):

1
field    = { (!("," | NEWLINE) ~ ANY)* }
2
record   = { field ~ ("," ~ field)* }
3
file     = { SOI ~ record ~ (NEWLINE ~ record)* ~ EOI }

Parser (src/main.rs):

1
// cargo add pest pest_derive
2
use pest::Parser;
3
use pest_derive::Parser;
4

5
#[derive(Parser)]
6
#[grammar = "csv.pest"]
7
struct CsvParser;
8

9
fn parse_csv(input: &str) -> Vec<Vec<String>> {
10
    let file = CsvParser::parse(Rule::file, input)
11
        .expect("parse failed")
12
        .next()
13
        .unwrap();
14

15
    file.into_inner()
16
        .filter(|p| p.as_rule() == Rule::record)
17
        .map(|record| {
18
            record
19
                .into_inner()
20
                .map(|f| f.as_str().to_string())
21
                .collect()
22
        })
23
        .collect()
24
}
25

26
fn main() {
27
    let input = "name,age,city\nAlice,30,NYC\nBob,25,LA";
28
    for record in parse_csv(input) {
29
        println!("{record:?}");
30
    }
31
}

Real output:

1
["name", "age", "city"]
2
["Alice", "30", "NYC"]
3
["Bob", "25", "LA"]

Note: This minimal grammar does not handle quoted fields containing commas (e.g. "Smith, John"). Real CSV is surprisingly subtle — for production use, reach for the dedicated csv crate, which is built on serde. Hand-rolling a parser is a great learning exercise but rarely worth it when a battle-tested crate exists.

Parsing: nom and pest

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

pest: a grammar instead of functions

Key Differences

When to reach for a real parser over regex

Common Pitfalls

A parser succeeding does not mean it consumed everything

Forgetting use nom::Parser (the .parse method)

Calling a parser binding twice without mut

pest: leaving out EOI lets trailing junk slip through

pest: .unwrap() on the parse tree assumes structure that may not be there

Best Practices

Real-World Example

Further Reading

Exercises

Exercise 1: Parse environment-variable lines

Exercise 2: Parse a human duration into seconds

Exercise 3: Parse CSV records with pest

Forgetting `use nom::Parser` (the `.parse` method)

Calling a parser binding twice without `mut`

pest: leaving out `EOI` lets trailing junk slip through

pest: `.unwrap()` on the parse tree assumes structure that may not be there