Regular Expressions with the `regex` Crate

14 min read

In JavaScript, regular expressions are a built-in language feature — /\d+/ is valid syntax and RegExp is always available. Rust has no regex literal and no regex in the standard library; instead, the community-standard regex crate provides a fast, predictable engine that you add as a dependency.

Quick Overview

The regex crate gives you Unicode-aware pattern matching that runs in linear time — it deliberately omits backtracking features (lookahead, backreferences) to guarantee it can never blow up on adversarial input. For a TypeScript/JavaScript developer the two big mental shifts are: regexes are compiled values you build once and reuse (not throwaway literals), and the engine cannot suffer catastrophic backtracking (ReDoS) because it is built on finite automata.

TypeScript/JavaScript Example

1
// JavaScript/TypeScript - regex is a language built-in
2

3
// A literal compiles when the script is parsed.
4
const DATE_RE = /(\d{4})-(\d{2})-(\d{2})/;
5

6
function parseDate(input: string): { year: string; month: string; day: string } | null {
7
  const m = DATE_RE.exec(input);
8
  if (m === null) return null;
9
  // Numbered groups via array indices, or named groups via .groups
10
  return { year: m[1], month: m[2], day: m[3] };
11
}
12

13
console.log(parseDate("2026-06-02")); // { year: '2026', month: '06', day: '02' }
14

15
// Find all matches with the /g flag
16
const emails = "ping a@b.com or c@d.org".match(/\w+@\w+\.\w+/g);
17
console.log(emails); // [ 'a@b.com', 'c@d.org' ]
18

19
// Replace with a backreference in the template
20
const masked = "a@b.com".replace(/(\w+)@(\w+\.\w+)/, "***@$2");
21
console.log(masked); // ***@b.com
22

23
// DANGER: a literal is cheap to *write*, but `new RegExp(str)` inside a
24
// hot loop recompiles every iteration — and some patterns can ReDoS.
25
const evil = /(a+)+$/; // catastrophic backtracking on "aaaa...X"

Key points:

Regex is built into the language; no import, no dependency.
A /.../ literal is compiled once; new RegExp(userInput) compiles each time it runs.
The engine is backtracking-based, so lookahead and backreferences work — but pathological patterns can hang the event loop (ReDoS).

Rust Equivalent

First add the crate to a project created with cargo new (which selects the current stable toolchain, Rust 1.96.0 on the 2024 edition, automatically):

1
cargo add regex

1
[dependencies]
2
regex = "1.12"

1
use regex::Regex;
2
use std::sync::LazyLock;
3

4
// Compile ONCE for the whole program. `LazyLock` builds the Regex the first
5
// time it is touched, then hands out the same value forever after.
6
static DATE_RE: LazyLock<Regex> =
7
    LazyLock::new(|| Regex::new(r"(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})").unwrap());
8

9
fn parse_date(input: &str) -> Option<(String, String, String)> {
10
    let caps = DATE_RE.captures(input)?;
11
    Some((
12
        caps["year"].to_string(),
13
        caps["month"].to_string(),
14
        caps["day"].to_string(),
15
    ))
16
}
17

18
fn main() {
19
    println!("{:?}", parse_date("2026-06-02"));
20
    // Some(("2026", "06", "02"))
21

22
    // Find all matches (the equivalent of the /g flag).
23
    let email_re = Regex::new(r"\w+@\w+\.\w+").unwrap();
24
    let emails: Vec<&str> = email_re
25
        .find_iter("ping a@b.com or c@d.org")
26
        .map(|m| m.as_str())
27
        .collect();
28
    println!("{emails:?}"); // ["a@b.com", "c@d.org"]
29

30
    // Replace with a captured group via $2 (or ${2} when ambiguous).
31
    let mask_re = Regex::new(r"(\w+)@(\w+\.\w+)").unwrap();
32
    let masked = mask_re.replace("a@b.com", "***@$2");
33
    println!("{masked}"); // ***@b.com
34
}

Running this prints exactly:

1
Some(("2026", "06", "02"))
2
["a@b.com", "c@d.org"]
3
***@b.com

Note: Patterns use raw string literals (r"...") so that backslashes are passed to the regex engine verbatim. Without the r, "\d" would be a Rust string-escape error. See 02-basics/01_types.md for string literal forms.

Detailed Explanation

No literal syntax. Rust has no /.../ token. A regex is an ordinary value of type Regex, constructed by parsing a pattern string with Regex::new. Parsing can fail (the pattern might be malformed), so Regex::new returns Result<Regex, regex::Error> — that is why the examples call .unwrap() on a known-good literal pattern.

Compile once, match many. Building a Regex is relatively expensive: the crate parses the pattern and constructs the finite-automata it will execute. Matching against that compiled value is cheap. The idiom is therefore to construct the Regex exactly once and reuse it. The static ... LazyLock<Regex> block does precisely that: the closure runs on first access and the resulting Regex lives for the rest of the program. LazyLock has been in the standard library since Rust 1.80, so you no longer need the once_cell or lazy_static crates for this. (See 23-ecosystem/10_useful-crates.md for where once_cell still earns its keep.)

Captures. re.captures(text) returns Option<Captures> — None when nothing matched, mirroring JavaScript’s RegExp.exec returning null. Unlike JavaScript, indexing a Captures is type-directed:

&caps[1] or &caps["year"] → &str, and panics if that group did not participate in the match.
caps.get(1) or caps.name("year") → Option<Match>, the safe form for optional groups. A Match carries .as_str(), .start(), and .end() (byte offsets).

1
use regex::Regex;
2

3
fn main() {
4
    let re = Regex::new(r"(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})").unwrap();
5
    let caps = re.captures("2026-06-02").unwrap();
6
    println!("year={} month={} day={}", &caps["year"], &caps["month"], &caps["day"]);
7
    println!("group 1 = {}", caps.get(1).unwrap().as_str());
8
}

This prints:

1
year=2026 month=06 day=02
2
group 1 = 2026

Iterating matches. Instead of a /g flag, you pick the method:

You want	Method	Yields
Does it match anywhere?	`is_match`	`bool`
First match	`find`	`Option<Match>`
All matches	`find_iter`	iterator of `Match`
First match with groups	`captures`	`Option<Captures>`
All matches with groups	`captures_iter`	iterator of `Captures`
Substitute	`replace` / `replace_all`	`Cow<str>`
Split on the pattern	`split`	iterator of `&str`

1
use regex::Regex;
2

3
fn main() {
4
    let re = Regex::new(r"(?<key>\w+)=(?<val>\d+)").unwrap();
5
    for caps in re.captures_iter("a=1 b=22 c=333") {
6
        println!("{} -> {}", &caps["key"], &caps["val"]);
7
    }
8
}

Output:

1
a -> 1
2
b -> 22
3
c -> 333

The no-backtracking guarantee. The crate is built on finite automata, not a recursive backtracking matcher. The upside is a hard worst-case bound: matching is O(m × n) in the length of the pattern and the input, with no exponential cliff. The downside is that features which require backtracking are simply not in the syntax: lookahead/lookbehind ((?=...), (?<=...)) and backreferences (\1, \k<name>). This is the single biggest behavioral difference from JavaScript’s RegExp. The classic ReDoS pattern below is harmless here — it runs in linear time:

1
use regex::Regex;
2

3
fn main() {
4
    let re = Regex::new(r"(a+)+$").unwrap();
5
    let evil = "a".repeat(40) + "X"; // the input that hangs a backtracking engine
6
    println!("catastrophic? {}", re.is_match(&evil));
7
}

Output (returns immediately, no hang):

1
catastrophic? false

The same (a+)+$ against "aaaa…X" can freeze a JavaScript engine for seconds or minutes. Rust’s engine answers instantly because it never backtracks.

Key Differences

Aspect	JavaScript/TypeScript `RegExp`	Rust `regex` crate
Availability	Built into the language	External crate (`cargo add regex`)
Literal syntax	`/pattern/flags`	None — `Regex::new(r"pattern")`
Construction cost	Hidden; recompiles with `new RegExp`	Explicit; compile once, reuse
Failure on bad pattern	Throws `SyntaxError` at runtime	`Result` from `Regex::new`
No-match result	`null` (exec) / `false` (test)	`None` / `false`
Lookahead / lookbehind	Supported	Not supported
Backreferences	Supported	Not supported
Worst-case time	Exponential (ReDoS possible)	Linear — guaranteed no ReDoS
Unicode	Per-flag (`u`, `v`)	Unicode-aware by default
Global iteration	`/g` flag + `matchAll`	`find_iter` / `captures_iter`
Case-insensitive	`/i` flag	`(?i)` inline flag

The trade-off is deliberate: by dropping the two backtracking-only features, the crate buys a guarantee you cannot get from RegExp — that an attacker who controls the input string can never make your matcher hang. When you genuinely need lookahead, backreferences, or a recursive grammar, reach for a real parser instead (see 23-ecosystem/09_parsing.md).

Common Pitfalls

Pitfall 1: Compiling the regex inside a hot function

The most common performance mistake is constructing the Regex on every call, paying the compilation cost repeatedly:

1
use regex::Regex;
2

3
fn is_valid_slug(s: &str) -> bool {
4
    // Recompiles the pattern on EVERY call — slow in a loop.
5
    let re = Regex::new(r"^[a-z0-9]+(?:-[a-z0-9]+)*$").unwrap();
6
    re.is_match(s)
7
}

This compiles and runs correctly, but in a loop over thousands of inputs it can be orders of magnitude slower than compiling once. The crate’s own documentation calls this out explicitly. The fix is a static LazyLock<Regex>:

1
use regex::Regex;
2
use std::sync::LazyLock;
3

4
fn is_valid_slug(s: &str) -> bool {
5
    static RE: LazyLock<Regex> =
6
        LazyLock::new(|| Regex::new(r"^[a-z0-9]+(?:-[a-z0-9]+)*$").unwrap());
7
    RE.is_match(s)
8
}

This is the same instinct a JavaScript developer already has — hoist a new RegExp(...) out of a loop — made structural by the type system.

Pitfall 2: Expecting lookahead or backreferences to work

Porting a JavaScript pattern that uses (?=...) or \1 does not fail to compile in Rust — it fails when Regex::new runs, because the pattern string is rejected at parse time. With a lookahead:

1
use regex::Regex;
2

3
fn main() {
4
    let result = Regex::new(r"foo(?=bar)"); // lookahead
5
    match result {
6
        Ok(_) => println!("compiled"),
7
        Err(e) => println!("ERROR:\n{e}"),
8
    }
9
}

The real error printed is:

1
ERROR:
2
regex parse error:
3
    foo(?=bar)
4
       ^^^
5
error: look-around, including look-ahead and look-behind, is not supported

A backreference fails the same way:

1
use regex::Regex;
2

3
fn main() {
4
    if let Err(e) = Regex::new(r"(\w+)\s\1") {
5
        println!("{e}");
6
    }
7
}

Real output:

1
regex parse error:
2
    (\w+)\s\1
3
           ^^
4
error: backreferences are not supported

Warning: Because the failure is a runtime Err, calling .unwrap() on such a pattern compiles fine and then panics at the first execution. Validate patterns you build from non-literal strings instead of unwrapping.

Pitfall 3: Panicking on an optional capture group

&caps[1] panics if group 1 did not participate in the match. When a group sits behind ? or | and might be absent, use the Option-returning form:

1
use regex::Regex;
2

3
fn main() {
4
    let re = Regex::new(r"(\d+)(?:px)?").unwrap();
5
    let caps = re.captures("16").unwrap();
6

7
    // Fine here, but `&caps[2]` would panic — there is no group 2 at all.
8
    // The safe pattern for optional groups:
9
    match caps.get(1) {
10
        Some(m) => println!("number = {}", m.as_str()),
11
        None => println!("no number"),
12
    }
13
}

Pitfall 4: Forgetting raw strings

Writing Regex::new("\d+") is not a regex bug — it is a Rust string bug. \d is not a valid Rust escape, so the file will not compile (unknown character escape: d). Always use a raw string literal r"\d+" for patterns. For a pattern that itself contains a quote, use r#"..."#.

Best Practices

Compile once. Store each pattern in a static LazyLock<Regex> (or build it once at startup and pass it around). Never compile inside a hot path.
Always use raw strings. r"..." keeps backslashes literal; r#"..."# when the pattern contains ".
Prefer named captures. (?<name>...) and &caps["name"] survive refactors that reorder groups; numbered indices do not.
Use non-capturing groups (?:...) when you only need grouping, not extraction — it is clearer and slightly cheaper.
Reach for the right method. Use is_match when you only need a yes/no answer; it can stop at the first match and skip building Captures.
Validate untrusted patterns. If a pattern comes from user input or config, handle the Result from Regex::new rather than .unwrap(). Optionally cap input or pattern size with RegexBuilder::size_limit.
Don’t reach for regex when you need a grammar. Nested or recursive structure (JSON, source code, balanced brackets) is a parser’s job — see 23-ecosystem/09_parsing.md.
Use the bytes API for non-UTF-8 data. regex::bytes::Regex matches over &[u8] when the input is not guaranteed valid UTF-8.

Real-World Example

Parsing an NGINX/Apache-style access log line — a job you might do with String.prototype.match in Node — using a single compiled, commented ((?x) verbose-mode) pattern and named captures:

1
use regex::Regex;
2
use std::sync::LazyLock;
3

4
#[derive(Debug)]
5
struct LogLine {
6
    ip: String,
7
    method: String,
8
    path: String,
9
    status: u16,
10
}
11

12
// `(?x)` enables verbose mode: insignificant whitespace and `#` comments are
13
// ignored, so a complex pattern stays readable.
14
static LOG_RE: LazyLock<Regex> = LazyLock::new(|| {
15
    Regex::new(
16
        r#"(?x)
17
        ^(?<ip>\d{1,3}(?:\.\d{1,3}){3})   # client IP
18
        \s-\s-\s
19
        \[[^\]]+\]\s                       # timestamp (ignored)
20
        "(?<method>[A-Z]+)\s
21
        (?<path>\S+)\s
22
        HTTP/[\d.]+"\s
23
        (?<status>\d{3})                   # status code
24
        "#,
25
    )
26
    .unwrap()
27
});
28

29
fn parse_line(line: &str) -> Option<LogLine> {
30
    let caps = LOG_RE.captures(line)?;
31
    Some(LogLine {
32
        ip: caps["ip"].to_string(),
33
        method: caps["method"].to_string(),
34
        path: caps["path"].to_string(),
35
        status: caps["status"].parse().ok()?, // group is \d{3}, parse cannot overflow u16
36
    })
37
}
38

39
fn main() {
40
    let line = r#"192.168.1.10 - - [02/Jun/2026:09:15:32 +0000] "GET /api/users HTTP/1.1" 200"#;
41
    match parse_line(line) {
42
        Some(entry) => println!("{entry:?}"),
43
        None => println!("no match"),
44
    }
45
}

Real output:

1
LogLine { ip: "192.168.1.10", method: "GET", path: "/api/users", status: 200 }

Two things to notice. The pattern is compiled exactly once for the whole process, so parsing a million log lines pays the build cost a single time. And the named groups feed straight into typed struct fields — status is parsed into a u16, turning a stringly-typed regex match into structured data the rest of your program can rely on (see 06-data-structures/00_structs.md).

Exercises

Exercise 1: Hex color validator

Difficulty: Beginner

Objective: Practice compiling a pattern once and using is_match.

Instructions: Write fn is_hex_color(s: &str) -> bool that returns true for strings like #fff or #ffaa00 (a # followed by exactly 3 or 6 hexadecimal digits) and false otherwise. Compile the Regex exactly once with LazyLock. Verify that #ggg is rejected.

Solution

1
use regex::Regex;
2
use std::sync::LazyLock;
3

4
static HEX_RE: LazyLock<Regex> =
5
    LazyLock::new(|| Regex::new(r"^#(?:[0-9a-fA-F]{3}|[0-9a-fA-F]{6})$").unwrap());
6

7
fn is_hex_color(s: &str) -> bool {
8
    HEX_RE.is_match(s)
9
}
10

11
fn main() {
12
    println!("{}", is_hex_color("#fff"));     // true
13
    println!("{}", is_hex_color("#ffaa00"));  // true
14
    println!("{}", is_hex_color("#ggg"));     // false
15
}

Output:

1
true
2
true
3
false

The ^...$ anchors prevent partial matches, and (?:...) groups the two alternatives without creating a capture group.

Exercise 2: Extract unique hashtags

Difficulty: Intermediate

Objective: Use captures_iter and a numbered capture group, and deduplicate while preserving order.

Instructions: Write fn hashtags(text: &str) -> Vec<String> that finds every #word token, lowercases it, and returns the tags in first-seen order with duplicates removed. For input "Loving #Rust and #rust and #WASM" the result should be ["rust", "wasm"].

Solution

1
use regex::Regex;
2
use std::collections::HashSet;
3
use std::sync::LazyLock;
4

5
fn hashtags(text: &str) -> Vec<String> {
6
    static RE: LazyLock<Regex> = LazyLock::new(|| Regex::new(r"#(\w+)").unwrap());
7
    let mut seen = HashSet::new();
8
    let mut out = Vec::new();
9
    for caps in RE.captures_iter(text) {
10
        let tag = caps[1].to_lowercase();
11
        if seen.insert(tag.clone()) {
12
            out.push(tag);
13
        }
14
    }
15
    out
16
}
17

18
fn main() {
19
    println!("{:?}", hashtags("Loving #Rust and #rust and #WASM"));
20
}

Output:

1
["rust", "wasm"]

HashSet::insert returns false when the value was already present, which is exactly the “skip duplicates” test. The Vec keeps insertion order.

Exercise 3: Redact phone numbers with a replacement closure

Difficulty: Advanced

Objective: Use replace_all with a closure that inspects each match, rather than a static template string.

Instructions: Write fn redact(text: &str) -> String that finds US-style phone numbers of the form NNN-NNN-NNNN (anchored on word boundaries) and rewrites each to keep only the area code, replacing the rest with asterisks — e.g. 415-555-0199 becomes 415-***-****. Pass a closure to replace_all so you can build the replacement from the captured area code.

Solution

1
use regex::{Captures, Regex};
2
use std::sync::LazyLock;
3

4
fn redact(text: &str) -> String {
5
    static RE: LazyLock<Regex> =
6
        LazyLock::new(|| Regex::new(r"\b(\d{3})-(\d{3})-(\d{4})\b").unwrap());
7
    RE.replace_all(text, |caps: &Captures| {
8
        format!("{}-***-****", &caps[1])
9
    })
10
    .into_owned()
11
}
12

13
fn main() {
14
    println!("{}", redact("Call 415-555-0199 or 212-555-0188"));
15
}

Output:

1
Call 415-***-**** or 212-***-****

replace_all accepts anything implementing the Replacer trait, including a closure Fn(&Captures) -> String. It returns a Cow<str> (borrowed when nothing matched, owned otherwise); .into_owned() yields a plain String.

Regular Expressions with the regex Crate

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

Key Differences

Common Pitfalls

Pitfall 1: Compiling the regex inside a hot function

Pitfall 2: Expecting lookahead or backreferences to work

Pitfall 3: Panicking on an optional capture group

Pitfall 4: Forgetting raw strings

Best Practices

Real-World Example

Further Reading

Exercises

Exercise 1: Hex color validator

Exercise 2: Extract unique hashtags

Exercise 3: Redact phone numbers with a replacement closure

Regular Expressions with the `regex` Crate