String Manipulation

14 min read

Splitting, trimming, replacing, searching, parsing, and building text are everyday tasks. In TypeScript these all live on the String.prototype; in Rust they are split across &str methods, the String type, and the iterator system. This page is your translation table.

Quick Overview

Most of what you do with JavaScript’s string methods (.split(), .trim(), .replace(), .toUpperCase(), parseInt, building up text) has a direct Rust counterpart on &str and String. The two big surprises for a TypeScript developer: many “transforming” methods return iterators (lazy, zero-allocation) instead of arrays, and you cannot index a string by integer (s[0] does not compile) because Rust strings are UTF-8 byte sequences, not arrays of characters.

Note: This page focuses on operations. For the foundational distinction between String (owned, growable) and &str (a borrowed view), and why slicing happens on byte boundaries, read strings.md first.

TypeScript/JavaScript Example

1
// A small log-line processor — the kind of code you write all the time.
2
function parseLogLine(line: string): { level: string; message: string } {
3
  const trimmed = line.trim();
4
  const level = trimmed.split(/\s+/)[1] ?? "UNKNOWN";
5
  const message = trimmed.replace("ERROR", "ERR");
6
  return { level, message };
7
}
8

9
// Splitting, joining, searching, casing
10
const csv = "name,age,city";
11
const fields = csv.split(",");                 // ["name", "age", "city"]
12
const rejoined = fields.join(" | ");           // "name | age | city"
13

14
const greeting = "Hello, World";
15
greeting.toUpperCase();                          // "HELLO, WORLD"
16
greeting.includes("World");                      // true
17
greeting.startsWith("Hello");                    // true
18
greeting.indexOf("o");                           // 4
19

20
// Parsing numbers
21
const n = parseInt("42", 10);                    // 42
22
const f = parseFloat("3.14");                    // 3.14
23
const bad = Number("not a number");              // NaN (no error!)
24

25
// Building strings
26
let buf = "";
27
for (const w of ["a", "b", "c"]) buf += w;       // "abc"
28
const bio = `${"Ada"} is ${36}`;                 // template literal
29

30
// Iterating characters
31
for (const ch of "héllo") console.log(ch);       // h é l l o (code points)
32
"hello".length;                                  // 5
33
"héllo".length;                                  // 5 (UTF-16 code units!)

Every method above is on the single string type, and indexing (csv[0]) returns a one-character string. JavaScript hides the underlying UTF-16 representation almost entirely — until you hit emoji or surrogate pairs.

Rust Equivalent

1
fn main() {
2
    // The same log-line processor.
3
    let line = "  2026-05-31 ERROR  Disk full on /dev/sda1  ";
4

5
    let trimmed = line.trim();                                  // &str view, no allocation
6
    let level = trimmed.split_whitespace().nth(1).unwrap_or("UNKNOWN");
7
    let message = trimmed.replace("ERROR", "ERR");              // allocates a new String
8
    println!("level={level} message={message}");
9

10
    // Splitting, joining, searching, casing
11
    let csv = "name,age,city";
12
    let fields: Vec<&str> = csv.split(',').collect();           // collect the iterator
13
    let rejoined = fields.join(" | ");
14
    println!("{rejoined}");
15

16
    let greeting = "Hello, World";
17
    println!("{}", greeting.to_uppercase());                   // "HELLO, WORLD" (new String)
18
    println!("{}", greeting.contains("World"));                // true
19
    println!("{}", greeting.starts_with("Hello"));             // true
20
    println!("{:?}", greeting.find('o'));                      // Some(4)  (byte index)
21

22
    // Parsing numbers — returns Result, never silently NaN
23
    let n: i32 = "42".parse().unwrap();                        // 42
24
    let f: f64 = "3.14".parse().unwrap();                      // 3.14
25
    let bad: Result<i32, _> = "not a number".parse();          // Err(...)
26
    println!("{n} {f} {}", bad.is_err());
27

28
    // Building strings
29
    let mut buf = String::new();
30
    for w in ["a", "b", "c"] { buf.push_str(w); }              // "abc"
31
    let bio = format!("{} is {}", "Ada", 36);
32
    println!("{buf} / {bio}");
33

34
    // Iterating characters (Unicode scalar values, not bytes)
35
    for ch in "héllo".chars() { print!("{ch} "); }
36
    println!();
37
    println!("{}", "héllo".len());                             // 6 (BYTES, not chars!)
38
    println!("{}", "héllo".chars().count());                   // 5 (characters)
39
}

1
level=ERROR message=2026-05-31 ERR  Disk full on /dev/sda1
2
name | age | city
3
HELLO, WORLD
4
true
5
true
6
Some(4)
7
42 3.14 true
8
abc / Ada is 36
9
h é l l o
10
6
11
5

Detailed Explanation

`trim` returns a borrowed slice, `replace` allocates

line.trim() returns a &str that points into the original line — no new memory is allocated, it just narrows the start/end. By contrast trimmed.replace("ERROR", "ERR") must build a brand-new String, because the result has different contents than the input. This split is a recurring theme: methods that only narrow or scan return borrowed &str; methods that produce different text return an owned String. TypeScript hides this distinction because every string is heap-allocated and immutable.

Splitting produces a lazy iterator, not an array

csv.split(',') does not give you a Vec. It returns a Split iterator that yields each piece on demand. That is why we wrote let fields: Vec<&str> = csv.split(',').collect(); — .collect() runs the iterator and gathers the results. If you only need to walk the pieces once, skip the Vec entirely and iterate directly:

1
fn main() {
2
    for field in "name,age,city".split(',') {
3
        println!("- {field}");
4
    }
5
}

This laziness mirrors how Rust treats all sequence transformations; see iterators.md.

`find` returns a byte index wrapped in `Option`

JavaScript’s indexOf returns -1 when not found. Rust’s find returns Option<usize> — Some(index) or None — so the “not found” case is part of the type and the compiler forces you to handle it. The number it returns is a byte offset, which matters for non-ASCII text (more below).

`parse` returns `Result`, never a silent `NaN`

"not a number" in JavaScript yields NaN, a value that silently poisons later arithmetic. Rust’s .parse() returns Result<T, E> and you must deal with the error. The target type drives the parsing: let n: i32 = "42".parse()... or the turbofish form "42".parse::<i32>(). Error handling for Result is covered in ../08-error-handling/README.md.

`chars()` vs `bytes()` vs `len()`

A Rust string is UTF-8 bytes. "héllo".len() is 6 because é takes two bytes. To count characters (Unicode scalar values) use .chars().count(), which is 5. JavaScript’s .length counts UTF-16 code units, so it gives 5 here but 2 for a single emoji like "\u{1F600}".length (a surrogate pair). None of the three languages’ counts agree in general — be explicit about whether you want bytes or characters.

Key Differences

Task	TypeScript/JavaScript	Rust
Split	`s.split(",")` → array	`s.split(',')` → lazy iterator (`.collect()` for `Vec`)
Split on whitespace	`s.split(/\s+/)` (regex)	`s.split_whitespace()` (built-in, no regex)
Split into lines	`s.split("\n")`	`s.lines()` (handles `\n` and `\r\n`)
Trim	`s.trim()`	`s.trim()` → borrowed `&str`
Replace all	`s.replaceAll("a","b")`	`s.replace("a", "b")` (all by default)
Replace first	`s.replace("a","b")`	`s.replacen("a", "b", 1)`
Uppercase	`s.toUpperCase()`	`s.to_uppercase()` → new `String`
Contains	`s.includes(x)`	`s.contains(x)`
Index of	`s.indexOf(x)` → `-1` if absent	`s.find(x)` → `Option<usize>` (byte index)
Parse int	`parseInt(s)` → `NaN` on fail	`s.parse::<i32>()` → `Result`
Char access	`s[0]` → 1-char string	`s.chars().nth(0)` → `Option<char>`
Length	`s.length` (UTF-16 units)	`s.len()` (bytes) / `.chars().count()` (chars)
Concatenate	`a + b`, `${a}${b}`	`format!("{a}{b}")`, `a.push_str(b)`, `a + &b`
Join	`arr.join(", ")`	`arr.join(", ")`
Repeat	`s.repeat(3)`	`s.repeat(3)`

Tip: When a JavaScript method returns an array (split, match), the Rust analog almost always returns an iterator. Add .collect::<Vec<_>>() only when you actually need a collection.

Why no integer indexing?

s[0] is rejected at compile time in Rust. UTF-8 means byte 0 might be only half of a character, so returning “the first character” by byte index would be a footgun. Rust makes you choose your intent: .chars().next() for the first character, .bytes().next() for the first byte, or &s[0..n] for a byte-range slice (which panics if the range falls inside a character).

Common Pitfalls

Pitfall 1: Trying to index a string by integer

1
fn main() {
2
    let s = String::from("hello");
3
    let first = s[0]; // does not compile (error E0277: `str` cannot be indexed by `{integer}`)
4
    println!("{first}");
5
}

Real compiler error:

1
error[E0277]: the type `str` cannot be indexed by `{integer}`
2
 --> src/main.rs:3:19
3
  |
4
3 |     let first = s[0];
5
  |                   ^ string indices are ranges of `usize`
6
  |
7
  = help: the trait `SliceIndex<str>` is not implemented for `{integer}`
8
  = note: you can use `.chars().nth()` or `.bytes().nth()`

Fix: Use s.chars().next() for the first character (returns Option<char>), or &s[0..1] for a byte slice when you know the boundary is safe.

Pitfall 2: `parse` without a target type

1
fn main() {
2
    let n = "42".parse().unwrap(); // does not compile (error E0284: type annotations needed)
3
    println!("{n}");
4
}

Real compiler error:

1
error[E0284]: type annotations needed
2
 --> src/main.rs:2:9
3
  |
4
2 |     let n = "42".parse().unwrap();
5
  |         ^        ----- type must be known at this point
6
  |
7
help: consider giving `n` an explicit type
8
  |
9
2 |     let n: /* Type */ = "42".parse().unwrap();
10
  |          ++++++++++++

parse is generic over any type that implements FromStr, so the compiler cannot guess. Fix: annotate the binding (let n: i32 = ...) or use the turbofish ("42".parse::<i32>()).

Pitfall 3: Slicing across a character boundary

Byte-range slicing compiles fine but panics at runtime if the boundary cuts a multi-byte character in half:

1
fn main() {
2
    let s = "héllo";       // é is 2 bytes, so bytes are: h(0) é(1..3) l(3) l(4) o(5)
3
    let sub = &s[0..2];    // panics at runtime: byte 2 is inside 'é'
4
    println!("{sub}");
5
}

Real runtime output:

1
thread 'main' panicked at src/main.rs:3:17:
2
byte index 2 is not a char boundary; it is inside 'é' (bytes 1..3) of `héllo`

Fix: iterate with .chars() / .char_indices(), or use .get(0..2) which returns Option<&str> (None instead of panicking).

Pitfall 4: Using a `String` after `+` consumes it

The + operator takes the left operand by value (it moves it):

1
fn main() {
2
    let a = String::from("foo");
3
    let b = String::from("bar");
4
    let c = a + &b;          // `a` is moved into the result
5
    println!("{a} {c}");     // does not compile (error E0382: borrow of moved value: `a`)
6
}

Real compiler error (abridged):

1
error[E0382]: borrow of moved value: `a`
2
 --> src/main.rs:5:16
3
  |
4
4 |     let c = a + &b;
5
  |             - value moved here
6
5 |     println!("{a} {c}");
7
  |                ^ value borrowed here after move
8
help: consider cloning the value if the performance cost is acceptable

Fix: prefer format!("{a}{b}") (borrows both, clearer intent), or clone if you genuinely need a afterward. Note the asymmetry: with a + &b, the left side must be an owned String and the right side must be a &str.

Pitfall 5: Expecting `split` to return a `Vec`

let parts = "a,b".split(','); gives you an iterator, not a Vec. Calling .len() on it fails because iterators have no length. Fix: add .collect::<Vec<_>>() when you need indexing or a length, or call .count() to consume it just for the number of items.

Best Practices

Prefer &str parameters. A function that reads text should take &str, not &String — it accepts both String and string literals via deref coercion:

1
fn shout(s: &str) -> String { s.to_uppercase() }
2

3
fn main() {
4
    let owned = String::from("hi");
5
    println!("{}", shout(&owned)); // String coerces to &str
6
    println!("{}", shout("there")); // literal works too
7
}

Use format! for readable concatenation, push_str/push in hot loops, and String::with_capacity(n) when you know the rough output size to avoid reallocations (see collection-performance.md).
Reach for the right splitter: split_whitespace() instead of a regex for whitespace; lines() instead of split('\n') so you get \r\n handling for free; split_once(d) when you expect exactly one delimiter (returns Option<(&str, &str)>).
Build strings with iterators + collect() for transformations: text.chars().filter(..).collect::<String>() is idiomatic and often faster than manual push loops.
Be deliberate about bytes vs chars. Use .bytes() / .len() for protocol/encoding work, .chars() / .chars().count() for user-facing text.
Use strip_prefix / strip_suffix (returning Option<&str>) instead of manual slicing to safely peel known prefixes like "https://".

Real-World Example

A URL-slug generator — the kind of helper a web backend uses to turn an article title into a clean path segment. It exercises trimming, casing, character classification, filtering, splitting, and joining.

1
/// Turn an arbitrary title into a lowercase, hyphen-separated slug.
2
/// Non-alphanumeric runs collapse to a single hyphen; edges are trimmed.
3
fn slugify(title: &str) -> String {
4
    title
5
        .trim()
6
        .to_lowercase()
7
        .chars()
8
        .map(|c| if c.is_alphanumeric() { c } else { '-' })
9
        .collect::<String>()       // intermediate: "hello--world---rust---ts"
10
        .split('-')                // split on the hyphens we inserted
11
        .filter(|piece| !piece.is_empty()) // drop the empty runs
12
        .collect::<Vec<_>>()
13
        .join("-")                 // re-join with single hyphens
14
}
15

16
fn main() {
17
    println!("{}", slugify("  Hello, World!  Rust & TS  "));
18
    println!("{}", slugify("Café del Mar"));
19
}

1
hello-world-rust-ts
2
café-del-mar

Notice how .is_alphanumeric() is Unicode-aware: é survives the filter and ends up in the slug, exactly as a TypeScript Intl-aware implementation would (and unlike a naive [a-z0-9] regex, which would strip it).

Exercises

Exercise 1: Normalize whitespace

Difficulty: Easy
Objective: Practice splitting and joining.
Instructions: Write fn normalize_whitespace(input: &str) -> String that collapses every run of whitespace into a single space and trims the ends. " the quick \n brown fox " should become "the quick brown fox".

1
fn normalize_whitespace(input: &str) -> String {
2
    // TODO: split on whitespace and re-join with single spaces
3
    todo!()
4
}
5

6
fn main() {
7
    assert_eq!(normalize_whitespace("  the   quick \n brown  fox "), "the quick brown fox");
8
    println!("ok");
9
}

Solution

1
fn normalize_whitespace(input: &str) -> String {
2
    input.split_whitespace().collect::<Vec<_>>().join(" ")
3
}
4

5
fn main() {
6
    assert_eq!(normalize_whitespace("  the   quick \n brown  fox "), "the quick brown fox");
7
    println!("ok");
8
}

split_whitespace() already drops empty pieces and handles all Unicode whitespace, so a single join(" ") finishes the job. Verified output: ok.

Exercise 2: Case-insensitive word frequency

Difficulty: Medium
Objective: Combine character filtering, casing, splitting, and a HashMap.
Instructions: Write fn word_count(text: &str) -> HashMap<String, usize> that counts how often each word appears, ignoring case and stripping punctuation. "The cat sat. The CAT ran!" should count cat → 2, the → 2, sat → 1, ran → 1.

1
use std::collections::HashMap;
2

3
fn word_count(text: &str) -> HashMap<String, usize> {
4
    // TODO: for each whitespace-separated word, strip non-alphanumeric chars,
5
    //       lowercase it, and bump its count
6
    todo!()
7
}
8

9
fn main() {
10
    let counts = word_count("The cat sat. The CAT ran!");
11
    assert_eq!(counts.get("cat"), Some(&2));
12
    assert_eq!(counts.get("the"), Some(&2));
13
    println!("ok");
14
}

Solution

1
use std::collections::HashMap;
2

3
fn word_count(text: &str) -> HashMap<String, usize> {
4
    let mut counts = HashMap::new();
5
    for word in text.split_whitespace() {
6
        let key: String = word
7
            .chars()
8
            .filter(|c| c.is_alphanumeric())
9
            .collect::<String>()
10
            .to_lowercase();
11
        if !key.is_empty() {
12
            *counts.entry(key).or_insert(0) += 1;
13
        }
14
    }
15
    counts
16
}
17

18
fn main() {
19
    let counts = word_count("The cat sat. The CAT ran!");
20
    assert_eq!(counts.get("cat"), Some(&2));
21
    assert_eq!(counts.get("the"), Some(&2));
22
    println!("ok");
23
}

The entry(key).or_insert(0) pattern is the idiomatic “increment-or-create” — see hashmaps.md for the full entry API. Verified output: ok.

Exercise 3: Mask a credit-card number

Difficulty: Medium/Hard
Objective: Practice character classification, building strings, and safe slicing of the last N characters.
Instructions: Write fn mask_card(card: &str) -> String that keeps only the digits, then replaces every digit except the last four with *. "4111 1111 1111 1234" should become "************1234". If there are four or fewer digits, return them unmasked.

1
fn mask_card(card: &str) -> String {
2
    // TODO: extract digits, then mask all but the last 4
3
    todo!()
4
}
5

6
fn main() {
7
    assert_eq!(mask_card("4111 1111 1111 1234"), "************1234");
8
    assert_eq!(mask_card("12-34"), "1234");
9
    println!("ok");
10
}

Solution

1
fn mask_card(card: &str) -> String {
2
    let digits: String = card.chars().filter(|c| c.is_ascii_digit()).collect();
3
    let n = digits.len();
4
    if n <= 4 {
5
        return digits;
6
    }
7
    let masked = "*".repeat(n - 4);
8
    let last4 = &digits[n - 4..]; // safe: digits are ASCII (1 byte each)
9
    format!("{masked}{last4}")
10
}
11

12
fn main() {
13
    assert_eq!(mask_card("4111 1111 1111 1234"), "************1234");
14
    assert_eq!(mask_card("12-34"), "1234");
15
    println!("ok");
16
}

Because we filtered to is_ascii_digit(), every remaining character is exactly one byte, so the byte slice &digits[n - 4..] can never split a character — slicing here is safe. Verified output: ok.

String Manipulation

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

trim returns a borrowed slice, replace allocates

Splitting produces a lazy iterator, not an array

find returns a byte index wrapped in Option

parse returns Result, never a silent NaN

chars() vs bytes() vs len()

Key Differences

Why no integer indexing?

Common Pitfalls

Pitfall 1: Trying to index a string by integer

Pitfall 2: parse without a target type

Pitfall 3: Slicing across a character boundary

Pitfall 4: Using a String after + consumes it

Pitfall 5: Expecting split to return a Vec

Best Practices

Real-World Example

Further Reading

Exercises

Exercise 1: Normalize whitespace

Exercise 2: Case-insensitive word frequency

Exercise 3: Mask a credit-card number

`trim` returns a borrowed slice, `replace` allocates

`find` returns a byte index wrapped in `Option`

`parse` returns `Result`, never a silent `NaN`

`chars()` vs `bytes()` vs `len()`

Pitfall 2: `parse` without a target type

Pitfall 4: Using a `String` after `+` consumes it

Pitfall 5: Expecting `split` to return a `Vec`