Inline Assembly with `asm!`

19 min read

Rust lets you drop a few raw machine instructions directly into a function with the asm! macro, on a stable compiler, on every officially-supported architecture. This page is about the rare cases where that is the right tool, how the register-constraint syntax keeps the optimizer informed, and the safety contract you are signing when you write an unsafe { asm!(...) } block.

Quick Overview

Inline assembly is the lowest level Rust offers: you write literal CPU instructions as strings, and Rust’s asm! macro wires your Rust variables to specific registers, lets the compiler keep optimizing around the block, and refuses to compile if your operand list is malformed. It became stable in Rust 1.59 and is available without nightly on x86, x86-64, ARM, AArch64, RISC-V, and several other targets.

For a TypeScript/JavaScript developer there is no equivalent: the V8 engine never lets your code see a register, and the nearest “go fast, trust me” escape hatches are hand-written WebAssembly or raw DataView/Buffer reads — both of which the runtime still sandboxes. Rust’s asm! has no sandbox. A wrong register constraint here is undefined behavior, not a thrown exception, which is why the entire feature lives behind unsafe.

Note: You almost never need this. Reach for asm! only after intrinsics (std::arch), core::hint, and a careful look at the generated code have failed you. This page exists so that when you do need it, you write it correctly. The broader story of stepping outside the safe-Rust guarantees is in ../20-unsafe-ffi/01_unsafe-rust.md and ../20-unsafe-ffi/09_when-to-use.md.

TypeScript/JavaScript Example

JavaScript runs on a managed virtual machine. You cannot name a CPU register, you cannot emit an instruction, and you cannot read a flag. The two closest things a senior developer reaches for are (1) hand-written WebAssembly bytes when they want predictable, near-metal numeric code, and (2) a DataView over an ArrayBuffer when they want to reinterpret raw bytes. Both are bounded and checked by the runtime.

1
// metal.mts — the closest JavaScript gets to "drop to the machine"
2
// Run with: node metal.mts   (Node v22)
3

4
// There is NO inline-assembly facility in JS/TS. The lowest you can go is to
5
// ship hand-assembled WebAssembly bytes and let the engine JIT them.
6
const wasmBytes = new Uint8Array([
7
  0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, // magic + version
8
  0x01, 0x07, 0x01, 0x60, 0x02, 0x7f, 0x7f, 0x01, 0x7f, // type: (i32,i32)->i32
9
  0x03, 0x02, 0x01, 0x00, // function section
10
  0x07, 0x07, 0x01, 0x03, 0x61, 0x64, 0x64, 0x00, 0x00, // export "add"
11
  // body: local.get 0; local.get 1; i32.add; end
12
  0x0a, 0x09, 0x01, 0x07, 0x00, 0x20, 0x00, 0x20, 0x01, 0x6a, 0x0b,
13
]);
14

15
const { instance } = await WebAssembly.instantiate(wasmBytes);
16
console.log("wasm add(37, 5) =", instance.exports.add(37, 5));
17

18
// The everyday "trust me, these bytes are a little-endian u32" move:
19
const buf = new Uint8Array([0xde, 0xad, 0xbe, 0xef]);
20
const view = new DataView(buf.buffer);
21
console.log("u32le =", view.getUint32(0, true));

Running it under Node v22 prints:

1
wasm add(37, 5) = 42
2
u32le = 4022250974

Notice what the runtime guarantees: the WebAssembly module is validated before it runs, getUint32 bounds-checks the offset, and the worst case is a thrown error. You are never one typo away from corrupting memory. Rust’s asm! removes that net entirely — that is the whole point, and the whole danger.

Rust Equivalent

The simplest useful asm! block: take a value in a register, run one instruction, hand a value back. Here it is on AArch64 (Apple Silicon, ARM servers), the architecture this page was compiled and run on:

1
use std::arch::asm;
2

3
/// Add 5 to `x` using a single AArch64 `add` instruction.
4
fn add_five(x: u64) -> u64 {
5
    let result: u64;
6
    // SAFETY: this block reads `x`, writes a fresh register into `result`, and
7
    // has no memory effects, so it cannot violate any of Rust's invariants.
8
    unsafe {
9
        asm!(
10
            "add {result}, {x}, #5",
11
            x = in(reg) x,
12
            result = out(reg) result,
13
        );
14
    }
15
    result
16
}
17

18
fn main() {
19
    println!("add_five(37) = {}", add_five(37));
20
}

The exact same idea on x86-64 uses different mnemonics. {0} is a positional operand, and inout(reg) reuses one register for both input and output:

1
use std::arch::asm;
2

3
/// Add 5 to `x` using a single x86-64 `add` instruction.
4
fn add_five(x: u64) -> u64 {
5
    let mut result = x;
6
    // SAFETY: reads/writes one register, no memory effects.
7
    unsafe {
8
        asm!("add {0}, 5", inout(reg) result);
9
    }
10
    result
11
}
12

13
fn main() {
14
    println!("add_five(37) = {}", add_five(37));
15
}

Both versions, run on their respective targets, print:

1
add_five(37) = 42

Tip: Inline assembly is not portable. The instruction text is target-specific. Gate each version behind #[cfg(target_arch = "...")] (shown later) or you will get a build that only compiles on one machine. The current stable toolchain is Rust 1.96.0 on the 2024 edition; cargo new selects it automatically, and asm! needs no feature flag there.

Detailed Explanation

asm! is a macro, not a function, because it has to inspect the template string at compile time and match every {name} / {0} placeholder against an operand. Let’s read the AArch64 example line by line.

The template string

1
"add {result}, {x}, #5",

This is literal AArch64 assembly with placeholders in braces. {x} and {result} are not register names — they are names you bind below. #5 is an immediate (literal) operand in ARM syntax. Rust concatenates multiple string arguments into one program, so you can write one instruction per string for readability:

1
asm!(
2
    "lsl {tmp}, {x}, #1",   // these three strings...
3
    "lsl {result}, {x}, #3",
4
    "sub {result}, {result}, {tmp}", // ...form one assembly program
5
    // ...operands here...
6
);

Operand specifiers

After the template strings come the operands. Each one tells the compiler which Rust value fills a placeholder and how the register is used:

Specifier	Meaning
`in(reg) x`	Compiler picks a register, puts `x` in it, treats it as read-only.
`out(reg) y`	Compiler picks a register, you write into it, the value lands in `y`. Input is garbage.
`inout(reg) z`	One register: `z` goes in, the new value comes back out into `z`.
`inout(reg) a => b`	One register: input `a`, output written to a different variable `b`.
`out(reg) _`	A scratch register you clobber but don’t read back (the `_` means “discard”).
`in("eax") v`	An explicit register (`eax`) — required when an instruction hard-codes a register.
`const N`	A compile-time constant baked straight into the instruction stream.
`sym some_fn`	The symbol (address) of a Rust `fn` or `static`.

reg means “any general-purpose register the allocator likes” — this is the key to not fighting the optimizer. You let the compiler choose; it slots your asm! into its register allocation like any other code.

Why `unsafe`?

The compiler cannot read your assembly. It does not know whether "add {result}, {x}, #5" actually matches the constraints you declared, whether you trashed a register you promised not to, or whether you read past a buffer. From the borrow checker’s perspective, asm! is a black box. So the whole construct is unsafe: you are asserting the instructions honor every promise the operand list makes. This is the same contract discussed in ../20-unsafe-ffi/01_unsafe-rust.md — asm! is one of the five “unsafe superpowers.”

Options

The trailing options(...) list tells the optimizer what your block does not do, which unlocks more aggressive scheduling:

1
asm!(
2
    "add {out}, {x}, #5",
3
    x = in(reg) x,
4
    out = out(reg) out,
5
    options(pure, nomem, nostack),
6
);

nomem — the block reads/writes no memory.
nostack — it does not push/pop the stack.
pure — same inputs always give the same outputs (lets the compiler dedup/hoist it). pure requires nomem or readonly.
preserves_flags — it does not modify the condition flags.
noreturn — control never returns (then the block has no outputs).

These are promises, not requests. If you say nomem and then write memory, that is undefined behavior. When in doubt, omit them — the default (no options) is the conservative, always-correct choice.

Key Differences

Aspect	TypeScript / JavaScript	Rust `asm!`
Access to CPU registers	None — fully abstracted by the engine	Direct, named or compiler-allocated
Closest “low-level” tool	Hand-written WebAssembly, `DataView`, typed arrays	Intrinsics (`std::arch`), then `asm!` as a last resort
Failure mode	Thrown exception, sandboxed	Undefined behavior — memory corruption, no exception
Portability	WebAssembly bytes run anywhere	Instruction text is per-architecture; must `#[cfg]`-gate
Optimizer interaction	JIT owns everything	You declare constraints; LLVM schedules around the block
Safety gate	Implicit, always on	Explicit `unsafe` block, mandatory

The single most important difference: JavaScript’s low-level escape hatches are still inside the VM’s safety net, and Rust’s asm! is not. When a TypeScript developer writes value as Foo, the worst outcome is a TypeError later. When you mis-declare a register clobber in asm!, the worst outcome is silent data corruption that may surface anywhere, anytime.

Note: A common misconception is that asm! is “faster than Rust.” It is not, by default. The optimizer produces excellent code for ordinary Rust, and an opaque asm! block can actually prevent optimizations (inlining across it, constant-folding through it). Inline assembly is for instructions the compiler cannot otherwise emit — privileged instructions, special registers, exotic SIMD — not for hand-tuning arithmetic.

Common Pitfalls

Forgetting the `unsafe` block

asm! is always unsafe. This is the first wall every newcomer hits:

1
use std::arch::asm;
2

3
fn main() {
4
    let mut x: u64 = 10;
5
    asm!("add {0}, {0}, #5", inout(reg) x); // does not compile (error[E0133])
6
    println!("{x}");
7
}

The real compiler output:

1
error[E0133]: use of inline assembly is unsafe and requires unsafe block
2
 --> src/main.rs:4:5
3
  |
4
4 |     asm!("add {0}, {0}, #5", inout(reg) x);
5
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ use of inline assembly
6
  |
7
  = note: inline assembly is entirely unchecked and can cause undefined behavior

The fix is to wrap it: unsafe { asm!(...) }.

A placeholder with no matching operand

If your template references {1} but you only supplied one operand, the macro catches it at compile time:

1
use std::arch::asm;
2

3
fn main() {
4
    let x: u64;
5
    unsafe {
6
        asm!("mov {0}, {1}", out(reg) x); // does not compile
7
    }
8
    println!("{x}");
9
}

Real output:

1
error: invalid reference to argument at index 1
2
 --> src/main.rs:5:24
3
  |
4
5 |         asm!("mov {0}, {1}", out(reg) x);
5
  |                        ^^^ from here
6
  |
7
  = note: there is 1 argument

Clobbering a register without declaring it

The most dangerous mistake compiles cleanly and then corrupts your program. If your assembly writes to a register that you did not list as an output or clobber, the compiler assumes that register is untouched — it may have been holding a live value. The classic case is calling another function: a bl/call instruction clobbers all the caller-saved registers per the ABI. You must tell the compiler with clobber_abi("C"):

1
use std::arch::asm;
2

3
extern "C" fn the_answer() -> u64 { 42 }
4

5
fn sym_demo() -> u64 {
6
    let result: u64;
7
    // SAFETY: `clobber_abi("C")` declares that the call trashes the C ABI's
8
    // caller-saved registers, so the compiler will not assume they survive.
9
    unsafe {
10
        asm!(
11
            "bl {f}",
12
            f = sym the_answer,
13
            lateout("x0") result, // AArch64 returns in x0
14
            clobber_abi("C"),
15
        );
16
    }
17
    result
18
}
19

20
fn main() {
21
    println!("sym_demo() = {}", sym_demo());
22
}

Run on AArch64, this prints sym_demo() = 42. Omit the clobber_abi("C") and the program may appear to work in a small test and then break once the surrounding function gets more complex and the optimizer keeps a value in a now-clobbered register — a textbook heisenbug.

Reusing `out` when you meant `lateout`

out operands may share a register with in operands only if the compiler can prove timing is safe. When your assembly reads all its inputs before writing any output, use lateout, which lets the allocator reuse an input register for the output and produces tighter code. Using plain out everywhere is always correct but can waste a register.

Assuming AT&T vs Intel syntax

On x86/x86-64, Rust defaults to Intel syntax (add dst, src). If you paste AT&T-syntax assembly (add src, dst, %-prefixed registers) it will not assemble. Add options(att_syntax) if you truly need AT&T. ARM and AArch64 have one syntax, so this trap is x86-only.

Best Practices

Exhaust the alternatives first. Try a std::arch intrinsic, a core::hint helper, or just trusting the optimizer. asm! is the last 1%.
Always write a // SAFETY: comment above the block stating which invariants you have personally verified — the same discipline used throughout ../20-unsafe-ffi/08_safety-abstractions.md.
Prefer {name} = operands over positional {0} for anything longer than one instruction; named operands survive edits and re-orderings.
Let the allocator choose registers (reg) unless an instruction hard-requires a specific one (cpuid → EAX/EBX/ECX/EDX, shift counts → CL, etc.).
Declare every effect. Outputs, clobbered scratch registers (out(reg) _), clobber_abi for calls, and accurate options. The compiler trusts you completely; reward that trust.
Gate per architecture with #[cfg(target_arch = "...")] and provide a fallback #[cfg(not(...))] arm so the crate still builds elsewhere.
Wrap asm! in a safe function with a clear contract, so callers never touch unsafe themselves.
Verify the generated code with cargo asm, objdump, or the Compiler Explorer before trusting it in production.

Tip: For writing an entire function body in assembly (e.g. a custom calling convention, an interrupt handler, a context switch), use naked_asm! inside a #[unsafe(naked)] function instead of asm!. Naked functions became stable in Rust 1.88 and give you a function with no compiler-generated prologue/epilogue — useful for OS and embedded work (see ../26-systems-programming/README.md).

Real-World Example

A genuinely justified use of asm!: reading the CPU’s hardware cycle/tick counter with the absolute minimum overhead, for fine-grained microbenchmarking. There is no single stable, portable intrinsic that lowers to exactly one instruction here, and the counter lives in a special register, so a one-instruction asm! is the right call. We provide both x86-64 (rdtsc) and AArch64 (cntvct_el0) versions behind cfg, wrapped in one safe function.

1
use std::arch::asm;
2

3
/// Read a monotonically increasing hardware cycle/tick counter with the lowest
4
/// possible overhead. There is no portable stable intrinsic that maps to a
5
/// single instruction here, so a one-instruction `asm!` is justified.
6
///
7
/// On x86-64 this is the time-stamp counter (`rdtsc`); on AArch64 it is the
8
/// virtual count register (`cntvct_el0`). Both are reads with no memory
9
/// effects, so `nomem` + `nostack` let the optimizer schedule around them.
10
#[inline]
11
fn read_cycle_counter() -> u64 {
12
    #[cfg(target_arch = "x86_64")]
13
    {
14
        let lo: u32;
15
        let hi: u32;
16
        // SAFETY: `rdtsc` writes EAX:EDX, reads no memory, uses no stack.
17
        unsafe {
18
            asm!(
19
                "rdtsc",
20
                out("eax") lo,
21
                out("edx") hi,
22
                options(nomem, nostack),
23
            );
24
        }
25
        ((hi as u64) << 32) | (lo as u64)
26
    }
27
    #[cfg(target_arch = "aarch64")]
28
    {
29
        let ticks: u64;
30
        // SAFETY: reads a system register into one register, no memory/stack.
31
        unsafe {
32
            asm!(
33
                "mrs {ticks}, cntvct_el0",
34
                ticks = out(reg) ticks,
35
                options(nomem, nostack),
36
            );
37
        }
38
        ticks
39
    }
40
}
41

42
fn fibonacci(n: u32) -> u64 {
43
    let (mut a, mut b) = (0u64, 1u64);
44
    for _ in 0..n {
45
        (a, b) = (b, a + b);
46
    }
47
    a
48
}
49

50
fn main() {
51
    let start = read_cycle_counter();
52
    let result = fibonacci(90);
53
    let end = read_cycle_counter();
54
    println!("fib(90) = {result}");
55
    println!("elapsed ticks: {}", end.wrapping_sub(start));
56
}

Compiled and run on AArch64, one sample run printed:

1
fib(90) = 2880067194370816120
2
elapsed ticks: 14

The tick count varies run to run (it is a real hardware counter) and the units differ between architectures — rdtsc counts reference cycles, cntvct_el0 counts a fixed-frequency timer — which is exactly why this belongs in a clearly-documented, architecture-gated helper rather than scattered through your code. For production timing prefer std::time::Instant; reach for the raw counter only when you need sub-nanosecond, instruction-level resolution.

When `asm!` is genuinely the answer

Special/privileged registers and instructions: cpuid, rdtsc, mrs/msr, svc/syscall, cli/sti, wfi — things with no safe-Rust spelling.
Custom calling conventions / naked functions: context switches, interrupt entry points, bootloaders.
An exotic instruction your target has but std::arch does not expose as a stable intrinsic.
Bare-metal embedded where you must poke a specific peripheral instruction.

If your reason is “I think I can beat the optimizer at integer math,” it is almost certainly not the answer.

Exercises

Set up a probe project to check your answers: cargo new asm_exercises && cd asm_exercises. Inline assembly is target-specific, so each solution below provides an arm for x86-64 and one for AArch64. Build with cargo run on your native machine.

Exercise 1

Difficulty: Beginner

Objective: Practice the basic in/out operand syntax with a shift-and-add trick.

Instructions: Write a function times_nine(x: u64) -> u64 that computes x * 9 without using a multiply instruction. Hint: x * 9 == x + (x << 3). Implement it for your native architecture using a single instruction with reg operands and options(pure, nomem, nostack).

Solution

1
use std::arch::asm;
2

3
// AArch64: a single `add` with a shifted operand does it in one instruction.
4
#[cfg(target_arch = "aarch64")]
5
fn times_nine(x: u64) -> u64 {
6
    let out: u64;
7
    // SAFETY: pure arithmetic on registers, no memory or stack effects.
8
    unsafe {
9
        asm!(
10
            "add {out}, {x}, {x}, lsl #3", // x + (x << 3) = x*9
11
            x = in(reg) x,
12
            out = out(reg) out,
13
            options(pure, nomem, nostack),
14
        );
15
    }
16
    out
17
}
18

19
// x86-64: `lea` computes address arithmetic, perfect for x + x*8.
20
#[cfg(target_arch = "x86_64")]
21
fn times_nine(x: u64) -> u64 {
22
    let out: u64;
23
    // SAFETY: pure arithmetic on registers, no memory or stack effects.
24
    unsafe {
25
        asm!(
26
            "lea {out}, [{x} + {x}*8]", // x + x*8 = x*9
27
            x = in(reg) x,
28
            out = out(reg) out,
29
            options(pure, nomem, nostack),
30
        );
31
    }
32
    out
33
}
34

35
fn main() {
36
    println!("times_nine(6) = {}", times_nine(6));
37
}

Output (both architectures):

1
times_nine(6) = 54

Exercise 2

Difficulty: Intermediate

Objective: Use a conditional/compare instruction and learn inout.

Instructions: Write a branchless max_u64(a: u64, b: u64) -> u64 that returns the larger of the two values using a compare plus a conditional-select (AArch64 csel) or conditional-move (x86-64 cmov). Avoid any if/branch in your assembly.

Solution

1
use std::arch::asm;
2

3
#[cfg(target_arch = "aarch64")]
4
fn max_u64(a: u64, b: u64) -> u64 {
5
    let out: u64;
6
    // SAFETY: compare + conditional select on registers; no memory/stack.
7
    unsafe {
8
        asm!(
9
            "cmp {a}, {b}",
10
            "csel {out}, {a}, {b}, hs", // out = (a >= b unsigned) ? a : b
11
            a = in(reg) a,
12
            b = in(reg) b,
13
            out = out(reg) out,
14
            options(pure, nomem, nostack),
15
        );
16
    }
17
    out
18
}
19

20
#[cfg(target_arch = "x86_64")]
21
fn max_u64(a: u64, b: u64) -> u64 {
22
    let mut out = a;
23
    // SAFETY: compare + conditional move on registers; no memory/stack.
24
    unsafe {
25
        asm!(
26
            "cmp {out}, {b}",
27
            "cmovb {out}, {b}", // if out < b (unsigned), out = b
28
            out = inout(reg) out,
29
            b = in(reg) b,
30
            options(pure, nomem, nostack),
31
        );
32
    }
33
    out
34
}
35

36
fn main() {
37
    println!("max_u64(17, 42) = {}", max_u64(17, 42));
38
    println!("max_u64(99, 42) = {}", max_u64(99, 42));
39
}

Output (both architectures):

1
max_u64(17, 42) = 42
2
max_u64(99, 42) = 99

Exercise 3

Difficulty: Advanced

Objective: Drive a fixed-register instruction and wrap it in a safe, portable API.

Instructions: On x86-64, write a safe function cpu_vendor() -> Option<String> that executes cpuid with leaf 0 and assembles the 12-byte vendor string (the bytes come back as EBX, then EDX, then ECX). cpuid hard-codes its registers, so you must use explicit-register operands — and because LLVM reserves rbx, you must save and restore it yourself. On every non-x86-64 target, return None so the crate still builds. Wrap the unsafe block so callers never see it.

Solution

1
use std::arch::asm;
2

3
/// Returns the 12-byte CPU vendor string, or `None` on non-x86-64 targets.
4
fn cpu_vendor() -> Option<String> {
5
    #[cfg(target_arch = "x86_64")]
6
    {
7
        let (ebx, ecx, edx): (u32, u32, u32);
8
        // SAFETY: `cpuid` with leaf 0 only writes the four output registers and
9
        // touches no memory; we preserve rbx by saving/restoring it ourselves.
10
        unsafe {
11
            asm!(
12
                "mov {ebx_tmp:r}, rbx", // stash LLVM-reserved rbx
13
                "cpuid",
14
                "xchg {ebx_tmp:r}, rbx", // pull EBX out, restore rbx
15
                inout("eax") 0u32 => _,  // leaf 0 in EAX; EAX result discarded
16
                ebx_tmp = out(reg) ebx,
17
                out("ecx") ecx,
18
                out("edx") edx,
19
                options(nostack, preserves_flags),
20
            );
21
        }
22
        let mut v = Vec::with_capacity(12);
23
        v.extend_from_slice(&ebx.to_le_bytes());
24
        v.extend_from_slice(&edx.to_le_bytes());
25
        v.extend_from_slice(&ecx.to_le_bytes());
26
        return String::from_utf8(v).ok();
27
    }
28
    #[cfg(not(target_arch = "x86_64"))]
29
    {
30
        None
31
    }
32
}
33

34
fn main() {
35
    match cpu_vendor() {
36
        Some(v) => println!("vendor: {v}"),
37
        None => println!("vendor: <not available on this target>"),
38
    }
39
}

On an Intel x86-64 host this prints something like vendor: GenuineIntel; on AMD, vendor: AuthenticAMD. Built for aarch64, the None arm runs and prints:

1
vendor: <not available on this target>

The {ebx_tmp:r} syntax names the operand ebx_tmp and selects its 64-bit (r) register-class form. In real code you would prefer the safe std::arch::x86_64::__cpuid intrinsic, which handles the rbx dance for you — this exercise reimplements it to learn the mechanics.

Inline Assembly with asm!

Quick Overview

TypeScript/JavaScript Example

Rust Equivalent

Detailed Explanation

The template string

Operand specifiers

Why unsafe?

Options

Key Differences

Common Pitfalls

Forgetting the unsafe block

A placeholder with no matching operand

Clobbering a register without declaring it

Reusing out when you meant lateout

Assuming AT&T vs Intel syntax

Best Practices

Real-World Example

When asm! is genuinely the answer

Further Reading

Exercises

Exercise 1

Exercise 2

Exercise 3

Inline Assembly with `asm!`

Why `unsafe`?

Forgetting the `unsafe` block

Reusing `out` when you meant `lateout`

When `asm!` is genuinely the answer