7. End-to-End Pipeline Walkthrough

Let’s trace the complete journey from source code to NPU execution during a single cargo run.

7.1 Compilation Phase

graph TD
    A["Rust Kernel Source<br/>kernels/src/lib.rs"] -->|"rustc + rustc_codegen_mlir"| B["Rust MIR<br/>Type-checked, monomorphized"]
    B -->|"builder_methods.rs:<br/>MIR ops → MLIR ops"| C["MLIR Modules<br/>LLVM · Arith · CF dialects<br/>hacc.entry attribute"]
    C -->|"compile_ascend.rs:<br/>merge all modules"| D["Merged MLIR<br/>kernel code + ascend_std deps"]
    D -->|"mlir_to_cpp"| E["Generated C++<br/>AscendC class with TBuf,<br/>DataCopy, ReduceMax, Exp, ..."]
    E --> F["ascend_compile crate<br/>Target abstraction · Validation<br/>Bisheng invocation · C ABI + CLI"]
    F -->|"310P: --cce-aicore-arch=dav-m200"| G["NPU Binary · kernel.acl.o<br/>Ascend 310P machine code"]
    F -->|"910B: --cce-aicore-arch=dav-c220"| H["NPU Binary · kernel.acl.o<br/>Ascend 910B machine code<br/>(522 tests verified)"]

7.1.1 The `ascend_compile` Compilation Hub

The ascend_compile crate (crates/ascend_compile/) is a standalone compilation library that decouples kernel compilation from the rustc_codegen_mlir backend. Any C++ kernel generator — whether from ascend-rs’s own MLIR-to-C++ pipeline, TileLang, Triton, PyPTO (CANN’s tile-level operator DSL), or future frontends — can use it to compile AscendC kernels:

graph TD
    A1["ascend-rs<br/>Rust→MLIR→C++"] --> E["AscendC C++ kernel source"]
    A2["TileLang<br/>Python DSL→AscendC (planned)"] -.-> E
    A3["Triton<br/>GPU kernel compiler (planned)"] -.-> E
    A4["PyTorch<br/>torch.compile (planned)"] -.-> E
    A5["PyPTO<br/>CANN tile-level DSL (planned)"] -.-> E
    E --> F["ascend_compile<br/><br/>Rust API · C ABI · CLI · Python<br/><br/>3 validation passes<br/>Dual flag paths · 310P + 910B<br/>Object or shared library output"]
    F --> G["NPU Binary · .o / .so"]

This architecture enables the broader Ascend ecosystem to benefit from ascend-rs’s validated compilation pipeline without depending on Rust or rustc. The dashed edges indicate planned integrations not yet implemented.

7.2 Runtime Phase

graph TD
    subgraph Host["Host CPU"]
        H1["Acl::new()"] --> H2["Device::new"]
        H2 --> H3["AclContext"]
        H3 --> H4["AclStream"]
        H4 --> H5["DeviceBuffer::from_slice()"]
        H5 --> H6["kernel.launch()"]
        H6 --> H7["stream.sync()"]
        H7 --> H8["z_device.to_host()"]
        H8 --> H9["Verify results"]
        H9 --> H10["RAII Drop · auto-clean"]
    end
    subgraph Device["NPU Device"]
        D1["AI Core 0<br/>block_idx=0<br/>Process x 0..8"]
        D2["AI Core 1<br/>block_idx=1<br/>Process x 8..16"]
        D3["Device Memory<br/>x: Input A · y: Input B<br/>z: Output = A * B"]
    end
    H4 -.->|"stream binds"| D3
    H5 -.->|"Host → Device copy"| D3
    H6 -.->|"Kernel execution"| D1
    H6 -.->|"Kernel execution"| D2
    H7 -.->|"Completion signal"| Device
    H8 -.->|"Device → Host transfer"| D3
    H10 -.->|"Resources freed"| Device

7.3 Memory Safety Guarantees

Throughout this process, ascend-rs provides the following compile-time safety guarantees:

Safety Issue	C++ Approach	ascend-rs Approach
Device memory leak	Manual `aclrtFree`	`Drop` on `DeviceBuffer<T>`
Wrong deallocation order	Programmer convention	Lifetime system prevents at compile time
Use-after-free stream	No check	Compile error
Send unsafe type to device	No check	`DeviceSend` trait bound
Forgetting to synchronize	Silent data corruption	Type system extensible to enforce

Keyboard shortcuts

ascend-rs: Memory-Safe NPU Kernel Programming in Rust

7. End-to-End Pipeline Walkthrough

7.1 Compilation Phase

7.1.1 The ascend_compile Compilation Hub

7.2 Runtime Phase

7.3 Memory Safety Guarantees

7.1.1 The `ascend_compile` Compilation Hub