1. Background: The State of NPU Programming

Why Memory Safety Matters

In heterogeneous computing, GPU/NPU programming has long relied on C/C++ ecosystems. Frameworks like CUDA, OpenCL, and SYCL are powerful but inherit all of C/C++’s memory safety problems: dangling pointers, buffer overflows, data races, and resource leaks. These issues are especially tricky in heterogeneous environments, where interactions between device and host memory add another layer of complexity.

A typical NPU programming mistake might look like this:

// C++ AscendC: Forgetting to free device memory → memory leak
void* devPtr;
aclrtMalloc(&devPtr, size, ACL_MEM_MALLOC_HUGE_FIRST);
// ... use devPtr for computation ...
// If an exception occurs here, aclrtFree is never called
aclrtFree(devPtr);

Rust’s ownership system and RAII (Resource Acquisition Is Initialization) pattern eliminate such problems at compile time. This is the core motivation behind the ascend-rs project.

The Open-Source Landscape

Several open-source projects have explored memory-safe heterogeneous computing:

Project	Target	Approach	Status
rust-cuda	NVIDIA GPU	Rust → PTX compilation, safe CUDA bindings	Inactive
rust-gpu	GPU (Vulkan)	Rust → SPIR-V compilation	Active
krnl	GPU (Vulkan)	Safe GPU compute kernels	Active
cudarc	NVIDIA GPU	Safe CUDA runtime bindings	Active
ascend-rs	Huawei Ascend NPU	Rust → MLIR → NPU, safe ACL bindings	In development

As you can see, ascend-rs is the only project in the Ascend NPU ecosystem attempting memory-safe Rust programming on both the host and device sides. This fills an important gap in the Ascend ecosystem.

ascend-rs Architecture

ascend-rs uses a three-layer architecture:

graph TD
    A["Application Layer<br/>User's Rust Program"] --> B["Host API Layer<br/>ascend_rs + ascend_sys<br/>Safe RAII wrappers"]
    A --> C["Device Runtime Layer<br/>ascend_std + rustc_codegen_mlir<br/>#![no_core] runtime | MLIR codegen backend"]
    B --> D["CANN SDK · Native C/C++ Libraries<br/>ACL Runtime · AscendCL · bisheng · bishengir · HIVM"]
    C --> D

The Host API layer uses bindgen to auto-generate FFI bindings, then builds safe Rust wrappers on top: Acl, Device, AclContext, AclStream, DeviceBuffer<T>, etc., using Rust’s lifetime system to enforce correct resource ordering.

The Device Runtime layer is more innovative: it contains a custom rustc codegen backend that compiles Rust code to MLIR. From there, a mlir_to_cpp translation pass converts the MLIR into C++ source with AscendC API calls, which is then compiled by bisheng (the CANN C++ compiler) into NPU-executable binaries for both Ascend 910B and 310P targets. This MLIR-to-C++ path is what enables the full AscendC feature set — DMA operations, vector intrinsics, pipe barriers, and TPipe infrastructure. The translator recognizes ascend_* function calls in MLIR and emits the corresponding AscendC vector operations.

Keyboard shortcuts

ascend-rs: Memory-Safe NPU Kernel Programming in Rust

1. Background: The State of NPU Programming

Why Memory Safety Matters

The Open-Source Landscape

ascend-rs Architecture