English | 中文版
1. Background: The State of NPU Programming
Why Memory Safety Matters
In heterogeneous computing, GPU/NPU programming has long relied on C/C++ ecosystems. Frameworks like CUDA, OpenCL, and SYCL are powerful but inherit all of C/C++’s memory safety problems: dangling pointers, buffer overflows, data races, and resource leaks. These issues are especially tricky in heterogeneous environments, where interactions between device and host memory add another layer of complexity.
A typical NPU programming mistake might look like this:
// C++ AscendC: Forgetting to free device memory → memory leak
void* devPtr;
aclrtMalloc(&devPtr, size, ACL_MEM_MALLOC_HUGE_FIRST);
// ... use devPtr for computation ...
// If an exception occurs here, aclrtFree is never called
aclrtFree(devPtr);
Rust’s ownership system and RAII (Resource Acquisition Is Initialization) pattern eliminate such problems at compile time. This is the core motivation behind the ascend-rs project.
The Open-Source Landscape
Several open-source projects have explored memory-safe heterogeneous computing:
| Project | Target | Approach | Status |
|---|---|---|---|
| rust-cuda | NVIDIA GPU | Rust → PTX compilation, safe CUDA bindings | Inactive |
| rust-gpu | GPU (Vulkan) | Rust → SPIR-V compilation | Active |
| krnl | GPU (Vulkan) | Safe GPU compute kernels | Active |
| cudarc | NVIDIA GPU | Safe CUDA runtime bindings | Active |
| ascend-rs | Huawei Ascend NPU | Rust → MLIR → NPU, safe ACL bindings | In development |
As you can see, ascend-rs is the only project in the Ascend NPU ecosystem attempting memory-safe Rust programming on both the host and device sides. This fills an important gap in the Ascend ecosystem.
ascend-rs Architecture
ascend-rs uses a three-layer architecture:
graph TD
A["Application Layer<br/>User's Rust Program"] --> B["Host API Layer<br/>ascend_rs + ascend_sys<br/>Safe RAII wrappers"]
A --> C["Device Runtime Layer<br/>ascend_std + rustc_codegen_mlir<br/>#![no_core] runtime | MLIR codegen backend"]
B --> D["CANN SDK · Native C/C++ Libraries<br/>ACL Runtime · AscendCL · bisheng · bishengir · HIVM"]
C --> D
The Host API layer uses bindgen to auto-generate FFI bindings, then builds safe Rust wrappers on top: Acl, Device, AclContext, AclStream, DeviceBuffer<T>, etc., using Rust’s lifetime system to enforce correct resource ordering.
The Device Runtime layer is more innovative: it contains a custom rustc codegen backend that compiles Rust code to MLIR. From there, a mlir_to_cpp translation pass converts the MLIR into C++ source with AscendC API calls, which is then compiled by bisheng (the CANN C++ compiler) into NPU-executable binaries for both Ascend 910B and 310P targets. This MLIR-to-C++ path is what enables the full AscendC feature set — DMA operations, vector intrinsics, pipe barriers, and TPipe infrastructure. The translator recognizes ascend_* function calls in MLIR and emits the corresponding AscendC vector operations.