English | 中文版
11. Extending the Oracle to Ingested linalg Kernels
Summary: Chapter 10 ran the safety oracle on
.acl.ptofiles — PTO-MLIR produced by our ownmlir_to_ptobackend. This chapter extends the same oracle to kernels that arrive from outside the ascend-rs pipeline: upstreamlinalgdialect MLIR emitted by third-party frontends liketorch-mlir. Two paths are described. Path A is a ~300-line projector that synthesises a stage-2Plandirectly fromascend_tileMLIR and runs a subset of the Chapter 10 passes on it. Path C pipes every ingested kernel through the realmlir_to_pto → ptoas --print-after-all → parse_stage2chain and runs the full six-pass oracle on the post-PlanMemoryPassplan. Both are wired to the ingress driver via a single environment variable,ACLRS_LINALG_SAFETY, and both run host-only onadablueusing an x86ptoasbuild — no NPU required. The two paths are complementary: Path A is fast and catches whole-tile issues; Path C is higher fidelity, especially on blocked matmuls where Path A conservatively over-approximates capacity.
11.1 Path A: A Projector for Ingested ascend_tile
Chapter 10 ran the oracle on .acl.pto files — PTO-MLIR produced by our own mlir_to_pto backend. A fair follow-up question is whether the same oracle has anything to say about kernels that arrive from outside the ascend-rs pipeline: specifically, upstream linalg dialect MLIR emitted by third-party frontends like torch-mlir. The linalg bridge (Chapter 7) ingests those kernels, lowers them to our ascend_tile form, and hands them to mlir_to_cpp to produce AscendC. Ingested kernels were, until now, the one code path in the repo with zero Rust-side safety analysis — a visible gap in the “Rust safety card” story.
This section plugs that gap. The same check_* passes from section 10.4, re-aimed at the ingress path, catch three bug classes in the four adversarial fixtures shipped under benchmarks/linalg/kernels_adversarial/. The path is deliberately minimal — a ~300-line projector that synthesises a stage-2 Plan directly from ascend_tile MLIR — and is wired to the ingress driver by a single environment variable.
11.1.1 Why Ingress Looks Different from .acl.pto
The stage-2 oracle in section 10.2 starts from ptoas --print-after-all, where every tile already has a concrete (space, offset, rows, cols, dtype, blayout, slayout). Ingested linalg has none of that: the frontend emits llvm.func @kernel(...) attributes {hacc.entry} with a body of llvm.call @ascend_tile_<op>_<dt>(%args...) intrinsics — pure op-and-operand soup, no placement.
We have two honest options:
- Path A (this section): synthesise a naive stage-2 plan by giving every SSA its own UB slot with sequential offsets, then run the oracle against that plan. Cheap to build, bounded in value — can catch whole-tile issues but never sees real buffer reuse.
- Path C (section 11.2): pipe every ingested kernel through the real
mlir_to_pto→ptoas --print-after-all→parse_stage2chain and reuse section 10.2 verbatim. Higher fidelity, especially on blocked matmuls where Path A conservatively over-approximates capacity.
Path A is the fast default; Path C runs the full oracle on the post-PlanMemoryPass plan and catches one bug class Path A structurally cannot see. Everything below describes Path A as implemented at commit 381340fc; section 11.2 covers Path C.
11.1.2 An SSA Property That Changes Which Checks Apply
Before describing the projector, one observation about the input format is load-bearing: linalg from torch-mlir is in SSA form, and SSA form automatically renames duplicates. A source-level pattern like y = x + x lowers to a single tile passed twice to linalg.generic; a back-to-back WAW (%t = f(%a); %t = g(%b)) is impossible to express because the second binding gets a fresh name. Two of the oracle’s six checks therefore do not apply to ingested linalg:
check_aliasinglooks for distinct SSA names sharing overlapping offsets — SSA form prevents that case by construction.- The original
check_linear_useWAW rule looks for a write followed by another write to the same slot — SSA form renames the second write, so it cannot trigger.
What can survive into a projected plan is the pattern write-never-read: an op produces an SSA value that no later op consumes. This is the canonical shape into which both source-level aliasing and source-level WAW collapse after SSA. To catch it, we added one new check:
// crates/pto_to_rust/src/safety.rs
pub fn check_dead_writes(f: &PlanFunc, rep: &mut SafetyReport) {
let mut read_slots: BTreeSet<&Ssa> = BTreeSet::new();
let mut written_slots: BTreeSet<&Ssa> = BTreeSet::new();
for op in &f.ops {
for s in op.reads() { read_slots.insert(s); }
for s in op.writes() { written_slots.insert(s); }
}
for w in &written_slots {
if !read_slots.contains(w) {
let producer = f.ops.iter()
.position(|op| op.writes().iter().any(|s| s == w));
let where_clause = producer
.map(|i| format!(" (produced by op #{})", i))
.unwrap_or_default();
rep.violations.push(SafetyViolation::warn(
&f.name, SafetyKind::DeadTile,
format!("tile `{}` is written but never read{} \
— the producing op is dead code",
w.0, where_clause),
));
}
}
}
check_dead_writes is wired into check_all (so it also improves coverage on hand-written PTO — the original 50-case corpus still passes) and into the new check_ingress subset.
11.1.3 The Projector
pto_to_rust::project(&ascend_tile_src) -> ProjectResult { plan, warnings } (~300 LoC, crates/pto_to_rust/src/ascend_tile_ingress.rs) walks the ascend_tile MLIR text and emits a PlanFunc per llvm.func @name ... attributes {hacc.entry}. The rules are intentionally small:
| Input form | Projected slot/op |
|---|---|
%c = llvm.mlir.constant(N : i32) | remembered shape constant |
llvm.call @ascend_tile_load_<dt>(%buf, %r, %c) -> %t | allocate slot %t in UB at next sequential offset; TLoad op |
llvm.call @ascend_tile_store_<dt>(%buf, %t, %r, %c) | TStore { tile: %t } |
llvm.call @ascend_tile_<unop>_<dt>(%a) -> %t for exp/log/sqrt/rsqrt/tanh/abs/neg/sigmoid/silu/relu/softmax/rms_norm | allocate %t; TUnary { src: %a, dst: %t } |
llvm.call @ascend_tile_<binop>_<dt>(%a, %b) -> %t for add/sub/mul/div/max/min | allocate %t; TBinary { a, b, dst } |
llvm.call @ascend_tile_matmul_<dt>(%a, %b) -> %t | allocate %t; TMatmul (all in UB — see below) |
any other llvm.call @ascend_tile_* | TUnary placeholder + warning |
Two design choices are worth flagging:
- Every SSA gets its own slot. The projector does not model buffer reuse —
mlir_to_cpp’s real allocator does that later. Capacity is therefore a conservative over-approximation: a kernel that the real allocator slims to 64 KiB might project to 512 KiB. The trade-off is deliberate — for adversarial fixtures the over-approximation is exactly the signal we want; for production-fit kernels it will false-positive on capacity (a documented limit, on Path C’s to-do list). - Matmul goes in UB, not L0. There is no Left/Right/Acc annotation to recover from
ascend_tileform. The projector puts every operand in UB and tags the opTMatmul, but thecheck_ingresssubset does not runcheck_op_constraintorcheck_matmul_bounds— doing so would report every matmul as mis-placed. Those checks are Path C territory.
check_ingress runs exactly five of the six passes: aliasing + capacity + dead_tiles + dead_writes + linear_use. (The first two are still worth running — aliasing is vacuous on SSA-projected plans so it is effectively a no-op, and capacity catches egregious whole-tile cases.)
11.1.4 Wiring into the Ingress Driver
The linalg_to_ascendc binary (the tool that consumes linalg MLIR and produces an AscendC .cce) gained one opt-in block:
// crates/mlir_to_cpp_tests/src/bin/linalg_to_ascendc.rs
if let Ok(mode) = std::env::var("ACLRS_LINALG_SAFETY") {
let projected = pto_to_rust::project(&ascend_tile);
for w in &projected.warnings {
eprintln!("linalg-safety [projector]: {}", w);
}
let spec = pto_to_rust::default_a5_910b2_cann85();
let report = pto_to_rust::check_ingress(&projected.plan, &spec);
let mut err_count = 0usize;
for v in &report.violations {
let sev = match v.severity {
pto_to_rust::Severity::Error => { err_count += 1; "error" }
pto_to_rust::Severity::Warning => "warning",
};
eprintln!("linalg-safety [{}] {}: {} (in `{}`)",
sev, v.kind.label(), v.message, v.func);
}
if mode == "error" && err_count > 0 {
eprintln!("linalg-safety: {} error(s), aborting \
(ACLRS_LINALG_SAFETY=error)", err_count);
std::process::exit(3);
}
}
ACLRS_LINALG_SAFETY=1runs advisory: warnings print, emission proceeds.ACLRS_LINALG_SAFETY=errorpromotes anySeverity::Errorto exit code 3, matching the conventionACLRS_PTO_SAFETY=erroralready uses for the.acl.ptopath.
A sibling helper, linalg_safety_dump, prints the projected Plan (slots + ops) alongside the full report — useful when an ingress fixture behaves unexpectedly and you want to see what the projector actually built.
11.1.5 Four Adversarial Fixtures
benchmarks/linalg/kernels_adversarial/ ships four .mlir inputs, each written to trigger one bug class. They are intentionally small (one function, ≤3 ops) so the projected plan is transparent.
| Fixture | Source-level pattern | Expected finding |
|---|---|---|
aliasing_same_tensor_twice.mlir | linalg.generic { %arg0, %arg0 } → add | clean — SSA dedupes the second operand |
capacity_overflow_1x131072.mlir | exp on a 1×131072 f32 tile (512 KiB) | capacity error — UB cap 192 KiB |
dead_tile_unused_intermediate.mlir | %t = exp(%a) produced then discarded; return %a + %b | dead-tile warning on %t |
waw_double_write.mlir | two linalg.generic ops with the same outs | dead-tile warning on the first op’s SSA (renamed by SSA) |
Running the driver on each with ACLRS_LINALG_SAFETY=1 gives the verbatim output below (captured on adablue, commit 381340fc, release build of linalg_to_ascendc):
$ for f in aliasing_same_tensor_twice capacity_overflow_1x131072 \
dead_tile_unused_intermediate waw_double_write; do
echo "=== $f ==="
ACLRS_LINALG_SAFETY=1 crates/mlir_to_cpp_tests/target/release/linalg_to_ascendc \
benchmarks/linalg/kernels_adversarial/$f.mlir /tmp/out.cce 2>&1 \
| grep -E '^linalg-safety' || echo '(clean — no findings)'
done
=== aliasing_same_tensor_twice ===
(clean — no findings)
=== capacity_overflow_1x131072 ===
linalg-safety [error] capacity: vec high-water 1048576 B exceeds capacity 196608 B
(on Ascend910B2 (CANN 8.5)) (in `adv_capacity_overflow`)
=== dead_tile_unused_intermediate ===
linalg-safety [warning] dead-tile: tile `%t2` is written but never read
(produced by op #2) — the producing op is dead code (in `adv_dead_tile`)
=== waw_double_write ===
linalg-safety [warning] dead-tile: tile `%t1` is written but never read
(produced by op #1) — the producing op is dead code (in `adv_waw`)
The clean line on the aliasing fixture is the honest part of the story: SSA renames %arg0, %arg0 to a single operand before the projector ever sees it, and the oracle says so by staying silent. Error-mode promotes the capacity finding to exit 3:
$ ACLRS_LINALG_SAFETY=error crates/mlir_to_cpp_tests/target/release/linalg_to_ascendc \
benchmarks/linalg/kernels_adversarial/capacity_overflow_1x131072.mlir /tmp/out.cce
linalg-safety [error] capacity: ...
linalg-safety: 1 error(s), aborting (ACLRS_LINALG_SAFETY=error)
$ echo $?
3
11.1.6 Reproducer
Two test suites cover the wiring end-to-end; both are green on adablue-probe:
$ cargo test -p pto_to_rust --test adversarial_ingress --release
test adv_aliasing_same_tensor_twice_clean ... ok
test adv_capacity_overflow_flagged ... ok
test adv_dead_intermediate_and_dead_write_flagged ... ok
test adv_waw_double_write_flagged ... ok
test ingress_aliasing_projects_cleanly ... ok
test ingress_capacity_1x131072_flagged ... ok
test ingress_dead_intermediate_caught_by_dead_write ... ok
test ingress_waw_caught_as_dead_write ... ok
8 passed; 0 failed
The first four exercise hand-crafted PlanFunc values (the oracle proper); the last four exercise the projector itself — starting from the .mlir text and asserting that project() + check_ingress() produces the expected Violation set. Adding a new adversarial pattern is therefore an .mlir plus one test: no new oracle code.
11.1.7 What This Path Does Not Catch
It is worth naming the limits explicitly so the claim lands as “Rust safety on ingested kernels, within these bounds” rather than “Rust catches all the things”:
- Cross-op buffer reuse bugs. The projector gives every SSA its own slot, so real allocator-level collisions in
mlir_to_cpp::analyze_kernelpass through unchecked. Closing this gap is the Path A follow-up: feed reuse decisions back into the projector so the capacity figure and aliasing surface match the shipping footprint. - Matmul placement + blocked shapes. No Left/Right/Acc in the projected plan, so
check_op_constraintandcheck_matmul_boundsare deliberately skipped. Worse, for blocked matmuls — wheremlir_to_ptotiles a largeNinto many per-op chunks — Path A’s capacity check reports the pre-blocking footprint, which is a false positive. Matmul fidelity is Path C (section 11.2); thematmul_row_overflowfixture below is the empirical demonstration. - Numerics. The oracle is structural; a fixture that produces wrong output but allocates correctly will pass.
Despite the limits, the four demo fixtures establish the new baseline: ingested linalg is no longer an un-analysed input. The same six-pass oracle from section 10.4 now sees both sides of the ascend-rs ingress boundary, and the ACLRS_LINALG_SAFETY=error setting gives downstream build systems the same advisory-or-hard knob that ACLRS_PTO_SAFETY=error already provides for self-emitted kernels.
11.2 Path C: The Full Oracle on Post-PlanMem Plans
Section 11.1 was honest about its ceiling: Path A can only see what a single text walk of ascend_tile tells it, and it cannot see buffer reuse, cannot see matmul placement, and cannot see mlir_to_pto’s own shape decisions (tile blocking, Kb selection, fractal packing). The interesting question is whether those gaps need a whole new analysis or whether the existing six-pass oracle from section 10.4 can be re-used verbatim on a plan that already has all that information in it. Path C says yes — just lower the ingested linalg through the real compilation pipeline and run the oracle on the post-PlanMemoryPass MLIR that ptoas --print-after-all emits. No new passes, no new plan format, one new driver.
11.2.1 Host-only on adablue
The assumption that blocked Path C earlier was that ptoas lives on 910c (aarch64, NPU hardware). It does not — it also ships as an x86 build at ~/ptoas-x86/bin/ptoas on adablue, and that binary produces correct --print-after-all output for static analysis without any NPU present. Path C is therefore cleanly separable from NPU execution: static safety analysis is a pure host concern, numerical validation is where you need 910c. This matches how cargo check and cargo test split on a cross-compilation project.
11.2.2 The Five Hops
linalg.mlir ── hop 1 ── linalg_to_ascend_tile
│
ascend_tile MLIR ── hop 2 ── mlir_to_pto
│
.acl.pto (PTO-MLIR) ── hop 3 ── ptoas --print-after-all (x86)
│
stage-2 MLIR in stderr ── hop 4 ── pto_to_rust::parse_stage2
│
post-PlanMem `Plan` ── hop 5 ── check_all (all 6 passes)
│
SafetyReport
Hops 1 and 2 are the existing ingress path. Hop 3 invokes the unmodified x86 ptoas as a subprocess and captures --print-after-all output on stderr. Hops 4 and 5 are the section 10.2 flow, unchanged — same parse_stage2, same check_all, same DeviceSpec. Path C contributes only the plumbing between them.
A standalone probe binary (linalg_path_c_probe, one .rs file) drives the full chain with PASS/FAIL per hop; it exists mainly as a diagnostic tool for adding new fixtures. Production use goes through the driver (section 11.2.4).
11.2.3 Where Path C Beats Path A
The advertised win of Path C is “tighter capacity, catches matmul bounds”, and we should be honest about what is actually empirically demonstrable on the current fixtures. Running Path C against every fixture in benchmarks/linalg/ (commit b6db7cae) produces the following findings table:
| Fixture | hop 3 rc | hop 5 findings |
|---|---|---|
upstream/{add,exp,matmul,softmax} | 0 | clean |
adv/aliasing_same_tensor_twice | 0 | clean (SSA dedup — matches Path A) |
adv/capacity_overflow_1x131072 | 1 | ptoas: vec overflow, requires 8388608 bits while 1572864 bits avaliable |
adv/dead_tile_unused_intermediate | 0 | dead-tile on %5 (post-PlanMem SSA) |
adv/waw_double_write | 0 | dead-tile on %3 (post-PlanMem SSA) |
adv/matmul_row_overflow (16×16 × 16×65536) | 0 | clean — Path A reports capacity 8 MiB; Path C correct |
The last row is the empirical value-add. Path A’s projector sums raw linalg tensor footprints: the output tile 16×65536×4 = 4 MiB alone exceeds the 192 KiB UB cap on 910B2, so check_capacity fires as an error. But mlir_to_pto blocks that N=65536 into many per-op chunks of N=32 before ever emitting pto.tmatmul. The post-PlanMemoryPass plan never has a tile that big, and Path C reports clean — the correct answer. This is the empirical instance of the “conservative over-approximation” caveat that section 11.1.7 warned about; Path C is the remedy.
Two other honest findings from running the probe end to end:
ptoashas its own sanity bounds. On large shapes (dims > 4095) ptoas’s built-in verifier rejectspto.tmatmulwell before ourcheck_matmul_bounds(ROW < 2^16) would trigger, so that pass is mostly dormant on ingested linalg. The rejection still surfaces — Path C treatsptoas rc≠0as anErrorfinding, so the violation reaches the user either way — just from a different layer than section 10.4’s check.- SSA names differ between Path A and Path C. Path A reports
%t2; Path C reports%5. Both are correct (same tile, different dialects —ascend_tilevs post-PlanMemoryPassMLIR), and Path C’s names match the emitted C++ byte-for-byte.
11.2.4 Driver Wiring
The ingress driver gained a Path C mode alongside the existing Path A one:
// crates/mlir_to_cpp_tests/src/bin/linalg_to_ascendc.rs
if let Ok(mode) = std::env::var("ACLRS_LINALG_SAFETY") {
let abort_env = std::env::var("ACLRS_LINALG_SAFETY_ABORT")
.ok().as_deref() == Some("1");
let abort_on_error = abort_env || mode == "error";
let err_count = if mode == "path-c" {
run_path_c(&ascend_tile) // hops 2..5
} else {
run_path_a(&ascend_tile) // project + check_ingress
};
if abort_on_error && err_count > 0 {
eprintln!("linalg-safety: {} error(s), aborting", err_count);
std::process::exit(3);
}
}
The knobs are:
| Env var | Effect |
|---|---|
ACLRS_LINALG_SAFETY=1 | path-a | Path A (projector + check_ingress), advisory |
ACLRS_LINALG_SAFETY=path-c | Path C (full pipeline through ptoas), advisory |
ACLRS_LINALG_SAFETY=error | Path A + abort on error findings (back-compat with §11.1) |
ACLRS_LINALG_SAFETY_ABORT=1 | Abort on error findings, combinable with either path |
ACLRS_PTOAS_BIN=<path> | Override the default $HOME/ptoas-x86/bin/ptoas |
run_path_c surfaces a non-zero ptoas exit code as a Severity::Error finding rather than a hard crash. A broken kernel that ptoas itself rejects is a safety finding — it just happens to be one that a different layer catches. Treating it as a structured error keeps the reporting surface uniform.
11.2.5 Demo: Path A vs Path C on matmul_row_overflow
$ BIN=crates/mlir_to_cpp_tests/target/release/linalg_to_ascendc
$ ACLRS_LINALG_SAFETY=path-a $BIN \
benchmarks/linalg/kernels_adversarial/matmul_row_overflow.mlir /tmp/a.cce \
2>&1 | grep linalg-safety
linalg-safety [path-a] [error] capacity: vec high-water 8389632 B exceeds capacity
196608 B (on Ascend910B2 (CANN 8.5)) (in `adv_matmul_row_overflow`)
$ ACLRS_LINALG_SAFETY=path-c $BIN \
benchmarks/linalg/kernels_adversarial/matmul_row_overflow.mlir /tmp/c.cce \
2>&1 | grep linalg-safety
(no output — Path C reports clean)
Path A false-positives with an 8.3 MiB capacity claim; Path C correctly sees the post-blocking plan and stays silent. Same kernel, same oracle passes, different layer of MLIR as input — and that’s the whole point of having Path C.
11.2.6 Reproducer
Three integration tests exercise the driver end-to-end; they spawn the release binary with each mode and assert on exit code + stderr:
$ cargo test --manifest-path crates/mlir_to_cpp_tests/Cargo.toml \
--test path_c_driver --release
test path_c_clean_upstream_add ... ok
test path_c_clean_where_path_a_overapproximates ... ok
test path_c_surfaces_ptoas_capacity_overflow ... ok
3 passed; 0 failed
The tests auto-discover ptoas at $ACLRS_PTOAS_BIN or $HOME/ptoas-x86/bin/ptoas and skip with a message if neither exists, so CI on machines without an x86 ptoas build stays green.
11.2.7 Non-goals
Path C is not a claim that the ingress oracle has closed every gap:
check_op_constraintandcheck_matmul_boundsremain largely dormant on ingress.mlir_to_ptopre-filters most violating shapes at hop 2, andptoasfilters the rest at hop 3 with a tighterdims ≤ 4095bound than the oracle’sROW < 2^16. Those two checks stay useful for hand-written.acl.pto(the original section 10.2 target), but on the ingress path they are rarely the first line of defence.- Path C still trusts
ptoas’s own pipeline. Ifptoassilently accepts a plan with a placement bug that neither it nor our passes catch, Path C will report clean. The oracle-catches-ptoas-blind-spots claim from section 10.3 still applies only to the slots the oracle knows how to read. - Numerics remain out of scope. Same as Path A.
What Path C does close is the specific gap named in §11.1.7: blocked matmuls where Path A conservatively fails safe. Any future ingested matmul-heavy kernel (LLM MLPs, attention projections, batched GEMMs) now has a clean structural signal instead of a capacity false positive, and it does so by reusing the exact six passes from section 10.4 on a plan that ptoas already constructed. No new oracle code; the value comes from running the old oracle at a more informative point in the lowering.