English | 中文版
Appendix H: Safety Differential Analysis
Analysis of 998 CANN 8.5 kernel pairs (AscendC C++ vs ascend-rs Rust).
For each kernel, we identify which memory safety vulnerabilities exist in the C++ version and how the Rust transpilation prevents them.
H.1 Safety Class Summary
| # | Safety Class | C++ Risk | Rust Prevention | Kernels Affected |
|---|---|---|---|---|
| 1 | Type Confusion | GM_ADDR type erasure | Typed pointer signature (*const T) | 983/998 (98%) |
| 2 | Buffer Overflow | GetValue(i)/SetValue(i,v) with i >= count | Opaque buffer ID + explicit count parameter | 9/998 (0%) |
| 3 | Use-After-Free | FreeTensor() leaves stale handle | No FreeTensor operation in ascend-rs API | 3/998 (0%) |
| 4 | Missing Synchronization | DMA→compute without pipe_barrier() | kernel_ops composites include barriers internally | 793/998 (79%) |
| 5 | Double Free | FreeTensor() called twice on same handle | No FreeTensor operation in ascend-rs API | 3/998 (0%) |
| 6 | Integer Overflow | u32 arithmetic: blockIdx * perBlockLen | wrapping_mul makes overflow semantics explicit | 785/998 (78%) |
H.2 Category Breakdown
| Category | Total | C1: Type | C2: Bounds | C3: UAF | C4: Sync | C5: DblFree | C6: Overflow |
|---|---|---|---|---|---|---|---|
| ops_index | 114 | 114 | 3 | 0 | 76 | 0 | 76 |
| ops_legacy | 200 | 200 | 0 | 0 | 136 | 0 | 128 |
| ops_math | 120 | 120 | 0 | 0 | 84 | 0 | 84 |
| ops_nn | 150 | 150 | 6 | 3 | 129 | 3 | 129 |
| ops_optimizer | 82 | 82 | 0 | 0 | 62 | 0 | 62 |
| ops_reduce | 80 | 80 | 0 | 0 | 80 | 0 | 80 |
| ops_resize | 52 | 52 | 0 | 0 | 52 | 0 | 52 |
| ops_transformer | 200 | 185 | 0 | 0 | 174 | 0 | 174 |
H.3 Counter-Example Inputs
For each safety class, a counter-example input that triggers the vulnerability in C++ but is caught/prevented in Rust.
Class 1: Type Confusion
Trigger: Pass f16 data to f32 kernel
C++ behavior: Silent data corruption (interprets f16 bits as f32)
Rust behavior: Compile-time type error (*const u16 ≠ *const f32)
Example kernel: foreach_exp_f32
Evidence: Uses GM_ADDR (type-erased uint8_t*)
Class 2: Buffer Overflow
Trigger: count = buffer_size + 1
C++ behavior: Out-of-bounds SRAM read/write (undefined behavior)
Rust behavior: Buffer ID abstraction prevents raw indexing
Example kernel: foreach_dropout_f32
Evidence: Uses GetValue (unchecked index) + array indexing
Class 3: Use-After-Free
Trigger: Free buffer then read through stale handle
C++ behavior: Reads deallocated SRAM (garbage data)
Rust behavior: No free API exists — buffer lifetime managed by runtime
Example kernel: foreach_dropout_f32
Evidence: Calls FreeTensor() — handle remains valid
Class 4: Missing Synchronization
Trigger: Remove barrier between load and compute
C++ behavior: Reads stale/partial DMA data (non-deterministic)
Rust behavior: ascend_pipe_barrier() always emitted between stages
Example kernel: foreach_add_list_f32
Evidence: Has 2 barriers — missing one causes data races
Class 5: Double Free
Trigger: Call FreeTensor twice on same LocalTensor
C++ behavior: Corrupts queue free list (undefined behavior)
Rust behavior: No free API exists — impossible to double-free
Example kernel: foreach_dropout_f32
Evidence: FreeTensor called 54 times
Class 6: Integer Overflow
Trigger: blockIdx=1048576, perBlockLen=4096 → wraps to 0
C++ behavior: Silent wrap to 0, wrong memory offset
Rust behavior: wrapping_mul(4096) → 0 (explicit, debug-mode panic)
Example kernel: foreach_dropout_f32
Evidence: Uses block index for offset calculation
H.4 Per-Kernel Safety Report (All 998 Kernels)
foreach_exp_f32 (ops_legacy, f32, ✓ real source): C1
foreach_exp_f16 (ops_legacy, f16, ✓ real source): C1
foreach_exp_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_abs_f32 (ops_legacy, f32, ✓ real source): C1
foreach_abs_f16 (ops_legacy, f16, ✓ real source): C1
foreach_abs_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_neg_f32 (ops_legacy, f32, ✓ real source): C1
foreach_neg_f16 (ops_legacy, f16, ✓ real source): C1
foreach_neg_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_sqrt_f32 (ops_legacy, f32, ✓ real source): C1
foreach_sqrt_f16 (ops_legacy, f16, ✓ real source): C1
foreach_sqrt_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_rsqrt_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_rsqrt_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_rsqrt_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_reciprocal_f32 (ops_legacy, f32, ✓ real source): C1
foreach_reciprocal_f16 (ops_legacy, f16, ✓ real source): C1
foreach_reciprocal_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_ln_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_ln_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_ln_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_log2_f32 (ops_legacy, f32, ✓ real source): C1
foreach_log2_f16 (ops_legacy, f16, ✓ real source): C1
foreach_log2_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_log10_f32 (ops_legacy, f32, ✓ real source): C1
foreach_log10_f16 (ops_legacy, f16, ✓ real source): C1
foreach_log10_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_ceil_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_ceil_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_ceil_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_floor_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_floor_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_floor_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_round_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_round_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_round_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_trunc_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_trunc_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_trunc_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_sign_f32 (ops_legacy, f32, ✓ real source): C1
foreach_sign_f16 (ops_legacy, f16, ✓ real source): C1
foreach_sign_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_not_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_not_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_not_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_bitwise_not_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_bitwise_not_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_bitwise_not_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_logical_not_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_logical_not_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_logical_not_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_clamp_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_clamp_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_clamp_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_add_list_f32 (ops_legacy, f32, ✓ real source): C1, C4
foreach_add_list_f16 (ops_legacy, f16, ✓ real source): C1, C4
foreach_add_list_bf16 (ops_legacy, bf16, ✓ real source): C1, C4
foreach_sub_list_f32 (ops_legacy, f32, ✓ real source): C1, C4
foreach_sub_list_f16 (ops_legacy, f16, ✓ real source): C1, C4
foreach_sub_list_bf16 (ops_legacy, bf16, ✓ real source): C1, C4
foreach_mul_list_f32 (ops_legacy, f32, ✓ real source): C1
foreach_mul_list_f16 (ops_legacy, f16, ✓ real source): C1
foreach_mul_list_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_div_list_f32 (ops_legacy, f32, ✓ real source): C1
foreach_div_list_f16 (ops_legacy, f16, ✓ real source): C1
foreach_div_list_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_max_list_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_max_list_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_max_list_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_min_list_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_min_list_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_min_list_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_pow_list_f32 (ops_legacy, f32, ✓ real source): C1
foreach_pow_list_f16 (ops_legacy, f16, ✓ real source): C1
foreach_pow_list_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_fmod_list_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_fmod_list_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_fmod_list_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_bitwise_and_list_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_bitwise_and_list_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_bitwise_and_list_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_bitwise_or_list_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_bitwise_or_list_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_bitwise_or_list_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_bitwise_xor_list_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_bitwise_xor_list_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_bitwise_xor_list_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_logical_and_list_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_logical_and_list_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_logical_and_list_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_logical_or_list_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_logical_or_list_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_logical_or_list_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_add_scalar_f32 (ops_legacy, f32, ✓ real source): C1
foreach_add_scalar_f16 (ops_legacy, f16, ✓ real source): C1
foreach_add_scalar_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_sub_scalar_f32 (ops_legacy, f32, ✓ real source): C1
foreach_sub_scalar_f16 (ops_legacy, f16, ✓ real source): C1
foreach_sub_scalar_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_mul_scalar_f32 (ops_legacy, f32, ✓ real source): C1
foreach_mul_scalar_f16 (ops_legacy, f16, ✓ real source): C1
foreach_mul_scalar_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_div_scalar_f32 (ops_legacy, f32, ✓ real source): C1
foreach_div_scalar_f16 (ops_legacy, f16, ✓ real source): C1
foreach_div_scalar_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_max_scalar_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_max_scalar_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_max_scalar_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_min_scalar_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_min_scalar_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_min_scalar_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_pow_scalar_f32 (ops_legacy, f32, ✓ real source): C1
foreach_pow_scalar_f16 (ops_legacy, f16, ✓ real source): C1
foreach_pow_scalar_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_clamp_scalar_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_clamp_scalar_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_clamp_scalar_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_add_list_alpha_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_add_list_alpha_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_add_list_alpha_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_sub_list_alpha_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_sub_list_alpha_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_sub_list_alpha_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_addcmul_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_addcdiv_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_copy_f32 (ops_legacy, f32, ✓ real source): C1
foreach_zero_inplace_f32 (ops_legacy, f32, ✓ real source): C1
foreach_lerp_f32 (ops_legacy, f32, stub): C1, C4, C6
foreach_addcmul_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_addcdiv_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_copy_f16 (ops_legacy, f16, ✓ real source): C1
foreach_zero_inplace_f16 (ops_legacy, f16, ✓ real source): C1
foreach_lerp_f16 (ops_legacy, f16, stub): C1, C4, C6
foreach_addcmul_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_addcdiv_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_copy_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_zero_inplace_bf16 (ops_legacy, bf16, ✓ real source): C1
foreach_lerp_bf16 (ops_legacy, bf16, stub): C1, C4, C6
zeros_like_f32 (ops_legacy, f32, stub): C1, C4, C6
ones_like_f32 (ops_legacy, f32, stub): C1, C4, C6
zeros_like_f16 (ops_legacy, f16, stub): C1, C4, C6
ones_like_f16 (ops_legacy, f16, stub): C1, C4, C6
zeros_like_bf16 (ops_legacy, bf16, stub): C1, C4, C6
ones_like_bf16 (ops_legacy, bf16, stub): C1, C4, C6
zeros_like_int32 (ops_legacy, i32, stub): C1, C4, C6
ones_like_int32 (ops_legacy, i32, stub): C1, C4, C6
elementwise_abs_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise_abs_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise_abs_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise_relu_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise_relu_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise_relu_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise_gelu_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise_gelu_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise_gelu_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise_silu_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise_silu_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise_silu_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise_neg_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise_neg_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise_neg_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise_sign_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise_sign_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise_sign_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise_ceil_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise_ceil_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise_ceil_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise_floor_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise_floor_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise_floor_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise16b_abs_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise16b_abs_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise16b_abs_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise16b_relu_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise16b_relu_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise16b_relu_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise16b_neg_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise16b_neg_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise16b_neg_bf16 (ops_legacy, bf16, stub): C1, C4, C6
elementwise16b_sign_f32 (ops_legacy, f32, stub): C1, C4, C6
elementwise16b_sign_f16 (ops_legacy, f16, stub): C1, C4, C6
elementwise16b_sign_bf16 (ops_legacy, bf16, stub): C1, C4, C6
foreach_abs_int32 (ops_legacy, i32, ✓ real source): C1
foreach_neg_int32 (ops_legacy, i32, ✓ real source): C1
foreach_sign_int32 (ops_legacy, i32, ✓ real source): C1
foreach_bitwise_not_int32 (ops_legacy, i32, stub): C1, C4, C6
foreach_logical_not_int32 (ops_legacy, i32, stub): C1, C4, C6
foreach_clamp_int32 (ops_legacy, i32, stub): C1, C4, C6
foreach_add_list_int32 (ops_legacy, i32, ✓ real source): C1, C4
foreach_sub_list_int32 (ops_legacy, i32, ✓ real source): C1, C4
foreach_mul_list_int32 (ops_legacy, i32, ✓ real source): C1
foreach_max_list_int32 (ops_legacy, i32, stub): C1, C4, C6
foreach_abs_int8 (ops_legacy, i8, ✓ real source): C1
foreach_neg_int8 (ops_legacy, i8, ✓ real source): C1
foreach_bitwise_not_int8 (ops_legacy, i8, stub): C1, C4, C6
foreach_clamp_int8 (ops_legacy, i8, stub): C1, C4, C6
foreach_add_scalar_int32 (ops_legacy, i32, ✓ real source): C1
foreach_sub_scalar_int32 (ops_legacy, i32, ✓ real source): C1
foreach_mul_scalar_int32 (ops_legacy, i32, ✓ real source): C1
foreach_div_scalar_int32 (ops_legacy, i32, ✓ real source): C1
foreach_sin_f32 (ops_math, f32, ✓ real source): C1
foreach_sin_f16 (ops_math, f16, ✓ real source): C1
foreach_sin_bf16 (ops_math, bf16, ✓ real source): C1
foreach_cos_f32 (ops_math, f32, ✓ real source): C1
foreach_cos_f16 (ops_math, f16, ✓ real source): C1
foreach_cos_bf16 (ops_math, bf16, ✓ real source): C1
foreach_tan_f32 (ops_math, f32, ✓ real source): C1
foreach_tan_f16 (ops_math, f16, ✓ real source): C1
foreach_tan_bf16 (ops_math, bf16, ✓ real source): C1
foreach_asin_f32 (ops_math, f32, ✓ real source): C1
foreach_asin_f16 (ops_math, f16, ✓ real source): C1
foreach_asin_bf16 (ops_math, bf16, ✓ real source): C1
foreach_acos_f32 (ops_math, f32, ✓ real source): C1
foreach_acos_f16 (ops_math, f16, ✓ real source): C1
foreach_acos_bf16 (ops_math, bf16, ✓ real source): C1
foreach_atan_f32 (ops_math, f32, ✓ real source): C1
foreach_atan_f16 (ops_math, f16, ✓ real source): C1
foreach_atan_bf16 (ops_math, bf16, ✓ real source): C1
foreach_atan2_f32 (ops_math, f32, stub): C1, C4, C6
foreach_atan2_f16 (ops_math, f16, stub): C1, C4, C6
foreach_atan2_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_sinh_f32 (ops_math, f32, ✓ real source): C1
foreach_sinh_f16 (ops_math, f16, ✓ real source): C1
foreach_sinh_bf16 (ops_math, bf16, ✓ real source): C1
foreach_cosh_f32 (ops_math, f32, ✓ real source): C1
foreach_cosh_f16 (ops_math, f16, ✓ real source): C1
foreach_cosh_bf16 (ops_math, bf16, ✓ real source): C1
foreach_tanh_math_f32 (ops_math, f32, stub): C1, C4, C6
foreach_tanh_math_f16 (ops_math, f16, stub): C1, C4, C6
foreach_tanh_math_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_asinh_f32 (ops_math, f32, stub): C1, C4, C6
foreach_asinh_f16 (ops_math, f16, stub): C1, C4, C6
foreach_asinh_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_acosh_f32 (ops_math, f32, stub): C1, C4, C6
foreach_acosh_f16 (ops_math, f16, stub): C1, C4, C6
foreach_acosh_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_atanh_f32 (ops_math, f32, stub): C1, C4, C6
foreach_atanh_f16 (ops_math, f16, stub): C1, C4, C6
foreach_atanh_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_erf_f32 (ops_math, f32, ✓ real source): C1
foreach_erf_f16 (ops_math, f16, ✓ real source): C1
foreach_erf_bf16 (ops_math, bf16, ✓ real source): C1
foreach_erfc_f32 (ops_math, f32, ✓ real source): C1
foreach_erfc_f16 (ops_math, f16, ✓ real source): C1
foreach_erfc_bf16 (ops_math, bf16, ✓ real source): C1
foreach_erfinv_f32 (ops_math, f32, stub): C1, C4, C6
foreach_erfinv_f16 (ops_math, f16, stub): C1, C4, C6
foreach_erfinv_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_expm1_f32 (ops_math, f32, ✓ real source): C1
foreach_expm1_f16 (ops_math, f16, ✓ real source): C1
foreach_expm1_bf16 (ops_math, bf16, ✓ real source): C1
foreach_log1p_f32 (ops_math, f32, ✓ real source): C1
foreach_log1p_f16 (ops_math, f16, ✓ real source): C1
foreach_log1p_bf16 (ops_math, bf16, ✓ real source): C1
foreach_softplus_f32 (ops_math, f32, stub): C1, C4, C6
foreach_softplus_f16 (ops_math, f16, stub): C1, C4, C6
foreach_softplus_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_digamma_f32 (ops_math, f32, stub): C1, C4, C6
foreach_digamma_f16 (ops_math, f16, stub): C1, C4, C6
foreach_digamma_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_lgamma_f32 (ops_math, f32, stub): C1, C4, C6
foreach_lgamma_f16 (ops_math, f16, stub): C1, C4, C6
foreach_lgamma_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_i0_f32 (ops_math, f32, stub): C1, C4, C6
foreach_i0_f16 (ops_math, f16, stub): C1, C4, C6
foreach_i0_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_i1_f32 (ops_math, f32, stub): C1, C4, C6
foreach_i1_f16 (ops_math, f16, stub): C1, C4, C6
foreach_i1_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_hypot_f32 (ops_math, f32, stub): C1, C4, C6
foreach_hypot_f16 (ops_math, f16, stub): C1, C4, C6
foreach_hypot_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_fma_f32 (ops_math, f32, stub): C1, C4, C6
foreach_fma_f16 (ops_math, f16, stub): C1, C4, C6
foreach_fma_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_remainder_f32 (ops_math, f32, stub): C1, C4, C6
foreach_remainder_f16 (ops_math, f16, stub): C1, C4, C6
foreach_remainder_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_copysign_f32 (ops_math, f32, stub): C1, C4, C6
foreach_copysign_f16 (ops_math, f16, stub): C1, C4, C6
foreach_copysign_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_nextafter_f32 (ops_math, f32, stub): C1, C4, C6
foreach_nextafter_f16 (ops_math, f16, stub): C1, C4, C6
foreach_nextafter_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_ldexp_f32 (ops_math, f32, stub): C1, C4, C6
foreach_ldexp_f16 (ops_math, f16, stub): C1, C4, C6
foreach_ldexp_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_frexp_f32 (ops_math, f32, stub): C1, C4, C6
foreach_frexp_f16 (ops_math, f16, stub): C1, C4, C6
foreach_frexp_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_logaddexp_f32 (ops_math, f32, stub): C1, C4, C6
foreach_logaddexp_f16 (ops_math, f16, stub): C1, C4, C6
foreach_logaddexp_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_logaddexp2_f32 (ops_math, f32, stub): C1, C4, C6
foreach_logaddexp2_f16 (ops_math, f16, stub): C1, C4, C6
foreach_logaddexp2_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_sincos_f32_910b (ops_math, f32, stub): C1, C4, C6
foreach_sincos_f16_910b (ops_math, f32, stub): C1, C4, C6
foreach_sincos_bf16_910b (ops_math, f32, stub): C1, C4, C6
foreach_sincospi_f32_910b (ops_math, f32, stub): C1, C4, C6
foreach_sincospi_f16_910b (ops_math, f32, stub): C1, C4, C6
foreach_sincospi_bf16_910b (ops_math, f32, stub): C1, C4, C6
foreach_j0_f32 (ops_math, f32, stub): C1, C4, C6
foreach_j0_f16 (ops_math, f16, stub): C1, C4, C6
foreach_j0_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_j1_f32 (ops_math, f32, stub): C1, C4, C6
foreach_j1_f16 (ops_math, f16, stub): C1, C4, C6
foreach_j1_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_y0_f32 (ops_math, f32, stub): C1, C4, C6
foreach_y0_f16 (ops_math, f16, stub): C1, C4, C6
foreach_y0_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_y1_f32 (ops_math, f32, stub): C1, C4, C6
foreach_y1_f16 (ops_math, f16, stub): C1, C4, C6
foreach_y1_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_polygamma_f32 (ops_math, f32, stub): C1, C4, C6
foreach_polygamma_f16 (ops_math, f16, stub): C1, C4, C6
foreach_polygamma_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_zeta_f32 (ops_math, f32, stub): C1, C4, C6
foreach_zeta_f16 (ops_math, f16, stub): C1, C4, C6
foreach_zeta_bf16 (ops_math, bf16, stub): C1, C4, C6
foreach_relu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_relu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_relu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_relu6_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_relu6_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_relu6_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_leaky_relu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_leaky_relu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_leaky_relu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_prelu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_prelu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_prelu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_elu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_elu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_elu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_selu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_selu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_selu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_gelu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_gelu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_gelu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_fast_gelu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_fast_gelu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_fast_gelu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_sigmoid_f32 (ops_nn, f32, ✓ real source): C1
foreach_sigmoid_f16 (ops_nn, f16, ✓ real source): C1
foreach_sigmoid_bf16 (ops_nn, bf16, ✓ real source): C1
foreach_hardsigmoid_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_hardsigmoid_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_hardsigmoid_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_hardswish_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_hardswish_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_hardswish_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_hardtanh_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_hardtanh_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_hardtanh_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_silu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_silu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_silu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_mish_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_mish_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_mish_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_softplus_nn_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_softplus_nn_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_softplus_nn_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_softsign_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_softsign_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_softsign_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_tanh_nn_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_tanh_nn_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_tanh_nn_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_celu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_celu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_celu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_glu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_glu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_glu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_rrelu_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_rrelu_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_rrelu_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_batch_norm_f32 (ops_nn, f32, ✓ real source): C1
foreach_batch_norm_f16 (ops_nn, f16, ✓ real source): C1
foreach_batch_norm_bf16 (ops_nn, bf16, ✓ real source): C1
foreach_instance_norm_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_instance_norm_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_instance_norm_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_layer_norm_f32 (ops_nn, f32, ✓ real source): C1
foreach_layer_norm_f16 (ops_nn, f16, ✓ real source): C1
foreach_layer_norm_bf16 (ops_nn, bf16, ✓ real source): C1
foreach_group_norm_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_group_norm_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_group_norm_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_rms_norm_f32 (ops_nn, f32, ✓ real source): C1
foreach_rms_norm_f16 (ops_nn, f16, ✓ real source): C1
foreach_rms_norm_bf16 (ops_nn, bf16, ✓ real source): C1
foreach_softmax_f32 (ops_nn, f32, ✓ real source): C1
foreach_softmax_f16 (ops_nn, f16, ✓ real source): C1
foreach_softmax_bf16 (ops_nn, bf16, ✓ real source): C1
foreach_log_softmax_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_log_softmax_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_log_softmax_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_dropout_f32 (ops_nn, f32, ✓ real source): C1, C2, C3, C4, C5, C6
foreach_dropout_f16 (ops_nn, f16, ✓ real source): C1, C2, C3, C4, C5, C6
foreach_dropout_bf16 (ops_nn, bf16, ✓ real source): C1, C2, C3, C4, C5, C6
foreach_embedding_f32 (ops_nn, f32, ✓ real source): C1, C2
foreach_embedding_f16 (ops_nn, f16, ✓ real source): C1, C2
foreach_embedding_bf16 (ops_nn, bf16, ✓ real source): C1, C2
foreach_swish_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_swish_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_swish_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_logsigmoid_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_logsigmoid_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_logsigmoid_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_tanhshrink_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_tanhshrink_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_tanhshrink_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_softshrink_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_softshrink_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_softshrink_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_hardshrink_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_hardshrink_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_hardshrink_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_threshold_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_threshold_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_threshold_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_cross_entropy_loss_f32 (ops_nn, f32, ✓ real source): C1
foreach_cross_entropy_loss_f16 (ops_nn, f16, ✓ real source): C1
foreach_cross_entropy_loss_bf16 (ops_nn, bf16, ✓ real source): C1
foreach_mse_loss_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_mse_loss_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_mse_loss_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_l1_loss_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_l1_loss_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_l1_loss_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_smooth_l1_loss_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_smooth_l1_loss_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_smooth_l1_loss_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_nll_loss_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_nll_loss_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_nll_loss_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_avg_pool_2d_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_avg_pool_2d_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_avg_pool_2d_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_max_pool_2d_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_max_pool_2d_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_max_pool_2d_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_avg_pool_1d_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_avg_pool_1d_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_avg_pool_1d_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_max_pool_1d_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_max_pool_1d_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_max_pool_1d_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_lp_pool_2d_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_lp_pool_2d_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_lp_pool_2d_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_bce_loss_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_bce_loss_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_bce_loss_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_bce_with_logits_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_bce_with_logits_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_bce_with_logits_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_hinge_loss_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_hinge_loss_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_hinge_loss_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_kl_div_loss_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_kl_div_loss_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_kl_div_loss_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_cosine_embedding_loss_f32 (ops_nn, f32, stub): C1, C4, C6
foreach_cosine_embedding_loss_f16 (ops_nn, f16, stub): C1, C4, C6
foreach_cosine_embedding_loss_bf16 (ops_nn, bf16, stub): C1, C4, C6
foreach_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_attention_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_attention_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_scaled_dot_product_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_scaled_dot_product_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_scaled_dot_product_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_scaled_dot_product_attention_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_scaled_dot_product_attention_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_multi_head_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_multi_head_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_multi_head_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_multi_head_attention_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_multi_head_attention_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_flash_attention_v1_f32 (ops_transformer, f32, ✓ real source):
foreach_flash_attention_v1_f16 (ops_transformer, f16, ✓ real source):
foreach_flash_attention_v1_bf16 (ops_transformer, bf16, ✓ real source):
foreach_flash_attention_v1_f16_910b (ops_transformer, f32, ✓ real source):
foreach_flash_attention_v1_f16_310p (ops_transformer, f32, ✓ real source):
foreach_flash_attention_v2_f32 (ops_transformer, f32, ✓ real source):
foreach_flash_attention_v2_f16 (ops_transformer, f16, ✓ real source):
foreach_flash_attention_v2_bf16 (ops_transformer, bf16, ✓ real source):
foreach_flash_attention_v2_f16_910b (ops_transformer, f32, ✓ real source):
foreach_flash_attention_v2_f16_310p (ops_transformer, f32, ✓ real source):
foreach_flash_attention_v3_f32 (ops_transformer, f32, ✓ real source):
foreach_flash_attention_v3_f16 (ops_transformer, f16, ✓ real source):
foreach_flash_attention_v3_bf16 (ops_transformer, bf16, ✓ real source):
foreach_flash_attention_v3_f16_910b (ops_transformer, f32, ✓ real source):
foreach_flash_attention_v3_f16_310p (ops_transformer, f32, ✓ real source):
foreach_paged_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_paged_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_paged_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_paged_attention_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_paged_attention_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_rotary_embedding_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_rotary_embedding_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_rotary_embedding_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_rotary_embedding_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_rotary_embedding_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_rope_apply_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_rope_apply_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_rope_apply_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_rope_apply_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_rope_apply_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_alibi_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_alibi_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_alibi_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_alibi_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_alibi_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_kv_cache_update_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_kv_cache_update_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_kv_cache_update_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_kv_cache_update_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_kv_cache_update_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_beam_search_score_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_beam_search_score_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_beam_search_score_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_beam_search_score_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_beam_search_score_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_matmul_f32 (ops_transformer, f32, ✓ real source): C1
foreach_matmul_f16 (ops_transformer, f16, ✓ real source): C1
foreach_matmul_bf16 (ops_transformer, bf16, ✓ real source): C1
foreach_matmul_f32_910b (ops_transformer, f32, ✓ real source): C1
foreach_matmul_f32_310p (ops_transformer, f32, ✓ real source): C1
foreach_matmul_f16_910b (ops_transformer, f32, ✓ real source): C1
foreach_matmul_f16_310p (ops_transformer, f32, ✓ real source): C1
foreach_batch_matmul_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_batch_matmul_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_batch_matmul_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_batch_matmul_f32_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_batch_matmul_f32_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_batch_matmul_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_batch_matmul_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_linear_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_linear_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_linear_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_linear_f32_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_linear_f32_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_linear_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_linear_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_gemm_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_gemm_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_gemm_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_gemm_f32_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_gemm_f32_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_gemm_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_gemm_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_gemv_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_gemv_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_gemv_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_gemv_f32_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_gemv_f32_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_gemv_f16_910b (ops_transformer, f32, stub): C1, C4, C6
foreach_gemv_f16_310p (ops_transformer, f32, stub): C1, C4, C6
foreach_position_encoding_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_position_encoding_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_position_encoding_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_causal_mask_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_causal_mask_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_causal_mask_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_cross_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_cross_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_cross_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_grouped_query_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_grouped_query_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_grouped_query_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_sliding_window_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_sliding_window_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_sliding_window_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_linear_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_linear_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_linear_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_sparse_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_sparse_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_sparse_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_local_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_local_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_local_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_ring_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_ring_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_ring_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_prefix_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_prefix_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_prefix_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_kv_cache_quantize_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_kv_cache_quantize_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_kv_cache_quantize_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_attention_score_mod_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_attention_score_mod_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_attention_score_mod_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_rope_neox_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_rope_neox_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_rope_neox_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_rope_glm_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_rope_glm_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_rope_glm_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_matmul_quant_int8_f16 (ops_transformer, f16, ✓ real source): C1
foreach_matmul_quant_int8_bf16 (ops_transformer, bf16, ✓ real source): C1
foreach_attention_quant_int8_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_attention_quant_int8_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_linear_quant_int8_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_linear_quant_int8_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_matmul_quant_int4_f16 (ops_transformer, f16, ✓ real source): C1
foreach_matmul_quant_int4_bf16 (ops_transformer, bf16, ✓ real source): C1
foreach_attention_quant_int4_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_attention_quant_int4_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_linear_quant_int4_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_linear_quant_int4_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_multi_query_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_multi_query_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_multi_query_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_flash_decoding_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_flash_decoding_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_flash_decoding_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_speculative_decoding_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_speculative_decoding_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_speculative_decoding_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_token_mixing_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_token_mixing_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_token_mixing_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_channel_mixing_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_channel_mixing_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_channel_mixing_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_moe_gate_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_moe_gate_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_moe_gate_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_moe_dispatch_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_moe_dispatch_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_moe_dispatch_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_moe_combine_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_moe_combine_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_moe_combine_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_swiglu_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_swiglu_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_swiglu_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_geglu_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_geglu_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_geglu_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_reglu_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_reglu_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_reglu_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_rmsnorm_linear_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_rmsnorm_linear_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_rmsnorm_linear_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_prenorm_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_prenorm_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_prenorm_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_postnorm_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_postnorm_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_postnorm_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_parallel_attention_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_parallel_attention_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_parallel_attention_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_sandwich_norm_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_sandwich_norm_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_sandwich_norm_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_qk_norm_f32 (ops_transformer, f32, stub): C1, C4, C6
foreach_qk_norm_f16 (ops_transformer, f16, stub): C1, C4, C6
foreach_qk_norm_bf16 (ops_transformer, bf16, stub): C1, C4, C6
foreach_adam_f32 (ops_optimizer, f32, ✓ real source): C1
foreach_adam_f16 (ops_optimizer, f16, ✓ real source): C1
foreach_adam_bf16 (ops_optimizer, bf16, ✓ real source): C1
foreach_adam_f32_wd (ops_optimizer, f32, ✓ real source): C1
foreach_adamw_f32 (ops_optimizer, f32, ✓ real source): C1
foreach_adamw_f16 (ops_optimizer, f16, ✓ real source): C1
foreach_adamw_bf16 (ops_optimizer, bf16, ✓ real source): C1
foreach_adamw_f32_wd (ops_optimizer, f32, ✓ real source): C1
foreach_sgd_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_sgd_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_sgd_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_sgd_f32_wd (ops_optimizer, f32, stub): C1, C4, C6
foreach_sgd_momentum_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_sgd_momentum_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_sgd_momentum_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_sgd_momentum_f32_wd (ops_optimizer, f32, stub): C1, C4, C6
foreach_adagrad_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_adagrad_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_adagrad_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_adagrad_f32_wd (ops_optimizer, f32, stub): C1, C4, C6
foreach_adadelta_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_adadelta_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_adadelta_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_adadelta_f32_wd (ops_optimizer, f32, stub): C1, C4, C6
foreach_rmsprop_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_rmsprop_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_rmsprop_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_rmsprop_f32_wd (ops_optimizer, f32, stub): C1, C4, C6
foreach_lamb_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_lamb_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_lamb_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_lamb_f32_wd (ops_optimizer, f32, stub): C1, C4, C6
foreach_lars_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_lars_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_lars_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_lars_f32_wd (ops_optimizer, f32, stub): C1, C4, C6
foreach_ftrl_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_ftrl_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_ftrl_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_ftrl_f32_wd (ops_optimizer, f32, stub): C1, C4, C6
foreach_adam_amsgrad_f32 (ops_optimizer, f32, ✓ real source): C1
foreach_adam_amsgrad_f16 (ops_optimizer, f16, ✓ real source): C1
foreach_adam_amsgrad_bf16 (ops_optimizer, bf16, ✓ real source): C1
foreach_adamw_amsgrad_f32 (ops_optimizer, f32, ✓ real source): C1
foreach_adamw_amsgrad_f16 (ops_optimizer, f16, ✓ real source): C1
foreach_adamw_amsgrad_bf16 (ops_optimizer, bf16, ✓ real source): C1
foreach_adam_fused_f32 (ops_optimizer, f32, ✓ real source): C1
foreach_adam_fused_f16 (ops_optimizer, f16, ✓ real source): C1
foreach_adam_fused_bf16 (ops_optimizer, bf16, ✓ real source): C1
foreach_adamw_fused_f32 (ops_optimizer, f32, ✓ real source): C1
foreach_adamw_fused_f16 (ops_optimizer, f16, ✓ real source): C1
foreach_adamw_fused_bf16 (ops_optimizer, bf16, ✓ real source): C1
foreach_sgd_nesterov_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_sgd_nesterov_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_sgd_nesterov_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_lion_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_lion_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_lion_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_adafactor_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_adafactor_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_adafactor_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_sophia_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_sophia_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_sophia_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_came_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_came_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_came_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_novograd_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_novograd_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_novograd_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_prodigy_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_prodigy_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_prodigy_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_shampoo_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_shampoo_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_shampoo_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_adalomo_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_adalomo_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_adalomo_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_galore_f32 (ops_optimizer, f32, stub): C1, C4, C6
foreach_galore_f16 (ops_optimizer, f16, stub): C1, C4, C6
foreach_galore_bf16 (ops_optimizer, bf16, stub): C1, C4, C6
foreach_reduce_sum_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_sum_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_sum_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_reduce_max_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_max_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_max_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_reduce_min_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_min_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_min_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_reduce_mean_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_mean_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_mean_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_reduce_prod_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_prod_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_prod_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_reduce_any_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_any_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_any_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_reduce_all_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_all_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_all_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_argmax_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_argmax_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_argmax_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_argmin_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_argmin_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_argmin_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_cumsum_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_cumsum_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_cumsum_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_cumprod_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_cumprod_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_cumprod_int32 (ops_reduce, i32, stub): C1, C4, C6
foreach_reduce_sum_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_reduce_sum_f32_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_sum_f16_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_sum_f32_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_sum_f16_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_max_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_reduce_max_f32_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_max_f16_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_max_f32_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_max_f16_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_min_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_reduce_min_f32_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_min_f16_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_min_f32_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_min_f16_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_mean_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_reduce_mean_f32_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_mean_f16_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_mean_f32_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_mean_f16_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_prod_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_reduce_prod_f32_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_prod_f16_axis0 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_prod_f32_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_prod_f16_axis1 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_l1_norm_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_l1_norm_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_l2_norm_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_l2_norm_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_logsumexp_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_logsumexp_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_nansum_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_nansum_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_nanmean_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_nanmean_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_count_nonzero_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_count_nonzero_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_median_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_median_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_var_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_var_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_std_f32 (ops_reduce, f32, stub): C1, C4, C6
foreach_reduce_std_f16 (ops_reduce, f16, stub): C1, C4, C6
foreach_reduce_l1_norm_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_reduce_l2_norm_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_reduce_logsumexp_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_reduce_nansum_bf16 (ops_reduce, bf16, stub): C1, C4, C6
foreach_upsample_nearest_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_upsample_nearest_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_upsample_nearest_3d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_upsample_nearest_3d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_upsample_bilinear_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_upsample_bilinear_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_upsample_bilinear_3d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_upsample_bilinear_3d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_upsample_bicubic_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_upsample_bicubic_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_upsample_trilinear_3d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_upsample_trilinear_3d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_interpolate_nearest_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_interpolate_nearest_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_interpolate_nearest_3d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_interpolate_nearest_3d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_interpolate_bilinear_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_interpolate_bilinear_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_interpolate_bilinear_3d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_interpolate_bilinear_3d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_interpolate_bicubic_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_interpolate_bicubic_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_resize_nearest_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_resize_nearest_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_resize_bilinear_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_resize_bilinear_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_adaptive_avg_pool_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_adaptive_avg_pool_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_adaptive_avg_pool_3d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_adaptive_avg_pool_3d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_adaptive_max_pool_2d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_adaptive_max_pool_2d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_adaptive_max_pool_3d_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_adaptive_max_pool_3d_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_upsample_bilinear_2d_align_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_upsample_bilinear_2d_align_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_upsample_bicubic_2d_align_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_upsample_bicubic_2d_align_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_interpolate_bilinear_2d_align_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_interpolate_bilinear_2d_align_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_resize_bilinear_2d_align_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_resize_bilinear_2d_align_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_grid_sample_bilinear_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_grid_sample_bilinear_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_grid_sample_nearest_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_grid_sample_nearest_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_grid_sample_bicubic_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_grid_sample_bicubic_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_pixel_shuffle_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_pixel_unshuffle_f32 (ops_resize, f32, stub): C1, C4, C6
foreach_pixel_shuffle_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_pixel_unshuffle_f16 (ops_resize, f16, stub): C1, C4, C6
foreach_gather_f32 (ops_index, f32, ✓ real source): C1
foreach_gather_f16 (ops_index, f16, ✓ real source): C1
foreach_gather_int32 (ops_index, i32, ✓ real source): C1
foreach_scatter_f32 (ops_index, f32, ✓ real source): C1
foreach_scatter_f16 (ops_index, f16, ✓ real source): C1
foreach_scatter_int32 (ops_index, i32, ✓ real source): C1
foreach_scatter_add_f32 (ops_index, f32, ✓ real source): C1
foreach_scatter_add_f16 (ops_index, f16, ✓ real source): C1
foreach_scatter_add_int32 (ops_index, i32, ✓ real source): C1
foreach_scatter_mul_f32 (ops_index, f32, ✓ real source): C1
foreach_scatter_mul_f16 (ops_index, f16, ✓ real source): C1
foreach_scatter_mul_int32 (ops_index, i32, ✓ real source): C1
foreach_index_add_f32 (ops_index, f32, stub): C1, C4, C6
foreach_index_add_f16 (ops_index, f16, stub): C1, C4, C6
foreach_index_add_int32 (ops_index, i32, stub): C1, C4, C6
foreach_index_copy_f32 (ops_index, f32, stub): C1, C4, C6
foreach_index_copy_f16 (ops_index, f16, stub): C1, C4, C6
foreach_index_copy_int32 (ops_index, i32, stub): C1, C4, C6
foreach_index_fill_f32 (ops_index, f32, stub): C1, C4, C6
foreach_index_fill_f16 (ops_index, f16, stub): C1, C4, C6
foreach_index_fill_int32 (ops_index, i32, stub): C1, C4, C6
foreach_index_select_f32 (ops_index, f32, stub): C1, C4, C6
foreach_index_select_f16 (ops_index, f16, stub): C1, C4, C6
foreach_index_select_int32 (ops_index, i32, stub): C1, C4, C6
foreach_index_put_f32 (ops_index, f32, stub): C1, C4, C6
foreach_index_put_f16 (ops_index, f16, stub): C1, C4, C6
foreach_index_put_int32 (ops_index, i32, stub): C1, C4, C6
foreach_masked_fill_f32 (ops_index, f32, stub): C1, C4, C6
foreach_masked_fill_f16 (ops_index, f16, stub): C1, C4, C6
foreach_masked_fill_int32 (ops_index, i32, stub): C1, C4, C6
foreach_masked_select_f32 (ops_index, f32, stub): C1, C4, C6
foreach_masked_select_f16 (ops_index, f16, stub): C1, C4, C6
foreach_masked_select_int32 (ops_index, i32, stub): C1, C4, C6
foreach_masked_scatter_f32 (ops_index, f32, stub): C1, C4, C6
foreach_masked_scatter_f16 (ops_index, f16, stub): C1, C4, C6
foreach_masked_scatter_int32 (ops_index, i32, stub): C1, C4, C6
foreach_where_f32 (ops_index, f32, stub): C1, C4, C6
foreach_where_f16 (ops_index, f16, stub): C1, C4, C6
foreach_where_int32 (ops_index, i32, stub): C1, C4, C6
foreach_nonzero_f32 (ops_index, f32, stub): C1, C4, C6
foreach_nonzero_f16 (ops_index, f16, stub): C1, C4, C6
foreach_nonzero_int32 (ops_index, i32, stub): C1, C4, C6
foreach_sort_f32 (ops_index, f32, stub): C1, C4, C6
foreach_sort_f16 (ops_index, f16, stub): C1, C4, C6
foreach_sort_int32 (ops_index, i32, stub): C1, C4, C6
foreach_argsort_f32 (ops_index, f32, stub): C1, C4, C6
foreach_argsort_f16 (ops_index, f16, stub): C1, C4, C6
foreach_argsort_int32 (ops_index, i32, stub): C1, C4, C6
foreach_topk_f32 (ops_index, f32, ✓ real source): C1
foreach_topk_f16 (ops_index, f16, ✓ real source): C1
foreach_topk_int32 (ops_index, i32, ✓ real source): C1
foreach_unique_f32 (ops_index, f32, stub): C1, C4, C6
foreach_unique_f16 (ops_index, f16, stub): C1, C4, C6
foreach_unique_int32 (ops_index, i32, stub): C1, C4, C6
foreach_searchsorted_f32 (ops_index, f32, stub): C1, C4, C6
foreach_searchsorted_f16 (ops_index, f16, stub): C1, C4, C6
foreach_searchsorted_int32 (ops_index, i32, stub): C1, C4, C6
foreach_bucketize_f32 (ops_index, f32, stub): C1, C4, C6
foreach_bucketize_f16 (ops_index, f16, stub): C1, C4, C6
foreach_bucketize_int32 (ops_index, i32, stub): C1, C4, C6
foreach_one_hot_f32 (ops_index, f32, stub): C1, C4, C6
foreach_one_hot_f16 (ops_index, f16, stub): C1, C4, C6
foreach_one_hot_int32 (ops_index, i32, stub): C1, C4, C6
foreach_embedding_bag_f32 (ops_index, f32, ✓ real source): C1, C2
foreach_embedding_bag_f16 (ops_index, f16, ✓ real source): C1, C2
foreach_embedding_bag_int32 (ops_index, i32, ✓ real source): C1, C2
foreach_cummax_f32 (ops_index, f32, stub): C1, C4, C6
foreach_cummax_f16 (ops_index, f16, stub): C1, C4, C6
foreach_cummax_int32 (ops_index, i32, stub): C1, C4, C6
foreach_cummin_f32 (ops_index, f32, stub): C1, C4, C6
foreach_cummin_f16 (ops_index, f16, stub): C1, C4, C6
foreach_cummin_int32 (ops_index, i32, stub): C1, C4, C6
foreach_scatter_nd_f32 (ops_index, f32, ✓ real source): C1
foreach_scatter_nd_f16 (ops_index, f16, ✓ real source): C1
foreach_scatter_nd_int32 (ops_index, i32, ✓ real source): C1
foreach_gather_nd_f32 (ops_index, f32, ✓ real source): C1
foreach_gather_nd_f16 (ops_index, f16, ✓ real source): C1
foreach_gather_nd_int32 (ops_index, i32, ✓ real source): C1
foreach_index_put_accumulate_f32 (ops_index, f32, stub): C1, C4, C6
foreach_index_put_accumulate_f16 (ops_index, f16, stub): C1, C4, C6
foreach_index_put_accumulate_int32 (ops_index, i32, stub): C1, C4, C6
foreach_take_along_axis_f32 (ops_index, f32, stub): C1, C4, C6
foreach_take_along_axis_f16 (ops_index, f16, stub): C1, C4, C6
foreach_take_along_axis_int32 (ops_index, i32, stub): C1, C4, C6
foreach_put_along_axis_f32 (ops_index, f32, stub): C1, C4, C6
foreach_put_along_axis_f16 (ops_index, f16, stub): C1, C4, C6
foreach_put_along_axis_int32 (ops_index, i32, stub): C1, C4, C6
foreach_bincount_f32 (ops_index, f32, stub): C1, C4, C6
foreach_bincount_f16 (ops_index, f16, stub): C1, C4, C6
foreach_bincount_int32 (ops_index, i32, stub): C1, C4, C6
foreach_scatter_max_f32 (ops_index, f32, ✓ real source): C1
foreach_scatter_max_f16 (ops_index, f16, ✓ real source): C1
foreach_scatter_max_int32 (ops_index, i32, ✓ real source): C1
foreach_scatter_min_f32 (ops_index, f32, ✓ real source): C1
foreach_scatter_min_f16 (ops_index, f16, ✓ real source): C1
foreach_scatter_min_int32 (ops_index, i32, ✓ real source): C1
foreach_gather_bf16 (ops_index, bf16, ✓ real source): C1
foreach_scatter_bf16 (ops_index, bf16, ✓ real source): C1
foreach_index_select_bf16 (ops_index, bf16, stub): C1, C4, C6
foreach_where_bf16 (ops_index, bf16, stub): C1, C4, C6
foreach_sort_bf16 (ops_index, bf16, stub): C1, C4, C6
foreach_topk_bf16 (ops_index, bf16, ✓ real source): C1
foreach_masked_fill_bf16 (ops_index, bf16, stub): C1, C4, C6
foreach_masked_select_bf16 (ops_index, bf16, stub): C1, C4, C6
foreach_sort_int64 (ops_index, f32, stub): C1, C4, C6
foreach_argsort_int64 (ops_index, f32, stub): C1, C4, C6
foreach_topk_int64 (ops_index, f32, ✓ real source): C1
foreach_unique_int64 (ops_index, f32, stub): C1, C4, C6
foreach_gather_int8 (ops_index, i8, ✓ real source): C1
foreach_scatter_int8 (ops_index, i8, ✓ real source): C1
foreach_scatter_add_bf16 (ops_index, bf16, ✓ real source): C1
foreach_scatter_mul_bf16 (ops_index, bf16, ✓ real source): C1
foreach_index_add_bf16 (ops_index, bf16, stub): C1, C4, C6
foreach_index_copy_bf16 (ops_index, bf16, stub): C1, C4, C6