Appendix B: CVE Code Analysis — Vulnerable C++ vs Safe Rust Mitigations

This appendix presents the actual (or reconstructed) vulnerable C/C++ code from the CVEs documented in Appendix A, paired with ascend-rs-style Rust code that structurally prevents each vulnerability class.

B.1 Use-After-Free via Reference Count Drop (CVE-2023-51042, AMDGPU)

The Linux AMDGPU driver dereferences a fence pointer after dropping its reference count.

Vulnerable C code (from drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c, before fix 2e54154):

// Inside amdgpu_cs_wait_all_fences()
r = dma_fence_wait_timeout(fence, true, timeout);
dma_fence_put(fence);          // Reference dropped — fence may be freed
if (r < 0)
    return r;
if (r == 0)
    break;
if (fence->error)              // USE-AFTER-FREE: fence already freed
    return fence->error;

ascend-rs mitigation — Rust’s ownership ensures the value is consumed, not dangled:

// ascend_rs host API pattern: Arc<Fence> enforces lifetime
fn wait_all_fences(fences: &[Arc<Fence>], timeout: Duration) -> Result<()> {
    for fence in fences {
        let status = fence.wait_timeout(timeout)?;
        // fence.error is checked WHILE we still hold the Arc reference
        if let Some(err) = fence.error() {
            return Err(err);
        }
        // Arc reference is alive until end of loop iteration —
        // Rust compiler rejects any code that uses fence after drop
    }
    Ok(())
}

Why Rust prevents this: Arc<Fence> is reference-counted. The compiler ensures you cannot access fence.error() after the Arc is dropped — the borrow checker rejects any reference to a moved/dropped value at compile time. There is no way to write the C pattern (use after put) in safe Rust.

B.2 Out-of-Bounds Write via Unchecked User Index (CVE-2024-0090, NVIDIA)

The NVIDIA GPU driver accepts a user-supplied index via ioctl without bounds checking.

Vulnerable C code (reconstructed from CVE description):

// NVIDIA GPU driver ioctl handler
struct gpu_resource_table {
    uint32_t entries[MAX_GPU_RESOURCES];
    uint32_t count;
};

static int nvidia_ioctl_set_resource(struct gpu_resource_table *table,
                                     struct user_resource_request *req)
{
    // BUG: No bounds check on user-supplied index
    table->entries[req->index] = req->value;   // OUT-OF-BOUNDS WRITE
    return 0;
}

ascend-rs mitigation — Rust slices enforce bounds at the type level:

// ascend_rs host API: DeviceBuffer<T> wraps a bounded slice
struct GpuResourceTable {
    entries: Vec<u32>,  // Vec tracks its own length
}

impl GpuResourceTable {
    fn set_resource(&mut self, index: usize, value: u32) -> Result<()> {
        // Option 1: Panics on out-of-bounds (debug + release)
        self.entries[index] = value;

        // Option 2: Returns None for out-of-bounds (graceful)
        *self.entries.get_mut(index)
            .ok_or(Error::IndexOutOfBounds)? = value;
        Ok(())
    }
}

Why Rust prevents this: Vec<u32> tracks its length. Indexing with [] performs a bounds check and panics (safe termination, not memory corruption). Using .get_mut() returns None for out-of-bounds access. There is no way to silently write past the buffer in safe Rust.

B.3 Integer Overflow Leading to Heap Buffer Overflow (CVE-2024-53873, NVIDIA CUDA Toolkit)

The CUDA cuobjdump tool reads a 2-byte signed value from a crafted .cubin file, sign-extends it, and uses the corrupted size in memcpy.

Vulnerable C code (from Talos disassembly analysis):

// Parsing .nv_debug_source section in cubin ELF files
int16_t name_len_raw = *(int16_t*)(section_data);  // e.g., 0xFFFF = -1
int32_t name_len = (int32_t)name_len_raw;           // sign-extends to -1
int32_t alloc_size = name_len + 1;                   // -1 + 1 = 0
memcpy(dest_buf, src, (size_t)alloc_size);           // HEAP BUFFER OVERFLOW

ascend-rs mitigation — Rust’s checked arithmetic catches overflow:

// ascend_rs: parsing NPU binary metadata with safe arithmetic
fn parse_debug_section(section: &[u8], dest: &mut [u8]) -> Result<()> {
    let name_len_raw = i16::from_le_bytes(
        section.get(0..2).ok_or(Error::TruncatedInput)?.try_into()?
    );

    // checked_add returns None on overflow instead of wrapping
    let alloc_size: usize = (name_len_raw as i32)
        .checked_add(1)
        .and_then(|n| usize::try_from(n).ok())
        .ok_or(Error::IntegerOverflow)?;

    // Slice bounds checking prevents buffer overflow
    let src = section.get(offset..offset + alloc_size)
        .ok_or(Error::BufferOverflow)?;
    dest.get_mut(..alloc_size)
        .ok_or(Error::BufferOverflow)?
        .copy_from_slice(src);
    Ok(())
}

Why Rust prevents this: checked_add() returns None on overflow. usize::try_from() rejects negative values. Slice indexing with .get() returns None for out-of-bounds ranges. The entire chain is safe — no silent wrapping, no unchecked memcpy.

B.4 Out-of-Bounds Read on Empty Container (PyTorch Issue #37153)

PyTorch’s CUDA reduce kernel indexes into iter.shape() which returns an empty array for scalar tensors.

Vulnerable C++ code (from aten/src/ATen/native/cuda/Reduce.cuh):

// iter.shape() returns empty IntArrayRef for scalar input
// iter.ndim() returns 0
int64_t dim0;
if (reduction_on_fastest_striding_dimension) {
    dim0 = iter.shape()[0];  // OUT-OF-BOUNDS: shape() is empty
    // dim0 = garbage value (e.g., 94599111233572)
}

ascend-rs mitigation — Rust’s Option type makes emptiness explicit:

// ascend_rs kernel: safe tensor shape access
fn configure_reduce_kernel(shape: &[usize], strides: &[usize]) -> Result<KernelConfig> {
    // .first() returns Option<&T> — None for empty slices
    let dim0 = shape.first()
        .copied()
        .ok_or(Error::ScalarTensorNotSupported)?;

    // Or use pattern matching for multiple dimensions
    let (dim0, dim1) = match shape {
        [d0, d1, ..] => (*d0, *d1),
        [d0] => (*d0, 1),
        [] => return Err(Error::EmptyShape),
    };

    Ok(KernelConfig { dim0, dim1 })
}

Why Rust prevents this: shape.first() returns Option<&usize>, forcing the caller to handle the empty case. The match on slice patterns is exhaustive — the compiler requires the [] (empty) arm. shape[0] on an empty slice panics with a clear message instead of reading garbage.

B.5 Integer Truncation Bypassing Bounds Checks (CVE-2019-16778, TensorFlow)

TensorFlow’s UnsortedSegmentSum kernel implicitly truncates int64 tensor sizes to int32.

Vulnerable C++ code (from tensorflow/core/kernels/segment_reduction_ops.h):

template <typename T, typename Index>  // Index = int32
struct UnsortedSegmentFunctor {
    void operator()(OpKernelContext* ctx,
                    const Index num_segments,  // TRUNCATED: int64 → int32
                    const Index data_size,     // TRUNCATED: int64 → int32
                    const T* data, /* ... */)
    {
        if (data_size == 0) return;  // Bypassed: truncated value ≠ 0
        // data_size = 1 (truncated from 4294967297)
        // Actual tensor has 4 billion elements — massive OOB access
    }
};

ascend-rs mitigation — Rust’s type system rejects implicit narrowing:

// ascend_rs: explicit conversions prevent silent truncation
fn unsorted_segment_sum(
    data: &DeviceBuffer<f32>,
    segment_ids: &DeviceBuffer<i32>,
    num_segments: usize,         // Always full-width
) -> Result<DeviceBuffer<f32>> {
    let data_size: usize = data.len();  // usize, never truncated

    // If i32 index is needed for the kernel, conversion is explicit:
    let data_size_i32: i32 = i32::try_from(data_size)
        .map_err(|_| Error::TensorTooLarge {
            size: data_size,
            max: i32::MAX as usize,
        })?;

    // Rust rejects: let x: i32 = some_i64;  // ERROR: mismatched types
    // Rust rejects: let x: i32 = some_i64 as i32;  // clippy::cast_possible_truncation
    Ok(output)
}

Why Rust prevents this: Rust has no implicit integer narrowing. let x: i32 = some_i64; is a compile error. The as cast exists but clippy::cast_possible_truncation warns on it. TryFrom/try_into() returns Err when the value doesn’t fit, making truncation impossible without explicit acknowledgment.

B.6 Use-After-Free via Raw Pointer After Lock Release (CVE-2023-4211, ARM Mali)

The ARM Mali GPU driver copies a raw pointer from shared state, releases the lock, sleeps, then dereferences the now-dangling pointer.

Vulnerable C code (from mali_kbase_mem_linux.c, confirmed by Project Zero):

static void kbasep_os_process_page_usage_drain(struct kbase_context *kctx)
{
    struct mm_struct *mm;

    spin_lock(&kctx->mm_update_lock);
    mm = rcu_dereference_protected(kctx->process_mm, /*...*/);
    rcu_assign_pointer(kctx->process_mm, NULL);
    spin_unlock(&kctx->mm_update_lock);  // Lock released

    synchronize_rcu();  // SLEEPS — mm may be freed by another thread

    add_mm_counter(mm, MM_FILEPAGES, -pages);  // USE-AFTER-FREE
}

ascend-rs mitigation — Rust’s Arc + Mutex prevents dangling references:

// ascend_rs host API: device context with safe shared state
struct DeviceContext {
    process_mm: Mutex<Option<Arc<MmStruct>>>,
}

impl DeviceContext {
    fn drain_page_usage(&self) {
        // Take ownership of the Arc from the Mutex
        let mm = {
            let mut guard = self.process_mm.lock().unwrap();
            guard.take()  // Sets inner to None, returns Option<Arc<MmStruct>>
        };
        // Lock is released here (guard dropped)

        // If mm exists, we hold a strong reference — it CANNOT be freed
        if let Some(mm) = mm {
            synchronize_rcu();
            // mm is still alive — Arc guarantees it
            mm.add_counter(MmCounter::FilePages, -pages);
        }
        // mm dropped here — Arc ref count decremented
        // Only freed when the LAST Arc reference is dropped
    }
}

Why Rust prevents this: Arc<MmStruct> is a reference-counted smart pointer. Taking it from the Option gives us ownership of a strong reference. Even after the lock is released and other threads run, our Arc keeps the MmStruct alive. There is no way to obtain a dangling raw pointer from an Arc in safe Rust — the underlying memory is freed only when the last Arc is dropped.

Keyboard shortcuts