CVE-2026-33298
Published: 24 March 2026
Summary
CVE-2026-33298 is a high-severity Heap-based Buffer Overflow (CWE-122) vulnerability in Ggml Llama.Cpp. Its CVSS base score is 7.8 (High).
Operationally, exploitation aligns with the MITRE ATT&CK technique Exploitation for Client Execution (T1203); ranked at the 4.4th percentile by exploit likelihood (below the median); it is not currently listed in the CISA KEV catalog; a public proof-of-concept is referenced.
This vulnerability is AI-related — categorised as NLP and Transformers.
The strongest mitigations our analysis identified are NIST 800-53 SI-10 (Information Input Validation) and SI-2 (Flaw Remediation).
Threat & Defense at a Glance
Threat & Defense Details
Mitigating Controls (NIST 800-53 r5)AI
Updating to llama.cpp b7824 or later directly remediates the integer overflow in ggml_nbytes, preventing heap buffer overflows from crafted GGUF files.
Validating tensor dimensions and sizes in GGUF files prior to processing prevents integer overflows that bypass memory validation.
Memory protection mechanisms such as ASLR and DEP mitigate exploitation of the resulting heap buffer overflow for remote code execution.
MITRE ATT&CK Enterprise TechniquesAI
Why these techniques?
The integer/heap overflow in GGUF tensor parsing is directly triggered by a crafted malicious file supplied to a local client application (llama.cpp), enabling client-side code execution via exploitation (T1203) after user interaction with the file (T1204.002).
NVD Description
llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes`…
more
to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix.
Deeper analysisAI
CVE-2026-33298, published on 2026-03-24, is an integer overflow vulnerability (CWE-190) combined with a heap-based buffer overflow (CWE-122) in the `ggml_nbytes` function of llama.cpp, a C/C++ inference engine for large language models (LLMs). Versions prior to b7824 are affected. The flaw allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions, causing `ggml_nbytes` to return a significantly smaller size than required—such as 4MB instead of exabytes—leading to memory corruption when the tensor is processed. It carries a CVSS v3.1 base score of 7.8 (AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H).
The attack requires local access and user interaction, with no privileges needed. An attacker can supply a malicious GGUF file to a target user running a vulnerable llama.cpp application, tricking them into loading it for LLM inference. This triggers the integer overflow during size calculation, resulting in a heap buffer overflow and potential remote code execution (RCE) through memory corruption.
Mitigation is addressed in the official GitHub security advisory (GHSA-96jg-mvhq-q7q7) and release tag b7824, which contains a fix for the `ggml_nbytes` function. Security practitioners should update to llama.cpp b7824 or later to prevent exploitation.
This vulnerability holds relevance for AI/ML deployments relying on llama.cpp for lightweight, local LLM inference, highlighting risks in file-processing components of such frameworks. No public evidence of real-world exploitation is available.
Details
- CWE(s)
Affected Products
AI Security AnalysisAI
- AI Category
- NLP and Transformers
- Risk Domain
- N/A
- OWASP Top 10 for LLMs 2025
- None mapped
- Classification Reason
- Matched keywords: llama.cpp, llm