CVE-2026-33298

HighPublic PoC

Published: 24 March 2026

Published

24 March 2026

Modified

30 April 2026

KEV Added

—

Patch

—

CVSS Score v3.1 7.8 CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

EPSS Score 0.0002 5.7th percentile

Risk Priority 16 60% EPSS · 20% KEV · 20% CVSS

Summary

CVE-2026-33298 is a high-severity Heap-based Buffer Overflow (CWE-122) vulnerability in Ggml Llama.Cpp. Its CVSS base score is 7.8 (High).

Operationally, exploitation aligns with the MITRE ATT&CK technique Exploitation for Client Execution (T1203); ranked at the 5.7th percentile by exploit likelihood (below the median); it is not currently listed in the CISA KEV catalog; a public proof-of-concept is referenced.

This vulnerability is AI-related — categorised as NLP and Transformers; in the Supply Chain and Deployment risk domain.

The strongest mitigations our analysis identified are NIST 800-53 SI-10 (Information Input Validation) and SI-2 (Flaw Remediation).

Deeper analysis

CVE-2026-33298, published on 2026-03-24, is an integer overflow vulnerability (CWE-190) combined with a heap-based buffer overflow (CWE-122) in the `ggml_nbytes` function of llama.cpp, a C/C++ inference engine for large language models (LLMs). Versions prior to b7824 are affected. The flaw allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions, causing `ggml_nbytes` to return a significantly smaller size than required—such as 4MB instead of exabytes—leading to memory corruption when the tensor is processed. It carries a CVSS v3.1 base score of 7.8 (AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H).

The attack requires local access and user interaction, with no privileges needed. An attacker can supply a malicious GGUF file to a target user running a vulnerable llama.cpp application, tricking them into loading it for LLM inference. This triggers the integer overflow during size calculation, resulting in a heap buffer overflow and potential remote code execution (RCE) through memory corruption.

Mitigation is addressed in the official GitHub security advisory (GHSA-96jg-mvhq-q7q7) and release tag b7824, which contains a fix for the `ggml_nbytes` function. Security practitioners should update to llama.cpp b7824 or later to prevent exploitation.

This vulnerability holds relevance for AI/ML deployments relying on llama.cpp for lightweight, local LLM inference, highlighting risks in file-processing components of such frameworks. No public evidence of real-world exploitation is available.

EU & UK References

🇪🇺 ENISA EUVD: EUVD-2026-14668

Vulnerability details

llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes`…

to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix.

CWE(s): CWE-122 CWE-190

AI Security AnalysisAI

AI Category: NLP and Transformers
Risk Domain: Supply Chain and Deployment
OWASP Top 10 for LLMs 2025: None mapped
Classification Reason: Matched keywords: llama.cpp, llm

Related Threats

MITRE ATT&CK Enterprise TechniquesAI

T1203 Exploitation for Client Execution Execution

Adversaries may exploit software vulnerabilities in client applications to execute code.

attack.mitre.org →

T1204.002 Malicious File Execution

An adversary may rely upon a user opening a malicious file in order to gain execution.

attack.mitre.org →

Why these techniques?

The integer/heap overflow in GGUF tensor parsing is directly triggered by a crafted malicious file supplied to a local client application (llama.cpp), enabling client-side code execution via exploitation (T1203) after user interaction with the file (T1204.002).

Confidence: HIGH · MITRE ATT&CK Enterprise v18.1

CVEs Like This One

CVE-2026-34159Same product: Ggml Llama.Cpp

CVE-2026-21869Same product: Ggml Llama.Cpp

CVE-2026-34545Shared CWE-122, CWE-190

CVE-2026-27940Shared CWE-122, CWE-190

CVE-2025-21172Shared CWE-122, CWE-190

CVE-2026-41445Shared CWE-122, CWE-190

CVE-2025-21395Shared CWE-122

CVE-2025-35984Shared CWE-122

CVE-2026-34629Shared CWE-122

CVE-2026-6306Shared CWE-122

Affected Assets

ggml

llama.cpp

≤ b7824

Mitigating Controls

Mitigating Controls (NIST 800-53 r5) AI

SI-2 Flaw Remediation good match

prevent

Updating to llama.cpp b7824 or later directly remediates the integer overflow in ggml_nbytes, preventing heap buffer overflows from crafted GGUF files.

SI-10 Information Input Validation good match

prevent

Validating tensor dimensions and sizes in GGUF files prior to processing prevents integer overflows that bypass memory validation.

SI-16 Memory Protection partial match

prevent

Memory protection mechanisms such as ASLR and DEP mitigate exploitation of the resulting heap buffer overflow for remote code execution.

References

https://github.com/ggml-org/llama.cpp/releases/tag/b7824
Release Notes · security-advisories@github.com
https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7
Exploit, Vendor Advisory · security-advisories@github.com