CVE-2025-49847
Published: 17 June 2025
Summary
CVE-2025-49847 is a high-severity Improper Restriction of Operations within the Bounds of a Memory Buffer (CWE-119) vulnerability in Ggml Llama.Cpp. Its CVSS base score is 8.8 (High).
Operationally, exploitation aligns with the MITRE ATT&CK technique Exploitation for Client Execution (T1203); ranked in the top 29.7% of CVEs by exploit likelihood; it is not currently listed in the CISA KEV catalog.
This vulnerability is AI-related — categorised as NLP and Transformers; in the Supply Chain and Deployment risk domain.
EU & UK References
- 🇪🇺 ENISA EUVD: EUVD-2025-18632
Vulnerability details
llama.cpp is an inference of several LLM models in C/C++. Prior to version b5662, an attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper _try_copy in llama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece() casts a very large size_t…
more
token length into an int32_t, causing the length check (if (length < (int32_t)size)) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution. This issue has been patched in version b5662.
- CWE(s)
AI Security AnalysisAI
- AI Category
- NLP and Transformers
- Risk Domain
- Supply Chain and Deployment
- OWASP Top 10 for LLMs 2025
- Classification Reason
- Matched keywords: llama.cpp, llm
Related Threats
MITRE ATT&CK Enterprise TechniquesAI
Why these techniques?
Buffer overflow in llama.cpp vocabulary loading from attacker-supplied GGUF model enables arbitrary memory corruption and code execution, facilitating Exploitation for Client Execution.
Affected Assets
Mitigating Controls
Likely Mitigating Controls AI
Per-CVE control mapping for this CVE has not run yet; the list below is derived from the weakness types (CWEs) cited in the NVD entry.
Ongoing control assessments and code testing (static/dynamic analysis, fuzzing) surface memory buffer restriction failures, which are then remediated before release.
Managed runtimes used by platform-independent applications (e.g., JVM, CLR) enforce memory safety, preventing most buffer overflows that require direct memory manipulation.
Memory protections (e.g., W^X, ASLR) make exploitation of buffer-boundary violations far harder to turn into code execution.
Detects exploitation attempts that produce memory corruption, crashes, or anomalous behavior.