CVE-2025-46560
Published: 30 April 2025
Summary
CVE-2025-46560 is a medium-severity Inefficient Regular Expression Complexity (CWE-1333) vulnerability in Vllm Vllm. Its CVSS base score is 6.5 (Medium).
Operationally, ranked in the top 29.7% of CVEs by exploit likelihood; it is not currently listed in the CISA KEV catalog; a public proof-of-concept is referenced.
This vulnerability is AI-related — categorised as NLP and Transformers; in the Adversarial Attacks risk domain.
Deeper analysis
vLLM is a high-throughput inference and serving engine for large language models. Versions 0.8.0 through 0.8.4 contain a performance vulnerability in the multimodal tokenizer's input preprocessing logic. The affected code replaces placeholder tokens such as <|audio_|> and <|image_|> by repeatedly concatenating tokens according to precomputed lengths; the use of inefficient list operations produces quadratic time complexity, enabling an attacker to supply specially crafted inputs that trigger excessive CPU and memory consumption.
An authenticated remote attacker with low privileges can submit malicious multimodal prompts to the inference engine. Because the vulnerability affects only availability, successful exploitation results in resource exhaustion that can degrade or deny service to other users without disclosing data or altering model behavior.
The issue is resolved in vLLM 0.8.5. The project security advisory GHSA-vc6m-hm49-g9qg and the corresponding code change in phi4mm.py document the patch that replaces the quadratic concatenation pattern with an efficient implementation.
The vulnerability is specific to multimodal LLM workloads and carries a CVSS score of 6.5 with a high availability impact. EPSS remains low, with a recorded peak of 0.0152.
EU & UK References
- 🇪🇺 ENISA EUVD: EUVD-2025-12671
Vulnerability details
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces…
more
placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicious actors to trigger resource exhaustion via specially crafted inputs. This issue has been patched in version 0.8.5.
- CWE(s)
AI Security AnalysisAI
- AI Category
- NLP and Transformers
- Risk Domain
- Adversarial Attacks
- OWASP Top 10 for LLMs 2025
- None mapped
- Classification Reason
- Matched keywords: llms, vllm
Related Threats
Affected Assets
Mitigating Controls
No mitigating controls mapped yet. The per-CVE control annotator has not reached this CVE.