CVE-2025-62164
Published: 21 November 2025
Summary
CVE-2025-62164 is a high-severity Improper Input Validation (CWE-20) vulnerability in Vllm Vllm. Its CVSS base score is 8.8 (High).
Operationally, exploitation aligns with the MITRE ATT&CK technique Exploit Public-Facing Application (T1190); ranked at the 40.6th percentile by exploit likelihood (below the median); it is not currently listed in the CISA KEV catalog.
This vulnerability is AI-related — categorised as APIs and Models; in the LLM/Generative AI Risks risk domain.
The strongest mitigations our analysis identified are NIST 800-53 SI-10 (Information Input Validation) and SI-16 (Memory Protection).
Threat & Defense at a Glance
Threat & Defense Details
Mitigating Controls (NIST 800-53 r5)AI
SI-10 requires validation of user-supplied inputs like prompt embeddings to prevent malformed serialized tensors from bypassing bounds checks during deserialization.
SI-16 enforces memory protection mechanisms that directly mitigate out-of-bounds memory writes triggered by malicious sparse tensors in the to_dense() call.
SI-2 ensures timely patching of the vLLM flaw, as demonstrated by the fix in version 0.11.1 that adds validation for malformed sparse tensors.
MITRE ATT&CK Enterprise TechniquesAI
Why these techniques?
The memory corruption vulnerability in vLLM's public-facing Completions API enables exploitation of public-facing applications (T1190) and remote services (T1210) via malicious prompt embeddings for potential RCE, and facilitates endpoint DoS through application exploitation (T1499.004).
NVD Description
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint.…
more
When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
Deeper analysisAI
CVE-2025-62164 is a memory corruption vulnerability affecting vLLM, an inference and serving engine for large language models, in versions 0.10.2 through 0.11.0. The issue resides in the Completions API endpoint, which processes user-supplied prompt embeddings by loading serialized tensors via torch.load() without adequate validation. A change in PyTorch 2.8.0 disables sparse tensor integrity checks by default, allowing maliciously crafted tensors to bypass internal bounds checks and trigger an out-of-bounds memory write during the to_dense() call.
Attackers with low privileges (PR:L) can exploit this vulnerability over the network (AV:N) with low complexity (AC:L) and no user interaction (UI:N), as indicated by its CVSS v3.1 base score of 8.8. By submitting specially crafted prompt embeddings to the Completions API endpoint, an attacker can cause a denial-of-service crash or potentially achieve remote code execution on the hosting server, with high impacts on confidentiality, integrity, and availability (C:H/I:H/A:H). The vulnerability maps to CWEs including CWE-20 (Improper Input Validation), CWE-123 (Write-what-where Condition), CWE-502 (Deserialization of Untrusted Data), and CWE-787 (Out-of-bounds Write).
The vLLM project has patched this issue in version 0.11.1. Mitigation details are available in the project's security advisory (GHSA-mrw7-hf4f-83pf), the fixing pull request (#27204), and the commit (58fab50d82838d5014f4a14d991fdb9352c9c84b) that adds validation to prevent the exploitation of malformed sparse tensors.
This vulnerability is particularly relevant to AI/ML deployments, as vLLM is designed for serving LLMs, potentially exposing production inference servers to risks from untrusted inputs. No public reports of real-world exploitation are noted in the available information.
Details
- CWE(s)
Affected Products
AI Security AnalysisAI
- AI Category
- APIs and Models
- Risk Domain
- LLM/Generative AI Risks
- OWASP Top 10 for LLMs 2025
- None mapped
- Classification Reason
- vLLM is an inference and serving engine for LLMs, with the vulnerability specifically in the Completions API endpoint that processes user-supplied prompt embeddings using torch.load() without validation, fitting APIs for model inference and serving.