CVE-2025-29770

MediumDDoS

Published: 19 March 2025

Published

19 March 2025

Modified

31 July 2025

KEV Added

—

Patch

—

CVSS Score v3.1 6.5 CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

EPSS Score 0.0066 71.5th percentile

Risk Priority 13 60% EPSS · 20% KEV · 20% CVSS

Summary

CVE-2025-29770 is a medium-severity Allocation of Resources Without Limits or Throttling (CWE-770) vulnerability in Vllm Vllm. Its CVSS base score is 6.5 (Medium).

Operationally, exploitation aligns with the MITRE ATT&CK technique OS Exhaustion Flood (T1499.001); ranked in the top 28.5% of CVEs by exploit likelihood; it is not currently listed in the CISA KEV catalog.

This vulnerability is AI-related — categorised as NLP and Transformers; in the LLM/Generative AI Risks risk domain.

EU & UK References

🇪🇺 ENISA EUVD: EUVD-2025-6726

Vulnerability details

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on…

the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server. The affected code in vLLM is vllm/model_executor/guided_decoding/outlines_logits_processors.py, which unconditionally uses the cache from outlines. A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service if the filesystem runs out of space. Note that even if vLLM was configured to use a different backend by default, it is still possible to choose outlines on a per-request basis using the guided_decoding_backend key of the extra_body field of the request. This issue applies only to the V0 engine and is fixed in 0.8.0.

CWE(s): CWE-770

AI Security AnalysisAI

AI Category: NLP and Transformers
Risk Domain: LLM/Generative AI Risks
OWASP Top 10 for LLMs 2025: None mapped
Classification Reason: Matched keywords: llms, openai, vllm

Related Threats

MITRE ATT&CK Enterprise TechniquesAI

T1499.001 OS Exhaustion Flood Impact

Adversaries may launch a denial of service (DoS) attack targeting an endpoint's operating system (OS).

attack.mitre.org →

T1499.003 Application Exhaustion Flood Impact

Adversaries may target resource intensive features of applications to cause a denial of service (DoS), denying availability to those applications.

attack.mitre.org →

T1499.004 Application or System Exploitation Impact

Adversaries may exploit software vulnerabilities that can cause an application or system to crash and deny availability to users.

attack.mitre.org →

Why these techniques?

Vulnerability allows remote DoS via repeated requests with unique schemas that fill the filesystem cache, enabling OS exhaustion flood (T1499.001), application exhaustion flood (T1499.003), and application exploitation for DoS (T1499.004).

Affected Assets

vllm

≤ 0.8.0

Mitigating Controls

Likely Mitigating Controls AI

Per-CVE control mapping for this CVE has not run yet; the list below is derived from the weakness types (CWEs) cited in the NVD entry.

AC-10 Concurrent Session Control

addresses: CWE-770

This control implements explicit throttling on session allocation, addressing the weakness of allocating resources without limits.

CP-4 Contingency Plan Testing

addresses: CWE-770

Plan testing exercises resource allocation limits and throttling during simulated failures, directly addressing weaknesses that allow unbounded resource use.

CP-5 Contingency Plan Update

addresses: CWE-770

Contingency plan updates ensure recovery strategies address unbounded resource allocation, making it harder for attackers to exploit lack of throttling to cause prolonged outages.

CP-7 Alternate Processing Site

addresses: CWE-770

Provides continuity when unbounded resource allocation at the primary site leads to exhaustion and downtime.

CP-8 Telecommunications Services

addresses: CWE-770

Alternate services allow operations to continue when primary allocation of resources lacks limits or throttling.

PL-6 Security-related Activity Planning

addresses: CWE-770

Explicit planning of security-related actions requires defining limits, windows, and resource allocations, making allocation without throttling far less likely.

PM-6 Measures of Performance

addresses: CWE-770

Measures of performance include tracking allocation behavior and throttling effectiveness, reducing the window for resource exhaustion attacks.

SC-10 Network Disconnect

addresses: CWE-770

Imposes an inactivity-based limit on network resource allocation, throttling the number of concurrently held connections.

References

https://github.com/vllm-project/vllm/blob/53be4a863486d02bd96a59c674bbec23eec508f6/vllm/model_executor/guided_decoding/outlines_logits_processors.py
Product · security-advisories@github.com
https://github.com/vllm-project/vllm/pull/14837
Issue Tracking, Patch · security-advisories@github.com
https://github.com/vllm-project/vllm/security/advisories/GHSA-mgrm-fgjv-mhv8
Patch, Vendor Advisory · security-advisories@github.com