CVE-2026-53923 - Vulnerability Details

- vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Published: 2026-06-22

Score: 5.3 Medium

EPSS: < 1% Very Low

KEV: No

Impact:

Action:

Analysis

Analysis and contextual insights are available on OpenCVE Cloud.

Default status is the baseline for the product, each version can override it (e.g. patched versions marked unaffected).

Vendor Product Default status Versions

vllm-project

vllm

affected

Version	Status	Constraints
`>= 0.5.5, < 0.23.1rc0`	affected	—

No data.

Vendor Product Confidence Versions

Vllm-project

Vllm

100%

Version	Status	Scheme	Platform
`[0.5.5,0.23.1rc0)`	affected	generic	—

Found an issue or want to improve our Enrichment? You can suggest it directly by opening an issue on our dedicated GitHub repository .

Remediation

No vendor fix or workaround currently provided.

Additional remediation guidance may be available on OpenCVE Cloud.

Tracking

Sign in to view the affected projects.

Advisories

Source	ID	Title
Github GHSA	GHSA-5jv2-g5wq-cmr4	vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving

Attack Vector Network

Attack Complexity Low

Privileges Required None

Attack Requirements None

User Interaction Passive

Vulnerable System Confidentiality Impact Low

Vulnerable System Integrity Impact Low

Vulnerable System Availability Impact None

Subsequent System Confidentiality Impact None

Subsequent System Integrity Impact None

Subsequent System Availability Impact None

Attack Vector Network

Attack Complexity Low

Privileges Required Low

Scope Unchanged

Confidentiality Impact Low

Integrity Impact None

Availability Impact None

User Interaction None

No CVSS v3.0

No CVSS v2

This CVE is not in the KEV list.

The EPSS score is 0.00281.

Exploitation none

Automatable no

Technical Impact partial

References

Link	Providers
https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e
https://github.com/vllm-project/vllm/pull/44971
https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4
https://nvd.nist.gov/vuln/detail/CVE-2026-53923
https://www.cve.org/CVERecord?id=CVE-2026-53923

History

Mon, 29 Jun 2026 12:15:00 +0000

Type	Values Removed	Values Added
Weaknesses		CWE-824
References		https://nvd.nist.gov/vuln/detail/CVE-2026-53923 https://www.cve.org/CVERecord?id=CVE-2026-53923
Metrics	threat_severity `None`	cvssV3_1 `{'score': 4.3, 'vector': 'CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:N/A:N'}` threat_severity `Low`

Tue, 23 Jun 2026 15:30:00 +0000

Type	Values Removed	Values Added
Metrics		ssvc `{'options': {'Automatable': 'no', 'Exploitation': 'none', 'Technical Impact': 'partial'}, 'version': '2.0.3'}`

Tue, 23 Jun 2026 01:30:00 +0000

Type	Values Removed	Values Added
First Time appeared		Vllm-project Vllm-project vllm
Vendors & Products		Vllm-project Vllm-project vllm

Mon, 22 Jun 2026 22:45:00 +0000

Type	Values Removed	Values Added
Description		vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Title		vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow
Weaknesses		CWE-200 CWE-681
References		https://github.com/vllm-project/vllm/commit/f219788f91952827132fa4fdf916427cd20d225e https://github.com/vllm-project/vllm/pull/44971 https://github.com/vllm-project/vllm/security/advisories/GHSA-5jv2-g5wq-cmr4
Metrics		cvssV4_0 `{'score': 5.3, 'vector': 'CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N'}`

Subscriptions

Vllm-project Vllm

MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published: 2026-06-22T21:55:42.001Z

Updated: 2026-06-23T15:05:21.711Z

Reserved: 2026-06-11T15:46:12.316Z

Link: CVE-2026-53923

Vulnrichment

Updated: 2026-06-23T15:04:19.969Z

NVD

No data.

Redhat

Severity : Low

Publid Date: 2026-06-22T21:55:42Z

Links: CVE-2026-53923 - Bugzilla

OpenCVE Enrichment

Updated: 2026-06-29T14:30:18Z

Weaknesses

Tracking

Attack Vector Network

Attack Complexity Low

Privileges Required None

Attack Requirements None

User Interaction Passive

Vulnerable System Confidentiality Impact Low

Vulnerable System Integrity Impact Low

Vulnerable System Availability Impact None

Subsequent System Confidentiality Impact None

Subsequent System Integrity Impact None

Subsequent System Availability Impact None

Attack Vector Network

Attack Complexity Low

Privileges Required Low

Scope Unchanged

Confidentiality Impact Low

Integrity Impact None

Availability Impact None

User Interaction None

Exploitation none

Automatable no

Technical Impact partial

Subscriptions

JSON object

JSON object

JSON object

JSON object

JSON object