Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Published: 2026-06-22
Score: 5.3 Medium
EPSS: < 1% Very Low
KEV: No
Impact: n/a
Action: n/a
AI Analysis

Analysis and contextual insights are available on OpenCVE Cloud.

Remediation

No vendor fix or workaround currently provided.

Additional remediation guidance may be available on OpenCVE Cloud.

Tracking

Sign in to view the affected projects.

Advisories
Source ID Title
Github GHSA Github GHSA GHSA-5jv2-g5wq-cmr4 vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving
History

Mon, 29 Jun 2026 12:15:00 +0000

Type Values Removed Values Added
Weaknesses CWE-824
References
Metrics threat_severity

None

cvssV3_1

{'score': 4.3, 'vector': 'CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:N/A:N'}

threat_severity

Low


Tue, 23 Jun 2026 15:30:00 +0000

Type Values Removed Values Added
Metrics ssvc

{'options': {'Automatable': 'no', 'Exploitation': 'none', 'Technical Impact': 'partial'}, 'version': '2.0.3'}


Tue, 23 Jun 2026 01:30:00 +0000

Type Values Removed Values Added
First Time appeared Vllm-project
Vllm-project vllm
Vendors & Products Vllm-project
Vllm-project vllm

Mon, 22 Jun 2026 22:45:00 +0000

Type Values Removed Values Added
Description vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Title vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow
Weaknesses CWE-200
CWE-681
References
Metrics cvssV4_0

{'score': 5.3, 'vector': 'CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:P/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N'}


Subscriptions

Vllm-project Vllm
cve-icon MITRE

Status: PUBLISHED

Assigner: GitHub_M

Published:

Updated: 2026-06-23T15:05:21.711Z

Reserved: 2026-06-11T15:46:12.316Z

Link: CVE-2026-53923

cve-icon Vulnrichment

Updated: 2026-06-23T15:04:19.969Z

cve-icon NVD

No data.

cve-icon Redhat

Severity : Low

Publid Date: 2026-06-22T21:55:42Z

Links: CVE-2026-53923 - Bugzilla

cve-icon OpenCVE Enrichment

Updated: 2026-06-29T14:30:18Z

Weaknesses