TOCTOU AI Vulnerability Affecting Containers Using NVIDIA GPUs

NVIDIA AI Vulnerability Affecting Containers Using NVIDIA GPUs, Including Over 35% of Cloud Environments

Vulnerability Overview:

CVE-2024-0132 is a critical security flaw in the NVIDIA Container Toolkit, affecting both cloud and on-premise AI workloads using GPUs.

This vulnerability allows attackers to escape containers, gaining full access to the host system, posing severe risks to data and infrastructure.

Widespread Impact:

The vulnerability affects AI applications using NVIDIA GPUs and the Container Toolkit, which is widely adopted in AI and cloud environments.

Over 35% of cloud AI environments may be exposed due to the use of NVIDIA's GPU integration tools.

Attack Scenarios:

Attackers could exploit this flaw by deploying malicious container images, allowing them to escape the container and access sensitive data, secrets, or control systems.

Particularly risky in multi-tenant environments (e.g., Kubernetes) and AI service providers where shared GPU resources are used.

Affected Components:

NVIDIA Container Toolkit versions up to 1.16.1 and NVIDIA GPU Operator up to version 24.6.1.

The vulnerability does not affect environments using the Container Device Interface (CDI).

Mitigation:

Urgent patching is recommended by upgrading to version 1.16.2 of the Container Toolkit and version 24.6.2 of the GPU Operator.

Organizations should prioritise patching hosts running untrusted or third-party container images to reduce the risk of attacks.

Potential Exploit Flow:

Attackers create a malicious container image that, when executed, gains access to the host file system, including Unix sockets that allow them to take over the host system.

Research Motivation:

Wiz Research investigated shared GPU environments and found a wide attack surface in NVIDIA's toolkit, which led to the discovery of this vulnerability.

Key Takeaway:

This vulnerability underscores the importance of securing AI infrastructure, as traditional infrastructure weaknesses remain a more immediate threat than futuristic AI-based attacks.

Security teams must closely collaborate with AI engineers to ensure strong isolation barriers and control over AI models.

Disclosure Timeline:

Wiz Research reported the vulnerability to NVIDIA on September 1, 2024, and NVIDIA released a patched version on September 26, 2024.

The CVE-2024-0132 vulnerability in NVIDIA’s Container Toolkit presents a critical threat to AI workloads, particularly in multi-tenant environments. Prompt patching and improved security practices are essential to mitigate the risk.

Reference:

TOU TOU AI Vulnerability Wiz Research Finds Critical NVIDIA AI Vulnerability

NVIDIA Container Toolkit Security Bulletin: NVIDIA Container Toolkit - September 2024

NVD CVE-2024-0132