The Most Powerful Universal GPU

Experience breakthrough multi-workload performance with the NVIDIA L40S GPU. Combining powerful AI compute with best-in-class graphics and media acceleration, the L40S GPU is built to power the next generation of data center workloads—from generative AI and large language model (LLM) inference and training to 3D graphics, rendering, and video.

NVIDIA, Global Data Center System Manufacturers to Supercharge Generative AI and Industrial Digitalization

NVIDIA OVX™ Servers featuring new NVIDIA GPUs to accelerate training and inference, as well as graphics-intensive workloads, are coming soon from Dell, Hewlett Packard Enterprise, Lenovo, Supermicro, and others.

Read Press Release

Highlights

Universal Performance

Tensor Performance

1466

TFLOPS¹

RT Core Performance

212

TFLOPS

Single-Precision Performance

92

TFLOPS

Features

Powered by the NVIDIA Ada Lovelace Architecture

Fourth-Generation Tensor Cores

Hardware support for structural sparsity and optimized TF32 format provides out of-the-box performance gains for faster AI and data science model training. Accelerate AI-enhanced graphics capabilities with DLSS to upscale resolution with better performance in select applications.

Third-Generation RT Cores

Enhanced throughput and concurrent ray-tracing and shading capabilities improve ray-tracing performance, accelerating renders for product design and architecture, engineering, and construction workflows. See lifelike designs in action with hardware-accelerated motion blur and stunning real-time animations.

CUDA Cores

Accelerated single-precision floating point (FP32) throughput and improved power efficiency significantly boost performance for workflows like 3D model development and computer-aided engineering (CAE) simulation. Use enhanced 16-bit math capabilities (BF16) for mixed-precision workloads.

Transformer Engine

Transformer Engine dramatically accelerates AI performance and improves memory utilization for both training and inference. Harnessing the power of the Ada Lovelace fourth-generation Tensor Cores, Transformer Engine intelligently scans the layers of transformer architecture neural networks and automatically recasts between FP8 and FP16 precisions to deliver faster AI performance and accelerate training and inference.

Efficiency and Security

L40S GPU is optimized for 24/7 enterprise data center operations and designed, built, tested, and supported by NVIDIA to ensure maximum performance, durability, and uptime. The L40S GPU meets the latest data center standards, are Network Equipment-Building System (NEBS) Level 3 ready, and features secure boot with root of trust technology, providing an additional layer of security for data centers.

DLSS 3

L40S GPU enables ultra-fast rendering and smoother frame rates with NVIDIA DLSS 3. This breakthrough frame-generation technology leverages deep learning and the latest hardware innovations within the Ada Lovelace architecture and the L40S GPU, including fourth-generation Tensor Cores and an Optical Flow Accelerator, to boost rendering performance, deliver higher frames per second (FPS), and significantly improve latency.

Workloads

Multi-Workload Acceleration

Performance

Breakthrough Performance

Key applications

Generative AI and Large Language Models (LLMs)

The NVIDIA L40S GPU is a powerful solution for NVIDIA Omniverse and 3D content creation, offering exceptional performance and versatility in data center environments. Built on the Ada Lovelace architecture, it features third-generation RT cores for enhanced real-time ray tracing and fourth-generation Tensor Cores that support AI-driven features, significantly improving the quality and speed of 3D workflows. As the engine of NVIDIA Omniverse in the data center, the L40S delivers stunning real-time ray tracing and AI-accelerated capabilities, making it ideal for extended reality (XR) and virtual production tasks. With 48GB of GDDR6 memory, it can handle complex 3D models, high-resolution textures, and large-scale simulations with ease, enabling creative professionals to work on intricate designs and render photorealistic scenes more efficiently. The L40S’s support for Universal Scene Description (OpenUSD)-based 3D workflows within the Omniverse ecosystem enhances collaboration and streamlines production pipelines. Its performance in Omniverse applications is described as „stunning,” positioning it as a top-tier solution for organizations looking to leverage cutting-edge technologies in virtual world creation, 3D visualization, and immersive content production.

NVIDIA Omniverse and 3D Content Creation

The NVIDIA L40S graphics processor is a powerful GPU designed for NVIDIA Omniverse and 3D content creation, offering exceptional performance and versatility in data center environments. Built on the Ada Lovelace architecture, it features third-generation RT Cores for enhanced real-time ray tracing and fourth-generation Tensor Cores that support AI-based features, significantly improving the quality and speed of 3D workflows.
As a key engine for NVIDIA Omniverse in the data center, the L40S delivers stunning real-time ray tracing and AI acceleration capabilities, making it ideal for tasks in extended reality (XR) and virtual production. With 48 GB of GDDR6 memory, it easily handles complex 3D models, high-resolution textures, and large-scale simulations, allowing creative professionals to work on intricate projects and efficiently render photorealistic scenes.
Support for 3D workflows based on Universal Scene Description (OpenUSD) within the Omniverse ecosystem facilitates collaboration and streamlines production processes. Its performance in Omniverse applications is described as “stunning,” positioning it as a top-tier solution for organizations looking to leverage cutting-edge technologies for creating virtual environments, 3D visualizations, and immersive content production.

AI Training and Inference

The NVIDIA L40S graphics processor is a powerful solution for AI training and inference workloads, offering exceptional performance and versatility in data center environments. Built on the Ada Lovelace architecture, it features 18,176 CUDA cores and 568 fourth-generation Tensor Cores, providing up to 5 times better single-precision floating-point (FP32) performance compared to the A100. Its advanced transformer engine intelligently manages precision between FP8 and FP16, significantly enhancing AI performance for both training and inference of transformer-based models.
With 48 GB of GDDR6 memory, the L40S can effectively handle complex AI tasks and large language models. For AI training, eight L40S GPUs in a primary server allow for an 0.8x performance increase compared to an 8-GPU A100 system for MLPerf models. In inference tasks, the L40S demonstrates impressive capabilities, often matching or exceeding the performance of the A100 across various MLPerf benchmarks.
This makes the L40S particularly well-suited for deploying and running sophisticated AI models in production environments, providing organizations with an efficient and powerful solution for their AI workloads.

Graphics and visualization

The NVIDIA L40S graphics processor offers exceptional capabilities for graphical and visualization workloads, making it a powerful solution for professional applications in fields such as Computer-Aided Design (CAD), virtual production, and scientific visualization. Built on the Ada Lovelace architecture, it features third-generation RT Cores that significantly enhance real-time ray tracing performance, delivering stunning visual fidelity and photorealistic rendering.
With 48 GB of GDDR6 memory, the L40S easily handles complex 3D models, high-resolution textures, and large datasets, allowing professionals to work on intricate projects and visualizations without performance bottlenecks. The fourth-generation Tensor Cores support AI-enhanced graphical features, such as DLSS (Deep Learning Super Sampling), which can improve performance and image quality in supported applications.
When paired with NVIDIA RTX Virtual Workstation (vWS) software, the L40S can power high-performance virtual workstations from the data center, providing flexible access to demanding graphical applications from any device. This makes the NVIDIA L40S an excellent choice for organizations looking to enhance their visualization capabilities, improve workflow efficiency, and deliver high-quality visual content across various industries.

Video encoding and streaming

The NVIDIA L40S graphics processor offers exceptional capabilities in video encoding and streaming, making it a powerful solution for live streaming, video production, and transcoding. Built on the Ada Lovelace architecture, the L40S features three video encoding and decoding engines, significantly enhancing its ability to simultaneously handle multiple high-quality video streams. A key advancement is the inclusion of AV1 encoding and decoding support, which provides groundbreaking performance and lower total cost of ownership for content creators and streaming platforms. This feature allows for higher video quality at lower bitrates, benefiting both content providers and end-users.
The L40S can manage over 1,000 simultaneous AV1 video streams at 720p30 resolution for mobile applications, making it an ideal solution for streaming services and content delivery networks. The powerful hardware acceleration combined with 48 GB of GDDR6 memory enables efficient processing of complex video workloads, including real-time transcoding and high-resolution content creation. For organizations involved in live streaming, video-on-demand services, or virtual production, the NVIDIA L40S provides the performance and versatility necessary to meet the demands of modern video content creation and distribution.

Specifications

NVIDIA L40S GPU

GPU ArchitectureNVIDIA Ada Lovelace architecture
GPU Memory48GB GDDR6 with ECC
Memory Bandwidth864GB/s
Interconnect InterfacePCIe Gen4 x16: 64GB/s bidirectional
NVIDIA Ada Lovelace Architecture-Based CUDA® Cores18,176
NVIDIA Third-Generation RT Cores142
NVIDIA Fourth-Generation Tensor Cores568
RT Core Performance TFLOPS212 
FP32 TFLOPS91.6
TF32 Tensor Core TFLOPS183 I 366*
BFLOAT16 Tensor Core TFLOPS362.05 I 733*
FP16 Tensor Core362.05 I 733*
FP8 Tensor Core733 I 1,466*
Peak INT8 Tensor TOPS
Peak INT4 Tensor TOPS
733 I 1,466*
733 I 1,466*
Form Factor4.4″ (H) x 10.5″ (L), dual slot
Display Ports4x DisplayPort 1.4a
Max Power Consumption350W
Power Connector16-pin
ThermalPassive
Virtual GPU (vGPU) Software SupportYes
vGPU Profiles SupportedSee virtual GPU licensing guide
NVENC I NVDEC3x l 3x (includes AV1 encode and decode)
Secure Boot With Root of TrustYes
NEBS ReadyLevel 3
Multi-Instance GPU (MIG) SupportNo
NVIDIA® NVLink® SupportNo

*With Sparsity

Dimensions of the NVIDIA L40S GPU

Get Started

Contact our Sales Specialist

Send an inquiry