NVIDIA L40S
Unparalleled AI and graphics performance for the data center.
Experience breakthrough multi-workload performance with the NVIDIA L40S GPU. Combining powerful AI compute with best-in-class graphics and media acceleration, the L40S GPU is built to power the next generation of data center workloads—from generative AI and large language model (LLM) inference and training to 3D graphics, rendering, and video.
NVIDIA OVX™ Servers featuring new NVIDIA GPUs to accelerate training and inference, as well as graphics-intensive workloads, are coming soon from Dell, Hewlett Packard Enterprise, Lenovo, Supermicro, and others.
Read Press ReleaseTFLOPS¹
TFLOPS
TFLOPS
Hardware support for structural sparsity and optimized TF32 format provides out of-the-box performance gains for faster AI and data science model training. Accelerate AI-enhanced graphics capabilities with DLSS to upscale resolution with better performance in select applications.
Enhanced throughput and concurrent ray-tracing and shading capabilities improve ray-tracing performance, accelerating renders for product design and architecture, engineering, and construction workflows. See lifelike designs in action with hardware-accelerated motion blur and stunning real-time animations.
Accelerated single-precision floating point (FP32) throughput and improved power efficiency significantly boost performance for workflows like 3D model development and computer-aided engineering (CAE) simulation. Use enhanced 16-bit math capabilities (BF16) for mixed-precision workloads.
Transformer Engine dramatically accelerates AI performance and improves memory utilization for both training and inference. Harnessing the power of the Ada Lovelace fourth-generation Tensor Cores, Transformer Engine intelligently scans the layers of transformer architecture neural networks and automatically recasts between FP8 and FP16 precisions to deliver faster AI performance and accelerate training and inference.
The NVIDIA L40S GPU is a powerful solution for NVIDIA Omniverse and 3D content creation, offering exceptional performance and versatility in data center environments. Built on the Ada Lovelace architecture, it features third-generation RT cores for enhanced real-time ray tracing and fourth-generation Tensor Cores that support AI-driven features, significantly improving the quality and speed of 3D workflows. As the engine of NVIDIA Omniverse in the data center, the L40S delivers stunning real-time ray tracing and AI-accelerated capabilities, making it ideal for extended reality (XR) and virtual production tasks. With 48GB of GDDR6 memory, it can handle complex 3D models, high-resolution textures, and large-scale simulations with ease, enabling creative professionals to work on intricate designs and render photorealistic scenes more efficiently. The L40S’s support for Universal Scene Description (OpenUSD)-based 3D workflows within the Omniverse ecosystem enhances collaboration and streamlines production pipelines. Its performance in Omniverse applications is described as „stunning,” positioning it as a top-tier solution for organizations looking to leverage cutting-edge technologies in virtual world creation, 3D visualization, and immersive content production.
The NVIDIA L40S graphics processor is a powerful GPU designed for NVIDIA Omniverse and 3D content creation, offering exceptional performance and versatility in data center environments. Built on the Ada Lovelace architecture, it features third-generation RT Cores for enhanced real-time ray tracing and fourth-generation Tensor Cores that support AI-based features, significantly improving the quality and speed of 3D workflows.
As a key engine for NVIDIA Omniverse in the data center, the L40S delivers stunning real-time ray tracing and AI acceleration capabilities, making it ideal for tasks in extended reality (XR) and virtual production. With 48 GB of GDDR6 memory, it easily handles complex 3D models, high-resolution textures, and large-scale simulations, allowing creative professionals to work on intricate projects and efficiently render photorealistic scenes.
Support for 3D workflows based on Universal Scene Description (OpenUSD) within the Omniverse ecosystem facilitates collaboration and streamlines production processes. Its performance in Omniverse applications is described as “stunning,” positioning it as a top-tier solution for organizations looking to leverage cutting-edge technologies for creating virtual environments, 3D visualizations, and immersive content production.
The NVIDIA L40S graphics processor is a powerful solution for AI training and inference workloads, offering exceptional performance and versatility in data center environments. Built on the Ada Lovelace architecture, it features 18,176 CUDA cores and 568 fourth-generation Tensor Cores, providing up to 5 times better single-precision floating-point (FP32) performance compared to the A100. Its advanced transformer engine intelligently manages precision between FP8 and FP16, significantly enhancing AI performance for both training and inference of transformer-based models.
With 48 GB of GDDR6 memory, the L40S can effectively handle complex AI tasks and large language models. For AI training, eight L40S GPUs in a primary server allow for an 0.8x performance increase compared to an 8-GPU A100 system for MLPerf models. In inference tasks, the L40S demonstrates impressive capabilities, often matching or exceeding the performance of the A100 across various MLPerf benchmarks.
This makes the L40S particularly well-suited for deploying and running sophisticated AI models in production environments, providing organizations with an efficient and powerful solution for their AI workloads.
The NVIDIA L40S graphics processor offers exceptional capabilities for graphical and visualization workloads, making it a powerful solution for professional applications in fields such as Computer-Aided Design (CAD), virtual production, and scientific visualization. Built on the Ada Lovelace architecture, it features third-generation RT Cores that significantly enhance real-time ray tracing performance, delivering stunning visual fidelity and photorealistic rendering.
With 48 GB of GDDR6 memory, the L40S easily handles complex 3D models, high-resolution textures, and large datasets, allowing professionals to work on intricate projects and visualizations without performance bottlenecks. The fourth-generation Tensor Cores support AI-enhanced graphical features, such as DLSS (Deep Learning Super Sampling), which can improve performance and image quality in supported applications.
When paired with NVIDIA RTX Virtual Workstation (vWS) software, the L40S can power high-performance virtual workstations from the data center, providing flexible access to demanding graphical applications from any device. This makes the NVIDIA L40S an excellent choice for organizations looking to enhance their visualization capabilities, improve workflow efficiency, and deliver high-quality visual content across various industries.
The NVIDIA L40S graphics processor offers exceptional capabilities in video encoding and streaming, making it a powerful solution for live streaming, video production, and transcoding. Built on the Ada Lovelace architecture, the L40S features three video encoding and decoding engines, significantly enhancing its ability to simultaneously handle multiple high-quality video streams. A key advancement is the inclusion of AV1 encoding and decoding support, which provides groundbreaking performance and lower total cost of ownership for content creators and streaming platforms. This feature allows for higher video quality at lower bitrates, benefiting both content providers and end-users.
The L40S can manage over 1,000 simultaneous AV1 video streams at 720p30 resolution for mobile applications, making it an ideal solution for streaming services and content delivery networks. The powerful hardware acceleration combined with 48 GB of GDDR6 memory enables efficient processing of complex video workloads, including real-time transcoding and high-resolution content creation. For organizations involved in live streaming, video-on-demand services, or virtual production, the NVIDIA L40S provides the performance and versatility necessary to meet the demands of modern video content creation and distribution.
GPU Architecture | NVIDIA Ada Lovelace architecture |
GPU Memory | 48GB GDDR6 with ECC |
Memory Bandwidth | 864GB/s |
Interconnect Interface | PCIe Gen4 x16: 64GB/s bidirectional |
NVIDIA Ada Lovelace Architecture-Based CUDA® Cores | 18,176 |
NVIDIA Third-Generation RT Cores | 142 |
NVIDIA Fourth-Generation Tensor Cores | 568 |
RT Core Performance TFLOPS | 212 |
FP32 TFLOPS | 91.6 |
TF32 Tensor Core TFLOPS | 183 I 366* |
BFLOAT16 Tensor Core TFLOPS | 362.05 I 733* |
FP16 Tensor Core | 362.05 I 733* |
FP8 Tensor Core | 733 I 1,466* |
Peak INT8 Tensor TOPS Peak INT4 Tensor TOPS | 733 I 1,466* 733 I 1,466* |
Form Factor | 4.4″ (H) x 10.5″ (L), dual slot |
Display Ports | 4x DisplayPort 1.4a |
Max Power Consumption | 350W |
Power Connector | 16-pin |
Thermal | Passive |
Virtual GPU (vGPU) Software Support | Yes |
vGPU Profiles Supported | See virtual GPU licensing guide |
NVENC I NVDEC | 3x l 3x (includes AV1 encode and decode) |
Secure Boot With Root of Trust | Yes |
NEBS Ready | Level 3 |
Multi-Instance GPU (MIG) Support | No |
NVIDIA® NVLink® Support | No |
*With Sparsity