Versatile Entry-Level Inference

The NVIDIA A2 Tensor Core GPU provides entry-level inference with low power, a small footprint, and high performance for NVIDIA AI at the edge. Featuring a low-profile PCIe Gen4 card and a low 40-60W configurable thermal design power (TDP) capability, the A2 brings versatile inference acceleration to any server for deployment at scale.

Download the NVIDIA A2 datasheet

Up to 20X More Inference Performance

AI inference is deployed to enhance consumer lives with smart, real-time experiences and to gain insights from trillions of end-point sensors and cameras. Compared to CPU-only servers, edge and entry-level servers with NVIDIA A2 Tensor Core GPUs offer up to 20X more inference performance, instantly upgrading any server to handle modern AI.

Comparisons of one NVIDIA A2 Tensor Core GPU versus a dual-socket Xeon Gold 6330N CPU

Konfiguracja systemu: CPU: HPE DL380 Gen10 Plus, 2S Xeon Gold 6330N @2.2GHz, 512GB DDR4
NLP: BERT-Large (Długość sekwencji: 384, SQuAD: v1.1) | TensorRT 8.2, Precyzja: INT8, BS:1 (GPU) | OpenVINO 2021.4, Precyzja: INT8, BS:1 (CPU)
Tekst na mowę: Tacotron2 + Waveglow end-to-end pipeline (długość wejściowa: 128) | PyTorch 1.9, Precyzja: FP16, BS:1 (GPU) | PyTorch 1.9, Precyzja: FP32, BS:1 (CPU)
Wizja komputerowa: EfficientDet-D0 (COCO, 512×512) | TensorRT 8.2, Precyzja: INT8, BS:8 (GPU) | OpenVINO 2021.4, Precyzja: INT8, BS:8 (CPU)

Higher IVA Performance for the
Intelligent Edge

Servers equipped with NVIDIA A2 GPUs offer up to 1.3X more performance in intelligent edge use cases, including smart cities, manufacturing, and retail. NVIDIA A2 GPUs running IVA workloads deliver more efficient deployments with up to 1.6X better price-performance and 10 percent better energy efficiency than previous GPU generations.

IVA Performance (Normalized)

System Configuration: [Supermicro SYS-1029GQ-TRT, 2S Xeon Gold 6240 @2.6GHz, 512GB DDR4, 1x NVIDIA A2 OR 1x NVIDIA T4] |
Measured performance with Deepstream 5.1. Networks: ShuffleNet-v2 (224×224), MobileNet-v2 (224×224). |
Pipeline represents end-to-end performance with video capture and decode, pre-processing, batching, inference, and post-processing.

Optimized for Any Server

NVIDIA A2 is optimized for inference workloads and deployments in entry-level servers constrained by space and thermal requirements, such as 5G edge and industrial environments. A2 delivers a low-profile form factor operating in a low-power envelope, from a TDP of 60W down to 40W, making it ideal for any server.

Lower Power and Configurable TDP

Leading AI Inference Performance Across Cloud, Data Center, and Edge

AI inference continues to drive breakthrough innovation across industries, including consumer internet, healthcare and life sciences, financial services, retail, manufacturing, and supercomputing. A2’s small form factor and low power combined with the NVIDIA A100 and A30 Tensor Core GPUs deliver a complete AI inference portfolio across cloud, data center, and edge. A2 and the NVIDIA AI inference portfolio ensure AI applications deploy with fewer servers and less power, resulting in faster insights with substantially lower costs.

Ready for Enterprise Utilization

NVIDIA AI Enterprise
NVIDIA AI Enterprise, an end-to-end cloud-native suite of AI and data analytics software, is certified to run on A2 in hypervisor-based virtual infrastructure with VMware vSphere. This enables management and scaling of AI and inference workloads in a hybrid cloud environment.

Learn more about NVIDIA AI Enterprise

Mainstream NVIDIA-Certified Systems

NVIDIA-Certified Systems™ with NVIDIA A2 bring together compute acceleration and high-speed, secure NVIDIA networking in enterprise data center servers, built and sold by NVIDIA’s OEM partners. This program lets customers identify, acquire, and deploy systems for traditional and diverse modern AI applications from the NVIDIA NGC™ catalog on a single high-performance, cost-effective, and scalable infrastructure.

Learn more about NVIDIA-Certified Systems

Powered by the NVIDIA Ampere Architecture

The NVIDIA Ampere architecture is designed for the age of elastic computing, delivering the performance and acceleration needed to power modern enterprise applications. Explore the heart of the world’s highest-performing, elastic data centers.

Learn more about NVIDIA Ampere architecture

Key applications

Smart Cities

The NVIDIA A2 GPU plays a key role in smart city applications, providing the robust computing power needed for real-time data processing and analysis. This compact yet powerful graphics processor is tailored for edge computing, enabling efficient deployment in various urban environments. Its low power consumption and high performance make it ideal for tasks such as traffic management, where it can process vast amounts of video data from surveillance cameras to optimize traffic flow and reduce congestion. Additionally, the A2 can enhance public safety through real-time facial recognition and anomaly detection, ensuring a swift response to potential threats. Its use in environmental monitoring systems also allows for analysis of air quality and noise levels, contributing to healthier living conditions in cities. Overall, the NVIDIA A2 accelerates the implementation of smart infrastructure, driving the evolution of smarter and more responsive cities.

Retail

The NVIDIA A2 GPU plays a key role in revolutionizing retail applications by enhancing AI-driven functionalities such as computer vision, customer analysis, and inventory management. With advanced AI capabilities, the A2 can provide real-time video analytics for in-store surveillance, enabling more effective loss prevention and analysis of customer behavior. Retailers can leverage this technology to personalize shopping experiences through dynamic digital signage and promotions tailored to demographic data and purchasing patterns. Additionally, the robust computing power of the A2 facilitates the automation of inventory tracking and management, ensuring optimal stock levels while reducing the likelihood of stockouts or overstock situations. By integrating NVIDIA A2 GPUs, retailers can achieve significant improvements in operational efficiency, customer satisfaction, and overall business performance.

Manufacturing

The NVIDIA A2, part of the line of advanced GPUs from NVIDIA, is increasingly becoming a cornerstone in manufacturing applications, leveraging its powerful capabilities in artificial intelligence and machine learning to enhance efficiency and precision. In automated quality control systems, the robust computing power of the A2 enables real-time analysis of product images, accurately identifying defects and ensuring consistent quality standards. Furthermore, in predictive maintenance, the A2 processes vast amounts of sensor data to anticipate equipment failures before they occur, reducing downtime and maintenance costs. Its application in robotic systems enhances automation, where the GPU’s ability to process complex algorithms and perform deep learning tasks allows robots to execute intricate manufacturing processes with precision and speed. By integrating the NVIDIA A2 GPU into manufacturing processes, companies can achieve significant advancements in productivity, quality assurance, and operational efficiency.

Edge Inference

The NVIDIA A2 GPU is an ideal solution for edge inference applications due to its balance between energy efficiency and performance. Specifically designed for AI workloads, the A2 features Ampere architecture, which includes advanced tensor cores that accelerate deep learning and AI tasks. Its compact form factor and low power consumption make it suitable for deployment in edge environments where space and energy efficiency are critical. The A2 enables real-time processing of large amounts of data close to the source, reducing latency and bandwidth usage compared to cloud-based inference. This makes it perfect for applications such as autonomous vehicles, smart cities, and the industrial Internet of Things, where quick decision-making and real-time analytics are essential. Additionally, the A2 supports a wide range of AI frameworks and tools, simplifying the integration and deployment of AI models at the edge.

Technical Specifications

Peak FP32	4.5 TF
TF32 Tensor Core	9 TF \| 18 TF¹
BFLOAT16 Tensor Core	18 TF \| 36 TF¹
Peak FP16 Tensor Core	18 TF \| 36 TF¹
Peak INT8 Tensor Core	36 TOPS \| 72 TOPS¹
Peak INT4 Tensor Core	72 TOPS \| 144 TOPS¹
RT Cores	10
Media engines	1 video encoder 2 video decoders (includes AV1 decode)
GPU memory	16GB GDDR6
GPU memory bandwidth	200GB/s
Interconnect	PCIe Gen4 x8
Form factor	1-slot, low-profile PCIe
Max thermal design power (TDP)	40–60W (configurable)
Virtual GPU (vGPU) software support²	NVIDIA Virtual PC (vPC), NVIDIA Virtual Applications (vApps), NVIDIA RTX Virtual Workstation (vWS), NVIDIA AI Enterprise, NVIDIA Virtual Compute Server (vCS)