AI Inference and Mainstream Compute for Every Enterprise

Bring accelerated performance to every enterprise workload with NVIDIA A30 Tensor Core GPUs. With NVIDIA Ampere architecture Tensor Cores and Multi-Instance GPU (MIG), it delivers speedups securely across diverse workloads, including AI inference at scale and high-performance computing (HPC) applications. By combining fast memory bandwidth and low-power consumption in a PCIe form factor—optimal for mainstream servers—A30 enables an elastic data center and delivers maximum value for enterprises.

View NVIDIA A30 datasheet

The Data Center Solution for Modern IT

The NVIDIA Ampere architecture is part of the unified NVIDIA EGX™ platform, incorporating building blocks across hardware, networking, software, libraries, and optimized AI models and applications from the NVIDIA NGC™ catalog. Representing the most powerful end-to-end AI and HPC platform for data centers, it allows researchers to rapidly deliver real-world results and deploy solutions into production at scale.

Deep Learning Training

AI Training—Up to 3X higher throughput than v100 and 6X higher than T4

BERT Large Pre-Training (Normalized)

ERT-Large Pre-Training (9/10 epochs) Phase 1 and (1/10 epochs) Phase 2, Sequence Length for Phase 1 = 128 and Phase 2 = 512, dataset = real, NGC™ container = 21.03,

8x GPU: T4 (FP32, BS=8, 2) | V100 PCIE 16GB (FP32, BS=8, 2) | A30 (TF32, BS=8, 2) | A100 PCIE 40GB (TF32, BS=54, 8) | batch sizes indicated are for Phase 1 and Phase 2 respectively

Training AI models for next-level challenges such as conversational AI requires massive compute power and scalability.

NVIDIA A30 Tensor Cores with Tensor Float (TF32) provide up to 10X higher performance over the NVIDIA T4 with zero code changes and an additional 2X boost with automatic mixed precision and FP16, delivering a combined 20X throughput increase. When combined with NVIDIA® NVLink®, PCIe Gen4, NVIDIA networking, and the NVIDIA Magnum IO™ SDK, it’s possible to scale to thousands of GPUs.

Tensor Cores and MIG enable A30 to be used for workloads dynamically throughout the day. It can be used for production inference at peak demand, and part of the GPU can be repurposed to rapidly re-train those very same models during off-peak hours.

NVIDIA set multiple performance records in MLPerf, the industry-wide benchmark for AI training.

Learn more about the NVIDIA Ampere architecture for training

Deep Learning Inference

A30 leverages groundbreaking features to optimize inference workloads. It accelerates a full range of precisions, from FP64 to TF32 and INT4. Supporting up to four MIGs per GPU, A30 lets multiple networks operate simultaneously in secure hardware partitions with guaranteed quality of service (QoS). And structural sparsity support delivers up to 2X more performance on top of A30’s other inference performance gains.

NVIDIA’s market-leading AI performance was demonstrated in MLPerf Inference. Combined with NVIDIA Triton™ Inference Server, which easily deploys AI at scale, A30 brings this groundbreaking performance to every enterprise.

Learn more about the NVIDIA Ampere architecture for inference

AI Inference—Up To 3X higher throughput than V100 at real-time conversational AI

BERT Large Inference (Normalized)
Throughput for <10ms Latency

NVIDIA® TensorRT®, Precision = INT8, Sequence Length = 384, NGC Container 20.12, Latency <10ms, Dataset = Synthetic 1x GPU: A100 PCIe 40GB (BS=8) | A30 (BS=4) | V100 SXM2 16GB (BS=1) | T4 (BS=1)

AI Inference—Over 3X higher throughput than T4 at real-time image classification

RN50 v1.5 Inference (Normalized)
Throughput for <7ms Latency

TensorRT, NGC Container 20.12, Latency <7ms, Dataset=Synthetic, 1x GPU: T4 (BS=31, INT8) | V100 (BS=43, Mixed precision) | A30 (BS=96, INT8) | A100 (BS=174, INT8)

High-Performance Computing

HPC—Up to 1.1X higher throughput than V100 and 8X higher than T4

LAMMPS (Normalized)

Dataset: ReaxFF/C, FP64 | 4x GPU: T4, V100 PCIE 16GB, A30

To unlock next-generation discoveries, scientists use simulations to better understand the world around us.

NVIDIA A30 features FP64 NVIDIA Ampere architecture Tensor Cores that deliver the biggest leap in HPC performance since the introduction of GPUs. Combined with 24 gigabytes (GB) of GPU memory with a bandwidth of 933 gigabytes per second (GB/s), researchers can rapidly solve double-precision calculations. HPC applications can also leverage TF32 to achieve higher throughput for single-precision, dense matrix-multiply operations.

The combination of FP64 Tensor Cores and MIG empowers research institutions to securely partition the GPU to allow multiple researchers access to compute resources with guaranteed QoS and maximum GPU utilization. Enterprises deploying AI can use A30’s inference capabilities during peak demand periods and then repurpose the same compute servers for HPC and AI training workloads during off-peak periods.

Review Latest GPU Performance on HPC Applications

Key applications

Deep Learning Inference

The NVIDIA A30 graphics processor is specifically optimized for deep learning, meeting the demanding computational needs of AI deployments in data centers. Built on the powerful Ampere architecture, the A30 is equipped with a large number of CUDA Cores and Tensor Cores, enabling exceptional speed and accuracy in processing neural networks.
This graphics processor excels at large-scale inference workloads across various industries, from natural language processing and image recognition to recommendation systems and autonomous vehicles. The A30’s support for mixed precision computing enhances performance by balancing computational accuracy with efficiency, delivering fast inference results without compromising model precision.
Integration with NVIDIA’s TensorRT inference optimization toolkit further streamlines deployment and maximizes throughput, allowing businesses to scale their AI applications effectively. Overall, the NVIDIA A30 graphics processor is a robust solution for enterprises looking to accelerate their deep learning inference capabilities, providing outstanding performance and scalability in AI-based environments.

Data analysis

The NVIDIA A30 graphics processor is set to revolutionize high-performance data analysis with its robust capabilities and efficient Ampere architecture. Tailored for demanding applications that require large amounts of data, the A30 is equipped with a wealth of CUDA Cores and Tensor Cores that excel in accelerating complex analytical tasks such as large-scale data processing, machine learning, and predictive analytics.
Its high memory bandwidth and support for NVIDIA NVLink technology provide fast access to vast datasets and enable their processing, allowing organizations to quickly extract insights and make informed decisions. The versatility of the A30 includes support for mixed precision computing, optimizing computational performance without sacrificing accuracy, which is critical for effectively handling diverse workloads.
Integrated with NVIDIA’s software toolkits, such as RAPIDS for GPU-accelerated data analysis processes and CUDA-X libraries, the A30 simplifies the deployment and scaling of data analysis solutions in hybrid cloud and on-premises environments. Ultimately, the NVIDIA A30 graphics processor sets a new standard for high-performance data analysis, empowering businesses to extract valuable insights faster and more effectively than ever before.

High-Performance Computing (HPC)

The NVIDIA A30 graphics processor represents a significant advancement in high-performance computing (HPC), designed to deliver unparalleled computational power and efficiency across a wide range of computational tasks. Built on the efficient Ampere architecture, the A30 features a substantial number of CUDA Cores and Tensor Cores optimized to handle complex scientific simulations, numerical analysis, and data-intensive computations with remarkable speed and accuracy.
Its high memory bandwidth and support for NVIDIA NVLink technology facilitate seamless communication between GPUs and other system components, enhancing overall system performance and scalability. The robust computational capabilities of the A30 make it ideal for accelerating applications in fields such as physics, chemistry, weather forecasting, and molecular dynamics, where rapid data processing and simulation are critical.
Integration with NVIDIA’s parallel computing platform and CUDA libraries ensures compatibility and simplifies the development of optimized software solutions, enabling researchers and engineers to effectively tackle larger and more complex problems. In summary, the NVIDIA A30 graphics processor is a powerful solution for HPC environments, offering unmatched performance and reliability to drive innovation and scientific discoveries.

AI Training

The NVIDIA A10 graphics processor is a versatile powerhouse designed to elevate the mainstream of enterprise computing, providing unmatched performance across a variety of workloads. Leveraging the advanced Ampere architecture, the A10 features a significant boost in computational performance, making it ideal for data analytics, Virtual Desktop Infrastructure (VDI), and cloud computing environments.
With its rich set of CUDA Cores and Tensor Cores, the A10 enables accelerated processing of complex computations, facilitating faster insights from large datasets and enhancing machine learning model efficiency. The wide memory bandwidth of the A10 ensures smooth management of data-intensive tasks, while NVIDIA’s virtualization technology allows multiple users to access the GPU’s capabilities concurrently, optimizing resource utilization and reducing operational costs.
Moreover, the A10 seamlessly integrates with NVIDIA’s comprehensive software ecosystem, including CUDA, cuDNN, and TensorRT, ensuring compatibility and ease of deployment within existing IT infrastructures. These features position the NVIDIA A10 as a key asset for enterprises aiming to enhance computational power, streamline operations, and drive innovation.

High-Performance Data Analytics

Data scientists need to be able to analyze, visualize, and turn massive datasets into insights. But scale-out solutions are often bogged down by datasets scattered across multiple servers.

Accelerated servers with A30 provide the needed compute power—along with large HBM2 memory, 933GB/sec of memory bandwidth, and scalability with NVLink—to tackle these workloads. Combined with NVIDIA InfiniBand, NVIDIA Magnum IO and the RAPIDS™ suite of open-source libraries, including the RAPIDS Accelerator for Apache Spark, the NVIDIA data center platform accelerates these huge workloads at unprecedented levels of performance and efficiency.

Learn more about data analytics

Enterprise-Ready Utilization

A30 with MIG maximizes the utilization of GPU-accelerated infrastructure.

With MIG, an A30 GPU can be partitioned into as many as four independent instances, giving multiple users access to GPU acceleration.

MIG works with Kubernetes, containers, and hypervisor-based server virtualization. MIG lets infrastructure managers offer a right-sized GPU with guaranteed QoS for every job, extending the reach of accelerated computing resources to every user.

Learn more about MIG

NVIDIA AI Enterprise

NVIDIA AI Enterprise, an end-to-end cloud-native suite of AI and data analytics software, is certified to run on A30 in hypervisor-based virtual infrastructure with VMware vSphere. This enables management and scaling of AI workloads in a hybrid cloud environment.

Learn more about NVIDIA AI Enterprise

Mainstream NVIDIA-Certified Systems

NVIDIA-Certified Systems™ with NVIDIA A30 bring together compute acceleration and high-speed, secure NVIDIA networking into enterprise data center servers, built and sold by NVIDIA’s OEM partners. This program enables customers to identify, acquire, and deploy systems for traditional and diverse modern AI applications from the NVIDIA NGC catalog on a single high-performance, cost-effective, and scalable infrastructure.

Learn more about NVIDIA-Certified Systems

A30 Tensor Core GPU Specifications

FP64	5.2 teraFLOPS
FP64 Tensor Core	10.3 teraFLOPS
FP32	10.3 teraFLOPS
TF32 Tensor Core	82 teraFLOPS \| 165 teraFLOPS*
BFLOAT16 Tensor Core	165 teraFLOPS \| 330 teraFLOPS*
FP16 Tensor Core	165 teraFLOPS \| 330 teraFLOPS*
INT8 Tensor Core	330 TOPS \| 661 TOPS*
INT4 Tensor Core	661 TOPS \| 1321 TOPS*
Media engines	1 optical flow accelerator (OFA) 1 JPEG decoder (NVJPEG) 4 video decoders (NVDEC)
GPU memory	24GB HBM2
GPU memory bandwidth	933GB/s
Interconnect	PCIe Gen4: 64GB/s Third-gen NVLINK: 200GB/s**
Form factor	Dual-slot, full-height, full-length (FHFL)
Max thermal design power (TDP)	165W
Multi-Instance GPU (MIG)	4 GPU instances @ 6GB each 2 GPU instances @ 12GB each 1 GPU instance @ 24GB
Virtual GPU (vGPU) software support	NVIDIA AI Enterprise NVIDIA Virtual Compute Server