<optgroup id="02gi4"><wbr id="02gi4"></wbr></optgroup>
<optgroup id="02gi4"></optgroup><optgroup id="02gi4"><small id="02gi4"></small></optgroup>
<optgroup id="02gi4"></optgroup>
<center id="02gi4"><div id="02gi4"></div></center><noscript id="02gi4"><div id="02gi4"></div></noscript>
<optgroup id="02gi4"><wbr id="02gi4"></wbr></optgroup><noscript id="02gi4"></noscript>
<optgroup id="02gi4"><wbr id="02gi4"></wbr></optgroup>
<optgroup id="02gi4"><div id="02gi4"></div></optgroup><center id="02gi4"><div id="02gi4"></div></center><center id="02gi4"><div id="02gi4"></div></center>
<center id="02gi4"><div id="02gi4"></div></center>
<optgroup id="02gi4"><div id="02gi4"></div></optgroup>
<center id="02gi4"></center><noscript id="02gi4"></noscript><optgroup id="02gi4"><div id="02gi4"></div></optgroup>


Powered by NVIDIA A100 Tensor Core GPUs, NVLink, and NVSwitch


The Most Powerful Accelerated Server Platform for AI and High-Performance Computing

Massive datasets in machine learning, exploding model sizes in deep learning, and complex simulations in high-performance computing (HPC) require multiple GPUs with extremely fast interconnections. NVIDIA HGX A100 combines NVIDIA A100 Tensor Core GPUs with new NVIDIA? NVLink? and NVSwitch? high-speed interconnects to create the world’s most powerful servers. A fully tested, easy-to-deploy baseboard, HGX A100 integrates into partner servers to provide guaranteed performance.

Unmatched Accelerated Computing

Leveraging the power of third-generation Tensor Cores, HGX A100 delivers up to a 20X speedup to AI out of the box with Tensor Float 32 (TF32) and a 2.5X speedup to HPC with FP64. NVIDIA HGX A100 4-GPU delivers nearly 80 teraFLOPS of FP64 for the most demanding HPC workloads. NVIDIA HGX A100 8-GPU provides 5 petaFLOPS of FP16 deep learning compute, while the 16-GPU HGX A100 delivers a staggering 10 petaFLOPS, creating the world’s most powerful accelerated scale-up server platform for AI and HPC.


NVIDIA HGX A100 with 8x A100 GPUs


NVIDIA HGX A100 with 4x A100 GPUs

Up to 6X Higher Out-of-the-Box Performance ?with TF32 for AI Training

BERT Training


Deep Learning Performance

Deep learning models are exploding in size and complexity. That means that AI models require a system with large amounts of memory, massive computing power, and high-speed interconnects to deliver efficient scalability. With NVIDIA NVSwitch providing high-speed, all-to-all GPU communications, HGX A100 delivers the power to handle the most advanced AI models. A single NVIDIA HGX A100 8-GPU delivers up to 6X more AI training performance and 7X AI inference performance on the advanced AI model BERT compared to prior-generation NVIDIA Volta?-based HGX systems.

Machine Learning Performance

Machine learning models require loading, transforming, and processing extremely large datasets to glean insights. With over half a terabyte of unified memory and all-to-all GPU communications with NVSwitch, HGX A100 has the power to load and perform calculations on enormous datasets to derive actionable insights quickly.

Machine Learning Performance

9X More HPC Performance in 4 Years

Throughput for Top HPC Apps?


HPC Performance

HPC applications require computing power that can perform an enormous amount of calculations per second. Increasing the compute density of each server node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For HPC simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors for computation, making GPUs connected by NVLink ideal. A single NVIDIA HGX A100 4-GPU server replaces over 100 CPU-based servers running the same scientific applications.

The Most Powerful End-to-End AI and HPC Data Center Platform

The complete NVIDIA data center solution incorporates building blocks across hardware, networking, software, libraries, and optimized AI models and applications from NGC?. Representing the most powerful end-to-end AI and HPC platform for data centers, it allows researchers to deliver real-world results and deploy solutions into production at scale.

HGX A100 Specifications

HGX A100 is available in single baseboards with four or eight A100 GPUs. The four-GPU configuration is fully interconnected with NVIDIA NVLink, and the eight-GPU configuration is interconnected with NVSwitch. Two NVIDIA HGX A100 8-GPU baseboards can also be combined using an NVSwitch interconnect to create a powerful 16-GPU single node.

4-GPU 8-GPU 16-GPU
GPUs 4x NVIDIA A100 8x NVIDIA A100 16x NVIDIA A100
HPC and AI Compute FP64/TF32*/FP16*/INT8* 78TF/1.25PF*/2.5PF*/5POPS* 156TF/2.5PF*/5PF*/10POPS* 312TF/5PF*/10PF*/20POPS*
Memory 160 GB 320 GB 640 GB
NVIDIA NVLink 3rd generation 3rd generation 3rd generation
NVIDIA NVSwitch N/A 2nd generation 2nd generation
NVIDIA NVSwitch GPU-to-GPU Bandwidth N/A 600 GB/s 600 GB/s
Total Aggregate Bandwidth 2.4 TB/s 4.8 TB/s 9.6 TB/s

HGX-1 and HGX-2 Reference Architectures

Powered by NVIDIA GPUs and NVLINK

NVIDIA HGX-1 and HGX-2 are reference architectures that standardize the design of data centers accelerating AI and HPC. Built with NVIDIA SXM2 V100 boards, with NVIDIA NVLink and NVSwitch interconnect technologies, HGX reference architectures have a modular design that works seamlessly in hyperscale and hybrid data centers to deliver up to 2 petaFLOPS of compute power for a quick, simple path to AI and HPC.

Powered by NVIDIA GPUs and NVLINK


GPUs 8x NVIDIA V100 16x NVIDIA V100
AI Compute 1 petaFLOPS (FP16) 2 petaFLOPS (FP16)
Memory 256 GB 512 GB
NVLink 2nd generation 2nd generation
NVSwitch N/A Yes
NVSwitch GPU-to-GPU Bandwidth N/A 300 GB/s
Total Aggregate Bandwidth 2.4 TB/s 4.8 TB/s

Inside the NVIDIA Ampere Architecture

Read this technical deep dive to learn what's new with the NVIDIA Ampere architecture and its implementation in the NVIDIA A100 GPU.