NVIDIA A100 Specifications and Benchmark

Impressive technical specifications to its impact across various industries. Join us on this journey through the A100's features, and understand why it's a game-changer for high-performance computing.

10/27/20232 min read

nvidia A100 benchmark
nvidia A100 benchmark

NVIDIA A100 Specifications and Benchmark

The A100's triumph over the Titan V shouldn't be surprising when considering the impressive technical attributes of the A100. The A100 silicon boasts an extraordinary 826 square millimeters and immense 54.2 billion transistors, made possible by TSMC's advanced 7nm FinFET manufacturing process. It features 128 streaming multiprocessors (SMs), totaling 8,192 CUDA cores. Although the A100 does not use its full capacity, its specifications remain remarkably impressive.

The A100 boasts an impressive array of hardware, featuring 6,912 CUDA cores and 432 Tensor cores. Its memory configuration is equally remarkable, with a generous 40GB of HBM2E memory utilizing a vast 5,120-bit memory interface, offering an astonishing bandwidth of up to 1,555 Gbps. For example, Titan V, with its 5,120 CUDA cores and 12GB of HBM2 memory, appears quite modest next to the A100.

It's worth noting that OctaneBench assesses graphics cards through OctaneRender, with a particular requirement for Nvidia CUDA. Consequently, you won't encounter any Radeon GPUs in the rankings. However, you will find a diverse selection of GeForce, Quadro, and Tesla devices prominently featured on the list.

Nvidia A100 system specifications:

NVIDIA A100 for NVLink, designed for optimal networking infrastructure on NVIDIA HGX™ A100 with 4/8 SXM.

NVIDIA A100 for PCIe, suitable for traditional PCIe slots, offering versatility for various servers.

Both versions deliver impressive performance:

Peak FP64 performance: 9.7 TF (19.5 TF for Tensor Cores).

Peak FP32 performance: 19.5 TF.

Peak FP16 and BFLOAT16 performance: 312 TF for Tensor Cores.

Peak Tensor Float 32 performance: 156 TF.

Peak INT8 performance: 624 TOPS on Tensor Cores.

Peak INT4 performance: 1,248 TOPS on Tensor Cores. (A100 supports double performance for sparsity workload)

Memory and GPU specifications differ between the two versions:

For NVLink version:

GPU memory options: 40 or 80 GB

Memory bandwidth: 1,555 or 2,039 GB/s

Up to 7 MIGs with 5 GB each (for A100 with 40 GB memory) or 10 GB each (for A100 with 80 GB memory)

Maximum power: 400 W

For PCIe version:

GPU memory: 40 GB

Memory bandwidth: 1,555 GB/s

Up to 7 MIGs with 5 GB each

Maximum power: 250 W

Nvidia A100 use cases:

The NVIDIA A100 GPU is a versatile powerhouse, predominantly used in high-performance computing (HPC) and artificial intelligence (AI). In HPC, it accelerates scientific simulations, weather modeling, and data-intensive research. For AI, it's the driving force behind deep learning, machine learning, and AI inference. Data analytics, supercomputing, genomics research, and cloud services also benefit from its immense processing capabilities. This GPU is a cornerstone of advanced computing, enhancing research, decision-making, and technological innovation across various industries.