NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale to power the high-performing elastic data centers for AI, data analytics, and HPC. Powered by the NVIDIA Ampere Architecture, A100 is the engine of the NVIDIA data center platform. A100 provides up to 20x higher performance over the prior generation and can be partitioned into seven GPU instances to dynamically adjust to shifting demands.
Deep learning training
NVIDIA A100 Tensor Cores with Tensor Float (TF32) provide up to 20x higher performance over the NVIDIA Volta with zero code changes and an additional 2x boost with automatic mixed precision and FP16. A training workload like BERT can be solved at scale in under a minute by 2,048 A100 GPUs, a world record for time to solution.
Deep learning inference
A100 introduces groundbreaking features to optimize inference workloads. It accelerates a full range of precision, from FP32 to INT4. Multi-Instance GPU (MIG) technology lets multiple networks operate simultaneously on a single A100 for optimal utilization of compute resources. And structural sparsity support delivers up to 2x more performance on top of A100's other inference performance gains.
High-performance data analytics
Data scientists need to be able to analyze, visualize, and turn massive datasets into insights. But scale-out solutions are often bogged down by datasets scattered across multiple servers. Accelerated servers with A100 provide the needed compute power - along with massive memory, over 2 TB/sec of memory bandwidth, and scalability with NVIDIA NVLink and NVSwitch, - to tackle these workloads. Combined with InfiniBand, NVIDIA Magnum IO and the RAPIDS suite of open-source libraries, including the RAPIDS Accelerator for Apache Spark for GPU-accelerated data analytics, the NVIDIA data center platform accelerates these huge workloads at unprecedented levels of performance and efficiency.
Enterprise-ready utilization
A100 with MIG maximizes the utilization of GPU-accelerated infrastructure. With MIG, an A100 GPU can be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration. MIG works with Kubernetes, containers, and hypervisor-based server virtualization. MIG lets infrastructure managers offer a right-sized GPU with guaranteed quality of service (QoS) for every job, extending the reach of accelerated computing resources to every user.