GPU Comparison

Select two GPUs to see a detailed side-by-side comparison of ML-relevant specs.

GPU A GPU B

"Can It Run?" ML Model Calculator

Select an AI model to see which GPUs have enough VRAM to run it at different quantization levels.

Model

Price/Performance Rankings

Sort GPUs by performance-per-dollar across different ML-relevant metrics.

Metric

All GPUs

Browse the complete GPU database. Filter by VRAM, price, and type.

Min VRAM (GB) Max Price ($) Type Sort By

GPU Specs That Matter for Machine Learning

VRAM (Video Memory)

VRAM is often the most critical spec for ML workloads. It determines the maximum model size you can load and the batch size you can use during training. Large language models like Llama 2 70B require 140 GB of VRAM at FP16 precision -- far more than any single consumer GPU offers. Quantization techniques (INT8, INT4) reduce VRAM requirements significantly, enabling smaller GPUs to run larger models at a slight accuracy trade-off.

FP16 and FP32 Performance (TFLOPS)

FP16 (half-precision) performance is the most relevant metric for modern ML training and inference. Most deep learning frameworks use mixed-precision training by default, leveraging FP16 Tensor Cores on NVIDIA GPUs for massive speed gains. FP32 (single-precision) still matters for certain scientific computing workloads and training stability. Datacenter GPUs like the H100 offer dramatically higher FP16 throughput compared to consumer cards.

Memory Bandwidth

Memory bandwidth determines how fast data can be read from and written to VRAM. For inference workloads -- especially autoregressive text generation with LLMs -- memory bandwidth is often the bottleneck, not compute. A GPU with high bandwidth will generate tokens faster. The NVIDIA H100 leads with 3,350 GB/s, while consumer cards like the RTX 4090 offer around 1,008 GB/s.

TDP (Thermal Design Power)

TDP indicates the maximum power the GPU will draw under load. For home setups, lower TDP means less heat and quieter operation. For data centers, TDP directly impacts electricity costs and cooling requirements. Apple Silicon chips stand out here with very low TDP relative to performance, though they trade raw compute power for energy efficiency.

Consumer vs. Datacenter GPUs

Consumer GPUs (like the RTX 4090) offer excellent price-to-performance for ML hobbyists and small-scale experiments. Datacenter GPUs (like the A100 and H100) provide far more VRAM, higher memory bandwidth, support for multi-GPU interconnects (NVLink), and ECC memory for reliability. Apple Silicon provides a unique option with unified memory architecture, allowing very large models to fit in memory at the cost of lower raw throughput.

Choosing the Right GPU for Your Workload

Fine-tuning 7B parameter models: RTX 4090 (24 GB VRAM) or RTX 3090 for budget builds
Running inference on 13B+ models: A6000 or RTX 6000 Ada (48 GB) for INT8; A100 80GB for FP16
Stable Diffusion / image generation: RTX 4070 or above (12+ GB VRAM is comfortable)
Training large models: H100 or multi-GPU A100 setups
Quiet home lab with large models: Apple M2 Ultra or M4 Max with unified memory