GPU | Typical workloads | Notes |
---|---|---|
H100 (80GB) | Large models (14B+), tensor-parallel RL | RDMA-enabled for multi-node training |
A100 (40/80GB) | Mid-sized models, SFT | 40GB for models up to 7B, 80GB for 14B+ |
A10G | Small models (≤4B), experiments | Sufficient for initial testing and SFT |
L4 | Evaluation, preprocessing | Lower cost option for non-training workloads |