Overview
Topology defines how GPUs are split between vLLM inference, trainer, and reference models. Set via job config (e.g.,topology
and vllm
sections).
Common layouts
- Single-node split: dedicate GPUs for vLLM, trainer, and reference.
- Tensor parallelism: align
vllm.tensor_parallel_size
withtopology.tensor_parallel
.
RDMA
Enable RDMA where available for improved all-reduce and inter-GPU throughput.Supported topologies
- H100:8 (single_node_split and multi_node)
- Typical single-node split: vLLM 5, trainer 2 (1 reference)
- A100-80GB:4 (single_node_split)
- Choose TP to match vLLM GPUs
- H100:2 (single_node_split)
- Typical split: vLLM 1–2 GPUs, trainer remaining
- Will be deprecated
- A10G:2 (single_node_split)
- Suitable for smaller models and smoke tests
- Will be deprecated in favor of a larger small topology
- multi_node with RDMA support is available but not yet in production. Please reach out to the Synth team if you would like to run training on 16+ H100s
- Ensure
gpu_type
matches the form<FAMILY>:<COUNT>
such asH100:8
.