Overview

Topology defines how GPUs are split between vLLM inference, trainer, and reference models. Set via job config (e.g., topology and vllm sections).

Common layouts

  • Single-node split: dedicate GPUs for vLLM, trainer, and reference.
  • Tensor parallelism: align vllm.tensor_parallel_size with topology.tensor_parallel.

RDMA

Enable RDMA where available for improved all-reduce and inter-GPU throughput.

Supported topologies

  • H100:8 (single_node_split and multi_node)
    • Typical single-node split: vLLM 5, trainer 2 (1 reference)
  • A100-80GB:4 (single_node_split)
    • Choose TP to match vLLM GPUs
  • H100:2 (single_node_split)
    • Typical split: vLLM 1–2 GPUs, trainer remaining
    • Will be deprecated
  • A10G:2 (single_node_split)
    • Suitable for smaller models and smoke tests
    • Will be deprecated in favor of a larger small topology
Notes:
  • multi_node with RDMA support is available but not yet in production. Please reach out to the Synth team if you would like to run training on 16+ H100s
  • Ensure gpu_type matches the form <FAMILY>:<COUNT> such as H100:8.