Topology - Synth

On this page

Overview
Common layouts
RDMA
Supported topologies

Overview

Topology defines how GPUs are split between vLLM inference, trainer, and reference models. Set via job config (e.g., topology and vllm sections).

Common layouts

Single-node split: dedicate GPUs for vLLM, trainer, and reference.
Tensor parallelism: align vllm.tensor_parallel_size with topology.tensor_parallel.

RDMA

Enable RDMA where available for improved all-reduce and inter-GPU throughput.

Supported topologies

H100:8 (single_node_split and multi_node)
- Typical single-node split: vLLM 5, trainer 2 (1 reference)
A100-80GB:4 (single_node_split)
- Choose TP to match vLLM GPUs
H100:2 (single_node_split)
- Typical split: vLLM 1–2 GPUs, trainer remaining
- Will be deprecated
A10G:2 (single_node_split)
- Suitable for smaller models and smoke tests
- Will be deprecated in favor of a larger small topology

Notes:

multi_node with RDMA support is available but not yet in production. Please reach out to the Synth team if you would like to run training on 16+ H100s
Ensure gpu_type matches the form <FAMILY>:<COUNT> such as H100:8.