Overview
Choose a causal LM that fits your GPU budget and latency targets. Examples use Qwen and Llama families.Settings
model
: HF repo id or internal identifierdtype
: e.g., bfloat16max_tokens
,max_model_len
,sampling_top_p
,sampling_temperature