Skip to main content
ModelSizesNotes
Qwen 30.6B, 4B, 14BDefault for demos; supports tool-calling
Llama 38B, 70B70B requires multiple GPUs or H100
Mistral7B, Mixtral 8x7BMixtral uses sparse mixture-of-experts
Groq LPU Hostedgpt-4.1-nano, qwen-32bHosted inference via API
I