Model | Sizes | Notes |
---|---|---|
Qwen 3 | 0.6B, 4B, 14B | Default for demos; supports tool-calling |
Llama 3 | 8B, 70B | 70B requires multiple GPUs or H100 |
Mistral | 7B, Mixtral 8x7B | Mixtral uses sparse mixture-of-experts |
Groq LPU Hosted | gpt-4.1-nano, qwen-32b | Hosted inference via API |