Hosted inference - Synth AI

Accurate for Stack 0.1.0 and later unless a section cites a newer release. See Stack Changelog.

Stack’s auxiliary agents (and your own clients) can call NVIDIA Nemotron 3 Ultra through Synth’s OpenAI-compatible inference endpoint. It is a drop-in /responses API authenticated with a Synth API key. Create a Synth account and an API key at usesynth.ai/keys.

export SYNTH_API_KEY=sk_...

2. List models

curl https://api.usesynth.ai/api/v1/synth/models \
  -H "authorization: Bearer $SYNTH_API_KEY"

You should see nemotron-3-ultra in the catalog.

3. Run inference

The aux endpoint requires an X-Stack-Actor-Role header (monitor, gardener, or aux). Use aux for general calls:

curl https://api.usesynth.ai/api/v1/stack-aux/openai/v1/responses \
  -H "authorization: Bearer $SYNTH_API_KEY" \
  -H "x-stack-actor-role: aux" \
  -H "content-type: application/json" \
  -d '{"model":"nemotron-3-ultra","input":"Reply with one word: pong."}'

Without the x-stack-actor-role header the request returns 403. Primary coding roles (worker, primary, codex, cursor) are intentionally rejected — this endpoint is for auxiliary agents and aux inference.

4. Usage and cost

Every request is metered per token and recorded against your org. Inference runs on a free aux tier with a per-org daily cap; usage and remaining quota are available at:

curl https://api.usesynth.ai/api/v1/stack-aux/usage \
  -H "authorization: Bearer $SYNTH_API_KEY" \
  -H "x-stack-actor-role: aux"

In Stack

When you sign in, Stack’s monitor and gardener aux agents use this endpoint automatically — you do not need to set headers by hand. The signed-out local loop (Quickstart) keeps working without any of this.

Configuration Optimizers

⌘I

​1. Sign in and get a key

​2. List models

​3. Run inference

​4. Usage and cost

​In Stack

1. Sign in and get a key

2. List models

3. Run inference

4. Usage and cost

In Stack