Before You Begin
- Config: TOML describing data paths, hyperparameters, validation strategy, and model defaults.
- Data: One or two JSONL files where each line is a training example (optionally a validation set). Store absolute paths or let the CLI help locate them.
- Secrets: Only
SYNTH_API_KEYis required, so a lightweight.envis enough. - Optional helpers:
--dataset /path/to/train.jsonlto override whatever the config references.--examples 2000to copy the first N rows into a temporary file (useful for smoke testing before a full run).- Multiple
.envfiles can be preloaded with--env-fileif you switch between orgs.
Run the CLI
- Dataset discovery: If
build_sft_payloadcan’t resolve a dataset, the CLI scans the config directory,datasets/,ft_data/, and repo-level folders. You can also enter a manual path when prompted. - Payload build: Overrides (
--dataset,--allow-experimental,--examples) are applied before validation. - Validation:
validate_sft_jsonlchecks every row for required fields. Fix formatting issues locally before re-running. - Optional truncation: When
--examplesis set, the CLI writes a temporary JSONL containing only the requested number of rows and points the payload at that file.
Upload & Job Creation
- Upload training data: Files are sent to
POST {backend}/files. The CLI prints the returned file IDs. - Wait for readiness:
_wait_for_training_filepolls the file endpoint (up to ~10s) so the job doesn’t start before the backend finishes processing. - Create job: The CLI prints a payload preview and calls
POST {backend}/learning/jobsfollowed byPOST .../{job_id}/start.
Monitoring
- Leave
--pollon to streamsft.progress, validation summaries, and overall status. - Use
--stream-format chartfor a livetrain.lossgraph; CLI mode hides orchestration noise (sft.stage,hatchet.*, etc.). - Adjust
--poll-timeoutand--poll-intervalif you expect very long or very short runs.
Data Hygiene Checklist
- Double-check that your JSONL is non-empty and UTF-8 encoded.
- Keep sensitive data out of
.envfiles; only credentials and dataset paths belong there. - Remove the temporary truncated file that
--examplescreates if you want to keep your workspace clean (the CLI attempts to delete it automatically).
Troubleshooting Tips
- Upload failures usually stem from path issues or zero-byte files—verify the file exists and is readable.
- If validation keeps failing, run
uvx python scripts/check_dataset.py path/to/file.jsonl(if available) or open a few rows manually to confirm schema compliance. - Backend 4xx errors on job creation often indicate missing required config fields; review the payload preview printed just before the request and compare it with your config.