Full Finetuning (FFT)
Run standard supervised finetuning using the SFT workflow withtraining.use_qlora = false
(default) and typical FFT hyperparameters.
- Invoke via:
uvx synth-ai train --type sft --config <path>
- Uses the same client/payload path as LoRA; differs only in training mode/toggles and typical hyperparameters/parallelism
Quickstart
Minimal TOML (FFT)
What the client validates and sends
- Validates dataset path existence and JSONL records
- Uploads files to
/api/learning/files
, then creates/starts job under/api/learning/jobs
- Payload mapping is identical to LoRA SFT: hyperparameters +
metadata.effective_config
(compute, data.topology, training)
Multi‑GPU guidance (FFT)
- Use
[compute]
for cluster shape - Prefer
[hyperparameters.parallelism]
for deepspeed stage, FSDP, precision, TP/PP sizes; forwarded verbatim [data.topology]
is optional and informational; backend/trainer validates actual resource consistency
Common issues
- HTTP 400
missing_gpu_type
: add[compute].gpu_type
- Dataset not found: specify absolute path or use
--dataset
(paths resolved from current working directory)
Helpful CLI flags
--examples N
to subset data for a quick smoke test--dry-run
to preview payload before submitting
All sections and parameters (FFT)
-
[job]
(client reads)model
(string, required): base model identifierdata
ordata_path
(string): training JSONL (required unless--dataset
provided)
-
[compute]
(forwarded into metadata.effective_config.compute)gpu_type
(string): required by backendgpu_count
(int)nodes
(int, optional)
-
[data]
/[data.topology]
topology
(table): forwarded intometadata.effective_config.data.topology
validation_path
(string, optional): if present and exists, is uploaded to enable validation
-
[training]
mode
(string, optional): copied to metadata for visibilityuse_qlora
(bool, default false)[training.validation]
keys promoted into hyperparameters:enabled
(bool, default true) -> surfaced into metadata.effective_config.training.validation.enabledevaluation_strategy
(string, default “steps”)eval_steps
(int, default 0)save_best_model_at_end
(bool, default true)metric_for_best_model
(string, default “val.loss”)greater_is_better
(bool, default false)
-
[hyperparameters]
n_epochs
(int, default 1)- Optional:
batch_size
,global_batch
,per_device_batch
,gradient_accumulation_steps
,sequence_length
,learning_rate
,warmup_ratio
,train_kind
[hyperparameters.parallelism]
forwarded verbatim:use_deepspeed
,deepspeed_stage
,fsdp
,bf16
,fp16
,tensor_parallel_size
,pipeline_parallel_size
-
[algorithm]
(ignored by client): sometimes used in examples for documentation only
- Dataset path must exist; otherwise the CLI prompts/aborts
- Dataset JSONL checked for
messages
structure - Backend requires
compute.gpu_type
; missing value yields HTTP 400 at create job
model
from[job].model
training_type = "sft_offline"
hyperparameters
from[hyperparameters]
plus selected[training.validation]
keysmetadata.effective_config.compute
from[compute]
metadata.effective_config.data.topology
from[data.topology]
metadata.effective_config.training.{mode,use_qlora}
from[training]