Supported models
Any Qwen3-VL checkpoint (2B–235B) works with the RL stack. The registry descriptor (backend/app/routes/simple_training/model_families/qwen3_vl.py) adds:
supports_vision = truemax_images_per_message = 1- LoRA projector targets (
mm_projector, attention/MLP layers)
backend/app/routes/clustered_training/core/algorithms/gspo/app_helpers.py) when you pick a VL model.
Task app requirements
Use the Crafter policy (examples/task_apps/crafter/task_app/synth_envs_hosted/envs/crafter/policy.py) as a template:
- Detect VL models via
model_nameand setuse_vision = True. - Include the observation image as a data URL (or HTTPS URL) inside the user message:
- Support
image_only_modeto send image segments without accompanying text when desired.
image_url segment (backend/app/routes/clustered_training/core/algorithms/gspo/inference/server.py), so ensure the URL is present and fetchable.
Config checklist
Thinking variants
If you choose a-Thinking SKU, populate the rollout policy_config with the intended thinking mode:
backend/app/routes/clustered_training/core/algorithms/gspo/evaluation/evaluator.py).
Example workflow
- Deploy the Crafter task app (
modal deploy examples/task_apps/crafter/task_app/main.py) with vision enabled. - Update
examples/qwen_vl/configs/crafter_rl_vision_qwen3vl4b.tomlwith your task URL and API key secrets. - Launch RL:
- Monitor rollouts – the trainer logs dropped images if you exceed
max_images_per_message, and vLLM reports multimodal prompt usage. - Evaluate / deploy – reuse the same
[model]+[rollout]blocks in your eval configs and Modal deployment manifests so the processor files ship with the model.
Tips
- Concurrency: Vision prompts are larger. Start with
max_concurrent_rollouts = 4and scale cautiously. - Topology: Use
single_node_splitand dedicate at least one GPU to vLLM and one to training; sharded models (235B) require additional GPUs. - Data capture: Enable tracing (
TASKAPP_TRACING_ENABLED=1) to keep image payloads in your evaluation logs. - LoRA projector weights: When using LoRA, ensure
target_modulesincludes the projector (the sample config uses"all-linear"to cover every linear module).