Capacity Planning for retrain¶
This page is about planning retrain runs, not generic infra sizing. Use it when deciding batch/group/token settings and campaign worker counts. Capacity guidance is here to keep experiments easy as they scale.
Where capacity shows up in retrain¶
In retrain, each step does:
- sample completions
- score rewards
- compute advantages/transforms
- run adapter training
Capacity pressure is mostly driven by:
[training].batch_size[training].group_size[training].max_tokens[campaign].parallel,max_workers,stagger_seconds[backpressure]controller settings (optional auto-tuning)
Profiles used in retrain docs¶
- Standard profile (default):
batch_size=8,group_size=16,max_tokens=10240 - Smoke-test profile (quickstart):
batch_size=2,group_size=8,max_tokens=1024(non-default)
Treat max_tokens as an upper bound, not expected generation length. Capacity should be calibrated from real runs.
Core formulas¶
Concurrency per step¶
This is the number of completions sampled per training step.
Total campaign steps¶
Estimated wall time¶
Where:
median_step_timecomes from recentstep_time_slogseffective_parallelism = min(max_workers, total_runs)whenparallel = trueeffective_parallelism = 1whenparallel = false
retrain command flow for capacity sizing¶
# 1) Create or inspect config
retrain init --template campaign
retrain explain campaign.toml
# 2) Run a small pilot (edit max_steps in TOML first)
retrain campaign.toml
# 3) Check run matrix and outputs
retrain status logs
Worker planning for retrain campaigns¶
Use these campaign keys:
Guidelines:
- Start with
max_workers = 2to confirm backend stability. - Increase workers only after pilot runs show stable latency and no API saturation.
- Use
stagger_seconds(5-15s) to smooth startup bursts on shared services. - If all workers slow down together, lower
max_workersbefore changing model/training knobs.
Tinker API throttling¶
For the Tinker backend, parallel campaigns automatically throttle concurrent API calls
via a file-based counting semaphore. This means you can safely set max_workers higher
than max_concurrent — workers will queue for API access instead of overwhelming the endpoint.
[campaign]
parallel = true
max_workers = 12 # 12 subprocesses run in parallel
[backend]
backend = "tinker"
max_concurrent = 4 # but only 4 hit the Tinker API at once
The default max_concurrent = 4 prevents the 502/504 errors that previously occurred
when many workers called Tinker simultaneously. Tune it based on your endpoint's capacity.
Calibration loop (recommended for retrain)¶
- Run a pilot: 20-50 steps, 1-2 seeds, target condition(s).
- Read
step_time_sand use the median asmedian_step_time. - Check
max_token_hit_rate: - If high, completions are being truncated; increase
max_tokensonly if quality requires it. - If low, reduce
max_tokensto lower latency and memory pressure. - Check backpressure signals (
bp_action,bp_regime,bp_p_star,bp_throughput): - Frequent
throttleor persistentretrogrademeans concurrency is too high. - Stable
holdnearoptimalis a good operating zone. - Recompute wall time with updated
median_step_timeand chosenmax_workers.
The two files you will read most often are:
logs/<run>/metrics.jsonllogs/<campaign>/manifest.json
Capacity decision checklist¶
- Is this a smoke test or a full training run?
- Is
max_tokensset intentionally for the objective? - Is campaign parallelism sized to backend limits?
- Do pilot metrics confirm expected throughput?
- Is estimated wall time acceptable before scaling seeds/conditions?