Training Guide¶
This tutorial covers PPO training, self-play, and league training on battalion-level environments.
Single-Agent PPO¶
from envs import BattalionEnv
from training.train import train
env = BattalionEnv()
train(env=env, total_timesteps=500_000)
Configuration is driven by YAML files in configs/:
Self-Play¶
Train a policy against a pool of frozen past checkpoints:
See configs/self_play.yaml for the canonical self-play configuration.
League Training¶
For a step-by-step worked example covering agent pool bootstrapping, matchmaker configuration, exploiter loops, Nash sampling, and the diversity report see the League Training Tutorial.
For the full API reference see
docs/league_training_guide.md.
Experiment Tracking¶
All training runs must be logged to W&B:
Post the W&B run URL in your PR or [EXP] tracking issue.