Skip to content

Training Guide

This tutorial covers PPO training, self-play, and league training on battalion-level environments.

Single-Agent PPO

from envs import BattalionEnv
from training.train import train

env = BattalionEnv()
train(env=env, total_timesteps=500_000)

Configuration is driven by YAML files in configs/:

python -m training.train --config configs/experiment_1.yaml

Self-Play

Train a policy against a pool of frozen past checkpoints:

from training.self_play import OpponentPool, SelfPlayCallback

See configs/self_play.yaml for the canonical self-play configuration.

League Training

For a step-by-step worked example covering agent pool bootstrapping, matchmaker configuration, exploiter loops, Nash sampling, and the diversity report see the League Training Tutorial.

For the full API reference see docs/league_training_guide.md.

Experiment Tracking

All training runs must be logged to W&B:

import wandb
wandb.init(project="wargames_training", config=your_config_dict)

Post the W&B run URL in your PR or [EXP] tracking issue.