Skip to content

Wargames Training — Project Roadmap

Living document. Updated by agents and maintainers. Last updated: auto-updated by weekly report agent.


Version Overview

Version Theme Status Target
v1 Foundation — 1v1 battalion ✅ Complete M4
v2 Multi-Agent — MARL 2v2+ ✅ Complete M6
v3 Hierarchy — Brigade/Division HRL ✅ Complete M8
v4 League — AlphaStar-style training ✅ Complete M10
v5 Real-World Interface & Analysis ✅ Complete M12
v6 Physics-Accurate Simulation 🔲 Planned M14
v7 Operational Scale (Corps / Army) 🔲 Planned M16
v8 Transformer Policy & Architecture 🔲 Planned M18
v9 Human-in-the-Loop & Decision Support 🔲 Planned M20
v10 Multi-Domain & Joint Operations 🔲 Planned M22
v11 Real-World Data & Transfer 🔲 Planned M24
v12 Foundation Model & Open Platform 🔲 Planned M26

v1: Foundation (Complete ✅)

Goal: A single battalion agent that reliably defeats scripted opponents in 1v1 continuous 2D battles, generalizes across randomized parameters, and has a working self-play loop.

Epics

  • [x] E1.1 — Project Bootstrap & Tooling
  • [x] E1.2 — Core Simulation Engine
  • [x] E1.3 — Gymnasium Environment (1v1)
  • [x] E1.4 — Baseline Training Loop (PPO + scripted opponent)
  • [x] E1.5 — Terrain & Environmental Randomization
  • [x] E1.6 — Reward Shaping & Curriculum Design
  • [x] E1.7 — Self-Play Implementation
  • [x] E1.8 — Evaluation Framework & Elo Tracking
  • [x] E1.9 — Visualization & Replay System
  • [x] E1.10 — v1 Documentation & Release

Milestones

  • M0: Project Bootstrap ✅
  • M1: 1v1 Competence ✅
  • M2: Terrain & Generalization ✅
  • M3: Self-Play Baseline ✅
  • M4: v1 Complete ✅

v2: Multi-Agent (Complete ✅)

Goal: Multiple battalions per side coordinate using MAPPO. Emergent flanking, fire concentration, and mutual support.

Epics

  • [x] E2.1 — PettingZoo Multi-Agent Environment
  • [x] E2.2 — MAPPO Implementation (Centralized Critic)
  • [x] E2.3 — 2v2 Curriculum
  • [x] E2.4 — Coordination Metrics & Analysis
  • [x] E2.5 — Scale to NvN (up to 6v6)
  • [x] E2.6 — Multi-Agent Self-Play
  • [x] E2.7 — v2 Documentation & Release

Milestones

  • M5: 2v2 MARL ✅
  • M6: v2 Complete ✅

Key Research Questions

  • Does parameter sharing (shared policy across all battalions) outperform independent policies?
  • What observation radius produces optimal coordination without information overload?
  • Does emergent flanking behavior appear without explicit reward for it?

Sprint Schedule (v2)

Sprint Weeks Focus Epics
S10 19–20 PettingZoo environment (2v2) E2.1
S11 21–22 MAPPO implementation E2.2
S12 23–24 2v2 curriculum + coordination metrics E2.3, E2.4
S13 25–26 NvN scaling + multi-agent self-play E2.5, E2.6
S14 27–28 v2 polish + release E2.7

v3: Hierarchical RL (Complete ✅)

Goal: Brigade and division commanders issue macro-commands to frozen battalion policies. HRL architecture matching Black (NPS 2024).

Epics

  • [x] E3.1 — SMDP / Options Framework
  • [x] E3.2 — Brigade Commander Layer
  • [x] E3.3 — Division Commander Layer
  • [x] E3.4 — Hierarchical Curriculum (bottom-up training)
  • [x] E3.5 — Temporal Abstraction Tuning
  • [x] E3.6 — Multi-Model Policy Library (per echelon)
  • [x] E3.7 — HRL Evaluation vs. Flat MARL
  • [x] E3.8 — v3 Documentation & Release

Milestones

  • M7: HRL Battalion→Brigade ✅
  • M8: v3 Complete ✅

Key Research Questions

  • Does hierarchical decomposition outperform flat MARL at scale?
  • What is the optimal temporal abstraction ratio between echelons?
  • Can brigade commanders discover novel operational maneuvers?

Sprint Schedule (v3)

Sprint Weeks Focus Epics
S15 29–30 SMDP / Options framework E3.1
S16 31–32 Brigade commander layer + HRL curriculum E3.2, E3.4
S17 33–34 Division commander + HRL evaluation E3.3, E3.7
S18 35–36 Temporal abstraction + policy library + v3 release E3.5, E3.6, E3.8

v4: League Training (Complete ✅)

Goal: AlphaStar-style league with main agents, exploiters, and league exploiters. Nash equilibrium sampling. Strategy diversity metrics.

Epics

  • [x] E4.1 — League Infrastructure (agent pool, matchmaking)
  • [x] E4.2 — Main Agent Training Loop
  • [x] E4.3 — Main Exploiter Agents
  • [x] E4.4 — League Exploiter Agents
  • [x] E4.5 — Nash Distribution Sampling
  • [x] E4.6 — Strategy Diversity Metrics
  • [x] E4.7 — Distributed Training (Ray)
  • [x] E4.8 — v4 Documentation & Release

Milestones

  • M9: League Training ✅
  • M10: v4 Complete ✅

Key Research Questions

  • Can we achieve Nash equilibrium sampling in a multi-agent wargame setting?
  • Does a diverse league produce demonstrably more robust main agents?
  • What level of distribution is required for a commercially viable league?

Sprint Schedule (v4)

Sprint Weeks Focus Epics
S19 37–38 League infrastructure + matchmaking E4.1
S20 39–40 Main agent + exploiter training E4.2, E4.3
S21 41–42 League exploiters + Nash sampling + diversity E4.4, E4.5, E4.6
S22 43–44 Distributed training (Ray/RLlib) + v4 release E4.7, E4.8

v5: Analysis & Interface (Complete ✅)

Goal: Turn the trained system into a useful wargaming tool. Human-vs-AI play, COA analysis, strategy visualization.

Epics

  • [x] E5.1 — Human-playable interface (envs/human_env.py, scripts/play.py)
  • [x] E5.2 — Course of Action (COA) generator (analysis/coa_generator.py, api/coa_endpoint.py)
  • [x] E5.3 — Strategy explainability (analysis/saliency.py, notebooks/explainability_demo.ipynb)
  • [x] E5.4 — Historical scenario validation (envs/scenarios/historical.py, configs/scenarios/historical/)
  • [x] E5.5 — Export trained policies for deployment (scripts/export_policy.py, docker/policy_server/)

Milestones

  • M11: Interface & Analysis ✅
  • M12: v5 Complete ✅

Key Research Questions

  • Do trained policies reproduce historically documented tactics?
  • Can the COA generator surface novel strategies missed by human planners?
  • How much does explainability improve operator trust in AI-generated COAs?

v6: Physics-Accurate Simulation (Planned 🔲)

Goal: Replace the abstract 2D simulation with a historically-grounded physics model: terrain elevation, line-of-sight, realistic weapon ranges and reload cycles, formation mechanics, morale cascades, supply consumption, and weather effects. This is the foundation that makes all subsequent versions tactically meaningful.

Epics

  • [ ] E6.1 — Terrain Elevation & Line-of-Sight Engine
  • [ ] E6.2 — Realistic Weapon Ranges, Accuracy & Reload Cycles
  • [ ] E6.3 — Morale, Cohesion & Rout Mechanics
  • [ ] E6.4 — Formation System (Line, Column, Square, Skirmish)
  • [ ] E6.5 — Supply, Ammunition & Fatigue Model
  • [ ] E6.6 — Weather & Time-of-Day Effects
  • [ ] E6.7 — v6 Documentation & Release

Milestones

  • M13: Physics Simulation
  • M14: v6 Complete

Key Research Questions

  • Does terrain-aware training produce qualitatively different emergent tactics?
  • What is the minimum physics fidelity required for historically-plausible agent behaviour?
  • Can agents discover fire-and-movement doctrine without explicit reward?

Sprint Schedule (v6)

Sprint Weeks Focus Epics
S26 45–46 Terrain engine + weapon system E6.1, E6.2
S27 47–48 Formations + morale + rout E6.3, E6.4
S28 49–50 Logistics + weather + v6 release E6.5, E6.6, E6.7

v7: Operational Scale — Corps Command (Planned 🔲)

Goal: Extend the HRL stack to corps level (3–5 divisions per side on a 20–50 km² map). Introduce road networks, strategic supply chains, and operational objectives (capture, interdict, fix-and-flank).

Epics

  • [ ] E7.1 — Corps-Level Operational Environment
  • [ ] E7.2 — Strategic Supply & Logistics Network
  • [ ] E7.3 — Multi-Corps Self-Play & League Extension
  • [ ] E7.4 — v7 Documentation & Release

Milestones

  • M15: Corps Command
  • M16: v7 Complete

Key Research Questions

  • Does corps-level HRL discover Napoleon's corps maneuver system independently?
  • What map scale is needed to make supply interdiction a decisive operational factor?
  • Does Nash equilibrium sampling still prevent strategy collapse at corps scale?

Sprint Schedule (v7)

Sprint Weeks Focus Epics
S29 51–52 Corps env + road network + objectives E7.1
S30 53–54 Strategic supply + corps league E7.2, E7.3
S31 55–56 v7 release E7.4

v8: Transformer Policy & Attention Architecture (Planned 🔲)

Goal: Replace fixed-size concatenation observations with variable-length entity-token sequences processed by a multi-head self-attention transformer. Add recurrent memory for fog-of-war scenarios. Systematic scaling study.

Epics

  • [ ] E8.1 — Entity-Based Observation & Transformer Policy
  • [ ] E8.2 — Memory Module (LSTM / Temporal Context)
  • [ ] E8.3 — Model Scaling & Hyperparameter Study
  • [ ] E8.4 — v8 Documentation & Release

Milestones

  • M17: Transformer Policy
  • M18: v8 Complete

Key Research Questions

  • Does entity-based transformer encoding outperform flat MLP at 8v8+?
  • Does recurrent memory provide meaningful advantage under fog-of-war?
  • What is the optimal model size for the performance–latency Pareto frontier?

Sprint Schedule (v8)

Sprint Weeks Focus Epics
S32 57–58 Entity encoder + transformer policy E8.1
S33 59–60 Memory module + scaling study + v8 release E8.2, E8.3, E8.4

v9: Human-in-the-Loop & Decision Support (Planned 🔲)

Goal: Build a web-based wargaming interface, an AI-assisted COA planning tool at corps scale, and an after-action review system with DAgger-style human feedback integration.

Epics

  • [ ] E9.1 — Interactive Web-Based Wargame Interface
  • [ ] E9.2 — AI-Assisted Course of Action (COA) Planning Tool
  • [ ] E9.3 — After-Action Review & Training Feedback Loop
  • [ ] E9.4 — v9 Documentation & Release

Milestones

  • M19: Decision Support
  • M20: v9 Complete

Key Research Questions

  • Do AI-generated COAs outperform expert human planners on novel scenarios?
  • Does DAgger-style human feedback improve performance on human-designed scenarios?
  • What level of explainability is required for operator trust in AI COAs?

Sprint Schedule (v9)

Sprint Weeks Focus Epics
S34 61–62 Web interface + AI policy server E9.1
S35 63–64 COA tool + AAR + v9 release E9.2, E9.3, E9.4

v10: Multi-Domain & Joint Operations (Planned 🔲)

Goal: Add naval units, elevate cavalry and artillery to independent operational arms, and enable joint combined-arms operations (land + sea).

Epics

  • [ ] E10.1 — Naval Unit Type & Coastal Operations
  • [ ] E10.2 — Cavalry Arm as Independent Maneuver Force
  • [ ] E10.3 — Artillery Arm: Grand Battery & Counter-Battery
  • [ ] E10.4 — v10 Documentation & Release

Milestones

  • M21: Joint Operations
  • M22: v10 Complete

Key Research Questions

  • Does emergent joint combined-arms doctrine appear without explicit reward?
  • Does naval fire support change amphibious assault tactics discovered by agents?
  • Does cavalry reconnaissance measurably improve corps-level decision quality?

Sprint Schedule (v10)

Sprint Weeks Focus Epics
S36 65–66 Naval units + coastal operations E10.1
S37 67–68 Cavalry corps + grand battery + v10 release E10.2, E10.3, E10.4

v11: Real-World Data & Transfer (Planned 🔲)

Goal: Import 50+ historical Napoleonic battle OOBs and GIS terrain for real battle sites. Collect expert demonstrations via the v9 interface. Validate and fine-tune agents on real-world data.

Epics

  • [ ] E11.1 — Historical Battle Database & Scenario Importer
  • [ ] E11.2 — GIS Terrain Import (real-world maps)
  • [ ] E11.3 — Expert Demonstration Collection & Imitation Learning
  • [ ] E11.4 — v11 Documentation & Release

Milestones

  • M23: Real-World Transfer
  • M24: v11 Complete

Key Research Questions

  • Does zero-shot transfer to real terrain lose less than 20 % win rate vs. procedural?
  • Does behaviour cloning from expert demonstrations accelerate RL convergence?
  • Do agents reproduce historical Napoleonic maneuvers on real battlefield terrain?

Sprint Schedule (v11)

Sprint Weeks Focus Epics
S38 69–70 Historical database + GIS import E11.1, E11.2
S39 71–72 Expert demonstrations + BC pre-training + v11 release E11.3, E11.4

v12: Foundation Model & Open Research Platform (Planned 🔲)

Goal: Train WFM-1 — a single large transformer policy generalising across all scenarios, scales, and unit types. Open-source the full stack as a reproducible research benchmark (WargamesBench). Submit the system paper.

Epics

  • [ ] E12.1 — Wargames Foundation Model (WFM-1)
  • [ ] E12.2 — Open Research Platform & Public Benchmark (WargamesBench)
  • [ ] E12.3 — Modern-Era Extension (v12 Stretch Goal)
  • [ ] E12.4 — v12 Documentation, Paper & Release

Milestones

  • M25: Foundation Model
  • M26: v12 Complete

Key Research Questions

  • Can a single foundation model generalise across battalion, brigade, and corps echelons?
  • Does multi-task training on all scenario types improve zero-shot generalisation?
  • Can WFM-1 transfer to a new historical era (WW1) with < 50k fine-tuning steps?

Sprint Schedule (v12)

Sprint Weeks Focus Epics
S40 73–74 WFM-1 architecture + multi-task training E12.1
S41 75–76 Open platform + paper + v12 LTS release E12.2, E12.3, E12.4

Sprint Schedule

Sprints are 2 weeks. Sprint planning happens every other Monday.

v1 Sprints (Active)

Sprint Weeks Focus
S01 1–2 Bootstrap, env setup, first training run
S02 3–4 Sim engine, combat resolution
S03 5–6 Reward shaping, curriculum L1–L3
S04 7–8 Terrain system, curriculum L4
S05 9–10 Generalization, randomized params
S06 11–12 Self-play infrastructure
S07 13–14 Self-play training, Elo tracking
S08 15–16 Evaluation, visualization, v1 polish
S09 17–18 v1 release, v2 planning

v2 Sprints (Planned)

Sprint Weeks Focus
S10 19–20 PettingZoo multi-agent environment (2v2)
S11 21–22 MAPPO implementation
S12 23–24 2v2 curriculum + coordination metrics
S13 25–26 NvN scaling + multi-agent self-play
S14 27–28 v2 polish + release

v3 Sprints (Planned)

Sprint Weeks Focus
S15 29–30 SMDP / Options framework
S16 31–32 Brigade commander layer + HRL curriculum
S17 33–34 Division commander + HRL evaluation
S18 35–36 Temporal abstraction + policy library + v3 release

v4 Sprints (Planned)

Sprint Weeks Focus
S19 37–38 League infrastructure + matchmaking
S20 39–40 Main agent + exploiter training
S21 41–42 League exploiters + Nash sampling + diversity
S22 43–44 Distributed training (Ray/RLlib) + v4 release

v5 Sprints (Complete ✅)

Sprint Weeks Focus
S23 (v5) Human interface + COA generator
S24 (v5) Explainability + historical validation
S25 (v5) Policy export + v5 release

v6 Sprints (Planned)

Sprint Weeks Focus
S26 45–46 Terrain engine + weapon system
S27 47–48 Formations + morale + rout
S28 49–50 Logistics + weather + v6 release

v7 Sprints (Planned)

Sprint Weeks Focus
S29 51–52 Corps env + road network + objectives
S30 53–54 Strategic supply + corps league
S31 55–56 v7 release

v8 Sprints (Planned)

Sprint Weeks Focus
S32 57–58 Entity encoder + transformer policy
S33 59–60 Memory module + scaling study + v8 release

v9 Sprints (Planned)

Sprint Weeks Focus
S34 61–62 Web interface + AI policy server
S35 63–64 COA tool + AAR + v9 release

v10 Sprints (Planned)

Sprint Weeks Focus
S36 65–66 Naval units + coastal operations
S37 67–68 Cavalry corps + grand battery + v10 release

v11 Sprints (Planned)

Sprint Weeks Focus
S38 69–70 Historical database + GIS import
S39 71–72 Expert demonstrations + BC pre-training + v11 release

v12 Sprints (Planned)

Sprint Weeks Focus
S40 73–74 WFM-1 architecture + multi-task training
S41 75–76 Open platform + paper + v12 LTS release