07. Multi-Agent Workflows and Agentic Swarms#
In the real world, no single agent can solve every problem optimally. As tasks grow in uncertainty, dimensionality, and interdependence — such as strategy games, simulations, robotics, or real-time business systems — we naturally evolve from single-agent reasoning to multi-agent workflows. In this tutorial, we see the first sparks of Super Agents. An AI Super-Agent is an orchestration system that coordinates multiple specialized AI agents to solve complex problems requiring diverse capabilities.
These workflows mirror how humans collaborate:
🗳️ Democratic committees balance diverse perspectives.
🧭 Hierarchical managers coordinate specialists under limited resources.
⚖️ Actor-Critic systems separate exploration (actor) from judgment (critic).
Each pattern encodes a different philosophy of coordination — distributing intelligence across specialized roles that communicate, negotiate, and arbitrate toward a shared goal.
⚙️ What Are Multi-Agent Workflows?#
A multi-agent workflow is a structured network of reasoning and action nodes — planners, evaluators, arbiters, memory modules — that interact through explicit channels rather than a single monolithic prompt.
Think of it as a graph of decision-making where:
Nodes = agents (LLMs, heuristics, or functions).
Edges = communication or dependency between them.
Memory = shared context that persists across steps.
Arbitration = how conflicting opinions are resolved.
This structure enables:
Parallel specialization (multiple evaluators in parallel).
Conditional routing (managers deciding who to consult).
Resource budgeting (decide when to skip expensive reasoning).
Explainability & debugging (explicit traces of who decided what).
🧩 Enter Pydantic Graphs#
Building and managing these interactions manually is painful — tracking state, type safety, branching, and parallel execution can get messy fast.
Pydantic Graphs solves this elegantly by combining:
✅ Typed data flow from Pydantic models — ensuring every node’s input and output are structured, validated, and traceable.
🕸️ Graph orchestration — defining agents and their dependencies as composable, inspectable workflows.
🔁 Parallel & conditional execution — automatically handling fan-out (multiple evaluators) and routing logic (manager/critic decisions).
🧾 Transparent traces — every step’s inputs, outputs, and reasoning can be logged, visualized, and replayed.
Together, they turn the messy spaghetti of agent calls into a declarative decision graph — a scalable foundation for complex, memory-aware, multi-agent systems.
🧩 What is poke-env?#
poke-env is a Python interface to the Pokémon Showdown battle simulator, providing an environment for reinforcement learning and AI experiments.
It exposes each battle as a structured API — giving access to game state (Pokémon, moves, types, HP, etc.) and allowing agents to pick legal actions programmatically.
In our workflow, we’ll use poke-env as the testbed to:
⚔️ Pit different multi-agent strategies (democratic, manager, actor-critic) against each other.
📊 Compare performance through metrics like win rate, turns survived, and move efficiency.
🧠 Benchmark reasoning styles — seeing how coordination strategies translate into competitive outcomes.
Before running experiments, we’ll start a local Pokémon Showdown server instance. This spins up a self-contained battle environment where our agents can safely train, plan, and battle — making Pokémon the perfect arena for testing agentic intelligence in action.
from src.pokemon_showdown_setup import run_pokemon_showdown
pokemon_container = run_pokemon_showdown()
🟢 Container already running: pokemon-showdown (4a670e8e0ee1)
🧪 Getting Started with poke-env#
Before building our custom multi-agent workflows, let’s first understand how the poke-env battle environment works.
It allows us to easily simulate Pokémon battles between automated agents — here, we’ll start with two simple RandomPlayer agents that pick legal moves at random.
By running a quick cross-evaluation, we can see how poke-env orchestrates matches, tracks results, and reports win rates — forming the foundation on which our more sophisticated, reasoning-based agents will later compete.
from poke_env.player.player import Player
from poke_env import RandomPlayer, cross_evaluate
from tabulate import tabulate
first_player = RandomPlayer()
second_player = RandomPlayer()
players = [first_player, second_player]
async def test_cross_evaluation(players, n_challenges=5):
cross_evaluation = await cross_evaluate(players, n_challenges=n_challenges)
table = [["-"] + [p.username for p in players]]
for p_1, results in cross_evaluation.items():
table.append([p_1] + [cross_evaluation[p_1][p_2] for p_2 in results])
return tabulate(table)
print(await test_cross_evaluation(players))
-------------- -------------- --------------
- RandomPlayer 1 RandomPlayer 2
RandomPlayer 1 0.4
RandomPlayer 2 0.6
-------------- -------------- --------------
Let’s see an example battle in action.
You can also view the full battle here.
⚡ Creating a “Max Damage” Baseline#
To add another simple benchmark beyond the RandomPlayer, we’ll define a MaxDamagePlayer — an agent that always selects the move with the highest base power.
This gives us a more deterministic and aggressive baseline that prioritizes raw damage output over safety or strategy. By comparing our Pydantic AI agent against both Random and MaxDamage players, we can see whether reasoning and memory-aware planning lead to better decision-making than brute-force move selection.
class MaxDamagePlayer(Player):
def choose_move(self, battle):
if battle.available_moves:
best_move = max(battle.available_moves, key=lambda move: move.base_power)
if battle.can_tera:
return self.create_order(best_move, terastallize=True)
return self.create_order(best_move)
else:
return self.choose_random_move(battle)
players = [first_player, MaxDamagePlayer()]
print(await test_cross_evaluation(players))
----------------- -------------- -----------------
- RandomPlayer 1 MaxDamagePlayer 1
RandomPlayer 1 0.0
MaxDamagePlayer 1 1.0
----------------- -------------- -----------------
🎮 Pokémon battle mechanics — and how we encode them for our agents#
Let’s now build a simple agent to do the same, as in, use the battle context to choose the optimal action.
Core mechanics (what the agent must reason about):
Turn-based actions: Each turn you either use a move or switch. Faster Pokémon usually act first; priority can override speed.
Types & STAB: Moves have types (e.g., Electric). Effectiveness depends on attacker vs defender types; using a move matching the user’s type grants STAB (bonus damage).
Accuracy & PP: Moves can miss (accuracy < 100) and have limited PP (uses).
HP & fainting: A Pokémon faints at 0 HP; win condition is faint all opponent Pokémon.
Information limits: You only know the opponent’s revealed Pokémon and partial info about their sets.
Switching & tempo: Switching preserves a weakened Pokémon, but concedes tempo (opponent gets a “free” hit).
Status/hazards/weather (omitted here for brevity): These exist in the simulator; we can add them later as fields.
🧱 Our context schema (how we feed the LLM the game state)#
We transform poke-env’s Battle into a typed, LLM-friendly snapshot:
TeamMon: one entry per Pokémon (both sides) with:species, fractionalhp,fainted, andtypes.
MoveOption: one entry per legal move this turn with:move_id,base_power,accuracy,move_type,pp,priority.
SwitchOption: one entry per legal switch target with:species,hp,fainted,types.
AgentContext: the full decision frame the agent sees:turn: current turn number.you_active/opp_active: currently active Pokémon on both sides.you_team: your full team (known).opp_known: only revealed opponent Pokémon (respecting partial observability).legal_moves/legal_switches: the only actions you may take now.past_actions: a short episodic memory string list (e.g., summaries of last turns).
🛠️ How the code builds this context#
_pokemon_to_teammon(p)safely converts a poke-envPokemoninto ourTeamMonschema (species, hp%, types).In
build_context(battle, past_actions):We iterate
battle.available_movesto populateMoveOption(capturing damage proxies via base power, reliability via accuracy, tempo via priority, and resource via PP).We iterate
battle.available_switchesto populateSwitchOption(capturing survivability options).We map your full
battle.teamintoyou_teamand the opponent’s revealed team intoopp_known(partial info).We capture actives (
you_active,opp_active) and theturncounter.We attach
past_actionsso the LLM can reason with short-term memory.
agent_context_to_string(ctx)serializes theAgentContextto pretty JSON, ideal for prompting an LLM agent.
Result: every decision step provides a compact, validated, and complete view of what matters now, aligning game mechanics with agent reasoning (damage, risk, tempo, information, and legal constraints).
from __future__ import annotations
from typing import List, Optional, Dict, Any, Literal
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from poke_env.battle.battle import Battle, Pokemon
class TeamMon(BaseModel):
species: str
hp: Optional[float] = None
fainted: bool = False
types: List[str] = []
boosts: Optional[Dict[str, int]] = None
status: Optional[str] = None
must_recharge: Optional[bool] = None
class MoveOption(BaseModel):
move_id: str
base_power: Optional[int] = None
accuracy: Optional[float] = None
move_type: Optional[str] = None
pp: Optional[int] = None
priority: int = 0
class SwitchOption(BaseModel):
species: str
hp: Optional[float] = None
fainted: bool = False
types: List[str] = []
class AgentContext(BaseModel):
turn: int
weather: Dict[str, Any]
# You
you_active: Optional[str]
you_team: List[TeamMon]
# Opponent
opp_active: Optional[str]
opp_known: List[TeamMon]
# Legals
legal_moves: List[MoveOption]
legal_switches: List[SwitchOption]
# Short episodic memory (last few actions / summaries)
past_actions: List[str] = []
def _pokemon_to_teammon(p: Pokemon) -> TeamMon:
return TeamMon(
species=p.species,
hp=p.current_hp_fraction,
fainted=p.fainted,
boosts=p.boosts,
status=p.status,
must_recharge=p.must_recharge,
types=[t.name for t in p.types or []],
)
def build_context(battle: Battle, past_actions: List[str]) -> AgentContext:
# legal moves
legal_moves: List[MoveOption] = []
for m in battle.available_moves:
legal_moves.append(MoveOption(
move_id=m.id,
base_power=m.base_power,
accuracy=m.accuracy,
move_type=m.type.name,
pp=m.current_pp,
priority=m.priority,
))
# legal switches
legal_switches: List[SwitchOption] = []
for p in battle.available_switches:
legal_switches.append(SwitchOption(
species=p.species,
hp=p.current_hp_fraction,
fainted=p.fainted,
types=[t.name for t in (p.types or [])],
))
# teams
your_team = [_pokemon_to_teammon(poke) for poke in battle.team.values()]
opp_known = [_pokemon_to_teammon(poke) for poke in battle.opponent_team.values() if poke._revealed] # revealed only
return AgentContext(
turn=battle.turn,
weather=battle.weather,
you_active=battle.active_pokemon.species,
you_team=your_team,
opp_active=battle.opponent_active_pokemon.species,
opp_known=opp_known,
legal_moves=legal_moves,
legal_switches=legal_switches,
past_actions=past_actions,
)
def agent_context_to_string(ctx: AgentContext) -> str:
return ctx.model_dump_json(indent=2)
🤖 A minimal “thinking player”#
Goal: turn the JSON context we built into a single legal action (move or switch) using a typed LLM agent, and keep a tiny episodic memory of what we did.
1) Structured output contract = Decision
We define a Pydantic schema that the LLM must fill:
kind:"move"or"switch".move_id/switch_species: only one is required depending onkind.rationale: short explanation (useful for logs and later learning).
This keeps the model honest and makes post-processing trivial.
2) The LLM-powered player = PydanticLLMPlayer
Extends
poke-env’sPlayer.Sets up a Pydantic AI
Agent(self.battle_agent) with:A system prompt encoding simple policy: prefer high-accuracy, super-effective moves; switch if danger is high or moves are poor; never invent illegal actions.
output_type=Decisionso the model must return a valid, typed object.
3) Decision loop = choose_move(...)
Build context
ctx = build_context(battle, past_actions=self._past_actions)→ serializes the current game state + short episodic memory.Call the agent
decision = self.battle_agent.run_sync(agent_context_to_string(ctx)).output→ the LLM reads the JSON, returns a validatedDecision.Legality mapping
If
kind == "move", find the exactmove_idinbattle.available_movesandcreate_order(m).If
kind == "switch", matchswitch_speciesinbattle.available_switchesandcreate_order(p).We append a human-readable summary to
_past_actionsfor the next turn’s context.
Safety fallback If, for any reason, the decision isn’t legal (should be rare), we choose a random legal action so the game continues.
4) Why this works well
Typed outputs remove prompt-engineering brittleness (no regex parsing or guesswork).
Context → Decision → Action is clean, auditable, and easy to extend (plug in evaluators/critics later).
The episodic memory (
_past_actions) gives the agent short-term continuity across turns without blowing up context size.
from rich import print as rprint
import nest_asyncio
import logfire
from poke_env import Player, RandomPlayer
logfire.configure(send_to_logfire=False) # set to true if you want to use logfire console
nest_asyncio.apply()
class Decision(BaseModel):
kind: Literal["move", "switch"] = Field(description="Choose 'move' or 'switch'.")
move_id: Optional[str] = Field(default=None, description="Required if kind == 'move'")
switch_species: Optional[str] = Field(default=None, description="Required if kind == 'switch'")
rationale: str
class PydanticLLMPlayer(Player):
def __init__(self, name: str, model: str = "openrouter:openai/gpt-4o-mini", **kwargs):
super().__init__(**kwargs)
self.name = name
self._past_actions: List[str] = []
self.battle_agent = Agent(
model=model,
system_prompt=(
"You are a Pokémon battle planner. "
"Given the current battle context, choose ONE legal action. "
"Prefer high-accuracy, super-effective moves; "
"switch if active Pokémon risks being KO'd or has no good moves. "
"Never invent illegal actions."
),
output_type=Decision,
)
def choose_move(self, battle: Battle):
# Build structured context for the agent
ctx = build_context(battle, past_actions=self._past_actions)
# Run agent to get decision
decision = self.battle_agent.run_sync(agent_context_to_string(ctx)).output
if battle.turn <= 1:
rprint(f"CONTEXT:", ctx)
rprint(f"DECISION:", decision)
else:
rprint(f"T{ctx.turn} DECISION:", decision.rationale)
# Map Decision → poke-env action
if decision.kind == "move":
# find the matching legal move
for m in battle.available_moves:
if m.id == decision.move_id:
self._past_actions.append(
f"T{ctx.turn}: MOVE {decision.move_id} ({ctx.you_active} vs {ctx.opp_active})"
)
return self.create_order(m)
elif decision.kind == "switch":
# find the matching legal switch
for p in battle.available_switches:
if p.species == decision.switch_species:
self._past_actions.append(
f"T{ctx.turn}: SWITCH to {decision.switch_species} (from {ctx.you_active})"
)
return self.create_order(p)
# Fallback: if agent suggested an illegal action (shouldn't happen), choose random
self._past_actions.append(f"T{ctx.turn}: FALLBACK random")
return self.choose_random_move(battle)
Logfire project URL: https://logfire-eu.pydantic.dev/shreshthtuli/agenticai
⚔️ Running our first Agentic battle#
Now that we’ve built our LLM-powered Pokémon agent, it’s time to see it in action!
Here we instantiate the PydanticLLMPlayer and let it battle a RandomPlayer for a single match.
When the battle runs:
Each turn, the
agentic_playerbuilds a structuredAgentContext(game state + short memory).The LLM agent reasons over that context and outputs a typed
Decision(moveorswitch).The environment executes that decision, updates the game state, and loops until one side faints all opponents.
This quick match serves as a smoke test — verifying that our agent can read the environment, reason with context, and select legal actions correctly before we scale up to multi-agent graphs and tournaments.
agentic_player = PydanticLLMPlayer(name="LLM Agent")
await agentic_player.battle_against(RandomPlayer(), n_battles=1)
CONTEXT: AgentContext( turn=1, weather={}, you_active='clawitzer', you_team=[ TeamMon( species='clawitzer', hp=1.0, fainted=False, types=['WATER'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='volcarona', hp=1.0, fainted=False, types=['BUG', 'FIRE'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='delphox', hp=1.0, fainted=False, types=['FIRE', 'PSYCHIC'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='golduck', hp=1.0, fainted=False, types=['WATER'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='raichualola', hp=1.0, fainted=False, types=['ELECTRIC', 'PSYCHIC'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='sneasler', hp=1.0, fainted=False, types=['FIGHTING', 'POISON'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ) ], opp_active='toedscruel', opp_known=[ TeamMon( species='toedscruel', hp=1.0, fainted=False, types=['GROUND', 'GRASS'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ) ], legal_moves=[ MoveOption(move_id='aurasphere', base_power=80, accuracy=1.0, move_type='FIGHTING', pp=32, priority=0), MoveOption(move_id='uturn', base_power=70, accuracy=1.0, move_type='BUG', pp=32, priority=0), MoveOption(move_id='dragonpulse', base_power=85, accuracy=1.0, move_type='DRAGON', pp=16, priority=0), MoveOption(move_id='waterpulse', base_power=60, accuracy=1.0, move_type='WATER', pp=32, priority=0) ], legal_switches=[ SwitchOption(species='volcarona', hp=1.0, fainted=False, types=['BUG', 'FIRE']), SwitchOption(species='delphox', hp=1.0, fainted=False, types=['FIRE', 'PSYCHIC']), SwitchOption(species='golduck', hp=1.0, fainted=False, types=['WATER']), SwitchOption(species='raichualola', hp=1.0, fainted=False, types=['ELECTRIC', 'PSYCHIC']), SwitchOption(species='sneasler', hp=1.0, fainted=False, types=['FIGHTING', 'POISON']) ], past_actions=[] )
DECISION: Decision( kind='move', move_id='aurasphere', switch_species=None, rationale='Aurasphere is a Fighting-type move and is super-effective against Toedscruel, which is part Ground-type. It has 100% accuracy, making it a reliable choice to potentially knock out the opponent.' )
T2 DECISION: Aurasphere is a high-accuracy Fighting-type move that is super effective against Toedscruel's Water-type, making it the best choice to maximize damage.
T3 DECISION: Aurasphere is a high-accuracy move that is super-effective against Duraludon, which is currently at low health (15% HP). This move can potentially knock it out.
T4 DECISION: Clawitzer has used 'aurasphere' for the last three turns against Water-type opponents and its health
is reduced. Switching to Volcarona, which has full HP and is not at risk of being KO'd, allows for a fresh
offensive strategy next turn.
T5 DECISION: Using Fire Blast as it is a high-power move and super-effective against the opposing Water/Dark type Samurott, making it an optimal attack choice.
T6 DECISION: Fire Blast is a high-powered and super-effective move against Toedscruel, which is a Water type Pokémon. It has good accuracy (85%), and with Volcarona's remaining HP being full, it can afford to make this attack.
T7 DECISION: Fire Blast is super-effective against Samurott Hisuian, which is a dual Water/Dark type Pokémon. With Volcarona at full HP (1.0), using Fire Blast takes advantage of its high power and accuracy, which is beneficial given the circumstances.
T8 DECISION: Fire Blast is a high-accuracy move that is super effective against Samurott-Hisui, which is currently low on HP. This move can potentially knock it out and maximize the chances to win this turn.
T9 DECISION: Using Fire Blast is a high-accuracy, high-damage (110 base power) move against Okidogi, which is weak to Fire-type attacks, making it super effective.
T10 DECISION: Volcarona is at very low health (3.5%), making it highly vulnerable to being knocked out. Switching to Delphox, which is at full health (100%), will provide a safer option for the battle.
T11 DECISION: Psyshock is a high-accuracy move that is super-effective against Cramorant's Psychic-type weakness. It deals 80 base power damage and has a 100% chance to hit, making it the best option.
T12 DECISION: Fire Blast is a high-damage Fire-type move, which is super-effective against Cramorant's Grass-type aspect, maximizing my damage potential in this turn.
T13 DECISION: Psyshock is a high-accuracy (100%) Psychic type move that is super-effective against Okidogi (Poison/Fighting). This move will deal substantial damage.
T14 DECISION: Psyshock is a high-accuracy move and is super effective against Glaceon's Ice type, making it the best option to deal significant damage.
T15 DECISION: Fire Blast is a high-power, high-accuracy move that is still viable. While Psyshock is super effective against Glaceon because it is an Ice type, Fire Blast also deals significant damage and has a chance to burn, offering additional tactical advantages.
T16 DECISION: Psyshock is a high-accuracy Psychic-type move and is super-effective against Toedscruel, which is a Water-type Pokémon. Using it can potentially KO Toedscruel, allowing for a better position in the battle.
T17 DECISION: Psyshock is a high-accuracy move that is super effective against Toedscruel's Psychic typing, and it can potentially knock it out given its low health.
T18 DECISION: Psyshock is a high-accuracy Psychic move that is super effective against Cramorant, which is weak to Psychic-type moves. Delphox also has a good chance of knocking out Cramorant given its current HP.
T19 DECISION: Psyshock is a high-accuracy and super-effective move against Cramorant, which has low HP. It's likely to knock it out and secure a significant advantage in the battle.
T20 DECISION: Psyshock is a high-accuracy move (100%) with a good base power (80) that can deal effective damage to Okidogi, which does not resist Psychic-type moves.
🕸️ Introducing Pydantic Graphs — the foundation for structured multi-agent workflows#
So far, our agent acted as a single decision-maker: it observed context, reasoned once, and returned a move. But as environments grow in complexity — multiple objectives, conflicting strategies, limited time — we need many specialized agents working together.
That’s where 🧩 Pydantic Graphs come in.
⚙️ What are Pydantic Graphs?#
Pydantic Graphs extend the idea of typed LLM workflows: instead of chaining prompts manually, you define a graph of agents — each node is a typed, callable component (Agent, Tool, or function), and edges represent how their structured outputs flow into each other.
Each node’s input/output types are enforced by Pydantic models, guaranteeing that ✅ every agent receives valid structured data, ✅ workflows are composable, debuggable, and inspectable, ✅ and parallel/conditional execution (“run these 3 evaluators in parallel”) becomes trivial.
🤝 Why multi-agent workflows?#
Real decision problems rarely have one “best” heuristic — they’re multi-objective:
Tactical reward vs safety (damage vs survivability)
Short-term payoff vs long-term setup
Exploration vs exploitation
Multi-agent graphs let you distribute cognition:
Each node/agent handles a sub-skill (planner, tactician, risk, scout).
Coordination logic (e.g., a manager or arbiter) fuses their reasoning.
Memory and arbitration layers can be swapped independently (for ablations).
This architecture naturally scales to agentic swarms — large ensembles of specialized agents that coordinate dynamically, forming emergent intelligence beyond a single model’s scope.
🔀 Static vs Dynamic Query Routing#
In our earlier “Manager” agent, the routing (which specialists to call) was static — we hard-coded: “always call Tactician, call Risk if danger > 0.6, call Scout every 3 turns”.
Dynamic routing, enabled by Pydantic Graphs, makes this adaptive:
Each agent’s outputs (or intermediate metadata like uncertainty, cost, or confidence) can dynamically decide the next edges to traverse.
If the planner returns low-confidence moves, the graph might automatically trigger the Risk Officer or Critic path.
If confidence is high, it can skip extra steps to save latency or tokens.
🧩 Benefit: Resource-aware, self-adapting workflows that scale gracefully — the system “thinks harder” only when needed.
✏️ Query Rewriting#
Another advanced feature is query rewriting — when incoming queries or contexts are transformed before being passed to downstream agents. In Pokémon terms, before the planner decides, a context rewriter might:
Simplify redundant details (“ignore irrelevant side conditions”), or
Add derived features (“this move is likely super-effective against Water”).
This lets different specialists receive domain-specific representations of the same state, improving efficiency and interpretability.
🚀 Why it matters
Together, dynamic routing and query rewriting turn a static, hand-crafted pipeline into a living cognitive graph:
💡 Adaptive: reasoning depth scales with uncertainty or stakes.
🧠 Modular: new skills or evaluators can be plugged in as new nodes.
⚖️ Efficient: token and time budgets are managed intelligently.
🔍 Transparent: every decision path and intermediate output is traceable.
By using Pydantic Graphs, we can finally move from “prompt chains” to structured, interpretable agentic systems — the same architectural leap that turns simple agents into full-fledged, cooperative AI swarms.
from pydantic_graph import BaseNode, End, Graph, GraphRunContext
class PlanCandidate(BaseModel):
kind: Literal["move", "switch"]
move_id: Optional[str] = None
switch_species: Optional[str] = None
rationale: str
class Plan(BaseModel):
candidates: List[PlanCandidate]
class EvalScore(BaseModel):
score: float
notes: Optional[str] = None
class GraphState(BaseModel):
context: AgentContext
plan: Optional[Plan] = None
tactician_scores: Optional[List[EvalScore]] = None
risk_scores: Optional[List[EvalScore]] = None
scout_scores: Optional[List[EvalScore]] = None
final_decision: Optional[PlanCandidate] = None
🧭 The Manager–Coordinator Paradigm in Agentic Swarms#
Let’s now implement our first multi-agent workflow: managerial multi-agent coordination pattern through a structured Pydantic Graph — where each node acts as a specialized agent, and together they form a coordinated decision-making swarm.
🕸️ The Idea: Manager + Specialists = Smarter Decisions#
Instead of relying on a single monolithic model, this setup distributes reasoning across multiple specialized roles — just like a management hierarchy in human organizations:
PlannerNode (Coordinator) 🧠 → proposes candidate actions (moves/switches).
TacticianNode ⚔️ → evaluates each candidate for expected value (damage, tempo).
RiskNode 🛡️ → evaluates safety and survivability.
ScoutNode 🔍 → evaluates information gain (learning about opponent’s hidden Pokémon).
DecisionNode (Manager) 🧩 → aggregates all scores and makes the final move selection.
Each node operates independently but shares a common state (GraphState) that persists through the workflow — this gives the system continuity, explainability, and structured memory across reasoning steps.
⚙️ How Pydantic Graphs Enable This#
Pydantic Graphs make this explicitly declarative:
Each node inherits from
BaseNodeand defines anasync run()method that updates a sharedGraphState.Nodes specify their next node — e.g.,
PlannerNode → TacticianNode → RiskNode → ScoutNode → DecisionNode → End.The
Graphobject (planner_graph) defines the entire workflow and its state type (GraphState), ensuring all data between nodes remains valid and typed.The graph runtime (
GraphRunContext) automatically handles execution order, state persistence, and error handling/retries.
This means the graph acts as an orchestration layer over multiple LLMs — a mini “swarm intelligence” system where reasoning flows like information through an organization chart.
🧩 Why the Manager-Coordinator Model Matters#
Decomposition of reasoning: Each agent focuses on a narrow cognitive skill — simplifying prompts, improving interpretability, and reducing hallucinations.
Parallelism and composability: Multiple evaluators can be executed concurrently, and new agents (e.g., “Healer Advisor”, “Weather Analyst”) can be plugged in without refactoring the graph.
Explainability: Every step is transparent — you can inspect the planner’s candidates, each specialist’s scores, and the rationale behind the final decision.
Dynamic scalability: The manager can later evolve to dynamic routing, consulting only relevant specialists based on battle context or uncertainty — enabling true adaptive swarms.
@dataclass
class PlannerNode(BaseNode):
planner_agent = Agent(
model="openrouter:openai/gpt-5-mini",
system_prompt=(
"You propose 2-4 legal actions for the given Pokémon battle context. "
"Prefer super-effective, high-accuracy moves; consider switches if HP is low or risk is high. "
"Do NOT invent illegal actions."
),
output_type=Plan,
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> TacticianNode:
state = context.state
plans = (await self.planner_agent.run(agent_context_to_string(state.context))).output
state.plan = plans
return TacticianNode()
@dataclass
class TacticianNode(BaseNode):
tactician_agent = Agent(
model="openrouter:openai/gpt-5-mini",
system_prompt=(
"You are a Pokémon battle tactician. "
"Score each candidate 0..1 for expected value (damage + board advantage)."
),
output_type=List[EvalScore],
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> RiskNode:
state = context.state
assert state.plan is not None, "Plan must be set before TacticianNode runs."
prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
scores = (await self.tactician_agent.run(prompt)).output
state.tactician_scores = scores
return RiskNode()
@dataclass
class RiskNode(BaseNode):
risk_agent = Agent(
model="openrouter:openai/gpt-5-mini",
system_prompt=(
"You are a Pokémon battle risk assessor. "
"Score each candidate 0..1 for risk (chance of failure, negative outcomes)."
),
output_type=List[EvalScore],
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> ScoutNode:
state = context.state
assert state.plan is not None, "Plan must be set before RiskNode runs."
prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
scores = (await self.risk_agent.run(prompt)).output
state.risk_scores = scores
return ScoutNode()
@dataclass
class ScoutNode(BaseNode):
scout_agent = Agent(
model="openrouter:openai/gpt-5-mini",
system_prompt=(
"You are a Pokémon battle scout. "
"Score each candidate 0..1 for information gain (revealing opponent's unknowns)."
),
output_type=List[EvalScore],
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> DecisionNode:
state = context.state
assert state.plan is not None, "Plan must be set before ScoutNode runs."
prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
scores = (await self.scout_agent.run(prompt)).output
state.scout_scores = scores
return DecisionNode()
@dataclass
class DecisionNode(BaseNode):
decision_agent = Agent(
model="openrouter:openai/gpt-5-mini",
system_prompt=(
"You are a Pokémon battle decision maker. "
"Using the provided scores from tactician, risk, and scout, "
"select the best candidate action to take."
),
output_type=PlanCandidate,
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> End:
state = context.state
assert state.tactician_scores is not None, "Tactician scores must be set before DecisionNode runs."
assert state.risk_scores is not None, "Risk scores must be set before DecisionNode runs."
assert state.scout_scores is not None, "Scout scores must be set before DecisionNode runs."
prompt = agent_context_to_string(state.context) + "\n\n"
prompt += "Planned Candidates:\n" + state.plan.model_dump_json(indent=2) + "\n\n"
prompt += "Tactician Scores:\n" + str([es.model_dump_json(indent=2) for es in state.tactician_scores]) + "\n\n"
prompt += "Risk Scores:\n" + str([es.model_dump_json(indent=2) for es in state.risk_scores]) + "\n\n"
prompt += "Scout Scores:\n" + str([es.model_dump_json(indent=2) for es in state.scout_scores]) + "\n\n"
decision = (await self.decision_agent.run(prompt)).output
state.final_decision = decision
return End(state.final_decision)
planner_graph = Graph(nodes=[PlannerNode, TacticianNode, RiskNode, ScoutNode, DecisionNode], state_type=GraphState)
example_agent_context = AgentContext(
turn=1,
weather={},
you_active="Pikachu",
you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
opp_active="Bulbasaur",
opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
past_actions=[],
)
result = await planner_graph.run(PlannerNode(), state=GraphState(context=example_agent_context))
rprint(result.state.final_decision)
23:16:39.013 run graph planner_graph
23:16:39.014 run node PlannerNode
23:17:22.155 run node TacticianNode
23:17:43.325 run node RiskNode
23:18:17.161 run node ScoutNode
23:18:53.984 run node DecisionNode
PlanCandidate( kind='move', move_id='Thunderbolt', switch_species=None, rationale='Choose Thunderbolt: 100% accuracy and high power; Pikachu has +1 Atk while Bulbasaur is -1 Def and paralyzed, making an immediate KO extremely likely. Switching concedes a free turn and forfeits the probable instant elimination—attacking now maximizes EV and minimizes risk.' )
Let’s see it in action!
from rich import print as rprint
import nest_asyncio
import logfire
import asyncio
logfire.configure(send_to_logfire=False) # set to true if you want to use logfire console
nest_asyncio.apply()
class PydanticGraphAgent(Player):
def __init__(self, name: str, agentic_graph: Graph, first_node: BaseNode, **kwargs):
super().__init__(**kwargs)
self.name = name
self._past_actions: List[str] = []
self.graph = agentic_graph
self.first_node = first_node
def choose_move(self, battle: Battle):
# Build structured context for the agent
ctx = build_context(battle, past_actions=self._past_actions)
# Run agent to get decision
result = asyncio.run(self.graph.run(self.first_node, state=GraphState(context=ctx)))
decision = result.state.final_decision
if battle.turn <= 1:
rprint(f"CONTEXT:", ctx)
rprint(f"DECISION:", decision)
else:
rprint(f"T{ctx.turn} DECISION:", decision.rationale)
# Map Decision → poke-env action
if decision.kind == "move":
# find the matching legal move
for m in battle.available_moves:
if m.id == decision.move_id:
self._past_actions.append(
f"T{ctx.turn}: MOVE {decision.move_id} ({ctx.you_active} vs {ctx.opp_active})"
)
return self.create_order(m)
elif decision.kind == "switch":
# find the matching legal switch
for p in battle.available_switches:
if p.species == decision.switch_species:
self._past_actions.append(
f"T{ctx.turn}: SWITCH to {decision.switch_species} (from {ctx.you_active})"
)
return self.create_order(p)
# Fallback: if agent suggested an illegal action (shouldn't happen), choose random
self._past_actions.append(f"T{ctx.turn}: FALLBACK random")
return self.choose_random_move(battle)
Logfire project URL: https://logfire-eu.pydantic.dev/shreshthtuli/agenticai
coordination_graph = Graph(nodes=[PlannerNode, TacticianNode, RiskNode, ScoutNode, DecisionNode], state_type=GraphState)
coordination_player = PydanticGraphAgent(name="LLM Agent", agentic_graph=coordination_graph, first_node=PlannerNode())
await coordination_player.battle_against(RandomPlayer(), n_battles=1)
22:31:19.383 run graph None
22:31:19.386 run node PlannerNode
22:31:47.861 run node TacticianNode
22:32:17.785 run node RiskNode
22:32:49.956 run node ScoutNode
22:33:20.686 run node DecisionNode
CONTEXT: AgentContext( turn=1, you_active='hitmontop', you_team=[ TeamMon(species='hitmontop', hp=1.0, fainted=False, types=['FIGHTING']), TeamMon(species='palafin', hp=1.0, fainted=False, types=['WATER']), TeamMon(species='brambleghast', hp=1.0, fainted=False, types=['GRASS', 'GHOST']), TeamMon(species='keldeoresolute', hp=1.0, fainted=False, types=['WATER', 'FIGHTING']), TeamMon(species='goodra', hp=1.0, fainted=False, types=['DRAGON']), TeamMon(species='pyroar', hp=1.0, fainted=False, types=['FIRE', 'NORMAL']) ], opp_active='phione', opp_known=[TeamMon(species='phione', hp=1.0, fainted=False, types=['WATER'])], legal_moves=[ MoveOption(move_id='rapidspin', base_power=50, accuracy=1.0, move_type='NORMAL', pp=64, priority=0), MoveOption(move_id='stoneedge', base_power=100, accuracy=0.8, move_type='ROCK', pp=8, priority=0), MoveOption(move_id='suckerpunch', base_power=70, accuracy=1.0, move_type='DARK', pp=8, priority=1), MoveOption(move_id='closecombat', base_power=120, accuracy=1.0, move_type='FIGHTING', pp=8, priority=0) ], legal_switches=[ SwitchOption(species='palafin', hp=1.0, fainted=False, types=['WATER']), SwitchOption(species='brambleghast', hp=1.0, fainted=False, types=['GRASS', 'GHOST']), SwitchOption(species='keldeoresolute', hp=1.0, fainted=False, types=['WATER', 'FIGHTING']), SwitchOption(species='goodra', hp=1.0, fainted=False, types=['DRAGON']), SwitchOption(species='pyroar', hp=1.0, fainted=False, types=['FIRE', 'NORMAL']) ], past_actions=[] )
DECISION: PlanCandidate( kind='move', move_id='suckerpunch', switch_species=None, rationale="Choose Sucker Punch: highest combined tactical and scouting value. Priority (+1) lets you beat Phione if it attacks, preserves board state vs being KO'd, and provides strong information about whether Phione intended to attack. The moderate risk of failing vs Protect/status/switch is acceptable given the upside on turn 1." )
22:33:37.220 run graph None
22:33:37.221 run node PlannerNode
22:34:16.439 run node TacticianNode
22:34:56.803 run node RiskNode
22:36:16.135 run node ScoutNode
22:37:08.924 run node DecisionNode
22:37:30.422 run graph None
22:37:30.422 run node PlannerNode
22:37:59.223 run node TacticianNode
22:38:48.199 run node RiskNode
22:39:21.331 run node ScoutNode
22:39:50.271 run node DecisionNode
22:40:11.907 run graph None
22:40:11.907 run node PlannerNode
22:40:56.972 run node TacticianNode
22:41:35.116 run node RiskNode
22:42:19.953 run node ScoutNode
22:42:57.951 run node DecisionNode
22:43:12.405 run graph None
22:43:12.406 run node PlannerNode
22:44:02.596 run node TacticianNode
22:44:43.567 run node RiskNode
22:45:19.134 run node ScoutNode
22:45:56.038 run node DecisionNode
22:46:04.602 run graph None
22:46:04.603 run node PlannerNode
22:46:35.056 run node TacticianNode
22:47:13.838 run node RiskNode
22:47:48.703 run node ScoutNode
22:48:29.847 run node DecisionNode
22:48:43.353 run graph None
22:48:43.355 run node PlannerNode
22:49:22.736 run node TacticianNode
22:50:04.054 run node RiskNode
22:50:37.960 run node ScoutNode
22:51:21.165 run node DecisionNode
22:51:47.109 run graph None
22:51:47.110 run node PlannerNode
22:52:17.801 run node TacticianNode
22:52:51.469 run node RiskNode
22:53:31.798 run node ScoutNode
22:54:02.783 run node DecisionNode
22:54:13.308 run graph None
22:54:13.309 run node PlannerNode
22:54:44.880 run node TacticianNode
22:55:27.497 run node RiskNode
22:55:57.522 run node ScoutNode
22:56:33.739 run node DecisionNode
22:56:47.403 run graph None
22:56:47.404 run node PlannerNode
22:57:43.008 run node TacticianNode
22:58:20.860 run node RiskNode
22:59:16.188 run node ScoutNode
23:00:01.194 run node DecisionNode
23:00:12.452 run graph None
23:00:12.453 run node PlannerNode
23:03:32.602 run node TacticianNode
23:03:56.897 run node RiskNode
23:04:27.360 run node ScoutNode
23:05:23.861 run node DecisionNode
23:05:39.230 run graph None
23:05:39.231 run node PlannerNode
23:06:30.547 run node TacticianNode
23:06:55.893 run node RiskNode
23:07:25.681 run node ScoutNode
23:08:14.485 run node DecisionNode
23:08:24.013 run graph None
23:08:24.014 run node PlannerNode
23:09:01.067 run node TacticianNode
23:09:32.810 run node RiskNode
23:10:08.895 run node ScoutNode
Result:
You can also view the full battle here.
🗳️ Democratic multi-agent swarms#
Now let’s move onto the next agentic swarm model: democratic orchestration. Herein,
a Planner or multiple planners, propose(s) several legal candidates (moves/switches),
multiple independent voters each judge every candidate from a different lens, and
a Tally node picks the action with the most YES votes.
Nodes & roles
PlannerNode→ produces 3–4 legal, diverse candidates for the current battle context.Voters (parallelizable):
AccuracyVoterNode– prefer high-reliability actions (≥90% accuracy or safe switch).TypeMatchupVoterNode– reward good type effectiveness or improved matchup after switch.TempoVoterNode– prefer momentum (threaten KO, force a switch, safe setup).PPVoterNode– favor conserving scarce PP/resources.DiversityVoterNode– encourage non-redundant options (coverage/status/switch variety).
TallyNode→ sums 0/1 votes per candidate and returns the majority winner (ties break by first max) or combines multiple plans, with their critiques/rationales to the best plan.
Each voter returns a list of 0/1 (YES/NO) aligned with plan.candidates, keeping the interface simple and debuggable. A simpler version of this demoncratic debate idea has also been shown by Andrej Karpathy’s LLM Council.
🧠 When to use Democratic swarms vs Manager–Coordinator#
Use Democratic when:
You want robustness via diversity: many simple judges smooth out any one agent’s bias.
The task benefits from ensemble wisdom and parallel scoring of options.
You need transparent preference profiles (“why did we pick this?” → look at voter tallies).
Latency budget allows fan-out to several voters.
Use Manager–Coordinator when:
You need budget-aware routing (call specialists only when danger/uncertainty is high).
The task has a clear decision funnel (plan → specific specialists → decision).
You want conditional depth (think harder only when needed) for tighter SLAs.
You prefer a single final authority aggregating nuanced scores/metrics.
Rule of thumb:
Exploration, variety, early prototyping → start with Democracy.
Production with SLAs, cost constraints → move to Manager/Coordinator (dynamic routing, early exit).
🧬 Mixed-model ensembles (per-agent LLMs)#
Each node can use a different LLM (as shown: OpenAI, Anthropic, Google, xAI, Qwen) to specialize strengths:
Models with longer context or stronger reasoning can power the Planner or Type voter.
Faster/cheaper models can handle Accuracy/PP voters at scale.
Mixing providers reduces correlated failure modes and improves ensemble reliability.
Benefit: You get a portfolio effect—diverse models + diverse criteria → more stable decisions under uncertainty.
🧾 Why this pattern is nice to teach & extend
Simple contract: voters return
[0/1, …]; the tally is trivial to audit.Parallel-friendly: voters can run concurrently for low wall-time.
Composable: add/remove voters without touching the rest of the graph.
Explainable: log
plan.candidates+ each voter’s vector to visualize support per option.
Next steps: try replacing 0/1 votes with ranked ballots (Borda/Condorcet), or add confidence-weighted voting to blend democratic and managerial ideas.
from __future__ import annotations
from typing import List, Optional, Literal, Dict
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_graph import BaseNode, End, Graph, GraphRunContext
from rich import print as rprint
class PlanCandidate(BaseModel):
kind: Literal["move", "switch"]
move_id: Optional[str] = None
switch_species: Optional[str] = None
rationale: str
class Plan(BaseModel):
candidates: List[PlanCandidate]
class GraphState(BaseModel):
context: AgentContext
plan: Optional[Plan] = None
accuracy_votes: Optional[List[int]] = None
type_votes: Optional[List[int]] = None
tempo_votes: Optional[List[int]] = None
pp_votes: Optional[List[int]] = None
diversity_votes: Optional[List[int]] = None
final_decision: Optional[PlanCandidate] = None
@dataclass
class PlannerNode(BaseNode):
planner_agent = Agent(
model="openrouter:openai/gpt-5-mini",
system_prompt=(
"You are a Pokémon move planner. From the given context, propose 3-4 LEGAL actions "
"(moves or switches). Prefer super-effective, high-accuracy moves; include at least "
"one safe SWITCH if current matchup looks poor. Do NOT invent illegal actions."
),
output_type=Plan,
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> "AccuracyVoterNode":
state = context.state
plan = (await self.planner_agent.run(agent_context_to_string(state.context))).output
if len(plan.candidates) < 2:
plan.candidates = plan.candidates * 2
plan.candidates[1].rationale = "Fallback duplicate to enable voting."
state.plan = plan
return AccuracyVoterNode()
@dataclass
class AccuracyVoterNode(BaseNode):
agent = Agent(
model="openrouter:openai/gpt-5-mini",
system_prompt=(
"Accuracy Voter: For each candidate, vote 1 if the action is high-reliability "
"(move accuracy >= 90% or a SWITCH that avoids a likely miss/KO), else 0. "
"Return a Python list of 0/1 of the same length as candidates, no extra text."
),
output_type=List[int],
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> "TypeMatchupVoterNode":
state = context.state
assert state.plan is not None
prompt = (
agent_context_to_string(state.context)
+ "\n\nCANDIDATES:\n"
+ state.plan.model_dump_json(indent=2)
)
votes = (await self.agent.run(prompt)).output
state.accuracy_votes = votes
return TypeMatchupVoterNode()
@dataclass
class TypeMatchupVoterNode(BaseNode):
agent = Agent(
model="openrouter:anthropic/claude-sonnet-4.5",
system_prompt=(
"Type Matchup Voter: For each candidate, vote 1 if the MOVE is likely super-effective "
"or at least neutral (avoid not-very-effective/immunity), or if a SWITCH improves the type matchup; "
"otherwise 0. Return a Python list of 0/1, same length as candidates."
),
output_type=List[int],
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> "TempoVoterNode":
state = context.state
assert state.plan is not None
prompt = (
agent_context_to_string(state.context)
+ "\n\nCANDIDATES:\n"
+ state.plan.model_dump_json(indent=2)
)
votes = (await self.agent.run(prompt)).output
state.type_votes = votes
return TempoVoterNode()
@dataclass
class TempoVoterNode(BaseNode):
agent = Agent(
model="openrouter:google/gemini-2.5-flash",
system_prompt=(
"Tempo Voter: Vote 1 for candidates that are likely to seize or keep momentum this turn "
"(e.g., fast KO, force a switch, gain setup safely); otherwise 0. "
"Return a Python list of 0/1 aligned with candidates."
),
output_type=List[int],
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> "PPVoterNode":
state = context.state
assert state.plan is not None
prompt = (
agent_context_to_string(state.context)
+ "\n\nCANDIDATES:\n"
+ state.plan.model_dump_json(indent=2)
)
votes = (await self.agent.run(prompt)).output
state.tempo_votes = votes
return PPVoterNode()
@dataclass
class PPVoterNode(BaseNode):
agent = Agent(
model="openrouter:x-ai/grok-4-fast",
system_prompt=(
"PP Conservation Voter: Prefer conserving scarce PP; vote 1 when the candidate either "
"uses a common PP move for chip damage or SWITCHES to preserve a key low-PP move; else 0. "
"Return a Python list of 0/1 aligned with candidates."
),
output_type=List[int],
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> "DiversityVoterNode":
state = context.state
assert state.plan is not None
prompt = (
agent_context_to_string(state.context)
+ "\n\nCANDIDATES:\n"
+ state.plan.model_dump_json(indent=2)
)
votes = (await self.agent.run(prompt)).output
state.pp_votes = votes
return DiversityVoterNode()
@dataclass
class DiversityVoterNode(BaseNode):
agent = Agent(
model="openrouter:qwen/qwen3-next-80b-a3b-thinking",
system_prompt=(
"Diversity Voter: Encourage non-redundant options. Vote 1 when the candidate adds "
"coverage not present in other candidates this turn (e.g., different target, status vs raw damage, "
"or SWITCH to change matchup); else 0. Return a Python list of 0/1 aligned with candidates."
),
output_type=List[int],
retries=3,
)
async def run(self, context: GraphRunContext[GraphState]) -> "TallyNode":
state = context.state
assert state.plan is not None
prompt = (
agent_context_to_string(state.context)
+ "\n\nCANDIDATES:\n"
+ state.plan.model_dump_json(indent=2)
)
votes = (await self.agent.run(prompt)).output
state.diversity_votes = votes
return TallyNode()
@dataclass
class TallyNode(BaseNode):
"""Aggregate votes across voters and pick the candidate with the most YES votes."""
async def run(self, context: GraphRunContext[GraphState]) -> End:
s = context.state
assert s.plan is not None
buckets: List[List[int]] = []
for name in ["accuracy_votes", "type_votes", "tempo_votes", "pp_votes", "diversity_votes"]:
votes = getattr(s, name)
if votes is not None:
buckets.append(votes)
n_candidates = len(s.plan.candidates)
totals = [0] * n_candidates
for bucket in buckets:
if len(bucket) != n_candidates:
# Defensive: truncate/pad to align
bucket = (bucket + [0] * n_candidates)[:n_candidates]
for i, v in enumerate(bucket):
totals[i] += int(v)
# Majority pick (max votes); deterministic tiebreak = first max
best_idx = max(range(n_candidates), key=lambda i: totals[i])
s.final_decision = s.plan.candidates[best_idx]
# Optional: print a quick audit
rprint({"totals": totals, "chosen_index": best_idx, "chosen": s.final_decision})
return End(s.final_decision)
democracy_graph = Graph(
nodes=[PlannerNode, AccuracyVoterNode, TypeMatchupVoterNode, TempoVoterNode, PPVoterNode, DiversityVoterNode, TallyNode],
state_type=GraphState,
)
example_agent_context = AgentContext(
turn=1,
weather={},
you_active="Pikachu",
you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
opp_active="Bulbasaur",
opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
past_actions=[],
)
result = await democracy_graph.run(PlannerNode(), state=GraphState(context=example_agent_context))
rprint("Final Democratic Decision:", result.state.final_decision)
23:20:42.677 run graph democracy_graph
23:20:42.677 run node PlannerNode
23:21:17.224 run node AccuracyVoterNode
23:21:44.775 run node TypeMatchupVoterNode
23:21:48.946 run node TempoVoterNode
23:21:50.458 run node PPVoterNode
23:21:59.919 run node DiversityVoterNode
23:24:35.191 run node TallyNode
{ 'totals': [3, 3, 2], 'chosen_index': 0, 'chosen': PlanCandidate( kind='move', move_id='Thunderbolt', switch_species=None, rationale='Hit now with Thunderbolt — 100% accuracy and guaranteed damage this turn. Although Electric is not very effective vs Grass/Poison Bulbasaur, this is your only immediate attacking option and it may chip or finish a weakened/paralyzed foe.' ) }
Final Democratic Decision: PlanCandidate( kind='move', move_id='Thunderbolt', switch_species=None, rationale='Hit now with Thunderbolt — 100% accuracy and guaranteed damage this turn. Although Electric is not very effective vs Grass/Poison Bulbasaur, this is your only immediate attacking option and it may chip or finish a weakened/paralyzed foe.' )
🎭 Actor–Critic Multi-Agent Workflow#
Our final agentic swarm model is the actor–critic style workflow, again implemented using pydantic-graph. This model is inspired by reinforcement learning but adapted for multi-LLM reasoning. Here, we explicitly separate proposal and evaluation, allowing the agent swarm to reason iteratively about value and risk before acting.
🧩 Roles and flow#
The graph proceeds through four main nodes:
ActorNode– The policy generatorProposes 3–4 legal moves or switches based on the current Pokémon context.
Behaves like a policy network, outputting candidate actions with rationales.
CriticNode– The value estimatorEvaluates each candidate with a Q-value (expected outcome), risk, and confidence score.
This acts as a “value network” estimating how good each candidate really is.
ImproveNode– The policy improver (optional)If the best candidate is too risky or has low Q-value, the improver agent asks the actor to refine its plan.
The critic is then re-invoked to rescore the improved candidates.
This mimics the actor–critic policy improvement loop in RL.
SelectorNode– The final decision layerCombines critic outputs into an adjusted score: \(\text{AdjustedScore} = Q \times (1 - \lambda \cdot \text{risk}) \times (0.5 + 0.5 \times \text{confidence})\)
Picks the candidate with the highest adjusted value and terminates the graph with an
Endnode.
🧠 What makes this pattern powerful#
Iterative refinement – Unlike the manager-coordinator (hierarchical) or democratic (ensemble) designs, the actor–critic loop learns from its own evaluation.
Value-based reasoning – The critic explicitly quantifies the expected reward of each move, enabling long-term strategic play rather than greedy local choices.
Adaptive depth – The
ImproveNodeonly triggers refinement when quality or safety drops, giving us dynamic compute allocation.Interpretability – Q-values, risk, and confidence are visible for each decision, so you can trace why the agent preferred one move over another.
⚙️ Comparing paradigms#
Workflow Type |
Nature |
Example Use |
Pros |
Trade-offs |
|---|---|---|---|---|
Manager–Coordinator |
Hierarchical |
Strategic planning under constraints |
Modular, dynamic routing |
Slight overhead for routing logic |
Democratic |
Ensemble |
Collective judgment / robustness |
High diversity, fault tolerance |
Higher latency, no feedback loop |
Actor–Critic |
Iterative feedback |
Adaptive value-based control |
Learns/refines actions, interpretable |
Slightly more compute per turn |
🚀 Why it fits Pokémon and beyond#
Battles require balancing expected gain vs. survivability, just like value-based RL tasks.
The critic captures contextual trade-offs (damage, tempo, risk), while the actor continuously learns what kinds of proposals score best.
This same structure can generalize to decision-making agents in finance, robotics, or multi-stage planning — anywhere feedback-driven refinement is useful.
from __future__ import annotations
from typing import List, Optional, Literal
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_graph import BaseNode, End, Graph, GraphRunContext
from rich import print as rprint
class CriticScore(BaseModel):
index: int = Field(description="Index into Plan.candidates[]")
q_value: float = Field(ge=0.0, description="Estimated value; higher is better")
risk: float = Field(ge=0.0, le=1.0, description="0 safe, 1 very risky")
confidence: float = Field(ge=0.0, le=1.0, description="Critic confidence in this score")
notes: Optional[str] = None
class GraphState(BaseModel):
context: AgentContext
plan: Optional[Plan] = None
critic_scores: Optional[List[CriticScore]] = None
refined: bool = False
final_decision: Optional[PlanCandidate] = None
@dataclass
class ActorNode(BaseNode):
actor = Agent(
model="openrouter:google/gemini-2.5-pro",
system_prompt=(
"ACTOR: Propose 3-4 LEGAL actions (moves or switches) for the current Pokémon context.\n"
"Favor super-effective, high-accuracy moves; include a safe SWITCH if matchup is bad.\n"
"Do NOT invent illegal actions. Keep rationales concise."
),
output_type=Plan,
retries=3
)
async def run(self, context: GraphRunContext[GraphState]) -> "CriticNode":
s = context.state
plan = (await self.actor.run(agent_context_to_string(s.context))).output
s.plan = plan
return CriticNode()
@dataclass
class CriticNode(BaseNode):
critic = Agent(
model="openrouter:anthropic/claude-sonnet-4.5",
system_prompt=(
"CRITIC: For each candidate, estimate a Q-value in [0, +inf) capturing expected outcome "
"(damage, survival, tempo) for THIS TURN and near future. Also output risk in [0,1] "
"(0=safe,1=dangerous) and confidence in [0,1]. Keep notes brief. "
"Return a list aligned with candidates using fields: index, q_value, risk, confidence, notes."
),
output_type=List[CriticScore],
retries=3
)
async def run(self, context: GraphRunContext[GraphState]) -> "ImproveNode":
s = context.state
assert s.plan is not None
prompt = agent_context_to_string(s.context) + "\n\nCANDIDATES:\n" + s.plan.model_dump_json(indent=2)
scores = (await self.critic.run(prompt)).output
# Defensive: clamp and align indices
n = len(s.plan.candidates)
clean = []
for sc in scores:
i = max(0, min(n - 1, int(sc.index)))
clean.append(CriticScore(
index=i,
q_value=max(0.0, float(sc.q_value)),
risk=min(1.0, max(0.0, float(sc.risk))),
confidence=min(1.0, max(0.0, float(sc.confidence))),
notes=sc.notes,
))
s.critic_scores = clean
return ImproveNode()
@dataclass
class ImproveNode(BaseNode):
"""Optional one-step policy improvement: if best Q is weak or risk is high, ask actor to refine once."""
improver = Agent(
model="openrouter:openai/gpt-5",
system_prompt=(
"IMPROVER: Given context, current candidates, and critic feedback, produce up to 2 REFINED "
"legal alternatives that address the critic's concerns (e.g., too risky, low value). "
"If current best is already strong, return an empty list to keep it."
),
output_type=List[PlanCandidate],
retries=2
)
# thresholds for triggering refinement
min_good_q: float = 0.75
max_ok_risk: float = 0.65
async def run(self, context: GraphRunContext[GraphState]) -> "SelectorNode":
s = context.state
assert s.plan is not None and s.critic_scores is not None
# Determine if refinement is needed
best = max(s.critic_scores, key=lambda x: x.q_value)
need_refine = (best.q_value < self.min_good_q) or (best.risk > self.max_ok_risk)
if not need_refine or s.refined:
return SelectorNode()
prompt = (
agent_context_to_string(s.context)
+ "\n\nCANDIDATES:\n" + s.plan.model_dump_json(indent=2)
+ "\n\nCRITIC:\n" + "\n".join(f"[{c.index}] Q={c.q_value:.2f} risk={c.risk:.2f} conf={c.confidence:.2f} {c.notes or ''}"
for c in s.critic_scores)
)
new_opts = (await self.improver.run(prompt)).output
if new_opts:
# merge refined options (append, keep old too)
s.plan = Plan(candidates=(s.plan.candidates + new_opts)[:6]) # cap to avoid prompt bloat
s.refined = True
return CriticNode() # re-score with critic after refinement
else:
return SelectorNode()
@dataclass
class SelectorNode(BaseNode):
"""Pick argmax of an adjusted score: Q * (1 - λ*risk) with confidence weighting."""
lambda_risk: float = 0.35
async def run(self, context: GraphRunContext[GraphState]) -> End:
s = context.state
assert s.plan is not None and s.critic_scores is not None
adjusted = []
for sc in s.critic_scores:
adj = sc.q_value * (1.0 - self.lambda_risk * sc.risk) * (0.5 + 0.5 * sc.confidence)
adjusted.append((sc.index, adj))
best_idx, _ = max(adjusted, key=lambda t: t[1])
s.final_decision = s.plan.candidates[best_idx]
return End(s.final_decision)
actor_critic_graph = Graph(
nodes=[ActorNode, CriticNode, ImproveNode, SelectorNode],
state_type=GraphState,
)
example_agent_context = AgentContext(
turn=1,
weather={},
you_active="Pikachu",
you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
opp_active="Bulbasaur",
opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
past_actions=[],
)
result = await actor_critic_graph.run(ActorNode(), state=GraphState(context=example_agent_context))
rprint("Actor–Critic Decision:", result.state.final_decision)
23:33:07.115 run graph actor_critic_graph
23:33:07.117 run node ActorNode
23:33:22.450 run node CriticNode
23:33:30.335 run node ImproveNode
23:33:30.337 run node SelectorNode
Actor–Critic Decision: PlanCandidate( kind='move', move_id='Thunderbolt', switch_species=None, rationale='Thunderbolt is a strong, reliable STAB move with 100% accuracy. Bulbasaur is paralyzed, so it may not even move. This is a good offensive option.' )
actor_critic_graph = Graph(
nodes=[ActorNode, CriticNode, ImproveNode, SelectorNode],
state_type=GraphState,
)
actor_critic_agent = PydanticGraphAgent(name="LLM Agent", agentic_graph=actor_critic_graph, first_node=ActorNode())
await actor_critic_agent.battle_against(RandomPlayer(), n_battles=1)
23:33:35.277 run graph None
23:33:35.280 run node ActorNode
23:33:49.671 run node CriticNode
23:34:00.227 run node ImproveNode
23:34:00.229 run node SelectorNode
T5 DECISION: A safe switch. Ironthorns has a good defensive typing (Rock/Electric) against Mightyena's likely Dark-type STAB moves and can threaten back with its own powerful attacks.
23:34:00.237 run graph None
23:34:00.238 run node ActorNode
23:34:30.396 run node CriticNode
23:34:40.744 run node ImproveNode
23:34:40.744 run node SelectorNode
T6 DECISION: Deals STAB damage and allows a safe switch to a better-positioned Pokémon like Krookodile, adapting to the opponent's move.
23:34:40.752 run graph None
23:34:40.752 run node ActorNode
23:34:57.234 run node CriticNode
23:35:03.734 run node ImproveNode
23:35:03.734 run node SelectorNode
T6 DECISION: Mightyena is a Dark-type pokemon, so a Bug-type move like Megahorn would be super-effective. Ironthorns is faster and should be able to knock out Mightyena with this move.
23:35:03.741 run graph None
23:35:03.742 run node ActorNode
23:35:16.440 run node CriticNode
23:35:28.183 run node ImproveNode
23:35:28.183 run node SelectorNode
T7 DECISION: Close Combat is a super-effective, high-power move that will likely knock out Mightyena.
23:35:28.199 run graph None
23:35:28.199 run node ActorNode
23:35:46.445 run node CriticNode
23:35:54.060 run node ImproveNode
23:35:54.061 run node SelectorNode
T8 DECISION: This is another safe switch. Zapdos resists Flying-type attacks and can hit Enamorus with a super-effective Electric-type STAB move.
23:35:54.073 run graph None
23:35:54.074 run node ActorNode
23:36:07.962 run node CriticNode
23:36:15.468 run node ImproveNode
23:36:15.468 run node SelectorNode
T9 DECISION: Ironthorns resists Normal-type attacks, making it a safe Pokémon to switch into against Maushold. It can absorb a potential Population Bomb and retaliate.
23:36:15.474 run graph None
23:36:15.474 run node ActorNode
23:36:36.471 run node CriticNode
23:36:45.996 run node ImproveNode
23:36:45.996 run node SelectorNode
T10 DECISION: Volt Switch deals damage and allows a pivot to another Pokémon. This is a safe, strategic option to maintain momentum and react to a potential switch from the opponent.
23:36:46.004 run graph None
23:36:46.004 run node ActorNode
23:37:05.105 run node CriticNode
23:37:14.771 run node ImproveNode
23:37:14.771 run node SelectorNode
T10 DECISION: Krookodile is immune to Jolteon's electric attacks and can threaten it with a super-effective ground type move. It is a high-risk high-reward play.
23:37:14.779 run graph None
23:37:14.780 run node ActorNode
23:37:29.325 run node CriticNode
23:37:35.950 run node ImproveNode
23:37:35.950 run node SelectorNode
T11 DECISION: Enamorus is immune to Earthquake and threatens Krookodile with its Fairy typing. Iron Thorns has a type advantage, resisting Enamorus's Flying-type moves and can hit back with super-effective Rock or Electric-type attacks.
23:37:35.955 run graph None
23:37:35.956 run node ActorNode
23:37:54.459 run node CriticNode
23:38:02.806 run node ImproveNode
23:38:02.806 run node SelectorNode
T12 DECISION: Krookodile is immune to Jolteon's Electric-type STAB moves and can OHKO it with a Ground-type move in return.
23:38:02.816 run graph None
23:38:02.816 run node ActorNode
23:38:19.945 run node CriticNode
23:38:31.231 run node ImproveNode
23:38:31.231 run node SelectorNode
T13 DECISION: Earthquake is a super-effective STAB move against Jolteon, which is very likely to knock it out. Given Krookodile's immunity to Electric attacks, this is a strong offensive choice.
23:38:31.235 run graph None
23:38:31.235 run node ActorNode
23:38:49.396 run node CriticNode
23:38:57.798 run node ImproveNode
23:38:57.799 run node SelectorNode
T14 DECISION: Earthquake is Krookodile's strongest and most reliable STAB move against Maushold. It has high power and perfect accuracy, making it the most straightforward and effective offensive option.
23:38:57.804 run graph None
23:38:57.805 run node ActorNode
23:39:15.425 run node CriticNode
23:39:25.445 run node ImproveNode
23:39:25.446 run node SelectorNode
T15 DECISION: Iron Thorns is an excellent switch-in. Its Rock/Electric typing gives it a 4x super-effective advantage against Enamorus's Flying type, and it resists Flying-type moves.
23:39:25.454 run graph None
23:39:25.454 run node ActorNode
23:39:41.633 run node CriticNode
23:39:51.240 run node ImproveNode
23:39:51.240 run node SelectorNode
T16 DECISION: Wild Charge is a powerful, super-effective STAB move with 100% accuracy. It has a high chance to
knock out Enamorus in one hit.
23:39:51.250 run graph None
23:39:51.250 run node ActorNode
23:40:13.728 run node CriticNode
23:40:24.176 run node ImproveNode
23:40:24.177 run node SelectorNode
T17 DECISION: Volt Switch deals damage and allows a switch to a healthier Pokémon like Zapdos or Feraligatr, which can more safely absorb a hit from Maushold.
23:40:24.184 run graph None
23:40:24.185 run node ActorNode
23:40:43.283 run node CriticNode
23:40:51.208 run node ImproveNode
23:40:51.208 run node SelectorNode
T17 DECISION: Switching to Zapdos is another strong choice. It has good defensive typing, resisting potential Fighting-type coverage moves from Maushold. Zapdos is also very fast and can threaten with powerful special attacks.
23:40:51.217 run graph None
23:40:51.218 run node ActorNode
23:41:17.314 run node CriticNode
23:41:26.949 run node ImproveNode
23:41:26.949 run node SelectorNode
T18 DECISION: Discharge is a reliable STAB move with 100% accuracy, offering consistent damage against Maushold
without the risk of missing.
23:41:26.951 run graph None
23:41:26.951 run node ActorNode
23:41:47.572 run node CriticNode
23:41:56.741 run node ImproveNode
23:41:56.741 run node SelectorNode
T19 DECISION: Maushold is asleep and at low health. Discharge is a reliable STAB move with 100% accuracy that could
secure the knockout.
23:41:56.750 run graph None
23:41:56.750 run node ActorNode
23:42:10.500 run node CriticNode
23:42:18.964 run node ImproveNode
23:42:18.967 run node SelectorNode
CONTEXT: AgentContext( turn=1, weather={}, you_active='deoxys', you_team=[ TeamMon( species='deoxys', hp=1.0, fainted=False, types=['PSYCHIC'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='brutebonnet', hp=1.0, fainted=False, types=['GRASS', 'DARK'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='weavile', hp=1.0, fainted=False, types=['DARK', 'ICE'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='regice', hp=1.0, fainted=False, types=['ICE'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='fluttermane', hp=1.0, fainted=False, types=['GHOST', 'FAIRY'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='zapdosgalar', hp=1.0, fainted=False, types=['FIGHTING', 'FLYING'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ) ], opp_active='dedenne', opp_known=[ TeamMon( species='dedenne', hp=1.0, fainted=False, types=['ELECTRIC', 'FAIRY'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ) ], legal_moves=[ MoveOption(move_id='psychoboost', base_power=140, accuracy=0.9, move_type='PSYCHIC', pp=8, priority=0), MoveOption(move_id='knockoff', base_power=65, accuracy=1.0, move_type='DARK', pp=32, priority=0), MoveOption(move_id='extremespeed', base_power=80, accuracy=1.0, move_type='NORMAL', pp=8, priority=2), MoveOption(move_id='superpower', base_power=120, accuracy=1.0, move_type='FIGHTING', pp=8, priority=0) ], legal_switches=[ SwitchOption(species='brutebonnet', hp=1.0, fainted=False, types=['GRASS', 'DARK']), SwitchOption(species='weavile', hp=1.0, fainted=False, types=['DARK', 'ICE']), SwitchOption(species='regice', hp=1.0, fainted=False, types=['ICE']), SwitchOption(species='fluttermane', hp=1.0, fainted=False, types=['GHOST', 'FAIRY']), SwitchOption(species='zapdosgalar', hp=1.0, fainted=False, types=['FIGHTING', 'FLYING']) ], past_actions=[ 'T5: SWITCH to ironthorns (from krookodile)', 'T6: MOVE voltswitch (ironthorns vs mightyena)', 'T6: FALLBACK random', 'T7: MOVE closecombat (braviary vs mightyena)', 'T8: SWITCH to zapdos (from braviary)', 'T9: SWITCH to ironthorns (from zapdos)', 'T10: MOVE voltswitch (ironthorns vs mausholdfour)', 'T10: SWITCH to krookodile (from ironthorns)', 'T11: SWITCH to ironthorns (from krookodile)', 'T12: SWITCH to krookodile (from ironthorns)', 'T13: MOVE earthquake (krookodile vs jolteon)', 'T14: MOVE earthquake (krookodile vs mausholdfour)', 'T15: SWITCH to ironthorns (from krookodile)', 'T16: MOVE wildcharge (ironthorns vs enamorus)', 'T17: MOVE voltswitch (ironthorns vs mausholdfour)', 'T17: SWITCH to zapdos (from ironthorns)', 'T18: MOVE discharge (zapdos vs mausholdfour)', 'T19: MOVE discharge (zapdos vs mausholdfour)' ] )
DECISION: PlanCandidate( kind='switch', move_id=None, switch_species='fluttermane', rationale='Flutter Mane resists Electric-type moves and can threaten Dedenne with its STAB moves on the following turn. This is a relatively safe switch.' )
23:42:19.033 run graph None
23:42:19.034 run node ActorNode
23:42:40.585 run node CriticNode
23:42:50.718 run node ImproveNode
23:42:50.719 run node SelectorNode
T2 DECISION: High power STAB move that will deal significant damage to Glalie. Fluttermane is faster, so it will attack first.
23:42:50.731 run graph None
23:42:50.732 run node ActorNode
23:43:10.551 run node CriticNode
23:43:18.699 run node ImproveNode
23:43:18.700 run node SelectorNode
T3 DECISION: Moonblast is Flutter Mane's strongest STAB move and is likely to knock out the weakened Skuntank.
23:43:18.717 run graph None
23:43:18.719 run node ActorNode
23:43:38.243 run node CriticNode
23:43:49.050 run node ImproveNode
23:43:49.051 run node SelectorNode
T4 DECISION: Moonblast is a 4x super-effective STAB move against Garchomp, and with Flutter Mane's high speed and special attack, it has a strong chance of securing a one-hit knockout.
23:43:49.066 run graph None
23:43:49.069 run node ActorNode
23:44:12.956 run node CriticNode
23:44:23.324 run node ImproveNode
23:44:23.325 run node SelectorNode
T5 DECISION: This is your strongest neutral attack against Dedenne. Given Flutter Mane's high Speed and Special Attack, this is a reliable way to deal significant damage.
23:44:23.338 run graph None
23:44:23.339 run node ActorNode
23:44:38.240 run node CriticNode
23:44:47.859 run node ImproveNode
23:44:47.861 run node SelectorNode
T6 DECISION: Moonblast is your strongest STAB move against Dedenne and is very likely to knock it out, as Dedenne
is already at 52% HP.
23:44:47.886 run graph None
23:44:47.888 run node ActorNode
23:45:08.940 run node CriticNode
23:45:18.377 run node ImproveNode
23:45:18.378 run node SelectorNode
T7 DECISION: Regice has massive special defense and resists Ice-type attacks, making it a very safe switch-in against Glalie.
23:45:18.384 run graph None
23:45:18.386 run node ActorNode
23:45:37.245 run node CriticNode
23:45:44.577 run node ImproveNode
23:45:44.577 run node SelectorNode
T8 DECISION: Weavile resists Glalie's STAB Ice-type moves and can hit back with its own super-effective attacks.
23:45:44.585 run graph None
23:45:44.586 run node ActorNode
23:46:01.809 run node CriticNode
23:46:09.654 run node ImproveNode
23:46:09.655 run node SelectorNode
T9 DECISION: Weavile is faster than Glalie. Knock Off deals solid neutral damage and removes Glalie's item, which could be a threat.
23:46:09.671 run graph None
23:46:09.673 run node ActorNode
23:46:26.144 run node CriticNode
23:46:35.060 run node ImproveNode
23:46:35.061 run node SelectorNode
T10 DECISION: Weavile is faster and can deal significant neutral damage with Knock Off, potentially knocking out Glalie. This is a strong offensive option.
23:46:35.074 run graph None
23:46:35.076 run node ActorNode
23:46:53.692 run node CriticNode
23:47:01.924 run node ImproveNode
23:47:01.925 run node SelectorNode
T11 DECISION: Guaranteed to hit first and has 100% accuracy. Glalie is at low HP, so this priority STAB move should
secure the knockout.
23:47:01.935 run graph None
23:47:01.935 run node ActorNode
23:47:23.878 run node CriticNode
23:47:32.369 run node ImproveNode
23:47:32.370 run node SelectorNode
T12 DECISION: This is the safest option. With its priority, Ice Shard is guaranteed to hit first, and with 100%
accuracy, it's very likely to knock out the weakened Glalie without any risk.
23:47:32.384 run graph None
23:47:32.385 run node ActorNode
23:47:51.924 run node CriticNode
23:48:01.713 run node ImproveNode
23:48:01.713 run node SelectorNode
T13 DECISION: Brute Bonnet's Grass typing provides a 4x resistance to Pikachu's Electric STAB moves, making it a very safe switch-in.
23:48:01.729 run graph None
23:48:01.730 run node ActorNode
23:48:24.346 run node CriticNode
23:48:35.492 run node ImproveNode
23:48:35.492 run node SelectorNode
T14 DECISION: Puts the opponent to sleep with 100% accuracy, neutralizing the immediate threat and creating an
opportunity for a follow-up attack or a safe switch.
23:48:35.532 run graph None
23:48:35.534 run node ActorNode
23:49:01.947 run node CriticNode
23:49:10.117 run node ImproveNode
23:49:10.117 run node SelectorNode
T15 DECISION: This is your strongest attack. It's a super-effective STAB move against a sleeping opponent, so you are guaranteed to land a powerful hit.
23:49:10.154 run graph None
23:49:10.155 run node ActorNode
23:49:35.530 run node CriticNode
23:49:44.780 run node ImproveNode
23:49:44.780 run node SelectorNode
T16 DECISION: Zapdos-Galar has a 4x resistance to Ground and a resistance to Dark, making it an excellent switch-in against Krookodile. It's at full health and can threaten a super-effective Fighting-type move.
23:49:44.800 run graph None
23:49:44.802 run node ActorNode
23:50:09.285 run node CriticNode
23:50:17.974 run node ImproveNode
23:50:17.974 run node SelectorNode
T17 DECISION: Brave Bird is a powerful STAB move that can deal significant neutral damage to Pikachu. This is a high-risk, high-reward play that could KO Pikachu if it doesn't have a-lot of investment in bulk.
23:50:17.982 run graph None
23:50:17.982 run node ActorNode
23:50:35.423 run node CriticNode
23:50:42.465 run node ImproveNode
23:50:42.466 run node SelectorNode
T18 DECISION: Since Pikachu is asleep, this is a free turn to set up. Bulk Up will boost Zapdos-Galar's Attack and Defense, making it much stronger against the incoming Krookodile.
23:50:42.473 run graph None
23:50:42.474 run node ActorNode
23:51:06.331 run node CriticNode
23:51:17.538 run node ImproveNode
23:51:17.538 run node SelectorNode
T19 DECISION: This move will KO the opposing Pikachu and is also super-effective against the likely switch-in, Krookodile.
23:51:17.565 run graph None
23:51:17.565 run node ActorNode
23:51:40.014 run node CriticNode
23:51:51.301 run node ImproveNode
23:51:51.302 run node SelectorNode
T20 DECISION: Close Combat is a super-effective STAB move that, when combined with the +2 Attack boost, has a very
high chance of knocking out Krookodile.
23:51:51.311 run graph None
23:51:51.311 run node ActorNode
23:52:08.270 run node CriticNode
23:52:19.718 run node ImproveNode
23:52:19.718 run node SelectorNode
CONTEXT: AgentContext( turn=1, weather={}, you_active='regidrago', you_team=[ TeamMon( species='regidrago', hp=1.0, fainted=False, types=['DRAGON'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='veluza', hp=1.0, fainted=False, types=['WATER', 'PSYCHIC'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='eelektross', hp=1.0, fainted=False, types=['ELECTRIC'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='gliscor', hp=1.0, fainted=False, types=['GROUND', 'FLYING'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='dewgong', hp=1.0, fainted=False, types=['WATER', 'ICE'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ), TeamMon( species='volcarona', hp=1.0, fainted=False, types=['BUG', 'FIRE'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ) ], opp_active='ampharos', opp_known=[ TeamMon( species='ampharos', hp=1.0, fainted=False, types=['ELECTRIC'], boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0}, status=None, must_recharge=False ) ], legal_moves=[ MoveOption(move_id='dragondance', base_power=0, accuracy=1.0, move_type='DRAGON', pp=32, priority=0), MoveOption(move_id='outrage', base_power=120, accuracy=1.0, move_type='DRAGON', pp=16, priority=0), MoveOption(move_id='dracometeor', base_power=130, accuracy=0.9, move_type='DRAGON', pp=8, priority=0), MoveOption(move_id='earthquake', base_power=100, accuracy=1.0, move_type='GROUND', pp=16, priority=0) ], legal_switches=[ SwitchOption(species='veluza', hp=1.0, fainted=False, types=['WATER', 'PSYCHIC']), SwitchOption(species='eelektross', hp=1.0, fainted=False, types=['ELECTRIC']), SwitchOption(species='gliscor', hp=1.0, fainted=False, types=['GROUND', 'FLYING']), SwitchOption(species='dewgong', hp=1.0, fainted=False, types=['WATER', 'ICE']), SwitchOption(species='volcarona', hp=1.0, fainted=False, types=['BUG', 'FIRE']) ], past_actions=[ 'T5: SWITCH to ironthorns (from krookodile)', 'T6: MOVE voltswitch (ironthorns vs mightyena)', 'T6: FALLBACK random', 'T7: MOVE closecombat (braviary vs mightyena)', 'T8: SWITCH to zapdos (from braviary)', 'T9: SWITCH to ironthorns (from zapdos)', 'T10: MOVE voltswitch (ironthorns vs mausholdfour)', 'T10: SWITCH to krookodile (from ironthorns)', 'T11: SWITCH to ironthorns (from krookodile)', 'T12: SWITCH to krookodile (from ironthorns)', 'T13: MOVE earthquake (krookodile vs jolteon)', 'T14: MOVE earthquake (krookodile vs mausholdfour)', 'T15: SWITCH to ironthorns (from krookodile)', 'T16: MOVE wildcharge (ironthorns vs enamorus)', 'T17: MOVE voltswitch (ironthorns vs mausholdfour)', 'T17: SWITCH to zapdos (from ironthorns)', 'T18: MOVE discharge (zapdos vs mausholdfour)', 'T19: MOVE discharge (zapdos vs mausholdfour)', 'T1: SWITCH to fluttermane (from deoxys)', 'T2: MOVE moonblast (fluttermane vs glalie)', 'T3: MOVE moonblast (fluttermane vs skuntank)', 'T4: MOVE moonblast (fluttermane vs garchomp)', 'T5: MOVE shadowball (fluttermane vs dedenne)', 'T6: MOVE moonblast (fluttermane vs dedenne)', 'T7: SWITCH to regice (from fluttermane)', 'T8: SWITCH to weavile (from regice)', 'T9: MOVE knockoff (weavile vs glalie)', 'T10: MOVE knockoff (weavile vs glalie)', 'T11: MOVE iceshard (weavile vs glalie)', 'T12: MOVE iceshard (weavile vs glalie)', 'T13: SWITCH to brutebonnet (from weavile)', 'T14: MOVE spore (brutebonnet vs pikachuhoenn)', 'T15: MOVE seedbomb (brutebonnet vs pikachuhoenn)', 'T16: SWITCH to zapdosgalar (from brutebonnet)', 'T17: MOVE bravebird (zapdosgalar vs pikachuhoenn)', 'T18: MOVE bulkup (zapdosgalar vs pikachuhoenn)', 'T19: MOVE closecombat (zapdosgalar vs pikachuhoenn)', 'T20: MOVE closecombat (zapdosgalar vs krookodile)' ] )
DECISION: PlanCandidate( kind='move', move_id='earthquake', switch_species=None, rationale='Earthquake is a super-effective, high-power, and perfectly accurate move against the opposing Ampharos. It is the strongest and most reliable offensive option.' )
23:52:19.761 run graph None
23:52:19.761 run node ActorNode
23:52:38.907 run node CriticNode
23:52:48.803 run node ImproveNode
23:52:48.804 run node SelectorNode
T2 DECISION: Volcarona has a major advantage, resisting Cresselia's likely Ice and Fairy moves and threatening with super-effective Bug-type attacks.
23:52:48.810 run graph None
23:52:48.810 run node ActorNode
23:53:06.655 run node CriticNode
23:53:14.731 run node ImproveNode
23:53:14.731 run node SelectorNode
T3 DECISION: Bug Buzz is a powerful STAB move that is super-effective against Cresselia's base Psychic typing, dealing significant damage despite its Special Defense boost.
23:53:14.741 run graph None
23:53:14.741 run node ActorNode
23:53:40.216 run node CriticNode
23:53:52.998 run node ImproveNode
23:53:53.000 run node SelectorNode
T4 DECISION: Bug Buzz is a powerful super-effective STAB move. With 100% accuracy, it's the most reliable way to
inflict significant damage on Cresselia, even with its Special Defense boost.
23:53:53.018 run graph None
23:53:53.020 run node ActorNode
23:54:13.774 run node CriticNode
23:54:22.119 run node ImproveNode
23:54:22.120 run node SelectorNode
T5 DECISION: Gliscor is immune to Electric-type attacks, making it a perfect switch-in against a Terastallized Electric Cresselia. It can then threaten with super-effective Ground-type moves.
23:54:22.126 run graph None
23:54:22.126 run node ActorNode
23:54:41.783 run node CriticNode
23:54:50.920 run node ImproveNode
23:54:50.920 run node SelectorNode
T6 DECISION: Gliscor has a 4x weakness to Delibird's Ice-type attacks. Switching to Eelektross is a safe and advantageous move, as it resists Flying-type moves and can counter with super-effective Electric-type attacks.
23:54:50.926 run graph None
23:54:50.927 run node ActorNode
23:55:02.354 run node CriticNode
23:55:11.527 run node ImproveNode
23:55:11.527 run node SelectorNode
T7 DECISION: Drain Punch is super-effective (Fighting vs. Dark/Fire) and has 100% accuracy, making it a strong and reliable offensive option to deal significant damage to Houndoom.
23:55:11.548 run graph None
23:55:11.551 run node ActorNode
23:55:28.942 run node CriticNode
23:55:37.917 run node ImproveNode
23:55:37.917 run node SelectorNode
T8 DECISION: Gliscor is immune to Falinks's Fighting-type STAB moves, making it a very safe switch-in. It can threaten Falinks back with its own STABs.
23:55:37.941 run graph None
23:55:37.944 run node ActorNode
23:56:00.686 run node CriticNode
23:56:11.033 run node ImproveNode
23:56:11.033 run node SelectorNode
T9 DECISION: Earthquake is Gliscor's strongest move against Falinks and has a high chance of knocking it out at its current health. Gliscor's typing resists Falinks's STAB moves, making this a safe and powerful offensive option.
23:56:11.055 run graph None
23:56:11.057 run node ActorNode
23:56:25.091 run node CriticNode
23:56:33.863 run node ImproveNode
23:56:33.863 run node SelectorNode
T10 DECISION: Earthquake is a powerful STAB move that should be able to knock out the already weakened Falinks, especially with its lowered defense.
23:56:33.876 run graph None
23:56:33.877 run node ActorNode
23:56:52.833 run node CriticNode
23:57:03.311 run node ImproveNode
23:57:03.312 run node SelectorNode
T11 DECISION: Switching to Eelektross is a safe move. It resists Ampharos's Electric STAB moves and is only hit neutrally by potential Ice-type coverage. Its Levitate ability also makes it immune to any predicted Ground-type attacks.
23:57:03.318 run graph None
23:57:03.318 run node ActorNode
23:57:23.176 run node CriticNode
23:57:33.817 run node ImproveNode
23:57:33.818 run node SelectorNode
T12 DECISION: This move has 100% accuracy and should be sufficient to knock out the low-HP Falinks, making it a
safer option than Supercell Slam.
23:57:33.833 run graph None
23:57:33.834 run node ActorNode
23:57:51.153 run node CriticNode
2025-11-10 23:57:55,507 - PydanticGraphAge 1 - ERROR - Unhandled exception raised while handling message:
>battle-gen9randombattle-279
|request|{"active":[{"moves":[{"move":"Drain Punch","id":"drainpunch","pp":15,"maxpp":16,"target":"normal","disabled":false},{"move":"Supercell Slam","id":"supercellslam","pp":24,"maxpp":24,"target":"normal","disabled":false},{"move":"Coil","id":"coil","pp":32,"maxpp":32,"target":"self","disabled":false},{"move":"Fire Punch","id":"firepunch","pp":23,"maxpp":24,"target":"normal","disabled":false}],"canTerastallize":"Fighting"}],"side":{"name":"PydanticGraphAge 1","id":"p1","pokemon":[{"ident":"p1: Eelektross","details":"Eelektross, L87, F","condition":"127/290","active":true,"stats":{"atk":250,"def":189,"spa":232,"spd":189,"spe":137},"moves":["drainpunch","supercellslam","coil","firepunch"],"baseAbility":"levitate","item":"leftovers","pokeball":"pokeball","ability":"levitate","commanding":false,"reviving":false,"teraType":"Fighting","terastallized":""},{"ident":"p1: Veluza","details":"Veluza, L85, F","condition":"292/292","active":false,"stats":{"atk":222,"def":173,"spa":181,"spd":159,"spe":168},"moves":["aquacutter","aquajet","psychocut","nightslash"],"baseAbility":"sharpness","item":"choiceband","pokeball":"pokeball","ability":"sharpness","commanding":false,"reviving":false,"teraType":"Dark","terastallized":""},{"ident":"p1: Gliscor","details":"Gliscor, L76, F","condition":"192/239 tox","active":false,"stats":{"atk":188,"def":234,"spa":112,"spd":158,"spe":188},"moves":["protect","toxic","knockoff","earthquake"],"baseAbility":"poisonheal","item":"toxicorb","pokeball":"pokeball","ability":"poisonheal","commanding":false,"reviving":false,"teraType":"Water","terastallized":""},{"ident":"p1: Volcarona","details":"Volcarona, L77, M","condition":"160/257","active":false,"stats":{"atk":97,"def":145,"spa":252,"spd":206,"spe":199},"moves":["bugbuzz","quiverdance","terablast","fireblast"],"baseAbility":"swarm","item":"heavydutyboots","pokeball":"pokeball","ability":"swarm","commanding":false,"reviving":false,"teraType":"Water","terastallized":""},{"ident":"p1: Dewgong","details":"Dewgong, L94, F","condition":"322/322","active":false,"stats":{"atk":185,"def":204,"spa":185,"spd":232,"spe":185},"moves":["surf","knockoff","tripleaxel","encore"],"baseAbility":"thickfat","item":"heavydutyboots","pokeball":"pokeball","ability":"thickfat","commanding":false,"reviving":false,"teraType":"Grass","terastallized":""},{"ident":"p1: Regidrago","details":"Regidrago, L77","condition":"435/435","active":false,"stats":{"atk":199,"def":122,"spa":199,"spd":122,"spe":168},"moves":["dragondance","outrage","dracometeor","earthquake"],"baseAbility":"dragonsmaw","item":"lumberry","pokeball":"pokeball","ability":"dragonsmaw","commanding":false,"reviving":false,"teraType":"Dragon","terastallized":""}]},"rqid":30}
Traceback (most recent call last):
File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\ps_client\ps_client.py", line 142, in _handle_message
await self._handle_battle_message(split_messages) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\player\player.py", line 291, in _handle_battle_message
await self._handle_battle_request(battle)
File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\player\player.py", line 416, in _handle_battle_request
choice = self.choose_move(battle)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SHRESHTH\AppData\Local\Temp\ipykernel_27708\3706119269.py", line 21, in choose_move
result = asyncio.run(self.graph.run(self.first_node, state=GraphState(context=ctx)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\nest_asyncio.py", line 30, in run
return loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\nest_asyncio.py", line 98, in run_until_complete
return f.result()
^^^^^^^^^^
File "C:\Users\SHRESHTH\AppData\Local\Programs\Python\Python312\Lib\asyncio\futures.py", line 203, in result
raise self._exception.with_traceback(self._exception_tb)
File "C:\Users\SHRESHTH\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py", line 304, in __step_run_and_handle_result
result = coro.send(None)
^^^^^^^^^^^^^^^
File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 152, in run
async for _node in graph_run:
File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 766, in __anext__
return await self.next(self._next_node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 739, in next
self._next_node = await node.run(ctx)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\SHRESHTH\AppData\Local\Temp\ipykernel_27708\3721675952.py", line 102, in run
best = max(s.critic_scores, key=lambda x: x.q_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: max() iterable argument is empty
23:57:55.504 run node ImproveNode
Let’s see this battle in action:
You can also view the full battle here.
🏁 Conclusion#
In this tutorial, we explored how multi-agent workflows enable structured, interpretable reasoning in complex environments — using Pokémon battles as our sandbox.
We built and compared three distinct coordination paradigms:
🧠 Manager–Coordinator: hierarchical and resource-aware — perfect for structured, goal-driven orchestration.
🗳️ Democratic Swarms: ensemble-style decision-making — robust and diverse, harnessing the collective judgment of many agents.
🎭 Actor–Critic: feedback-driven refinement — adaptive and value-based, blending planning with learned evaluation.
Each architecture has strengths depending on your problem structure, latency budget, and decision uncertainty. Together, they form the foundation of Agentic AI systems — not just single agents, but entire cognitive ecosystems working in concert.
🌐 Beyond these: other agentic swarm paradigms#
Modern agent research spans several other fascinating coordination styles that extend or hybridize these ideas:
🕸️ Blackboard Systems / Shared Memory Agents — agents read and write to a common “knowledge blackboard” (e.g., HUB, LangGraph Shared Memory, ReAct with tool state).
🪶 Evolutionary or Genetic Agent Swarms — agents mutate and compete (e.g., EvoAgents, Yuan et al. (2024)) to explore strategy space efficiently.
🔁 Agentic Swarms — multiple agents monitor and fine-tune their own reasoning, using reflection or secondary evaluators (e.g., SwarmAgentic, Zhang et al. (2025)).
Together, these paradigms form a growing ecosystem of Agentic Swarms — distributed reasoning systems where intelligence emerges from structured interaction rather than single-model dominance.
🔮 What’s next#
So far, we’ve focused on how agents reason and collaborate. In the next tutorial, we’ll explore how to select the optimal model for each agent automatically — using meta-evaluation, model profiling, and cost–quality trade-offs to dynamically assign the best LLM to each sub-task.
We’ll essentially teach our system to self-optimize its model choices — the first step toward autonomous orchestration in large multi-agent setups.