07. Multi-Agent Workflows and Agentic Swarms#

In the real world, no single agent can solve every problem optimally. As tasks grow in uncertainty, dimensionality, and interdependence — such as strategy games, simulations, robotics, or real-time business systems — we naturally evolve from single-agent reasoning to multi-agent workflows. In this tutorial, we see the first sparks of Super Agents. An AI Super-Agent is an orchestration system that coordinates multiple specialized AI agents to solve complex problems requiring diverse capabilities.

These workflows mirror how humans collaborate:

  • 🗳️ Democratic committees balance diverse perspectives.

  • 🧭 Hierarchical managers coordinate specialists under limited resources.

  • ⚖️ Actor-Critic systems separate exploration (actor) from judgment (critic).

Each pattern encodes a different philosophy of coordination — distributing intelligence across specialized roles that communicate, negotiate, and arbitrate toward a shared goal.

⚙️ What Are Multi-Agent Workflows?#

A multi-agent workflow is a structured network of reasoning and action nodes — planners, evaluators, arbiters, memory modules — that interact through explicit channels rather than a single monolithic prompt.

Think of it as a graph of decision-making where:

  • Nodes = agents (LLMs, heuristics, or functions).

  • Edges = communication or dependency between them.

  • Memory = shared context that persists across steps.

  • Arbitration = how conflicting opinions are resolved.

This structure enables:

  • Parallel specialization (multiple evaluators in parallel).

  • Conditional routing (managers deciding who to consult).

  • Resource budgeting (decide when to skip expensive reasoning).

  • Explainability & debugging (explicit traces of who decided what).

🧩 Enter Pydantic Graphs#

Building and managing these interactions manually is painful — tracking state, type safety, branching, and parallel execution can get messy fast.

Pydantic Graphs solves this elegantly by combining:

  • Typed data flow from Pydantic models — ensuring every node’s input and output are structured, validated, and traceable.

  • 🕸️ Graph orchestration — defining agents and their dependencies as composable, inspectable workflows.

  • 🔁 Parallel & conditional execution — automatically handling fan-out (multiple evaluators) and routing logic (manager/critic decisions).

  • 🧾 Transparent traces — every step’s inputs, outputs, and reasoning can be logged, visualized, and replayed.

Together, they turn the messy spaghetti of agent calls into a declarative decision graph — a scalable foundation for complex, memory-aware, multi-agent systems.

🧩 What is poke-env?#

poke-env is a Python interface to the Pokémon Showdown battle simulator, providing an environment for reinforcement learning and AI experiments. It exposes each battle as a structured API — giving access to game state (Pokémon, moves, types, HP, etc.) and allowing agents to pick legal actions programmatically.

In our workflow, we’ll use poke-env as the testbed to:

  • ⚔️ Pit different multi-agent strategies (democratic, manager, actor-critic) against each other.

  • 📊 Compare performance through metrics like win rate, turns survived, and move efficiency.

  • 🧠 Benchmark reasoning styles — seeing how coordination strategies translate into competitive outcomes.

Before running experiments, we’ll start a local Pokémon Showdown server instance. This spins up a self-contained battle environment where our agents can safely train, plan, and battle — making Pokémon the perfect arena for testing agentic intelligence in action.

from src.pokemon_showdown_setup import run_pokemon_showdown

pokemon_container = run_pokemon_showdown()
🟢 Container already running: pokemon-showdown (4a670e8e0ee1)

🧪 Getting Started with poke-env#

Before building our custom multi-agent workflows, let’s first understand how the poke-env battle environment works. It allows us to easily simulate Pokémon battles between automated agents — here, we’ll start with two simple RandomPlayer agents that pick legal moves at random.

By running a quick cross-evaluation, we can see how poke-env orchestrates matches, tracks results, and reports win rates — forming the foundation on which our more sophisticated, reasoning-based agents will later compete.

from poke_env.player.player import Player
from poke_env import RandomPlayer, cross_evaluate

from tabulate import tabulate

first_player = RandomPlayer()
second_player = RandomPlayer()

players = [first_player, second_player]

async def test_cross_evaluation(players, n_challenges=5):
    cross_evaluation = await cross_evaluate(players, n_challenges=n_challenges)

    table = [["-"] + [p.username for p in players]]
    for p_1, results in cross_evaluation.items():
        table.append([p_1] + [cross_evaluation[p_1][p_2] for p_2 in results])

    return tabulate(table)

print(await test_cross_evaluation(players))
--------------  --------------  --------------
-               RandomPlayer 1  RandomPlayer 2
RandomPlayer 1                  0.4
RandomPlayer 2  0.6
--------------  --------------  --------------

Let’s see an example battle in action.

You can also view the full battle here.

⚡ Creating a “Max Damage” Baseline#

To add another simple benchmark beyond the RandomPlayer, we’ll define a MaxDamagePlayer — an agent that always selects the move with the highest base power.

This gives us a more deterministic and aggressive baseline that prioritizes raw damage output over safety or strategy. By comparing our Pydantic AI agent against both Random and MaxDamage players, we can see whether reasoning and memory-aware planning lead to better decision-making than brute-force move selection.

class MaxDamagePlayer(Player):
    def choose_move(self, battle):
        if battle.available_moves:
            best_move = max(battle.available_moves, key=lambda move: move.base_power)

            if battle.can_tera:
                return self.create_order(best_move, terastallize=True)

            return self.create_order(best_move)
        else:
            return self.choose_random_move(battle)
        
players = [first_player, MaxDamagePlayer()]

print(await test_cross_evaluation(players))
-----------------  --------------  -----------------
-                  RandomPlayer 1  MaxDamagePlayer 1
RandomPlayer 1                     0.0
MaxDamagePlayer 1  1.0
-----------------  --------------  -----------------

🎮 Pokémon battle mechanics — and how we encode them for our agents#

Let’s now build a simple agent to do the same, as in, use the battle context to choose the optimal action.

Core mechanics (what the agent must reason about):

  • Turn-based actions: Each turn you either use a move or switch. Faster Pokémon usually act first; priority can override speed.

  • Types & STAB: Moves have types (e.g., Electric). Effectiveness depends on attacker vs defender types; using a move matching the user’s type grants STAB (bonus damage).

  • Accuracy & PP: Moves can miss (accuracy < 100) and have limited PP (uses).

  • HP & fainting: A Pokémon faints at 0 HP; win condition is faint all opponent Pokémon.

  • Information limits: You only know the opponent’s revealed Pokémon and partial info about their sets.

  • Switching & tempo: Switching preserves a weakened Pokémon, but concedes tempo (opponent gets a “free” hit).

  • Status/hazards/weather (omitted here for brevity): These exist in the simulator; we can add them later as fields.

🧱 Our context schema (how we feed the LLM the game state)#

We transform poke-env’s Battle into a typed, LLM-friendly snapshot:

  • TeamMon: one entry per Pokémon (both sides) with:

    • species, fractional hp, fainted, and types.

  • MoveOption: one entry per legal move this turn with:

    • move_id, base_power, accuracy, move_type, pp, priority.

  • SwitchOption: one entry per legal switch target with:

    • species, hp, fainted, types.

  • AgentContext: the full decision frame the agent sees:

    • turn: current turn number.

    • you_active / opp_active: currently active Pokémon on both sides.

    • you_team: your full team (known).

    • opp_known: only revealed opponent Pokémon (respecting partial observability).

    • legal_moves / legal_switches: the only actions you may take now.

    • past_actions: a short episodic memory string list (e.g., summaries of last turns).

🛠️ How the code builds this context#

  • _pokemon_to_teammon(p) safely converts a poke-env Pokemon into our TeamMon schema (species, hp%, types).

  • In build_context(battle, past_actions):

    • We iterate battle.available_moves to populate MoveOption (capturing damage proxies via base power, reliability via accuracy, tempo via priority, and resource via PP).

    • We iterate battle.available_switches to populate SwitchOption (capturing survivability options).

    • We map your full battle.team into you_team and the opponent’s revealed team into opp_known (partial info).

    • We capture actives (you_active, opp_active) and the turn counter.

    • We attach past_actions so the LLM can reason with short-term memory.

  • agent_context_to_string(ctx) serializes the AgentContext to pretty JSON, ideal for prompting an LLM agent.

Result: every decision step provides a compact, validated, and complete view of what matters now, aligning game mechanics with agent reasoning (damage, risk, tempo, information, and legal constraints).

from __future__ import annotations
from typing import List, Optional, Dict, Any, Literal
from dataclasses import dataclass

from pydantic import BaseModel, Field
from pydantic_ai import Agent

from poke_env.battle.battle import Battle, Pokemon

class TeamMon(BaseModel):
    species: str
    hp: Optional[float] = None
    fainted: bool = False
    types: List[str] = []
    boosts: Optional[Dict[str, int]] = None
    status: Optional[str] = None
    must_recharge: Optional[bool] = None

class MoveOption(BaseModel):
    move_id: str
    base_power: Optional[int] = None
    accuracy: Optional[float] = None
    move_type: Optional[str] = None
    pp: Optional[int] = None
    priority: int = 0

class SwitchOption(BaseModel):
    species: str
    hp: Optional[float] = None
    fainted: bool = False
    types: List[str] = []

class AgentContext(BaseModel):
    turn: int
    weather: Dict[str, Any]
    # You
    you_active: Optional[str]
    you_team: List[TeamMon]
    # Opponent
    opp_active: Optional[str]
    opp_known: List[TeamMon]
    # Legals
    legal_moves: List[MoveOption]
    legal_switches: List[SwitchOption]
    # Short episodic memory (last few actions / summaries)
    past_actions: List[str] = []

def _pokemon_to_teammon(p: Pokemon) -> TeamMon:
    return TeamMon(
        species=p.species,
        hp=p.current_hp_fraction,
        fainted=p.fainted,
        boosts=p.boosts,
        status=p.status,
        must_recharge=p.must_recharge,
        types=[t.name for t in p.types or []],
    )

def build_context(battle: Battle, past_actions: List[str]) -> AgentContext:
    # legal moves
    legal_moves: List[MoveOption] = []
    for m in battle.available_moves:
        legal_moves.append(MoveOption(
            move_id=m.id,
            base_power=m.base_power,
            accuracy=m.accuracy,
            move_type=m.type.name,
            pp=m.current_pp,
            priority=m.priority,
        ))

    # legal switches
    legal_switches: List[SwitchOption] = []
    for p in battle.available_switches:
        legal_switches.append(SwitchOption(
            species=p.species,
            hp=p.current_hp_fraction,
            fainted=p.fainted,
            types=[t.name for t in (p.types or [])],
        ))

    # teams
    your_team = [_pokemon_to_teammon(poke) for poke in battle.team.values()]
    opp_known = [_pokemon_to_teammon(poke) for poke in battle.opponent_team.values() if poke._revealed] # revealed only

    return AgentContext(
        turn=battle.turn,
        weather=battle.weather,
        you_active=battle.active_pokemon.species,
        you_team=your_team,
        opp_active=battle.opponent_active_pokemon.species,
        opp_known=opp_known,
        legal_moves=legal_moves,
        legal_switches=legal_switches,
        past_actions=past_actions, 
    )

def agent_context_to_string(ctx: AgentContext) -> str:
    return ctx.model_dump_json(indent=2)

🤖 A minimal “thinking player”#

Goal: turn the JSON context we built into a single legal action (move or switch) using a typed LLM agent, and keep a tiny episodic memory of what we did.

1) Structured output contract = Decision

  • We define a Pydantic schema that the LLM must fill:

    • kind: "move" or "switch".

    • move_id / switch_species: only one is required depending on kind.

    • rationale: short explanation (useful for logs and later learning).

  • This keeps the model honest and makes post-processing trivial.

2) The LLM-powered player = PydanticLLMPlayer

  • Extends poke-env’s Player.

  • Sets up a Pydantic AI Agent (self.battle_agent) with:

    • A system prompt encoding simple policy: prefer high-accuracy, super-effective moves; switch if danger is high or moves are poor; never invent illegal actions.

    • output_type=Decision so the model must return a valid, typed object.

3) Decision loop = choose_move(...)

  1. Build context ctx = build_context(battle, past_actions=self._past_actions) → serializes the current game state + short episodic memory.

  2. Call the agent decision = self.battle_agent.run_sync(agent_context_to_string(ctx)).output → the LLM reads the JSON, returns a validated Decision.

  3. Legality mapping

    • If kind == "move", find the exact move_id in battle.available_moves and create_order(m).

    • If kind == "switch", match switch_species in battle.available_switches and create_order(p).

    • We append a human-readable summary to _past_actions for the next turn’s context.

  4. Safety fallback If, for any reason, the decision isn’t legal (should be rare), we choose a random legal action so the game continues.

4) Why this works well

  • Typed outputs remove prompt-engineering brittleness (no regex parsing or guesswork).

  • Context → Decision → Action is clean, auditable, and easy to extend (plug in evaluators/critics later).

  • The episodic memory (_past_actions) gives the agent short-term continuity across turns without blowing up context size.

from rich import print as rprint
import nest_asyncio
import logfire

from poke_env import Player, RandomPlayer

logfire.configure(send_to_logfire=False) # set to true if you want to use logfire console
nest_asyncio.apply()

class Decision(BaseModel):
    kind: Literal["move", "switch"] = Field(description="Choose 'move' or 'switch'.")
    move_id: Optional[str] = Field(default=None, description="Required if kind == 'move'")
    switch_species: Optional[str] = Field(default=None, description="Required if kind == 'switch'")
    rationale: str

class PydanticLLMPlayer(Player):
    def __init__(self, name: str, model: str = "openrouter:openai/gpt-4o-mini", **kwargs):
        super().__init__(**kwargs)
        self.name = name
        self._past_actions: List[str] = []
        self.battle_agent = Agent(
            model=model,
            system_prompt=(
                "You are a Pokémon battle planner. "
                "Given the current battle context, choose ONE legal action. "
                "Prefer high-accuracy, super-effective moves; "
                "switch if active Pokémon risks being KO'd or has no good moves. "
                "Never invent illegal actions."
            ),
            output_type=Decision,
        )

    def choose_move(self, battle: Battle):
        # Build structured context for the agent
        ctx = build_context(battle, past_actions=self._past_actions)
        # Run agent to get decision
        decision = self.battle_agent.run_sync(agent_context_to_string(ctx)).output
        if battle.turn <= 1:
            rprint(f"CONTEXT:", ctx)
            rprint(f"DECISION:", decision)
        else:
            rprint(f"T{ctx.turn} DECISION:", decision.rationale)

        # Map Decision → poke-env action
        if decision.kind == "move":
            # find the matching legal move
            for m in battle.available_moves:
                if m.id == decision.move_id:
                    self._past_actions.append(
                        f"T{ctx.turn}: MOVE {decision.move_id} ({ctx.you_active} vs {ctx.opp_active})"
                    )
                    return self.create_order(m)
        elif decision.kind == "switch":
            # find the matching legal switch
            for p in battle.available_switches:
                if p.species == decision.switch_species:
                    self._past_actions.append(
                        f"T{ctx.turn}: SWITCH to {decision.switch_species} (from {ctx.you_active})"
                    )
                    return self.create_order(p)

        # Fallback: if agent suggested an illegal action (shouldn't happen), choose random
        self._past_actions.append(f"T{ctx.turn}: FALLBACK random")
        return self.choose_random_move(battle)

⚔️ Running our first Agentic battle#

Now that we’ve built our LLM-powered Pokémon agent, it’s time to see it in action! Here we instantiate the PydanticLLMPlayer and let it battle a RandomPlayer for a single match.

When the battle runs:

  1. Each turn, the agentic_player builds a structured AgentContext (game state + short memory).

  2. The LLM agent reasons over that context and outputs a typed Decision (move or switch).

  3. The environment executes that decision, updates the game state, and loops until one side faints all opponents.

This quick match serves as a smoke test — verifying that our agent can read the environment, reason with context, and select legal actions correctly before we scale up to multi-agent graphs and tournaments.

agentic_player = PydanticLLMPlayer(name="LLM Agent")

await agentic_player.battle_against(RandomPlayer(), n_battles=1)
CONTEXT:
AgentContext(
    turn=1,
    weather={},
    you_active='clawitzer',
    you_team=[
        TeamMon(
            species='clawitzer',
            hp=1.0,
            fainted=False,
            types=['WATER'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='volcarona',
            hp=1.0,
            fainted=False,
            types=['BUG', 'FIRE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='delphox',
            hp=1.0,
            fainted=False,
            types=['FIRE', 'PSYCHIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='golduck',
            hp=1.0,
            fainted=False,
            types=['WATER'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='raichualola',
            hp=1.0,
            fainted=False,
            types=['ELECTRIC', 'PSYCHIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='sneasler',
            hp=1.0,
            fainted=False,
            types=['FIGHTING', 'POISON'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    opp_active='toedscruel',
    opp_known=[
        TeamMon(
            species='toedscruel',
            hp=1.0,
            fainted=False,
            types=['GROUND', 'GRASS'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    legal_moves=[
        MoveOption(move_id='aurasphere', base_power=80, accuracy=1.0, move_type='FIGHTING', pp=32, priority=0),
        MoveOption(move_id='uturn', base_power=70, accuracy=1.0, move_type='BUG', pp=32, priority=0),
        MoveOption(move_id='dragonpulse', base_power=85, accuracy=1.0, move_type='DRAGON', pp=16, priority=0),
        MoveOption(move_id='waterpulse', base_power=60, accuracy=1.0, move_type='WATER', pp=32, priority=0)
    ],
    legal_switches=[
        SwitchOption(species='volcarona', hp=1.0, fainted=False, types=['BUG', 'FIRE']),
        SwitchOption(species='delphox', hp=1.0, fainted=False, types=['FIRE', 'PSYCHIC']),
        SwitchOption(species='golduck', hp=1.0, fainted=False, types=['WATER']),
        SwitchOption(species='raichualola', hp=1.0, fainted=False, types=['ELECTRIC', 'PSYCHIC']),
        SwitchOption(species='sneasler', hp=1.0, fainted=False, types=['FIGHTING', 'POISON'])
    ],
    past_actions=[]
)
DECISION:
Decision(
    kind='move',
    move_id='aurasphere',
    switch_species=None,
    rationale='Aurasphere is a Fighting-type move and is super-effective against Toedscruel, which is part 
Ground-type. It has 100% accuracy, making it a reliable choice to potentially knock out the opponent.'
)
T2 DECISION: Aurasphere is a high-accuracy Fighting-type move that is super effective against Toedscruel's 
Water-type, making it the best choice to maximize damage.
T3 DECISION: Aurasphere is a high-accuracy move that is super-effective against Duraludon, which is currently at 
low health (15% HP). This move can potentially knock it out.
T4 DECISION: Clawitzer has used 'aurasphere' for the last three turns against Water-type opponents and its health 
is reduced. Switching to Volcarona, which has full HP and is not at risk of being KO'd, allows for a fresh 
offensive strategy next turn.
T5 DECISION: Using Fire Blast as it is a high-power move and super-effective against the opposing Water/Dark type 
Samurott, making it an optimal attack choice.
T6 DECISION: Fire Blast is a high-powered and super-effective move against Toedscruel, which is a Water type 
Pokémon. It has good accuracy (85%), and with Volcarona's remaining HP being full, it can afford to make this 
attack.
T7 DECISION: Fire Blast is super-effective against Samurott Hisuian, which is a dual Water/Dark type Pokémon. With 
Volcarona at full HP (1.0), using Fire Blast takes advantage of its high power and accuracy, which is beneficial 
given the circumstances.
T8 DECISION: Fire Blast is a high-accuracy move that is super effective against Samurott-Hisui, which is currently 
low on HP. This move can potentially knock it out and maximize the chances to win this turn.
T9 DECISION: Using Fire Blast is a high-accuracy, high-damage (110 base power) move against Okidogi, which is weak 
to Fire-type attacks, making it super effective.
T10 DECISION: Volcarona is at very low health (3.5%), making it highly vulnerable to being knocked out. Switching 
to Delphox, which is at full health (100%), will provide a safer option for the battle.
T11 DECISION: Psyshock is a high-accuracy move that is super-effective against Cramorant's Psychic-type weakness. 
It deals 80 base power damage and has a 100% chance to hit, making it the best option.
T12 DECISION: Fire Blast is a high-damage Fire-type move, which is super-effective against Cramorant's Grass-type 
aspect, maximizing my damage potential in this turn.
T13 DECISION: Psyshock is a high-accuracy (100%) Psychic type move that is super-effective against Okidogi 
(Poison/Fighting). This move will deal substantial damage.
T14 DECISION: Psyshock is a high-accuracy move and is super effective against Glaceon's Ice type, making it the 
best option to deal significant damage.
T15 DECISION: Fire Blast is a high-power, high-accuracy move that is still viable. While Psyshock is super 
effective against Glaceon because it is an Ice type, Fire Blast also deals significant damage and has a chance to 
burn, offering additional tactical advantages.
T16 DECISION: Psyshock is a high-accuracy Psychic-type move and is super-effective against Toedscruel, which is a 
Water-type Pokémon. Using it can potentially KO Toedscruel, allowing for a better position in the battle.
T17 DECISION: Psyshock is a high-accuracy move that is super effective against Toedscruel's Psychic typing, and it 
can potentially knock it out given its low health.
T18 DECISION: Psyshock is a high-accuracy Psychic move that is super effective against Cramorant, which is weak to 
Psychic-type moves. Delphox also has a good chance of knocking out Cramorant given its current HP.
T19 DECISION: Psyshock is a high-accuracy and super-effective move against Cramorant, which has low HP. It's likely
to knock it out and secure a significant advantage in the battle.
T20 DECISION: Psyshock is a high-accuracy move (100%) with a good base power (80) that can deal effective damage to
Okidogi, which does not resist Psychic-type moves.

🕸️ Introducing Pydantic Graphs — the foundation for structured multi-agent workflows#

So far, our agent acted as a single decision-maker: it observed context, reasoned once, and returned a move. But as environments grow in complexity — multiple objectives, conflicting strategies, limited time — we need many specialized agents working together.

That’s where 🧩 Pydantic Graphs come in.

⚙️ What are Pydantic Graphs?#

Pydantic Graphs extend the idea of typed LLM workflows: instead of chaining prompts manually, you define a graph of agents — each node is a typed, callable component (Agent, Tool, or function), and edges represent how their structured outputs flow into each other.

Each node’s input/output types are enforced by Pydantic models, guaranteeing that ✅ every agent receives valid structured data, ✅ workflows are composable, debuggable, and inspectable, ✅ and parallel/conditional execution (“run these 3 evaluators in parallel”) becomes trivial.

🤝 Why multi-agent workflows?#

Real decision problems rarely have one “best” heuristic — they’re multi-objective:

  • Tactical reward vs safety (damage vs survivability)

  • Short-term payoff vs long-term setup

  • Exploration vs exploitation

Multi-agent graphs let you distribute cognition:

  • Each node/agent handles a sub-skill (planner, tactician, risk, scout).

  • Coordination logic (e.g., a manager or arbiter) fuses their reasoning.

  • Memory and arbitration layers can be swapped independently (for ablations).

This architecture naturally scales to agentic swarms — large ensembles of specialized agents that coordinate dynamically, forming emergent intelligence beyond a single model’s scope.

🔀 Static vs Dynamic Query Routing#

In our earlier “Manager” agent, the routing (which specialists to call) was static — we hard-coded: “always call Tactician, call Risk if danger > 0.6, call Scout every 3 turns”.

Dynamic routing, enabled by Pydantic Graphs, makes this adaptive:

  • Each agent’s outputs (or intermediate metadata like uncertainty, cost, or confidence) can dynamically decide the next edges to traverse.

  • If the planner returns low-confidence moves, the graph might automatically trigger the Risk Officer or Critic path.

  • If confidence is high, it can skip extra steps to save latency or tokens.

🧩 Benefit: Resource-aware, self-adapting workflows that scale gracefully — the system “thinks harder” only when needed.

✏️ Query Rewriting#

Another advanced feature is query rewriting — when incoming queries or contexts are transformed before being passed to downstream agents. In Pokémon terms, before the planner decides, a context rewriter might:

  • Simplify redundant details (“ignore irrelevant side conditions”), or

  • Add derived features (“this move is likely super-effective against Water”).

This lets different specialists receive domain-specific representations of the same state, improving efficiency and interpretability.

🚀 Why it matters

Together, dynamic routing and query rewriting turn a static, hand-crafted pipeline into a living cognitive graph:

  • 💡 Adaptive: reasoning depth scales with uncertainty or stakes.

  • 🧠 Modular: new skills or evaluators can be plugged in as new nodes.

  • ⚖️ Efficient: token and time budgets are managed intelligently.

  • 🔍 Transparent: every decision path and intermediate output is traceable.

By using Pydantic Graphs, we can finally move from “prompt chains” to structured, interpretable agentic systems — the same architectural leap that turns simple agents into full-fledged, cooperative AI swarms.

from pydantic_graph import BaseNode, End, Graph, GraphRunContext

class PlanCandidate(BaseModel):
    kind: Literal["move", "switch"]
    move_id: Optional[str] = None
    switch_species: Optional[str] = None
    rationale: str

class Plan(BaseModel):
    candidates: List[PlanCandidate]

class EvalScore(BaseModel):
    score: float
    notes: Optional[str] = None

class GraphState(BaseModel):
    context: AgentContext
    plan: Optional[Plan] = None 
    tactician_scores: Optional[List[EvalScore]] = None
    risk_scores: Optional[List[EvalScore]] = None
    scout_scores: Optional[List[EvalScore]] = None
    final_decision: Optional[PlanCandidate] = None

🧭 The Manager–Coordinator Paradigm in Agentic Swarms#

Let’s now implement our first multi-agent workflow: managerial multi-agent coordination pattern through a structured Pydantic Graph — where each node acts as a specialized agent, and together they form a coordinated decision-making swarm.

🕸️ The Idea: Manager + Specialists = Smarter Decisions#

Instead of relying on a single monolithic model, this setup distributes reasoning across multiple specialized roles — just like a management hierarchy in human organizations:

  • PlannerNode (Coordinator) 🧠 → proposes candidate actions (moves/switches).

  • TacticianNode ⚔️ → evaluates each candidate for expected value (damage, tempo).

  • RiskNode 🛡️ → evaluates safety and survivability.

  • ScoutNode 🔍 → evaluates information gain (learning about opponent’s hidden Pokémon).

  • DecisionNode (Manager) 🧩 → aggregates all scores and makes the final move selection.

Each node operates independently but shares a common state (GraphState) that persists through the workflow — this gives the system continuity, explainability, and structured memory across reasoning steps.

⚙️ How Pydantic Graphs Enable This#

Pydantic Graphs make this explicitly declarative:

  • Each node inherits from BaseNode and defines an async run() method that updates a shared GraphState.

  • Nodes specify their next node — e.g., PlannerNode TacticianNode RiskNode ScoutNode DecisionNode End.

  • The Graph object (planner_graph) defines the entire workflow and its state type (GraphState), ensuring all data between nodes remains valid and typed.

  • The graph runtime (GraphRunContext) automatically handles execution order, state persistence, and error handling/retries.

This means the graph acts as an orchestration layer over multiple LLMs — a mini “swarm intelligence” system where reasoning flows like information through an organization chart.

🧩 Why the Manager-Coordinator Model Matters#

  1. Decomposition of reasoning: Each agent focuses on a narrow cognitive skill — simplifying prompts, improving interpretability, and reducing hallucinations.

  2. Parallelism and composability: Multiple evaluators can be executed concurrently, and new agents (e.g., “Healer Advisor”, “Weather Analyst”) can be plugged in without refactoring the graph.

  3. Explainability: Every step is transparent — you can inspect the planner’s candidates, each specialist’s scores, and the rationale behind the final decision.

  4. Dynamic scalability: The manager can later evolve to dynamic routing, consulting only relevant specialists based on battle context or uncertainty — enabling true adaptive swarms.

@dataclass
class PlannerNode(BaseNode):
    planner_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You propose 2-4 legal actions for the given Pokémon battle context. "
            "Prefer super-effective, high-accuracy moves; consider switches if HP is low or risk is high. "
            "Do NOT invent illegal actions."
        ),
        output_type=Plan,
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> TacticianNode:
        state = context.state
        plans = (await self.planner_agent.run(agent_context_to_string(state.context))).output
        state.plan = plans

        return TacticianNode()
    

@dataclass
class TacticianNode(BaseNode):
    tactician_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon battle tactician. "
            "Score each candidate 0..1 for expected value (damage + board advantage)."
        ),
        output_type=List[EvalScore],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> RiskNode:
        state = context.state
        assert state.plan is not None, "Plan must be set before TacticianNode runs."
        prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
        scores = (await self.tactician_agent.run(prompt)).output
        state.tactician_scores = scores

        return RiskNode()
    
@dataclass
class RiskNode(BaseNode):
    risk_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon battle risk assessor. "
            "Score each candidate 0..1 for risk (chance of failure, negative outcomes)."
        ),
        output_type=List[EvalScore],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> ScoutNode:
        state = context.state
        assert state.plan is not None, "Plan must be set before RiskNode runs."
        prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
        scores = (await self.risk_agent.run(prompt)).output
        state.risk_scores = scores

        return ScoutNode()
    
@dataclass
class ScoutNode(BaseNode):
    scout_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon battle scout. "
            "Score each candidate 0..1 for information gain (revealing opponent's unknowns)."
        ),
        output_type=List[EvalScore],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> DecisionNode:
        state = context.state
        assert state.plan is not None, "Plan must be set before ScoutNode runs."
        prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
        scores = (await self.scout_agent.run(prompt)).output
        state.scout_scores = scores

        return DecisionNode()
    
@dataclass
class DecisionNode(BaseNode):
    decision_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon battle decision maker. "
            "Using the provided scores from tactician, risk, and scout, "
            "select the best candidate action to take."
        ),
        output_type=PlanCandidate,
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> End:
        state = context.state
        assert state.tactician_scores is not None, "Tactician scores must be set before DecisionNode runs."
        assert state.risk_scores is not None, "Risk scores must be set before DecisionNode runs."
        assert state.scout_scores is not None, "Scout scores must be set before DecisionNode runs."

        prompt = agent_context_to_string(state.context) + "\n\n"
        prompt += "Planned Candidates:\n" + state.plan.model_dump_json(indent=2) + "\n\n"
        prompt += "Tactician Scores:\n" + str([es.model_dump_json(indent=2) for es in state.tactician_scores]) + "\n\n"
        prompt += "Risk Scores:\n" + str([es.model_dump_json(indent=2) for es in state.risk_scores]) + "\n\n"
        prompt += "Scout Scores:\n" + str([es.model_dump_json(indent=2) for es in state.scout_scores]) + "\n\n"

        decision = (await self.decision_agent.run(prompt)).output
        state.final_decision = decision

        return End(state.final_decision)
    
planner_graph = Graph(nodes=[PlannerNode, TacticianNode, RiskNode, ScoutNode, DecisionNode], state_type=GraphState)  

example_agent_context = AgentContext(
    turn=1,
    weather={},
    you_active="Pikachu",
    you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
    opp_active="Bulbasaur",
    opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
    legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
    legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
    past_actions=[],
)

result = await planner_graph.run(PlannerNode(), state=GraphState(context=example_agent_context))
rprint(result.state.final_decision)
23:16:39.013 run graph planner_graph
23:16:39.014   run node PlannerNode
23:17:22.155   run node TacticianNode
23:17:43.325   run node RiskNode
23:18:17.161   run node ScoutNode
23:18:53.984   run node DecisionNode
PlanCandidate(
    kind='move',
    move_id='Thunderbolt',
    switch_species=None,
    rationale='Choose Thunderbolt: 100% accuracy and high power; Pikachu has +1 Atk while Bulbasaur is -1 Def and 
paralyzed, making an immediate KO extremely likely. Switching concedes a free turn and forfeits the probable 
instant elimination—attacking now maximizes EV and minimizes risk.'
)

Let’s see it in action!

from rich import print as rprint
import nest_asyncio
import logfire
import asyncio

logfire.configure(send_to_logfire=False) # set to true if you want to use logfire console
nest_asyncio.apply()

class PydanticGraphAgent(Player):
    def __init__(self, name: str, agentic_graph: Graph, first_node: BaseNode, **kwargs):
        super().__init__(**kwargs)
        self.name = name
        self._past_actions: List[str] = []
        self.graph = agentic_graph
        self.first_node = first_node

    def choose_move(self, battle: Battle):
        # Build structured context for the agent
        ctx = build_context(battle, past_actions=self._past_actions)
        # Run agent to get decision
        result = asyncio.run(self.graph.run(self.first_node, state=GraphState(context=ctx)))
        decision = result.state.final_decision
        if battle.turn <= 1:
            rprint(f"CONTEXT:", ctx)
            rprint(f"DECISION:", decision)
        else:
            rprint(f"T{ctx.turn} DECISION:", decision.rationale)

        # Map Decision → poke-env action
        if decision.kind == "move":
            # find the matching legal move
            for m in battle.available_moves:
                if m.id == decision.move_id:
                    self._past_actions.append(
                        f"T{ctx.turn}: MOVE {decision.move_id} ({ctx.you_active} vs {ctx.opp_active})"
                    )
                    return self.create_order(m)
        elif decision.kind == "switch":
            # find the matching legal switch
            for p in battle.available_switches:
                if p.species == decision.switch_species:
                    self._past_actions.append(
                        f"T{ctx.turn}: SWITCH to {decision.switch_species} (from {ctx.you_active})"
                    )
                    return self.create_order(p)

        # Fallback: if agent suggested an illegal action (shouldn't happen), choose random
        self._past_actions.append(f"T{ctx.turn}: FALLBACK random")
        return self.choose_random_move(battle)
coordination_graph = Graph(nodes=[PlannerNode, TacticianNode, RiskNode, ScoutNode, DecisionNode], state_type=GraphState)  

coordination_player = PydanticGraphAgent(name="LLM Agent", agentic_graph=coordination_graph, first_node=PlannerNode())

await coordination_player.battle_against(RandomPlayer(), n_battles=1)
22:31:19.383 run graph None
22:31:19.386   run node PlannerNode
22:31:47.861   run node TacticianNode
22:32:17.785   run node RiskNode
22:32:49.956   run node ScoutNode
22:33:20.686   run node DecisionNode
CONTEXT:
AgentContext(
    turn=1,
    you_active='hitmontop',
    you_team=[
        TeamMon(species='hitmontop', hp=1.0, fainted=False, types=['FIGHTING']),
        TeamMon(species='palafin', hp=1.0, fainted=False, types=['WATER']),
        TeamMon(species='brambleghast', hp=1.0, fainted=False, types=['GRASS', 'GHOST']),
        TeamMon(species='keldeoresolute', hp=1.0, fainted=False, types=['WATER', 'FIGHTING']),
        TeamMon(species='goodra', hp=1.0, fainted=False, types=['DRAGON']),
        TeamMon(species='pyroar', hp=1.0, fainted=False, types=['FIRE', 'NORMAL'])
    ],
    opp_active='phione',
    opp_known=[TeamMon(species='phione', hp=1.0, fainted=False, types=['WATER'])],
    legal_moves=[
        MoveOption(move_id='rapidspin', base_power=50, accuracy=1.0, move_type='NORMAL', pp=64, priority=0),
        MoveOption(move_id='stoneedge', base_power=100, accuracy=0.8, move_type='ROCK', pp=8, priority=0),
        MoveOption(move_id='suckerpunch', base_power=70, accuracy=1.0, move_type='DARK', pp=8, priority=1),
        MoveOption(move_id='closecombat', base_power=120, accuracy=1.0, move_type='FIGHTING', pp=8, priority=0)
    ],
    legal_switches=[
        SwitchOption(species='palafin', hp=1.0, fainted=False, types=['WATER']),
        SwitchOption(species='brambleghast', hp=1.0, fainted=False, types=['GRASS', 'GHOST']),
        SwitchOption(species='keldeoresolute', hp=1.0, fainted=False, types=['WATER', 'FIGHTING']),
        SwitchOption(species='goodra', hp=1.0, fainted=False, types=['DRAGON']),
        SwitchOption(species='pyroar', hp=1.0, fainted=False, types=['FIRE', 'NORMAL'])
    ],
    past_actions=[]
)
DECISION:
PlanCandidate(
    kind='move',
    move_id='suckerpunch',
    switch_species=None,
    rationale="Choose Sucker Punch: highest combined tactical and scouting value. Priority (+1) lets you beat 
Phione if it attacks, preserves board state vs being KO'd, and provides strong information about whether Phione 
intended to attack. The moderate risk of failing vs Protect/status/switch is acceptable given the upside on turn 
1."
)
22:33:37.220 run graph None
22:33:37.221   run node PlannerNode
22:34:16.439   run node TacticianNode
22:34:56.803   run node RiskNode
22:36:16.135   run node ScoutNode
22:37:08.924   run node DecisionNode
22:37:30.422 run graph None
22:37:30.422   run node PlannerNode
22:37:59.223   run node TacticianNode
22:38:48.199   run node RiskNode
22:39:21.331   run node ScoutNode
22:39:50.271   run node DecisionNode
22:40:11.907 run graph None
22:40:11.907   run node PlannerNode
22:40:56.972   run node TacticianNode
22:41:35.116   run node RiskNode
22:42:19.953   run node ScoutNode
22:42:57.951   run node DecisionNode
22:43:12.405 run graph None
22:43:12.406   run node PlannerNode
22:44:02.596   run node TacticianNode
22:44:43.567   run node RiskNode
22:45:19.134   run node ScoutNode
22:45:56.038   run node DecisionNode
22:46:04.602 run graph None
22:46:04.603   run node PlannerNode
22:46:35.056   run node TacticianNode
22:47:13.838   run node RiskNode
22:47:48.703   run node ScoutNode
22:48:29.847   run node DecisionNode
22:48:43.353 run graph None
22:48:43.355   run node PlannerNode
22:49:22.736   run node TacticianNode
22:50:04.054   run node RiskNode
22:50:37.960   run node ScoutNode
22:51:21.165   run node DecisionNode
22:51:47.109 run graph None
22:51:47.110   run node PlannerNode
22:52:17.801   run node TacticianNode
22:52:51.469   run node RiskNode
22:53:31.798   run node ScoutNode
22:54:02.783   run node DecisionNode
22:54:13.308 run graph None
22:54:13.309   run node PlannerNode
22:54:44.880   run node TacticianNode
22:55:27.497   run node RiskNode
22:55:57.522   run node ScoutNode
22:56:33.739   run node DecisionNode
22:56:47.403 run graph None
22:56:47.404   run node PlannerNode
22:57:43.008   run node TacticianNode
22:58:20.860   run node RiskNode
22:59:16.188   run node ScoutNode
23:00:01.194   run node DecisionNode
23:00:12.452 run graph None
23:00:12.453   run node PlannerNode
23:03:32.602   run node TacticianNode
23:03:56.897   run node RiskNode
23:04:27.360   run node ScoutNode
23:05:23.861   run node DecisionNode
23:05:39.230 run graph None
23:05:39.231   run node PlannerNode
23:06:30.547   run node TacticianNode
23:06:55.893   run node RiskNode
23:07:25.681   run node ScoutNode
23:08:14.485   run node DecisionNode
23:08:24.013 run graph None
23:08:24.014   run node PlannerNode
23:09:01.067   run node TacticianNode
23:09:32.810   run node RiskNode
23:10:08.895   run node ScoutNode

Result:

You can also view the full battle here.

🗳️ Democratic multi-agent swarms#

Now let’s move onto the next agentic swarm model: democratic orchestration. Herein,

  1. a Planner or multiple planners, propose(s) several legal candidates (moves/switches),

  2. multiple independent voters each judge every candidate from a different lens, and

  3. a Tally node picks the action with the most YES votes.

Nodes & roles

  • PlannerNode → produces 3–4 legal, diverse candidates for the current battle context.

  • Voters (parallelizable):

    • AccuracyVoterNode – prefer high-reliability actions (≥90% accuracy or safe switch).

    • TypeMatchupVoterNode – reward good type effectiveness or improved matchup after switch.

    • TempoVoterNode – prefer momentum (threaten KO, force a switch, safe setup).

    • PPVoterNode – favor conserving scarce PP/resources.

    • DiversityVoterNode – encourage non-redundant options (coverage/status/switch variety).

  • TallyNode → sums 0/1 votes per candidate and returns the majority winner (ties break by first max) or combines multiple plans, with their critiques/rationales to the best plan.

Each voter returns a list of 0/1 (YES/NO) aligned with plan.candidates, keeping the interface simple and debuggable. A simpler version of this demoncratic debate idea has also been shown by Andrej Karpathy’s LLM Council.

🧠 When to use Democratic swarms vs Manager–Coordinator#

Use Democratic when:

  • You want robustness via diversity: many simple judges smooth out any one agent’s bias.

  • The task benefits from ensemble wisdom and parallel scoring of options.

  • You need transparent preference profiles (“why did we pick this?” → look at voter tallies).

  • Latency budget allows fan-out to several voters.

Use Manager–Coordinator when:

  • You need budget-aware routing (call specialists only when danger/uncertainty is high).

  • The task has a clear decision funnel (plan → specific specialists → decision).

  • You want conditional depth (think harder only when needed) for tighter SLAs.

  • You prefer a single final authority aggregating nuanced scores/metrics.

Rule of thumb:

  • Exploration, variety, early prototyping → start with Democracy.

  • Production with SLAs, cost constraints → move to Manager/Coordinator (dynamic routing, early exit).

🧬 Mixed-model ensembles (per-agent LLMs)#

Each node can use a different LLM (as shown: OpenAI, Anthropic, Google, xAI, Qwen) to specialize strengths:

  • Models with longer context or stronger reasoning can power the Planner or Type voter.

  • Faster/cheaper models can handle Accuracy/PP voters at scale.

  • Mixing providers reduces correlated failure modes and improves ensemble reliability.

Benefit: You get a portfolio effect—diverse models + diverse criteria → more stable decisions under uncertainty.

🧾 Why this pattern is nice to teach & extend

  • Simple contract: voters return [0/1, …]; the tally is trivial to audit.

  • Parallel-friendly: voters can run concurrently for low wall-time.

  • Composable: add/remove voters without touching the rest of the graph.

  • Explainable: log plan.candidates + each voter’s vector to visualize support per option.

Next steps: try replacing 0/1 votes with ranked ballots (Borda/Condorcet), or add confidence-weighted voting to blend democratic and managerial ideas.

from __future__ import annotations
from typing import List, Optional, Literal, Dict
from dataclasses import dataclass

from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_graph import BaseNode, End, Graph, GraphRunContext
from rich import print as rprint

class PlanCandidate(BaseModel):
    kind: Literal["move", "switch"]
    move_id: Optional[str] = None
    switch_species: Optional[str] = None
    rationale: str

class Plan(BaseModel):
    candidates: List[PlanCandidate]

class GraphState(BaseModel):
    context: AgentContext
    plan: Optional[Plan] = None
    accuracy_votes: Optional[List[int]] = None
    type_votes: Optional[List[int]] = None
    tempo_votes: Optional[List[int]] = None
    pp_votes: Optional[List[int]] = None
    diversity_votes: Optional[List[int]] = None
    final_decision: Optional[PlanCandidate] = None

@dataclass
class PlannerNode(BaseNode):
    planner_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon move planner. From the given context, propose 3-4 LEGAL actions "
            "(moves or switches). Prefer super-effective, high-accuracy moves; include at least "
            "one safe SWITCH if current matchup looks poor. Do NOT invent illegal actions."
        ),
        output_type=Plan,
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "AccuracyVoterNode":
        state = context.state
        plan = (await self.planner_agent.run(agent_context_to_string(state.context))).output
        if len(plan.candidates) < 2:
            plan.candidates = plan.candidates * 2
            plan.candidates[1].rationale = "Fallback duplicate to enable voting."
        state.plan = plan
        return AccuracyVoterNode()


@dataclass
class AccuracyVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "Accuracy Voter: For each candidate, vote 1 if the action is high-reliability "
            "(move accuracy >= 90% or a SWITCH that avoids a likely miss/KO), else 0. "
            "Return a Python list of 0/1 of the same length as candidates, no extra text."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "TypeMatchupVoterNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.accuracy_votes = votes
        return TypeMatchupVoterNode()


@dataclass
class TypeMatchupVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:anthropic/claude-sonnet-4.5",
        system_prompt=(
            "Type Matchup Voter: For each candidate, vote 1 if the MOVE is likely super-effective "
            "or at least neutral (avoid not-very-effective/immunity), or if a SWITCH improves the type matchup; "
            "otherwise 0. Return a Python list of 0/1, same length as candidates."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "TempoVoterNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.type_votes = votes
        return TempoVoterNode()


@dataclass
class TempoVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:google/gemini-2.5-flash",
        system_prompt=(
            "Tempo Voter: Vote 1 for candidates that are likely to seize or keep momentum this turn "
            "(e.g., fast KO, force a switch, gain setup safely); otherwise 0. "
            "Return a Python list of 0/1 aligned with candidates."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "PPVoterNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.tempo_votes = votes
        return PPVoterNode()


@dataclass
class PPVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:x-ai/grok-4-fast",
        system_prompt=(
            "PP Conservation Voter: Prefer conserving scarce PP; vote 1 when the candidate either "
            "uses a common PP move for chip damage or SWITCHES to preserve a key low-PP move; else 0. "
            "Return a Python list of 0/1 aligned with candidates."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "DiversityVoterNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.pp_votes = votes
        return DiversityVoterNode()


@dataclass
class DiversityVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:qwen/qwen3-next-80b-a3b-thinking",
        system_prompt=(
            "Diversity Voter: Encourage non-redundant options. Vote 1 when the candidate adds "
            "coverage not present in other candidates this turn (e.g., different target, status vs raw damage, "
            "or SWITCH to change matchup); else 0. Return a Python list of 0/1 aligned with candidates."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "TallyNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.diversity_votes = votes
        return TallyNode()


@dataclass
class TallyNode(BaseNode):
    """Aggregate votes across voters and pick the candidate with the most YES votes."""
    async def run(self, context: GraphRunContext[GraphState]) -> End:
        s = context.state
        assert s.plan is not None

        buckets: List[List[int]] = []
        for name in ["accuracy_votes", "type_votes", "tempo_votes", "pp_votes", "diversity_votes"]:
            votes = getattr(s, name)
            if votes is not None:
                buckets.append(votes)

        n_candidates = len(s.plan.candidates)
        totals = [0] * n_candidates
        for bucket in buckets:
            if len(bucket) != n_candidates:
                # Defensive: truncate/pad to align
                bucket = (bucket + [0] * n_candidates)[:n_candidates]
            for i, v in enumerate(bucket):
                totals[i] += int(v)

        # Majority pick (max votes); deterministic tiebreak = first max
        best_idx = max(range(n_candidates), key=lambda i: totals[i])
        s.final_decision = s.plan.candidates[best_idx]

        # Optional: print a quick audit
        rprint({"totals": totals, "chosen_index": best_idx, "chosen": s.final_decision})

        return End(s.final_decision)

democracy_graph = Graph(
    nodes=[PlannerNode, AccuracyVoterNode, TypeMatchupVoterNode, TempoVoterNode, PPVoterNode, DiversityVoterNode, TallyNode],
    state_type=GraphState,
)

example_agent_context = AgentContext(
    turn=1,
    weather={},
    you_active="Pikachu",
    you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
    opp_active="Bulbasaur",
    opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
    legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
    legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
    past_actions=[],
)

result = await democracy_graph.run(PlannerNode(), state=GraphState(context=example_agent_context))
rprint("Final Democratic Decision:", result.state.final_decision)
23:20:42.677 run graph democracy_graph
23:20:42.677   run node PlannerNode
23:21:17.224   run node AccuracyVoterNode
23:21:44.775   run node TypeMatchupVoterNode
23:21:48.946   run node TempoVoterNode
23:21:50.458   run node PPVoterNode
23:21:59.919   run node DiversityVoterNode
23:24:35.191   run node TallyNode
{
    'totals': [3, 3, 2],
    'chosen_index': 0,
    'chosen': PlanCandidate(
        kind='move',
        move_id='Thunderbolt',
        switch_species=None,
        rationale='Hit now with Thunderbolt — 100% accuracy and guaranteed damage this turn. Although Electric is 
not very effective vs Grass/Poison Bulbasaur, this is your only immediate attacking option and it may chip or 
finish a weakened/paralyzed foe.'
    )
}
Final Democratic Decision:
PlanCandidate(
    kind='move',
    move_id='Thunderbolt',
    switch_species=None,
    rationale='Hit now with Thunderbolt — 100% accuracy and guaranteed damage this turn. Although Electric is not 
very effective vs Grass/Poison Bulbasaur, this is your only immediate attacking option and it may chip or finish a 
weakened/paralyzed foe.'
)

🎭 Actor–Critic Multi-Agent Workflow#

Our final agentic swarm model is the actor–critic style workflow, again implemented using pydantic-graph. This model is inspired by reinforcement learning but adapted for multi-LLM reasoning. Here, we explicitly separate proposal and evaluation, allowing the agent swarm to reason iteratively about value and risk before acting.

🧩 Roles and flow#

The graph proceeds through four main nodes:

  1. ActorNode – The policy generator

    • Proposes 3–4 legal moves or switches based on the current Pokémon context.

    • Behaves like a policy network, outputting candidate actions with rationales.

  2. CriticNode – The value estimator

    • Evaluates each candidate with a Q-value (expected outcome), risk, and confidence score.

    • This acts as a “value network” estimating how good each candidate really is.

  3. ImproveNode – The policy improver (optional)

    • If the best candidate is too risky or has low Q-value, the improver agent asks the actor to refine its plan.

    • The critic is then re-invoked to rescore the improved candidates.

    • This mimics the actor–critic policy improvement loop in RL.

  4. SelectorNode – The final decision layer

    • Combines critic outputs into an adjusted score: \(\text{AdjustedScore} = Q \times (1 - \lambda \cdot \text{risk}) \times (0.5 + 0.5 \times \text{confidence})\)

    • Picks the candidate with the highest adjusted value and terminates the graph with an End node.

🧠 What makes this pattern powerful#

  • Iterative refinement – Unlike the manager-coordinator (hierarchical) or democratic (ensemble) designs, the actor–critic loop learns from its own evaluation.

  • Value-based reasoning – The critic explicitly quantifies the expected reward of each move, enabling long-term strategic play rather than greedy local choices.

  • Adaptive depth – The ImproveNode only triggers refinement when quality or safety drops, giving us dynamic compute allocation.

  • Interpretability – Q-values, risk, and confidence are visible for each decision, so you can trace why the agent preferred one move over another.

⚙️ Comparing paradigms#

Workflow Type

Nature

Example Use

Pros

Trade-offs

Manager–Coordinator

Hierarchical

Strategic planning under constraints

Modular, dynamic routing

Slight overhead for routing logic

Democratic

Ensemble

Collective judgment / robustness

High diversity, fault tolerance

Higher latency, no feedback loop

Actor–Critic

Iterative feedback

Adaptive value-based control

Learns/refines actions, interpretable

Slightly more compute per turn

🚀 Why it fits Pokémon and beyond#

  • Battles require balancing expected gain vs. survivability, just like value-based RL tasks.

  • The critic captures contextual trade-offs (damage, tempo, risk), while the actor continuously learns what kinds of proposals score best.

  • This same structure can generalize to decision-making agents in finance, robotics, or multi-stage planning — anywhere feedback-driven refinement is useful.

from __future__ import annotations
from typing import List, Optional, Literal
from dataclasses import dataclass

from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_graph import BaseNode, End, Graph, GraphRunContext
from rich import print as rprint

class CriticScore(BaseModel):
    index: int = Field(description="Index into Plan.candidates[]")
    q_value: float = Field(ge=0.0, description="Estimated value; higher is better")
    risk: float = Field(ge=0.0, le=1.0, description="0 safe, 1 very risky")
    confidence: float = Field(ge=0.0, le=1.0, description="Critic confidence in this score")
    notes: Optional[str] = None

class GraphState(BaseModel):
    context: AgentContext
    plan: Optional[Plan] = None
    critic_scores: Optional[List[CriticScore]] = None
    refined: bool = False
    final_decision: Optional[PlanCandidate] = None

@dataclass
class ActorNode(BaseNode):
    actor = Agent(
        model="openrouter:google/gemini-2.5-pro",
        system_prompt=(
            "ACTOR: Propose 3-4 LEGAL actions (moves or switches) for the current Pokémon context.\n"
            "Favor super-effective, high-accuracy moves; include a safe SWITCH if matchup is bad.\n"
            "Do NOT invent illegal actions. Keep rationales concise."
        ),
        output_type=Plan,
        retries=3
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "CriticNode":
        s = context.state
        plan = (await self.actor.run(agent_context_to_string(s.context))).output
        s.plan = plan
        return CriticNode()


@dataclass
class CriticNode(BaseNode):
    critic = Agent(
        model="openrouter:anthropic/claude-sonnet-4.5",
        system_prompt=(
            "CRITIC: For each candidate, estimate a Q-value in [0, +inf) capturing expected outcome "
            "(damage, survival, tempo) for THIS TURN and near future. Also output risk in [0,1] "
            "(0=safe,1=dangerous) and confidence in [0,1]. Keep notes brief. "
            "Return a list aligned with candidates using fields: index, q_value, risk, confidence, notes."
        ),
        output_type=List[CriticScore],
        retries=3
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "ImproveNode":
        s = context.state
        assert s.plan is not None
        prompt = agent_context_to_string(s.context) + "\n\nCANDIDATES:\n" + s.plan.model_dump_json(indent=2)
        scores = (await self.critic.run(prompt)).output
        # Defensive: clamp and align indices
        n = len(s.plan.candidates)
        clean = []
        for sc in scores:
            i = max(0, min(n - 1, int(sc.index)))
            clean.append(CriticScore(
                index=i,
                q_value=max(0.0, float(sc.q_value)),
                risk=min(1.0, max(0.0, float(sc.risk))),
                confidence=min(1.0, max(0.0, float(sc.confidence))),
                notes=sc.notes,
            ))
        s.critic_scores = clean
        return ImproveNode()


@dataclass
class ImproveNode(BaseNode):
    """Optional one-step policy improvement: if best Q is weak or risk is high, ask actor to refine once."""
    improver = Agent(
        model="openrouter:openai/gpt-5",
        system_prompt=(
            "IMPROVER: Given context, current candidates, and critic feedback, produce up to 2 REFINED "
            "legal alternatives that address the critic's concerns (e.g., too risky, low value). "
            "If current best is already strong, return an empty list to keep it."
        ),
        output_type=List[PlanCandidate],
        retries=2
    )

    # thresholds for triggering refinement
    min_good_q: float = 0.75
    max_ok_risk: float = 0.65

    async def run(self, context: GraphRunContext[GraphState]) -> "SelectorNode":
        s = context.state
        assert s.plan is not None and s.critic_scores is not None

        # Determine if refinement is needed
        best = max(s.critic_scores, key=lambda x: x.q_value)
        need_refine = (best.q_value < self.min_good_q) or (best.risk > self.max_ok_risk)

        if not need_refine or s.refined:
            return SelectorNode()

        prompt = (
            agent_context_to_string(s.context)
            + "\n\nCANDIDATES:\n" + s.plan.model_dump_json(indent=2)
            + "\n\nCRITIC:\n" + "\n".join(f"[{c.index}] Q={c.q_value:.2f} risk={c.risk:.2f} conf={c.confidence:.2f} {c.notes or ''}"
                                          for c in s.critic_scores)
        )
        new_opts = (await self.improver.run(prompt)).output

        if new_opts:
            # merge refined options (append, keep old too)
            s.plan = Plan(candidates=(s.plan.candidates + new_opts)[:6])  # cap to avoid prompt bloat
            s.refined = True
            return CriticNode()  # re-score with critic after refinement
        else:
            return SelectorNode()


@dataclass
class SelectorNode(BaseNode):
    """Pick argmax of an adjusted score: Q * (1 - λ*risk) with confidence weighting."""
    lambda_risk: float = 0.35

    async def run(self, context: GraphRunContext[GraphState]) -> End:
        s = context.state
        assert s.plan is not None and s.critic_scores is not None

        adjusted = []
        for sc in s.critic_scores:
            adj = sc.q_value * (1.0 - self.lambda_risk * sc.risk) * (0.5 + 0.5 * sc.confidence)
            adjusted.append((sc.index, adj))

        best_idx, _ = max(adjusted, key=lambda t: t[1])
        s.final_decision = s.plan.candidates[best_idx]

        return End(s.final_decision)

actor_critic_graph = Graph(
    nodes=[ActorNode, CriticNode, ImproveNode, SelectorNode],
    state_type=GraphState,
)

example_agent_context = AgentContext(
    turn=1,
    weather={},
    you_active="Pikachu",
    you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
    opp_active="Bulbasaur",
    opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
    legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
    legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
    past_actions=[],
)

result = await actor_critic_graph.run(ActorNode(), state=GraphState(context=example_agent_context))
rprint("Actor–Critic Decision:", result.state.final_decision)
23:33:07.115 run graph actor_critic_graph
23:33:07.117   run node ActorNode
23:33:22.450   run node CriticNode
23:33:30.335   run node ImproveNode
23:33:30.337   run node SelectorNode
Actor–Critic Decision:
PlanCandidate(
    kind='move',
    move_id='Thunderbolt',
    switch_species=None,
    rationale='Thunderbolt is a strong, reliable STAB move with 100% accuracy. Bulbasaur is paralyzed, so it may 
not even move. This is a good offensive option.'
)
actor_critic_graph = Graph(
    nodes=[ActorNode, CriticNode, ImproveNode, SelectorNode],
    state_type=GraphState,
)

actor_critic_agent = PydanticGraphAgent(name="LLM Agent", agentic_graph=actor_critic_graph, first_node=ActorNode())

await actor_critic_agent.battle_against(RandomPlayer(), n_battles=1)
23:33:35.277 run graph None
23:33:35.280   run node ActorNode
23:33:49.671   run node CriticNode
23:34:00.227   run node ImproveNode
23:34:00.229   run node SelectorNode
T5 DECISION: A safe switch. Ironthorns has a good defensive typing (Rock/Electric) against Mightyena's likely 
Dark-type STAB moves and can threaten back with its own powerful attacks.
23:34:00.237 run graph None
23:34:00.238   run node ActorNode
23:34:30.396   run node CriticNode
23:34:40.744   run node ImproveNode
23:34:40.744   run node SelectorNode
T6 DECISION: Deals STAB damage and allows a safe switch to a better-positioned Pokémon like Krookodile, adapting to
the opponent's move.
23:34:40.752 run graph None
23:34:40.752   run node ActorNode
23:34:57.234   run node CriticNode
23:35:03.734   run node ImproveNode
23:35:03.734   run node SelectorNode
T6 DECISION: Mightyena is a Dark-type pokemon, so a Bug-type move like Megahorn would be super-effective. 
Ironthorns is faster and should be able to knock out Mightyena with this move.
23:35:03.741 run graph None
23:35:03.742   run node ActorNode
23:35:16.440   run node CriticNode
23:35:28.183   run node ImproveNode
23:35:28.183   run node SelectorNode
T7 DECISION: Close Combat is a super-effective, high-power move that will likely knock out Mightyena.
23:35:28.199 run graph None
23:35:28.199   run node ActorNode
23:35:46.445   run node CriticNode
23:35:54.060   run node ImproveNode
23:35:54.061   run node SelectorNode
T8 DECISION: This is another safe switch. Zapdos resists Flying-type attacks and can hit Enamorus with a 
super-effective Electric-type STAB move.
23:35:54.073 run graph None
23:35:54.074   run node ActorNode
23:36:07.962   run node CriticNode
23:36:15.468   run node ImproveNode
23:36:15.468   run node SelectorNode
T9 DECISION: Ironthorns resists Normal-type attacks, making it a safe Pokémon to switch into against Maushold. It 
can absorb a potential Population Bomb and retaliate.
23:36:15.474 run graph None
23:36:15.474   run node ActorNode
23:36:36.471   run node CriticNode
23:36:45.996   run node ImproveNode
23:36:45.996   run node SelectorNode
T10 DECISION: Volt Switch deals damage and allows a pivot to another Pokémon. This is a safe, strategic option to 
maintain momentum and react to a potential switch from the opponent.
23:36:46.004 run graph None
23:36:46.004   run node ActorNode
23:37:05.105   run node CriticNode
23:37:14.771   run node ImproveNode
23:37:14.771   run node SelectorNode
T10 DECISION: Krookodile is immune to Jolteon's electric attacks and can threaten it with a super-effective ground 
type move. It is a high-risk high-reward play.
23:37:14.779 run graph None
23:37:14.780   run node ActorNode
23:37:29.325   run node CriticNode
23:37:35.950   run node ImproveNode
23:37:35.950   run node SelectorNode
T11 DECISION: Enamorus is immune to Earthquake and threatens Krookodile with its Fairy typing. Iron Thorns has a 
type advantage, resisting Enamorus's Flying-type moves and can hit back with super-effective Rock or Electric-type 
attacks.
23:37:35.955 run graph None
23:37:35.956   run node ActorNode
23:37:54.459   run node CriticNode
23:38:02.806   run node ImproveNode
23:38:02.806   run node SelectorNode
T12 DECISION: Krookodile is immune to Jolteon's Electric-type STAB moves and can OHKO it with a Ground-type move in
return.
23:38:02.816 run graph None
23:38:02.816   run node ActorNode
23:38:19.945   run node CriticNode
23:38:31.231   run node ImproveNode
23:38:31.231   run node SelectorNode
T13 DECISION: Earthquake is a super-effective STAB move against Jolteon, which is very likely to knock it out. 
Given Krookodile's immunity to Electric attacks, this is a strong offensive choice.
23:38:31.235 run graph None
23:38:31.235   run node ActorNode
23:38:49.396   run node CriticNode
23:38:57.798   run node ImproveNode
23:38:57.799   run node SelectorNode
T14 DECISION: Earthquake is Krookodile's strongest and most reliable STAB move against Maushold. It has high power 
and perfect accuracy, making it the most straightforward and effective offensive option.
23:38:57.804 run graph None
23:38:57.805   run node ActorNode
23:39:15.425   run node CriticNode
23:39:25.445   run node ImproveNode
23:39:25.446   run node SelectorNode
T15 DECISION: Iron Thorns is an excellent switch-in. Its Rock/Electric typing gives it a 4x super-effective 
advantage against Enamorus's Flying type, and it resists Flying-type moves.
23:39:25.454 run graph None
23:39:25.454   run node ActorNode
23:39:41.633   run node CriticNode
23:39:51.240   run node ImproveNode
23:39:51.240   run node SelectorNode
T16 DECISION: Wild Charge is a powerful, super-effective STAB move with 100% accuracy. It has a high chance to 
knock out Enamorus in one hit.
23:39:51.250 run graph None
23:39:51.250   run node ActorNode
23:40:13.728   run node CriticNode
23:40:24.176   run node ImproveNode
23:40:24.177   run node SelectorNode
T17 DECISION: Volt Switch deals damage and allows a switch to a healthier Pokémon like Zapdos or Feraligatr, which 
can more safely absorb a hit from Maushold.
23:40:24.184 run graph None
23:40:24.185   run node ActorNode
23:40:43.283   run node CriticNode
23:40:51.208   run node ImproveNode
23:40:51.208   run node SelectorNode
T17 DECISION: Switching to Zapdos is another strong choice. It has good defensive typing, resisting potential 
Fighting-type coverage moves from Maushold. Zapdos is also very fast and can threaten with powerful special 
attacks.
23:40:51.217 run graph None
23:40:51.218   run node ActorNode
23:41:17.314   run node CriticNode
23:41:26.949   run node ImproveNode
23:41:26.949   run node SelectorNode
T18 DECISION: Discharge is a reliable STAB move with 100% accuracy, offering consistent damage against Maushold 
without the risk of missing.
23:41:26.951 run graph None
23:41:26.951   run node ActorNode
23:41:47.572   run node CriticNode
23:41:56.741   run node ImproveNode
23:41:56.741   run node SelectorNode
T19 DECISION: Maushold is asleep and at low health. Discharge is a reliable STAB move with 100% accuracy that could
secure the knockout.
23:41:56.750 run graph None
23:41:56.750   run node ActorNode
23:42:10.500   run node CriticNode
23:42:18.964   run node ImproveNode
23:42:18.967   run node SelectorNode
CONTEXT:
AgentContext(
    turn=1,
    weather={},
    you_active='deoxys',
    you_team=[
        TeamMon(
            species='deoxys',
            hp=1.0,
            fainted=False,
            types=['PSYCHIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='brutebonnet',
            hp=1.0,
            fainted=False,
            types=['GRASS', 'DARK'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='weavile',
            hp=1.0,
            fainted=False,
            types=['DARK', 'ICE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='regice',
            hp=1.0,
            fainted=False,
            types=['ICE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='fluttermane',
            hp=1.0,
            fainted=False,
            types=['GHOST', 'FAIRY'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='zapdosgalar',
            hp=1.0,
            fainted=False,
            types=['FIGHTING', 'FLYING'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    opp_active='dedenne',
    opp_known=[
        TeamMon(
            species='dedenne',
            hp=1.0,
            fainted=False,
            types=['ELECTRIC', 'FAIRY'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    legal_moves=[
        MoveOption(move_id='psychoboost', base_power=140, accuracy=0.9, move_type='PSYCHIC', pp=8, priority=0),
        MoveOption(move_id='knockoff', base_power=65, accuracy=1.0, move_type='DARK', pp=32, priority=0),
        MoveOption(move_id='extremespeed', base_power=80, accuracy=1.0, move_type='NORMAL', pp=8, priority=2),
        MoveOption(move_id='superpower', base_power=120, accuracy=1.0, move_type='FIGHTING', pp=8, priority=0)
    ],
    legal_switches=[
        SwitchOption(species='brutebonnet', hp=1.0, fainted=False, types=['GRASS', 'DARK']),
        SwitchOption(species='weavile', hp=1.0, fainted=False, types=['DARK', 'ICE']),
        SwitchOption(species='regice', hp=1.0, fainted=False, types=['ICE']),
        SwitchOption(species='fluttermane', hp=1.0, fainted=False, types=['GHOST', 'FAIRY']),
        SwitchOption(species='zapdosgalar', hp=1.0, fainted=False, types=['FIGHTING', 'FLYING'])
    ],
    past_actions=[
        'T5: SWITCH to ironthorns (from krookodile)',
        'T6: MOVE voltswitch (ironthorns vs mightyena)',
        'T6: FALLBACK random',
        'T7: MOVE closecombat (braviary vs mightyena)',
        'T8: SWITCH to zapdos (from braviary)',
        'T9: SWITCH to ironthorns (from zapdos)',
        'T10: MOVE voltswitch (ironthorns vs mausholdfour)',
        'T10: SWITCH to krookodile (from ironthorns)',
        'T11: SWITCH to ironthorns (from krookodile)',
        'T12: SWITCH to krookodile (from ironthorns)',
        'T13: MOVE earthquake (krookodile vs jolteon)',
        'T14: MOVE earthquake (krookodile vs mausholdfour)',
        'T15: SWITCH to ironthorns (from krookodile)',
        'T16: MOVE wildcharge (ironthorns vs enamorus)',
        'T17: MOVE voltswitch (ironthorns vs mausholdfour)',
        'T17: SWITCH to zapdos (from ironthorns)',
        'T18: MOVE discharge (zapdos vs mausholdfour)',
        'T19: MOVE discharge (zapdos vs mausholdfour)'
    ]
)
DECISION:
PlanCandidate(
    kind='switch',
    move_id=None,
    switch_species='fluttermane',
    rationale='Flutter Mane resists Electric-type moves and can threaten Dedenne with its STAB moves on the 
following turn. This is a relatively safe switch.'
)
23:42:19.033 run graph None
23:42:19.034   run node ActorNode
23:42:40.585   run node CriticNode
23:42:50.718   run node ImproveNode
23:42:50.719   run node SelectorNode
T2 DECISION: High power STAB move that will deal significant damage to Glalie. Fluttermane is faster, so it will 
attack first.
23:42:50.731 run graph None
23:42:50.732   run node ActorNode
23:43:10.551   run node CriticNode
23:43:18.699   run node ImproveNode
23:43:18.700   run node SelectorNode
T3 DECISION: Moonblast is Flutter Mane's strongest STAB move and is likely to knock out the weakened Skuntank.
23:43:18.717 run graph None
23:43:18.719   run node ActorNode
23:43:38.243   run node CriticNode
23:43:49.050   run node ImproveNode
23:43:49.051   run node SelectorNode
T4 DECISION: Moonblast is a 4x super-effective STAB move against Garchomp, and with Flutter Mane's high speed and 
special attack, it has a strong chance of securing a one-hit knockout.
23:43:49.066 run graph None
23:43:49.069   run node ActorNode
23:44:12.956   run node CriticNode
23:44:23.324   run node ImproveNode
23:44:23.325   run node SelectorNode
T5 DECISION: This is your strongest neutral attack against Dedenne. Given Flutter Mane's high Speed and Special 
Attack, this is a reliable way to deal significant damage.
23:44:23.338 run graph None
23:44:23.339   run node ActorNode
23:44:38.240   run node CriticNode
23:44:47.859   run node ImproveNode
23:44:47.861   run node SelectorNode
T6 DECISION: Moonblast is your strongest STAB move against Dedenne and is very likely to knock it out, as Dedenne 
is already at 52% HP.
23:44:47.886 run graph None
23:44:47.888   run node ActorNode
23:45:08.940   run node CriticNode
23:45:18.377   run node ImproveNode
23:45:18.378   run node SelectorNode
T7 DECISION: Regice has massive special defense and resists Ice-type attacks, making it a very safe switch-in 
against Glalie.
23:45:18.384 run graph None
23:45:18.386   run node ActorNode
23:45:37.245   run node CriticNode
23:45:44.577   run node ImproveNode
23:45:44.577   run node SelectorNode
T8 DECISION: Weavile resists Glalie's STAB Ice-type moves and can hit back with its own super-effective attacks.
23:45:44.585 run graph None
23:45:44.586   run node ActorNode
23:46:01.809   run node CriticNode
23:46:09.654   run node ImproveNode
23:46:09.655   run node SelectorNode
T9 DECISION: Weavile is faster than Glalie. Knock Off deals solid neutral damage and removes Glalie's item, which 
could be a threat.
23:46:09.671 run graph None
23:46:09.673   run node ActorNode
23:46:26.144   run node CriticNode
23:46:35.060   run node ImproveNode
23:46:35.061   run node SelectorNode
T10 DECISION: Weavile is faster and can deal significant neutral damage with Knock Off, potentially knocking out 
Glalie. This is a strong offensive option.
23:46:35.074 run graph None
23:46:35.076   run node ActorNode
23:46:53.692   run node CriticNode
23:47:01.924   run node ImproveNode
23:47:01.925   run node SelectorNode
T11 DECISION: Guaranteed to hit first and has 100% accuracy. Glalie is at low HP, so this priority STAB move should
secure the knockout.
23:47:01.935 run graph None
23:47:01.935   run node ActorNode
23:47:23.878   run node CriticNode
23:47:32.369   run node ImproveNode
23:47:32.370   run node SelectorNode
T12 DECISION: This is the safest option. With its priority, Ice Shard is guaranteed to hit first, and with 100% 
accuracy, it's very likely to knock out the weakened Glalie without any risk.
23:47:32.384 run graph None
23:47:32.385   run node ActorNode
23:47:51.924   run node CriticNode
23:48:01.713   run node ImproveNode
23:48:01.713   run node SelectorNode
T13 DECISION: Brute Bonnet's Grass typing provides a 4x resistance to Pikachu's Electric STAB moves, making it a 
very safe switch-in.
23:48:01.729 run graph None
23:48:01.730   run node ActorNode
23:48:24.346   run node CriticNode
23:48:35.492   run node ImproveNode
23:48:35.492   run node SelectorNode
T14 DECISION: Puts the opponent to sleep with 100% accuracy, neutralizing the immediate threat and creating an 
opportunity for a follow-up attack or a safe switch.
23:48:35.532 run graph None
23:48:35.534   run node ActorNode
23:49:01.947   run node CriticNode
23:49:10.117   run node ImproveNode
23:49:10.117   run node SelectorNode
T15 DECISION: This is your strongest attack. It's a super-effective STAB move against a sleeping opponent, so you 
are guaranteed to land a powerful hit.
23:49:10.154 run graph None
23:49:10.155   run node ActorNode
23:49:35.530   run node CriticNode
23:49:44.780   run node ImproveNode
23:49:44.780   run node SelectorNode
T16 DECISION: Zapdos-Galar has a 4x resistance to Ground and a resistance to Dark, making it an excellent switch-in
against Krookodile. It's at full health and can threaten a super-effective Fighting-type move.
23:49:44.800 run graph None
23:49:44.802   run node ActorNode
23:50:09.285   run node CriticNode
23:50:17.974   run node ImproveNode
23:50:17.974   run node SelectorNode
T17 DECISION: Brave Bird is a powerful STAB move that can deal significant neutral damage to Pikachu. This is a 
high-risk, high-reward play that could KO Pikachu if it doesn't have a-lot of investment in bulk.
23:50:17.982 run graph None
23:50:17.982   run node ActorNode
23:50:35.423   run node CriticNode
23:50:42.465   run node ImproveNode
23:50:42.466   run node SelectorNode
T18 DECISION: Since Pikachu is asleep, this is a free turn to set up. Bulk Up will boost Zapdos-Galar's Attack and 
Defense, making it much stronger against the incoming Krookodile.
23:50:42.473 run graph None
23:50:42.474   run node ActorNode
23:51:06.331   run node CriticNode
23:51:17.538   run node ImproveNode
23:51:17.538   run node SelectorNode
T19 DECISION: This move will KO the opposing Pikachu and is also super-effective against the likely switch-in, 
Krookodile.
23:51:17.565 run graph None
23:51:17.565   run node ActorNode
23:51:40.014   run node CriticNode
23:51:51.301   run node ImproveNode
23:51:51.302   run node SelectorNode
T20 DECISION: Close Combat is a super-effective STAB move that, when combined with the +2 Attack boost, has a very 
high chance of knocking out Krookodile.
23:51:51.311 run graph None
23:51:51.311   run node ActorNode
23:52:08.270   run node CriticNode
23:52:19.718   run node ImproveNode
23:52:19.718   run node SelectorNode
CONTEXT:
AgentContext(
    turn=1,
    weather={},
    you_active='regidrago',
    you_team=[
        TeamMon(
            species='regidrago',
            hp=1.0,
            fainted=False,
            types=['DRAGON'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='veluza',
            hp=1.0,
            fainted=False,
            types=['WATER', 'PSYCHIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='eelektross',
            hp=1.0,
            fainted=False,
            types=['ELECTRIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='gliscor',
            hp=1.0,
            fainted=False,
            types=['GROUND', 'FLYING'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='dewgong',
            hp=1.0,
            fainted=False,
            types=['WATER', 'ICE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='volcarona',
            hp=1.0,
            fainted=False,
            types=['BUG', 'FIRE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    opp_active='ampharos',
    opp_known=[
        TeamMon(
            species='ampharos',
            hp=1.0,
            fainted=False,
            types=['ELECTRIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    legal_moves=[
        MoveOption(move_id='dragondance', base_power=0, accuracy=1.0, move_type='DRAGON', pp=32, priority=0),
        MoveOption(move_id='outrage', base_power=120, accuracy=1.0, move_type='DRAGON', pp=16, priority=0),
        MoveOption(move_id='dracometeor', base_power=130, accuracy=0.9, move_type='DRAGON', pp=8, priority=0),
        MoveOption(move_id='earthquake', base_power=100, accuracy=1.0, move_type='GROUND', pp=16, priority=0)
    ],
    legal_switches=[
        SwitchOption(species='veluza', hp=1.0, fainted=False, types=['WATER', 'PSYCHIC']),
        SwitchOption(species='eelektross', hp=1.0, fainted=False, types=['ELECTRIC']),
        SwitchOption(species='gliscor', hp=1.0, fainted=False, types=['GROUND', 'FLYING']),
        SwitchOption(species='dewgong', hp=1.0, fainted=False, types=['WATER', 'ICE']),
        SwitchOption(species='volcarona', hp=1.0, fainted=False, types=['BUG', 'FIRE'])
    ],
    past_actions=[
        'T5: SWITCH to ironthorns (from krookodile)',
        'T6: MOVE voltswitch (ironthorns vs mightyena)',
        'T6: FALLBACK random',
        'T7: MOVE closecombat (braviary vs mightyena)',
        'T8: SWITCH to zapdos (from braviary)',
        'T9: SWITCH to ironthorns (from zapdos)',
        'T10: MOVE voltswitch (ironthorns vs mausholdfour)',
        'T10: SWITCH to krookodile (from ironthorns)',
        'T11: SWITCH to ironthorns (from krookodile)',
        'T12: SWITCH to krookodile (from ironthorns)',
        'T13: MOVE earthquake (krookodile vs jolteon)',
        'T14: MOVE earthquake (krookodile vs mausholdfour)',
        'T15: SWITCH to ironthorns (from krookodile)',
        'T16: MOVE wildcharge (ironthorns vs enamorus)',
        'T17: MOVE voltswitch (ironthorns vs mausholdfour)',
        'T17: SWITCH to zapdos (from ironthorns)',
        'T18: MOVE discharge (zapdos vs mausholdfour)',
        'T19: MOVE discharge (zapdos vs mausholdfour)',
        'T1: SWITCH to fluttermane (from deoxys)',
        'T2: MOVE moonblast (fluttermane vs glalie)',
        'T3: MOVE moonblast (fluttermane vs skuntank)',
        'T4: MOVE moonblast (fluttermane vs garchomp)',
        'T5: MOVE shadowball (fluttermane vs dedenne)',
        'T6: MOVE moonblast (fluttermane vs dedenne)',
        'T7: SWITCH to regice (from fluttermane)',
        'T8: SWITCH to weavile (from regice)',
        'T9: MOVE knockoff (weavile vs glalie)',
        'T10: MOVE knockoff (weavile vs glalie)',
        'T11: MOVE iceshard (weavile vs glalie)',
        'T12: MOVE iceshard (weavile vs glalie)',
        'T13: SWITCH to brutebonnet (from weavile)',
        'T14: MOVE spore (brutebonnet vs pikachuhoenn)',
        'T15: MOVE seedbomb (brutebonnet vs pikachuhoenn)',
        'T16: SWITCH to zapdosgalar (from brutebonnet)',
        'T17: MOVE bravebird (zapdosgalar vs pikachuhoenn)',
        'T18: MOVE bulkup (zapdosgalar vs pikachuhoenn)',
        'T19: MOVE closecombat (zapdosgalar vs pikachuhoenn)',
        'T20: MOVE closecombat (zapdosgalar vs krookodile)'
    ]
)
DECISION:
PlanCandidate(
    kind='move',
    move_id='earthquake',
    switch_species=None,
    rationale='Earthquake is a super-effective, high-power, and perfectly accurate move against the opposing 
Ampharos. It is the strongest and most reliable offensive option.'
)
23:52:19.761 run graph None
23:52:19.761   run node ActorNode
23:52:38.907   run node CriticNode
23:52:48.803   run node ImproveNode
23:52:48.804   run node SelectorNode
T2 DECISION: Volcarona has a major advantage, resisting Cresselia's likely Ice and Fairy moves and threatening with
super-effective Bug-type attacks.
23:52:48.810 run graph None
23:52:48.810   run node ActorNode
23:53:06.655   run node CriticNode
23:53:14.731   run node ImproveNode
23:53:14.731   run node SelectorNode
T3 DECISION: Bug Buzz is a powerful STAB move that is super-effective against Cresselia's base Psychic typing, 
dealing significant damage despite its Special Defense boost.
23:53:14.741 run graph None
23:53:14.741   run node ActorNode
23:53:40.216   run node CriticNode
23:53:52.998   run node ImproveNode
23:53:53.000   run node SelectorNode
T4 DECISION: Bug Buzz is a powerful super-effective STAB move. With 100% accuracy, it's the most reliable way to 
inflict significant damage on Cresselia, even with its Special Defense boost.
23:53:53.018 run graph None
23:53:53.020   run node ActorNode
23:54:13.774   run node CriticNode
23:54:22.119   run node ImproveNode
23:54:22.120   run node SelectorNode
T5 DECISION: Gliscor is immune to Electric-type attacks, making it a perfect switch-in against a Terastallized 
Electric Cresselia. It can then threaten with super-effective Ground-type moves.
23:54:22.126 run graph None
23:54:22.126   run node ActorNode
23:54:41.783   run node CriticNode
23:54:50.920   run node ImproveNode
23:54:50.920   run node SelectorNode
T6 DECISION: Gliscor has a 4x weakness to Delibird's Ice-type attacks. Switching to Eelektross is a safe and 
advantageous move, as it resists Flying-type moves and can counter with super-effective Electric-type attacks.
23:54:50.926 run graph None
23:54:50.927   run node ActorNode
23:55:02.354   run node CriticNode
23:55:11.527   run node ImproveNode
23:55:11.527   run node SelectorNode
T7 DECISION: Drain Punch is super-effective (Fighting vs. Dark/Fire) and has 100% accuracy, making it a strong and 
reliable offensive option to deal significant damage to Houndoom.
23:55:11.548 run graph None
23:55:11.551   run node ActorNode
23:55:28.942   run node CriticNode
23:55:37.917   run node ImproveNode
23:55:37.917   run node SelectorNode
T8 DECISION: Gliscor is immune to Falinks's Fighting-type STAB moves, making it a very safe switch-in. It can 
threaten Falinks back with its own STABs.
23:55:37.941 run graph None
23:55:37.944   run node ActorNode
23:56:00.686   run node CriticNode
23:56:11.033   run node ImproveNode
23:56:11.033   run node SelectorNode
T9 DECISION: Earthquake is Gliscor's strongest move against Falinks and has a high chance of knocking it out at its
current health. Gliscor's typing resists Falinks's STAB moves, making this a safe and powerful offensive option.
23:56:11.055 run graph None
23:56:11.057   run node ActorNode
23:56:25.091   run node CriticNode
23:56:33.863   run node ImproveNode
23:56:33.863   run node SelectorNode
T10 DECISION: Earthquake is a powerful STAB move that should be able to knock out the already weakened Falinks, 
especially with its lowered defense.
23:56:33.876 run graph None
23:56:33.877   run node ActorNode
23:56:52.833   run node CriticNode
23:57:03.311   run node ImproveNode
23:57:03.312   run node SelectorNode
T11 DECISION: Switching to Eelektross is a safe move. It resists Ampharos's Electric STAB moves and is only hit 
neutrally by potential Ice-type coverage. Its Levitate ability also makes it immune to any predicted Ground-type 
attacks.
23:57:03.318 run graph None
23:57:03.318   run node ActorNode
23:57:23.176   run node CriticNode
23:57:33.817   run node ImproveNode
23:57:33.818   run node SelectorNode
T12 DECISION: This move has 100% accuracy and should be sufficient to knock out the low-HP Falinks, making it a 
safer option than Supercell Slam.
23:57:33.833 run graph None
23:57:33.834   run node ActorNode
23:57:51.153   run node CriticNode
2025-11-10 23:57:55,507 - PydanticGraphAge 1 - ERROR - Unhandled exception raised while handling message:
>battle-gen9randombattle-279
|request|{"active":[{"moves":[{"move":"Drain Punch","id":"drainpunch","pp":15,"maxpp":16,"target":"normal","disabled":false},{"move":"Supercell Slam","id":"supercellslam","pp":24,"maxpp":24,"target":"normal","disabled":false},{"move":"Coil","id":"coil","pp":32,"maxpp":32,"target":"self","disabled":false},{"move":"Fire Punch","id":"firepunch","pp":23,"maxpp":24,"target":"normal","disabled":false}],"canTerastallize":"Fighting"}],"side":{"name":"PydanticGraphAge 1","id":"p1","pokemon":[{"ident":"p1: Eelektross","details":"Eelektross, L87, F","condition":"127/290","active":true,"stats":{"atk":250,"def":189,"spa":232,"spd":189,"spe":137},"moves":["drainpunch","supercellslam","coil","firepunch"],"baseAbility":"levitate","item":"leftovers","pokeball":"pokeball","ability":"levitate","commanding":false,"reviving":false,"teraType":"Fighting","terastallized":""},{"ident":"p1: Veluza","details":"Veluza, L85, F","condition":"292/292","active":false,"stats":{"atk":222,"def":173,"spa":181,"spd":159,"spe":168},"moves":["aquacutter","aquajet","psychocut","nightslash"],"baseAbility":"sharpness","item":"choiceband","pokeball":"pokeball","ability":"sharpness","commanding":false,"reviving":false,"teraType":"Dark","terastallized":""},{"ident":"p1: Gliscor","details":"Gliscor, L76, F","condition":"192/239 tox","active":false,"stats":{"atk":188,"def":234,"spa":112,"spd":158,"spe":188},"moves":["protect","toxic","knockoff","earthquake"],"baseAbility":"poisonheal","item":"toxicorb","pokeball":"pokeball","ability":"poisonheal","commanding":false,"reviving":false,"teraType":"Water","terastallized":""},{"ident":"p1: Volcarona","details":"Volcarona, L77, M","condition":"160/257","active":false,"stats":{"atk":97,"def":145,"spa":252,"spd":206,"spe":199},"moves":["bugbuzz","quiverdance","terablast","fireblast"],"baseAbility":"swarm","item":"heavydutyboots","pokeball":"pokeball","ability":"swarm","commanding":false,"reviving":false,"teraType":"Water","terastallized":""},{"ident":"p1: Dewgong","details":"Dewgong, L94, F","condition":"322/322","active":false,"stats":{"atk":185,"def":204,"spa":185,"spd":232,"spe":185},"moves":["surf","knockoff","tripleaxel","encore"],"baseAbility":"thickfat","item":"heavydutyboots","pokeball":"pokeball","ability":"thickfat","commanding":false,"reviving":false,"teraType":"Grass","terastallized":""},{"ident":"p1: Regidrago","details":"Regidrago, L77","condition":"435/435","active":false,"stats":{"atk":199,"def":122,"spa":199,"spd":122,"spe":168},"moves":["dragondance","outrage","dracometeor","earthquake"],"baseAbility":"dragonsmaw","item":"lumberry","pokeball":"pokeball","ability":"dragonsmaw","commanding":false,"reviving":false,"teraType":"Dragon","terastallized":""}]},"rqid":30}
Traceback (most recent call last):
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\ps_client\ps_client.py", line 142, in _handle_message
    await self._handle_battle_message(split_messages)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\player\player.py", line 291, in _handle_battle_message
    await self._handle_battle_request(battle)
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\player\player.py", line 416, in _handle_battle_request
    choice = self.choose_move(battle)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SHRESHTH\AppData\Local\Temp\ipykernel_27708\3706119269.py", line 21, in choose_move
    result = asyncio.run(self.graph.run(self.first_node, state=GraphState(context=ctx)))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "C:\Users\SHRESHTH\AppData\Local\Programs\Python\Python312\Lib\asyncio\futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "C:\Users\SHRESHTH\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py", line 304, in __step_run_and_handle_result
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 152, in run
    async for _node in graph_run:
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 766, in __anext__
    return await self.next(self._next_node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 739, in next
    self._next_node = await node.run(ctx)
                      ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SHRESHTH\AppData\Local\Temp\ipykernel_27708\3721675952.py", line 102, in run
    best = max(s.critic_scores, key=lambda x: x.q_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: max() iterable argument is empty
23:57:55.504   run node ImproveNode

Let’s see this battle in action:

You can also view the full battle here.

🏁 Conclusion#

In this tutorial, we explored how multi-agent workflows enable structured, interpretable reasoning in complex environments — using Pokémon battles as our sandbox.

We built and compared three distinct coordination paradigms:

  • 🧠 Manager–Coordinator: hierarchical and resource-aware — perfect for structured, goal-driven orchestration.

  • 🗳️ Democratic Swarms: ensemble-style decision-making — robust and diverse, harnessing the collective judgment of many agents.

  • 🎭 Actor–Critic: feedback-driven refinement — adaptive and value-based, blending planning with learned evaluation.

Each architecture has strengths depending on your problem structure, latency budget, and decision uncertainty. Together, they form the foundation of Agentic AI systems — not just single agents, but entire cognitive ecosystems working in concert.

🌐 Beyond these: other agentic swarm paradigms#

Modern agent research spans several other fascinating coordination styles that extend or hybridize these ideas:

  • 🕸️ Blackboard Systems / Shared Memory Agents — agents read and write to a common “knowledge blackboard” (e.g., HUB, LangGraph Shared Memory, ReAct with tool state).

  • 🪶 Evolutionary or Genetic Agent Swarms — agents mutate and compete (e.g., EvoAgents, Yuan et al. (2024)) to explore strategy space efficiently.

  • 🔁 Agentic Swarms — multiple agents monitor and fine-tune their own reasoning, using reflection or secondary evaluators (e.g., SwarmAgentic, Zhang et al. (2025)).

Together, these paradigms form a growing ecosystem of Agentic Swarms — distributed reasoning systems where intelligence emerges from structured interaction rather than single-model dominance.

🔮 What’s next#

So far, we’ve focused on how agents reason and collaborate. In the next tutorial, we’ll explore how to select the optimal model for each agent automatically — using meta-evaluation, model profiling, and cost–quality trade-offs to dynamically assign the best LLM to each sub-task.

We’ll essentially teach our system to self-optimize its model choices — the first step toward autonomous orchestration in large multi-agent setups.