07. Multi-Agent Workflows and Agentic Swarms

07. Multi-Agent Workflows and Agentic Swarms#

In the real world, no single agent can solve every problem optimally. As tasks grow in uncertainty, dimensionality, and interdependence — such as strategy games, simulations, robotics, or real-time business systems — we naturally evolve from single-agent reasoning to multi-agent workflows. In this tutorial, we see the first sparks of Super Agents. An AI Super-Agent is an orchestration system that coordinates multiple specialized AI agents to solve complex problems requiring diverse capabilities.

These workflows mirror how humans collaborate:

🗳️ Democratic committees balance diverse perspectives.
🧭 Hierarchical managers coordinate specialists under limited resources.
⚖️ Actor-Critic systems separate exploration (actor) from judgment (critic).

Each pattern encodes a different philosophy of coordination — distributing intelligence across specialized roles that communicate, negotiate, and arbitrate toward a shared goal.

⚙️ What Are Multi-Agent Workflows?#

A multi-agent workflow is a structured network of reasoning and action nodes — planners, evaluators, arbiters, memory modules — that interact through explicit channels rather than a single monolithic prompt.

Think of it as a graph of decision-making where:

Nodes = agents (LLMs, heuristics, or functions).
Edges = communication or dependency between them.
Memory = shared context that persists across steps.
Arbitration = how conflicting opinions are resolved.

This structure enables:

Parallel specialization (multiple evaluators in parallel).
Conditional routing (managers deciding who to consult).
Resource budgeting (decide when to skip expensive reasoning).
Explainability & debugging (explicit traces of who decided what).

🧩 Enter Pydantic Graphs #

Building and managing these interactions manually is painful — tracking state, type safety, branching, and parallel execution can get messy fast.

Pydantic Graphs solves this elegantly by combining:

✅ Typed data flow from Pydantic models — ensuring every node’s input and output are structured, validated, and traceable.
🕸️ Graph orchestration — defining agents and their dependencies as composable, inspectable workflows.
🔁 Parallel & conditional execution — automatically handling fan-out (multiple evaluators) and routing logic (manager/critic decisions).
🧾 Transparent traces — every step’s inputs, outputs, and reasoning can be logged, visualized, and replayed.

Together, they turn the messy spaghetti of agent calls into a declarative decision graph — a scalable foundation for complex, memory-aware, multi-agent systems.

🧩 What is `poke-env`?#

poke-env is a Python interface to the Pokémon Showdown battle simulator, providing an environment for reinforcement learning and AI experiments. It exposes each battle as a structured API — giving access to game state (Pokémon, moves, types, HP, etc.) and allowing agents to pick legal actions programmatically.

In our workflow, we’ll use poke-env as the testbed to:

⚔️ Pit different multi-agent strategies (democratic, manager, actor-critic) against each other.
📊 Compare performance through metrics like win rate, turns survived, and move efficiency.
🧠 Benchmark reasoning styles — seeing how coordination strategies translate into competitive outcomes.

Before running experiments, we’ll start a local Pokémon Showdown server instance. This spins up a self-contained battle environment where our agents can safely train, plan, and battle — making Pokémon the perfect arena for testing agentic intelligence in action.

from src.pokemon_showdown_setup import run_pokemon_showdown

pokemon_container = run_pokemon_showdown()

🟢 Container already running: pokemon-showdown (4a670e8e0ee1)

🧪 Getting Started with `poke-env`#

Before building our custom multi-agent workflows, let’s first understand how the poke-env battle environment works. It allows us to easily simulate Pokémon battles between automated agents — here, we’ll start with two simple RandomPlayer agents that pick legal moves at random.

By running a quick cross-evaluation, we can see how poke-env orchestrates matches, tracks results, and reports win rates — forming the foundation on which our more sophisticated, reasoning-based agents will later compete.

from poke_env.player.player import Player
from poke_env import RandomPlayer, cross_evaluate

from tabulate import tabulate

first_player = RandomPlayer()
second_player = RandomPlayer()

players = [first_player, second_player]

async def test_cross_evaluation(players, n_challenges=5):
    cross_evaluation = await cross_evaluate(players, n_challenges=n_challenges)

    table = [["-"] + [p.username for p in players]]
    for p_1, results in cross_evaluation.items():
        table.append([p_1] + [cross_evaluation[p_1][p_2] for p_2 in results])

    return tabulate(table)

print(await test_cross_evaluation(players))

--------------  --------------  --------------
-               RandomPlayer 1  RandomPlayer 2
RandomPlayer 1                  0.4
RandomPlayer 2  0.6
--------------  --------------  --------------

Let’s see an example battle in action.

You can also view the full battle here.

⚡ Creating a “Max Damage” Baseline#

To add another simple benchmark beyond the RandomPlayer, we’ll define a MaxDamagePlayer — an agent that always selects the move with the highest base power.

This gives us a more deterministic and aggressive baseline that prioritizes raw damage output over safety or strategy. By comparing our Pydantic AI agent against both Random and MaxDamage players, we can see whether reasoning and memory-aware planning lead to better decision-making than brute-force move selection.

class MaxDamagePlayer(Player):
    def choose_move(self, battle):
        if battle.available_moves:
            best_move = max(battle.available_moves, key=lambda move: move.base_power)

            if battle.can_tera:
                return self.create_order(best_move, terastallize=True)

            return self.create_order(best_move)
        else:
            return self.choose_random_move(battle)
        
players = [first_player, MaxDamagePlayer()]

print(await test_cross_evaluation(players))

-----------------  --------------  -----------------
-                  RandomPlayer 1  MaxDamagePlayer 1
RandomPlayer 1                     0.0
MaxDamagePlayer 1  1.0
-----------------  --------------  -----------------

🎮 Pokémon battle mechanics — and how we encode them for our agents#

Let’s now build a simple agent to do the same, as in, use the battle context to choose the optimal action.

Core mechanics (what the agent must reason about):

Turn-based actions: Each turn you either use a move or switch. Faster Pokémon usually act first; priority can override speed.
Types & STAB: Moves have types (e.g., Electric). Effectiveness depends on attacker vs defender types; using a move matching the user’s type grants STAB (bonus damage).
Accuracy & PP: Moves can miss (accuracy < 100) and have limited PP (uses).
HP & fainting: A Pokémon faints at 0 HP; win condition is faint all opponent Pokémon.
Information limits: You only know the opponent’s revealed Pokémon and partial info about their sets.
Switching & tempo: Switching preserves a weakened Pokémon, but concedes tempo (opponent gets a “free” hit).
Status/hazards/weather (omitted here for brevity): These exist in the simulator; we can add them later as fields.

🧱 Our context schema (how we feed the LLM the game state)#

We transform poke-env’s Battle into a typed, LLM-friendly snapshot:

TeamMon: one entry per Pokémon (both sides) with:
- species, fractional hp, fainted, and types.
MoveOption: one entry per legal move this turn with:
- move_id, base_power, accuracy, move_type, pp, priority.
SwitchOption: one entry per legal switch target with:
- species, hp, fainted, types.
AgentContext: the full decision frame the agent sees:
- turn: current turn number.
- you_active / opp_active: currently active Pokémon on both sides.
- you_team: your full team (known).
- opp_known: only revealed opponent Pokémon (respecting partial observability).
- legal_moves / legal_switches: the only actions you may take now.
- past_actions: a short episodic memory string list (e.g., summaries of last turns).

🛠️ How the code builds this context#

_pokemon_to_teammon(p) safely converts a poke-env Pokemon into our TeamMon schema (species, hp%, types).
In build_context(battle, past_actions):
- We iterate battle.available_moves to populate MoveOption (capturing damage proxies via base power, reliability via accuracy, tempo via priority, and resource via PP).
- We iterate battle.available_switches to populate SwitchOption (capturing survivability options).
- We map your full battle.team into you_team and the opponent’s revealed team into opp_known (partial info).
- We capture actives (you_active, opp_active) and the turn counter.
- We attach past_actions so the LLM can reason with short-term memory.
agent_context_to_string(ctx) serializes the AgentContext to pretty JSON, ideal for prompting an LLM agent.

Result: every decision step provides a compact, validated, and complete view of what matters now, aligning game mechanics with agent reasoning (damage, risk, tempo, information, and legal constraints).

from __future__ import annotations
from typing import List, Optional, Dict, Any, Literal
from dataclasses import dataclass

from pydantic import BaseModel, Field
from pydantic_ai import Agent

from poke_env.battle.battle import Battle, Pokemon

class TeamMon(BaseModel):
    species: str
    hp: Optional[float] = None
    fainted: bool = False
    types: List[str] = []
    boosts: Optional[Dict[str, int]] = None
    status: Optional[str] = None
    must_recharge: Optional[bool] = None

class MoveOption(BaseModel):
    move_id: str
    base_power: Optional[int] = None
    accuracy: Optional[float] = None
    move_type: Optional[str] = None
    pp: Optional[int] = None
    priority: int = 0

class SwitchOption(BaseModel):
    species: str
    hp: Optional[float] = None
    fainted: bool = False
    types: List[str] = []

class AgentContext(BaseModel):
    turn: int
    weather: Dict[str, Any]
    # You
    you_active: Optional[str]
    you_team: List[TeamMon]
    # Opponent
    opp_active: Optional[str]
    opp_known: List[TeamMon]
    # Legals
    legal_moves: List[MoveOption]
    legal_switches: List[SwitchOption]
    # Short episodic memory (last few actions / summaries)
    past_actions: List[str] = []

def _pokemon_to_teammon(p: Pokemon) -> TeamMon:
    return TeamMon(
        species=p.species,
        hp=p.current_hp_fraction,
        fainted=p.fainted,
        boosts=p.boosts,
        status=p.status,
        must_recharge=p.must_recharge,
        types=[t.name for t in p.types or []],
    )

def build_context(battle: Battle, past_actions: List[str]) -> AgentContext:
    # legal moves
    legal_moves: List[MoveOption] = []
    for m in battle.available_moves:
        legal_moves.append(MoveOption(
            move_id=m.id,
            base_power=m.base_power,
            accuracy=m.accuracy,
            move_type=m.type.name,
            pp=m.current_pp,
            priority=m.priority,
        ))

    # legal switches
    legal_switches: List[SwitchOption] = []
    for p in battle.available_switches:
        legal_switches.append(SwitchOption(
            species=p.species,
            hp=p.current_hp_fraction,
            fainted=p.fainted,
            types=[t.name for t in (p.types or [])],
        ))

    # teams
    your_team = [_pokemon_to_teammon(poke) for poke in battle.team.values()]
    opp_known = [_pokemon_to_teammon(poke) for poke in battle.opponent_team.values() if poke._revealed] # revealed only

    return AgentContext(
        turn=battle.turn,
        weather=battle.weather,
        you_active=battle.active_pokemon.species,
        you_team=your_team,
        opp_active=battle.opponent_active_pokemon.species,
        opp_known=opp_known,
        legal_moves=legal_moves,
        legal_switches=legal_switches,
        past_actions=past_actions, 
    )

def agent_context_to_string(ctx: AgentContext) -> str:
    return ctx.model_dump_json(indent=2)

🤖 A minimal “thinking player”#

Goal: turn the JSON context we built into a single legal action (move or switch) using a typed LLM agent, and keep a tiny episodic memory of what we did.

1) Structured output contract = Decision

We define a Pydantic schema that the LLM must fill:
- kind: "move" or "switch".
- move_id / switch_species: only one is required depending on kind.
- rationale: short explanation (useful for logs and later learning).
This keeps the model honest and makes post-processing trivial.

2) The LLM-powered player = PydanticLLMPlayer

Extends poke-env’s Player.
Sets up a Pydantic AI Agent (self.battle_agent) with:
- A system prompt encoding simple policy: prefer high-accuracy, super-effective moves; switch if danger is high or moves are poor; never invent illegal actions.
- output_type=Decision so the model must return a valid, typed object.

3) Decision loop = choose_move(...)

Build context ctx = build_context(battle, past_actions=self._past_actions) → serializes the current game state + short episodic memory.
Call the agent decision = self.battle_agent.run_sync(agent_context_to_string(ctx)).output → the LLM reads the JSON, returns a validated Decision.
Legality mapping
- If kind == "move", find the exact move_id in battle.available_moves and create_order(m).
- If kind == "switch", match switch_species in battle.available_switches and create_order(p).
- We append a human-readable summary to _past_actions for the next turn’s context.
Safety fallback If, for any reason, the decision isn’t legal (should be rare), we choose a random legal action so the game continues.

4) Why this works well

Typed outputs remove prompt-engineering brittleness (no regex parsing or guesswork).
Context → Decision → Action is clean, auditable, and easy to extend (plug in evaluators/critics later).
The episodic memory (_past_actions) gives the agent short-term continuity across turns without blowing up context size.

from rich import print as rprint
import nest_asyncio
import logfire

from poke_env import Player, RandomPlayer

logfire.configure(send_to_logfire=False) # set to true if you want to use logfire console
nest_asyncio.apply()

class Decision(BaseModel):
    kind: Literal["move", "switch"] = Field(description="Choose 'move' or 'switch'.")
    move_id: Optional[str] = Field(default=None, description="Required if kind == 'move'")
    switch_species: Optional[str] = Field(default=None, description="Required if kind == 'switch'")
    rationale: str

class PydanticLLMPlayer(Player):
    def __init__(self, name: str, model: str = "openrouter:openai/gpt-4o-mini", **kwargs):
        super().__init__(**kwargs)
        self.name = name
        self._past_actions: List[str] = []
        self.battle_agent = Agent(
            model=model,
            system_prompt=(
                "You are a Pokémon battle planner. "
                "Given the current battle context, choose ONE legal action. "
                "Prefer high-accuracy, super-effective moves; "
                "switch if active Pokémon risks being KO'd or has no good moves. "
                "Never invent illegal actions."
            ),
            output_type=Decision,
        )

    def choose_move(self, battle: Battle):
        # Build structured context for the agent
        ctx = build_context(battle, past_actions=self._past_actions)
        # Run agent to get decision
        decision = self.battle_agent.run_sync(agent_context_to_string(ctx)).output
        if battle.turn <= 1:
            rprint(f"CONTEXT:", ctx)
            rprint(f"DECISION:", decision)
        else:
            rprint(f"T{ctx.turn} DECISION:", decision.rationale)

        # Map Decision → poke-env action
        if decision.kind == "move":
            # find the matching legal move
            for m in battle.available_moves:
                if m.id == decision.move_id:
                    self._past_actions.append(
                        f"T{ctx.turn}: MOVE {decision.move_id} ({ctx.you_active} vs {ctx.opp_active})"
                    )
                    return self.create_order(m)
        elif decision.kind == "switch":
            # find the matching legal switch
            for p in battle.available_switches:
                if p.species == decision.switch_species:
                    self._past_actions.append(
                        f"T{ctx.turn}: SWITCH to {decision.switch_species} (from {ctx.you_active})"
                    )
                    return self.create_order(p)

        # Fallback: if agent suggested an illegal action (shouldn't happen), choose random
        self._past_actions.append(f"T{ctx.turn}: FALLBACK random")
        return self.choose_random_move(battle)

Logfire project URL: https://logfire-eu.pydantic.dev/shreshthtuli/agenticai

⚔️ Running our first Agentic battle#

Now that we’ve built our LLM-powered Pokémon agent, it’s time to see it in action! Here we instantiate the PydanticLLMPlayer and let it battle a RandomPlayer for a single match.

When the battle runs:

Each turn, the agentic_player builds a structured AgentContext (game state + short memory).
The LLM agent reasons over that context and outputs a typed Decision (move or switch).
The environment executes that decision, updates the game state, and loops until one side faints all opponents.

This quick match serves as a smoke test — verifying that our agent can read the environment, reason with context, and select legal actions correctly before we scale up to multi-agent graphs and tournaments.

agentic_player = PydanticLLMPlayer(name="LLM Agent")

await agentic_player.battle_against(RandomPlayer(), n_battles=1)

CONTEXT:
AgentContext(
    turn=1,
    weather={},
    you_active='clawitzer',
    you_team=[
        TeamMon(
            species='clawitzer',
            hp=1.0,
            fainted=False,
            types=['WATER'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='volcarona',
            hp=1.0,
            fainted=False,
            types=['BUG', 'FIRE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='delphox',
            hp=1.0,
            fainted=False,
            types=['FIRE', 'PSYCHIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='golduck',
            hp=1.0,
            fainted=False,
            types=['WATER'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='raichualola',
            hp=1.0,
            fainted=False,
            types=['ELECTRIC', 'PSYCHIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='sneasler',
            hp=1.0,
            fainted=False,
            types=['FIGHTING', 'POISON'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    opp_active='toedscruel',
    opp_known=[
        TeamMon(
            species='toedscruel',
            hp=1.0,
            fainted=False,
            types=['GROUND', 'GRASS'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    legal_moves=[
        MoveOption(move_id='aurasphere', base_power=80, accuracy=1.0, move_type='FIGHTING', pp=32, priority=0),
        MoveOption(move_id='uturn', base_power=70, accuracy=1.0, move_type='BUG', pp=32, priority=0),
        MoveOption(move_id='dragonpulse', base_power=85, accuracy=1.0, move_type='DRAGON', pp=16, priority=0),
        MoveOption(move_id='waterpulse', base_power=60, accuracy=1.0, move_type='WATER', pp=32, priority=0)
    ],
    legal_switches=[
        SwitchOption(species='volcarona', hp=1.0, fainted=False, types=['BUG', 'FIRE']),
        SwitchOption(species='delphox', hp=1.0, fainted=False, types=['FIRE', 'PSYCHIC']),
        SwitchOption(species='golduck', hp=1.0, fainted=False, types=['WATER']),
        SwitchOption(species='raichualola', hp=1.0, fainted=False, types=['ELECTRIC', 'PSYCHIC']),
        SwitchOption(species='sneasler', hp=1.0, fainted=False, types=['FIGHTING', 'POISON'])
    ],
    past_actions=[]
)

DECISION:
Decision(
    kind='move',
    move_id='aurasphere',
    switch_species=None,
    rationale='Aurasphere is a Fighting-type move and is super-effective against Toedscruel, which is part 
Ground-type. It has 100% accuracy, making it a reliable choice to potentially knock out the opponent.'
)

T2 DECISION: Aurasphere is a high-accuracy Fighting-type move that is super effective against Toedscruel's 
Water-type, making it the best choice to maximize damage.

T3 DECISION: Aurasphere is a high-accuracy move that is super-effective against Duraludon, which is currently at 
low health (15% HP). This move can potentially knock it out.

T4 DECISION: Clawitzer has used 'aurasphere' for the last three turns against Water-type opponents and its health 
is reduced. Switching to Volcarona, which has full HP and is not at risk of being KO'd, allows for a fresh 
offensive strategy next turn.

T5 DECISION: Using Fire Blast as it is a high-power move and super-effective against the opposing Water/Dark type 
Samurott, making it an optimal attack choice.

T6 DECISION: Fire Blast is a high-powered and super-effective move against Toedscruel, which is a Water type 
Pokémon. It has good accuracy (85%), and with Volcarona's remaining HP being full, it can afford to make this 
attack.

T7 DECISION: Fire Blast is super-effective against Samurott Hisuian, which is a dual Water/Dark type Pokémon. With 
Volcarona at full HP (1.0), using Fire Blast takes advantage of its high power and accuracy, which is beneficial 
given the circumstances.

T8 DECISION: Fire Blast is a high-accuracy move that is super effective against Samurott-Hisui, which is currently 
low on HP. This move can potentially knock it out and maximize the chances to win this turn.

T9 DECISION: Using Fire Blast is a high-accuracy, high-damage (110 base power) move against Okidogi, which is weak 
to Fire-type attacks, making it super effective.

T10 DECISION: Volcarona is at very low health (3.5%), making it highly vulnerable to being knocked out. Switching 
to Delphox, which is at full health (100%), will provide a safer option for the battle.

T11 DECISION: Psyshock is a high-accuracy move that is super-effective against Cramorant's Psychic-type weakness. 
It deals 80 base power damage and has a 100% chance to hit, making it the best option.

T12 DECISION: Fire Blast is a high-damage Fire-type move, which is super-effective against Cramorant's Grass-type 
aspect, maximizing my damage potential in this turn.

T13 DECISION: Psyshock is a high-accuracy (100%) Psychic type move that is super-effective against Okidogi 
(Poison/Fighting). This move will deal substantial damage.

T14 DECISION: Psyshock is a high-accuracy move and is super effective against Glaceon's Ice type, making it the 
best option to deal significant damage.

T15 DECISION: Fire Blast is a high-power, high-accuracy move that is still viable. While Psyshock is super 
effective against Glaceon because it is an Ice type, Fire Blast also deals significant damage and has a chance to 
burn, offering additional tactical advantages.

T16 DECISION: Psyshock is a high-accuracy Psychic-type move and is super-effective against Toedscruel, which is a 
Water-type Pokémon. Using it can potentially KO Toedscruel, allowing for a better position in the battle.

T17 DECISION: Psyshock is a high-accuracy move that is super effective against Toedscruel's Psychic typing, and it 
can potentially knock it out given its low health.

T18 DECISION: Psyshock is a high-accuracy Psychic move that is super effective against Cramorant, which is weak to 
Psychic-type moves. Delphox also has a good chance of knocking out Cramorant given its current HP.

T19 DECISION: Psyshock is a high-accuracy and super-effective move against Cramorant, which has low HP. It's likely
to knock it out and secure a significant advantage in the battle.

T20 DECISION: Psyshock is a high-accuracy move (100%) with a good base power (80) that can deal effective damage to
Okidogi, which does not resist Psychic-type moves.

🕸️ Introducing Pydantic Graphs — the foundation for structured multi-agent workflows#

So far, our agent acted as a single decision-maker: it observed context, reasoned once, and returned a move. But as environments grow in complexity — multiple objectives, conflicting strategies, limited time — we need many specialized agents working together.

That’s where 🧩 Pydantic Graphs come in.

⚙️ What are Pydantic Graphs?#

Pydantic Graphs extend the idea of typed LLM workflows: instead of chaining prompts manually, you define a graph of agents — each node is a typed, callable component (Agent, Tool, or function), and edges represent how their structured outputs flow into each other.

Each node’s input/output types are enforced by Pydantic models, guaranteeing that ✅ every agent receives valid structured data, ✅ workflows are composable, debuggable, and inspectable, ✅ and parallel/conditional execution (“run these 3 evaluators in parallel”) becomes trivial.

🤝 Why multi-agent workflows?#

Real decision problems rarely have one “best” heuristic — they’re multi-objective:

Tactical reward vs safety (damage vs survivability)
Short-term payoff vs long-term setup
Exploration vs exploitation

Multi-agent graphs let you distribute cognition:

Each node/agent handles a sub-skill (planner, tactician, risk, scout).
Coordination logic (e.g., a manager or arbiter) fuses their reasoning.
Memory and arbitration layers can be swapped independently (for ablations).

This architecture naturally scales to agentic swarms — large ensembles of specialized agents that coordinate dynamically, forming emergent intelligence beyond a single model’s scope.

🔀 Static vs Dynamic Query Routing#

In our earlier “Manager” agent, the routing (which specialists to call) was static — we hard-coded: “always call Tactician, call Risk if danger > 0.6, call Scout every 3 turns”.

Dynamic routing, enabled by Pydantic Graphs, makes this adaptive:

Each agent’s outputs (or intermediate metadata like uncertainty, cost, or confidence) can dynamically decide the next edges to traverse.
If the planner returns low-confidence moves, the graph might automatically trigger the Risk Officer or Critic path.
If confidence is high, it can skip extra steps to save latency or tokens.

🧩 Benefit: Resource-aware, self-adapting workflows that scale gracefully — the system “thinks harder” only when needed.

✏️ Query Rewriting#

Another advanced feature is query rewriting — when incoming queries or contexts are transformed before being passed to downstream agents. In Pokémon terms, before the planner decides, a context rewriter might:

Simplify redundant details (“ignore irrelevant side conditions”), or
Add derived features (“this move is likely super-effective against Water”).

This lets different specialists receive domain-specific representations of the same state, improving efficiency and interpretability.

🚀 Why it matters

Together, dynamic routing and query rewriting turn a static, hand-crafted pipeline into a living cognitive graph:

💡 Adaptive: reasoning depth scales with uncertainty or stakes.
🧠 Modular: new skills or evaluators can be plugged in as new nodes.
⚖️ Efficient: token and time budgets are managed intelligently.
🔍 Transparent: every decision path and intermediate output is traceable.

By using Pydantic Graphs, we can finally move from “prompt chains” to structured, interpretable agentic systems — the same architectural leap that turns simple agents into full-fledged, cooperative AI swarms.

from pydantic_graph import BaseNode, End, Graph, GraphRunContext

class PlanCandidate(BaseModel):
    kind: Literal["move", "switch"]
    move_id: Optional[str] = None
    switch_species: Optional[str] = None
    rationale: str

class Plan(BaseModel):
    candidates: List[PlanCandidate]

class EvalScore(BaseModel):
    score: float
    notes: Optional[str] = None

class GraphState(BaseModel):
    context: AgentContext
    plan: Optional[Plan] = None 
    tactician_scores: Optional[List[EvalScore]] = None
    risk_scores: Optional[List[EvalScore]] = None
    scout_scores: Optional[List[EvalScore]] = None
    final_decision: Optional[PlanCandidate] = None

🧭 The Manager–Coordinator Paradigm in Agentic Swarms#

Let’s now implement our first multi-agent workflow: managerial multi-agent coordination pattern through a structured Pydantic Graph — where each node acts as a specialized agent, and together they form a coordinated decision-making swarm.

🕸️ The Idea: Manager + Specialists = Smarter Decisions#

Instead of relying on a single monolithic model, this setup distributes reasoning across multiple specialized roles — just like a management hierarchy in human organizations:

PlannerNode (Coordinator) 🧠 → proposes candidate actions (moves/switches).
TacticianNode ⚔️ → evaluates each candidate for expected value (damage, tempo).
RiskNode 🛡️ → evaluates safety and survivability.
ScoutNode 🔍 → evaluates information gain (learning about opponent’s hidden Pokémon).
DecisionNode (Manager) 🧩 → aggregates all scores and makes the final move selection.

Each node operates independently but shares a common state (GraphState) that persists through the workflow — this gives the system continuity, explainability, and structured memory across reasoning steps.

⚙️ How Pydantic Graphs Enable This#

Pydantic Graphs make this explicitly declarative:

Each node inherits from BaseNode and defines an async run() method that updates a shared GraphState.
Nodes specify their next node — e.g., PlannerNode → TacticianNode → RiskNode → ScoutNode → DecisionNode → End.
The Graph object (planner_graph) defines the entire workflow and its state type (GraphState), ensuring all data between nodes remains valid and typed.
The graph runtime (GraphRunContext) automatically handles execution order, state persistence, and error handling/retries.

This means the graph acts as an orchestration layer over multiple LLMs — a mini “swarm intelligence” system where reasoning flows like information through an organization chart.

🧩 Why the Manager-Coordinator Model Matters#

Decomposition of reasoning: Each agent focuses on a narrow cognitive skill — simplifying prompts, improving interpretability, and reducing hallucinations.
Parallelism and composability: Multiple evaluators can be executed concurrently, and new agents (e.g., “Healer Advisor”, “Weather Analyst”) can be plugged in without refactoring the graph.
Explainability: Every step is transparent — you can inspect the planner’s candidates, each specialist’s scores, and the rationale behind the final decision.
Dynamic scalability: The manager can later evolve to dynamic routing, consulting only relevant specialists based on battle context or uncertainty — enabling true adaptive swarms.

@dataclass
class PlannerNode(BaseNode):
    planner_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You propose 2-4 legal actions for the given Pokémon battle context. "
            "Prefer super-effective, high-accuracy moves; consider switches if HP is low or risk is high. "
            "Do NOT invent illegal actions."
        ),
        output_type=Plan,
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> TacticianNode:
        state = context.state
        plans = (await self.planner_agent.run(agent_context_to_string(state.context))).output
        state.plan = plans

        return TacticianNode()
    

@dataclass
class TacticianNode(BaseNode):
    tactician_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon battle tactician. "
            "Score each candidate 0..1 for expected value (damage + board advantage)."
        ),
        output_type=List[EvalScore],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> RiskNode:
        state = context.state
        assert state.plan is not None, "Plan must be set before TacticianNode runs."
        prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
        scores = (await self.tactician_agent.run(prompt)).output
        state.tactician_scores = scores

        return RiskNode()
    
@dataclass
class RiskNode(BaseNode):
    risk_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon battle risk assessor. "
            "Score each candidate 0..1 for risk (chance of failure, negative outcomes)."
        ),
        output_type=List[EvalScore],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> ScoutNode:
        state = context.state
        assert state.plan is not None, "Plan must be set before RiskNode runs."
        prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
        scores = (await self.risk_agent.run(prompt)).output
        state.risk_scores = scores

        return ScoutNode()
    
@dataclass
class ScoutNode(BaseNode):
    scout_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon battle scout. "
            "Score each candidate 0..1 for information gain (revealing opponent's unknowns)."
        ),
        output_type=List[EvalScore],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> DecisionNode:
        state = context.state
        assert state.plan is not None, "Plan must be set before ScoutNode runs."
        prompt = agent_context_to_string(state.context) + "\n\n" + state.plan.model_dump_json(indent=2)
        scores = (await self.scout_agent.run(prompt)).output
        state.scout_scores = scores

        return DecisionNode()
    
@dataclass
class DecisionNode(BaseNode):
    decision_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon battle decision maker. "
            "Using the provided scores from tactician, risk, and scout, "
            "select the best candidate action to take."
        ),
        output_type=PlanCandidate,
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> End:
        state = context.state
        assert state.tactician_scores is not None, "Tactician scores must be set before DecisionNode runs."
        assert state.risk_scores is not None, "Risk scores must be set before DecisionNode runs."
        assert state.scout_scores is not None, "Scout scores must be set before DecisionNode runs."

        prompt = agent_context_to_string(state.context) + "\n\n"
        prompt += "Planned Candidates:\n" + state.plan.model_dump_json(indent=2) + "\n\n"
        prompt += "Tactician Scores:\n" + str([es.model_dump_json(indent=2) for es in state.tactician_scores]) + "\n\n"
        prompt += "Risk Scores:\n" + str([es.model_dump_json(indent=2) for es in state.risk_scores]) + "\n\n"
        prompt += "Scout Scores:\n" + str([es.model_dump_json(indent=2) for es in state.scout_scores]) + "\n\n"

        decision = (await self.decision_agent.run(prompt)).output
        state.final_decision = decision

        return End(state.final_decision)
    
planner_graph = Graph(nodes=[PlannerNode, TacticianNode, RiskNode, ScoutNode, DecisionNode], state_type=GraphState)  

example_agent_context = AgentContext(
    turn=1,
    weather={},
    you_active="Pikachu",
    you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
    opp_active="Bulbasaur",
    opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
    legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
    legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
    past_actions=[],
)

result = await planner_graph.run(PlannerNode(), state=GraphState(context=example_agent_context))
rprint(result.state.final_decision)

16:39.013 run graph planner_graph
16:39.014   run node PlannerNode
17:22.155   run node TacticianNode
17:43.325   run node RiskNode
18:17.161   run node ScoutNode
18:53.984   run node DecisionNode

PlanCandidate(
    kind='move',
    move_id='Thunderbolt',
    switch_species=None,
    rationale='Choose Thunderbolt: 100% accuracy and high power; Pikachu has +1 Atk while Bulbasaur is -1 Def and 
paralyzed, making an immediate KO extremely likely. Switching concedes a free turn and forfeits the probable 
instant elimination—attacking now maximizes EV and minimizes risk.'
)

Let’s see it in action!

from rich import print as rprint
import nest_asyncio
import logfire
import asyncio

logfire.configure(send_to_logfire=False) # set to true if you want to use logfire console
nest_asyncio.apply()

class PydanticGraphAgent(Player):
    def __init__(self, name: str, agentic_graph: Graph, first_node: BaseNode, **kwargs):
        super().__init__(**kwargs)
        self.name = name
        self._past_actions: List[str] = []
        self.graph = agentic_graph
        self.first_node = first_node

    def choose_move(self, battle: Battle):
        # Build structured context for the agent
        ctx = build_context(battle, past_actions=self._past_actions)
        # Run agent to get decision
        result = asyncio.run(self.graph.run(self.first_node, state=GraphState(context=ctx)))
        decision = result.state.final_decision
        if battle.turn <= 1:
            rprint(f"CONTEXT:", ctx)
            rprint(f"DECISION:", decision)
        else:
            rprint(f"T{ctx.turn} DECISION:", decision.rationale)

        # Map Decision → poke-env action
        if decision.kind == "move":
            # find the matching legal move
            for m in battle.available_moves:
                if m.id == decision.move_id:
                    self._past_actions.append(
                        f"T{ctx.turn}: MOVE {decision.move_id} ({ctx.you_active} vs {ctx.opp_active})"
                    )
                    return self.create_order(m)
        elif decision.kind == "switch":
            # find the matching legal switch
            for p in battle.available_switches:
                if p.species == decision.switch_species:
                    self._past_actions.append(
                        f"T{ctx.turn}: SWITCH to {decision.switch_species} (from {ctx.you_active})"
                    )
                    return self.create_order(p)

        # Fallback: if agent suggested an illegal action (shouldn't happen), choose random
        self._past_actions.append(f"T{ctx.turn}: FALLBACK random")
        return self.choose_random_move(battle)

Logfire project URL: https://logfire-eu.pydantic.dev/shreshthtuli/agenticai

coordination_graph = Graph(nodes=[PlannerNode, TacticianNode, RiskNode, ScoutNode, DecisionNode], state_type=GraphState)  

coordination_player = PydanticGraphAgent(name="LLM Agent", agentic_graph=coordination_graph, first_node=PlannerNode())

await coordination_player.battle_against(RandomPlayer(), n_battles=1)

31:19.383 run graph None
31:19.386   run node PlannerNode
31:47.861   run node TacticianNode
32:17.785   run node RiskNode
32:49.956   run node ScoutNode
33:20.686   run node DecisionNode

CONTEXT:
AgentContext(
    turn=1,
    you_active='hitmontop',
    you_team=[
        TeamMon(species='hitmontop', hp=1.0, fainted=False, types=['FIGHTING']),
        TeamMon(species='palafin', hp=1.0, fainted=False, types=['WATER']),
        TeamMon(species='brambleghast', hp=1.0, fainted=False, types=['GRASS', 'GHOST']),
        TeamMon(species='keldeoresolute', hp=1.0, fainted=False, types=['WATER', 'FIGHTING']),
        TeamMon(species='goodra', hp=1.0, fainted=False, types=['DRAGON']),
        TeamMon(species='pyroar', hp=1.0, fainted=False, types=['FIRE', 'NORMAL'])
    ],
    opp_active='phione',
    opp_known=[TeamMon(species='phione', hp=1.0, fainted=False, types=['WATER'])],
    legal_moves=[
        MoveOption(move_id='rapidspin', base_power=50, accuracy=1.0, move_type='NORMAL', pp=64, priority=0),
        MoveOption(move_id='stoneedge', base_power=100, accuracy=0.8, move_type='ROCK', pp=8, priority=0),
        MoveOption(move_id='suckerpunch', base_power=70, accuracy=1.0, move_type='DARK', pp=8, priority=1),
        MoveOption(move_id='closecombat', base_power=120, accuracy=1.0, move_type='FIGHTING', pp=8, priority=0)
    ],
    legal_switches=[
        SwitchOption(species='palafin', hp=1.0, fainted=False, types=['WATER']),
        SwitchOption(species='brambleghast', hp=1.0, fainted=False, types=['GRASS', 'GHOST']),
        SwitchOption(species='keldeoresolute', hp=1.0, fainted=False, types=['WATER', 'FIGHTING']),
        SwitchOption(species='goodra', hp=1.0, fainted=False, types=['DRAGON']),
        SwitchOption(species='pyroar', hp=1.0, fainted=False, types=['FIRE', 'NORMAL'])
    ],
    past_actions=[]
)

DECISION:
PlanCandidate(
    kind='move',
    move_id='suckerpunch',
    switch_species=None,
    rationale="Choose Sucker Punch: highest combined tactical and scouting value. Priority (+1) lets you beat 
Phione if it attacks, preserves board state vs being KO'd, and provides strong information about whether Phione 
intended to attack. The moderate risk of failing vs Protect/status/switch is acceptable given the upside on turn 
1."
)

33:37.220 run graph None
33:37.221   run node PlannerNode
34:16.439   run node TacticianNode
34:56.803   run node RiskNode
36:16.135   run node ScoutNode
37:08.924   run node DecisionNode
37:30.422 run graph None
37:30.422   run node PlannerNode
37:59.223   run node TacticianNode
38:48.199   run node RiskNode
39:21.331   run node ScoutNode
39:50.271   run node DecisionNode
40:11.907 run graph None
40:11.907   run node PlannerNode
40:56.972   run node TacticianNode
41:35.116   run node RiskNode
42:19.953   run node ScoutNode
42:57.951   run node DecisionNode
43:12.405 run graph None
43:12.406   run node PlannerNode
44:02.596   run node TacticianNode
44:43.567   run node RiskNode
45:19.134   run node ScoutNode
45:56.038   run node DecisionNode
46:04.602 run graph None
46:04.603   run node PlannerNode
46:35.056   run node TacticianNode
47:13.838   run node RiskNode
47:48.703   run node ScoutNode
48:29.847   run node DecisionNode
48:43.353 run graph None
48:43.355   run node PlannerNode
49:22.736   run node TacticianNode
50:04.054   run node RiskNode
50:37.960   run node ScoutNode
51:21.165   run node DecisionNode
51:47.109 run graph None
51:47.110   run node PlannerNode
52:17.801   run node TacticianNode
52:51.469   run node RiskNode
53:31.798   run node ScoutNode
54:02.783   run node DecisionNode
54:13.308 run graph None
54:13.309   run node PlannerNode
54:44.880   run node TacticianNode
55:27.497   run node RiskNode
55:57.522   run node ScoutNode
56:33.739   run node DecisionNode
56:47.403 run graph None
56:47.404   run node PlannerNode
57:43.008   run node TacticianNode
58:20.860   run node RiskNode
59:16.188   run node ScoutNode
00:01.194   run node DecisionNode
00:12.452 run graph None
00:12.453   run node PlannerNode
03:32.602   run node TacticianNode
03:56.897   run node RiskNode
04:27.360   run node ScoutNode
05:23.861   run node DecisionNode
05:39.230 run graph None
05:39.231   run node PlannerNode
06:30.547   run node TacticianNode
06:55.893   run node RiskNode
07:25.681   run node ScoutNode
08:14.485   run node DecisionNode
08:24.013 run graph None
08:24.014   run node PlannerNode
09:01.067   run node TacticianNode
09:32.810   run node RiskNode
10:08.895   run node ScoutNode

Result:

You can also view the full battle here.

🗳️ Democratic multi-agent swarms#

Now let’s move onto the next agentic swarm model: democratic orchestration. Herein,

a Planner or multiple planners, propose(s) several legal candidates (moves/switches),
multiple independent voters each judge every candidate from a different lens, and
a Tally node picks the action with the most YES votes.

Nodes & roles

PlannerNode → produces 3–4 legal, diverse candidates for the current battle context.
Voters (parallelizable):
- AccuracyVoterNode – prefer high-reliability actions (≥90% accuracy or safe switch).
- TypeMatchupVoterNode – reward good type effectiveness or improved matchup after switch.
- TempoVoterNode – prefer momentum (threaten KO, force a switch, safe setup).
- PPVoterNode – favor conserving scarce PP/resources.
- DiversityVoterNode – encourage non-redundant options (coverage/status/switch variety).
TallyNode → sums 0/1 votes per candidate and returns the majority winner (ties break by first max) or combines multiple plans, with their critiques/rationales to the best plan.

Each voter returns a list of 0/1 (YES/NO) aligned with plan.candidates, keeping the interface simple and debuggable. A simpler version of this demoncratic debate idea has also been shown by Andrej Karpathy’s LLM Council.

🧠 When to use Democratic swarms vs Manager–Coordinator#

Use Democratic when:

You want robustness via diversity: many simple judges smooth out any one agent’s bias.
The task benefits from ensemble wisdom and parallel scoring of options.
You need transparent preference profiles (“why did we pick this?” → look at voter tallies).
Latency budget allows fan-out to several voters.

Use Manager–Coordinator when:

You need budget-aware routing (call specialists only when danger/uncertainty is high).
The task has a clear decision funnel (plan → specific specialists → decision).
You want conditional depth (think harder only when needed) for tighter SLAs.
You prefer a single final authority aggregating nuanced scores/metrics.

Rule of thumb:

Exploration, variety, early prototyping → start with Democracy.
Production with SLAs, cost constraints → move to Manager/Coordinator (dynamic routing, early exit).

🧬 Mixed-model ensembles (per-agent LLMs)#

Each node can use a different LLM (as shown: OpenAI, Anthropic, Google, xAI, Qwen) to specialize strengths:

Models with longer context or stronger reasoning can power the Planner or Type voter.
Faster/cheaper models can handle Accuracy/PP voters at scale.
Mixing providers reduces correlated failure modes and improves ensemble reliability.

Benefit: You get a portfolio effect—diverse models + diverse criteria → more stable decisions under uncertainty.

🧾 Why this pattern is nice to teach & extend

Simple contract: voters return [0/1, …]; the tally is trivial to audit.
Parallel-friendly: voters can run concurrently for low wall-time.
Composable: add/remove voters without touching the rest of the graph.
Explainable: log plan.candidates + each voter’s vector to visualize support per option.

Next steps: try replacing 0/1 votes with ranked ballots (Borda/Condorcet), or add confidence-weighted voting to blend democratic and managerial ideas.

from __future__ import annotations
from typing import List, Optional, Literal, Dict
from dataclasses import dataclass

from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_graph import BaseNode, End, Graph, GraphRunContext
from rich import print as rprint

class PlanCandidate(BaseModel):
    kind: Literal["move", "switch"]
    move_id: Optional[str] = None
    switch_species: Optional[str] = None
    rationale: str

class Plan(BaseModel):
    candidates: List[PlanCandidate]

class GraphState(BaseModel):
    context: AgentContext
    plan: Optional[Plan] = None
    accuracy_votes: Optional[List[int]] = None
    type_votes: Optional[List[int]] = None
    tempo_votes: Optional[List[int]] = None
    pp_votes: Optional[List[int]] = None
    diversity_votes: Optional[List[int]] = None
    final_decision: Optional[PlanCandidate] = None

@dataclass
class PlannerNode(BaseNode):
    planner_agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "You are a Pokémon move planner. From the given context, propose 3-4 LEGAL actions "
            "(moves or switches). Prefer super-effective, high-accuracy moves; include at least "
            "one safe SWITCH if current matchup looks poor. Do NOT invent illegal actions."
        ),
        output_type=Plan,
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "AccuracyVoterNode":
        state = context.state
        plan = (await self.planner_agent.run(agent_context_to_string(state.context))).output
        if len(plan.candidates) < 2:
            plan.candidates = plan.candidates * 2
            plan.candidates[1].rationale = "Fallback duplicate to enable voting."
        state.plan = plan
        return AccuracyVoterNode()


@dataclass
class AccuracyVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:openai/gpt-5-mini",
        system_prompt=(
            "Accuracy Voter: For each candidate, vote 1 if the action is high-reliability "
            "(move accuracy >= 90% or a SWITCH that avoids a likely miss/KO), else 0. "
            "Return a Python list of 0/1 of the same length as candidates, no extra text."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "TypeMatchupVoterNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.accuracy_votes = votes
        return TypeMatchupVoterNode()


@dataclass
class TypeMatchupVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:anthropic/claude-sonnet-4.5",
        system_prompt=(
            "Type Matchup Voter: For each candidate, vote 1 if the MOVE is likely super-effective "
            "or at least neutral (avoid not-very-effective/immunity), or if a SWITCH improves the type matchup; "
            "otherwise 0. Return a Python list of 0/1, same length as candidates."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "TempoVoterNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.type_votes = votes
        return TempoVoterNode()


@dataclass
class TempoVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:google/gemini-2.5-flash",
        system_prompt=(
            "Tempo Voter: Vote 1 for candidates that are likely to seize or keep momentum this turn "
            "(e.g., fast KO, force a switch, gain setup safely); otherwise 0. "
            "Return a Python list of 0/1 aligned with candidates."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "PPVoterNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.tempo_votes = votes
        return PPVoterNode()


@dataclass
class PPVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:x-ai/grok-4-fast",
        system_prompt=(
            "PP Conservation Voter: Prefer conserving scarce PP; vote 1 when the candidate either "
            "uses a common PP move for chip damage or SWITCHES to preserve a key low-PP move; else 0. "
            "Return a Python list of 0/1 aligned with candidates."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "DiversityVoterNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.pp_votes = votes
        return DiversityVoterNode()


@dataclass
class DiversityVoterNode(BaseNode):
    agent = Agent(
        model="openrouter:qwen/qwen3-next-80b-a3b-thinking",
        system_prompt=(
            "Diversity Voter: Encourage non-redundant options. Vote 1 when the candidate adds "
            "coverage not present in other candidates this turn (e.g., different target, status vs raw damage, "
            "or SWITCH to change matchup); else 0. Return a Python list of 0/1 aligned with candidates."
        ),
        output_type=List[int],
        retries=3,
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "TallyNode":
        state = context.state
        assert state.plan is not None
        prompt = (
            agent_context_to_string(state.context)
            + "\n\nCANDIDATES:\n"
            + state.plan.model_dump_json(indent=2)
        )
        votes = (await self.agent.run(prompt)).output
        state.diversity_votes = votes
        return TallyNode()


@dataclass
class TallyNode(BaseNode):
    """Aggregate votes across voters and pick the candidate with the most YES votes."""
    async def run(self, context: GraphRunContext[GraphState]) -> End:
        s = context.state
        assert s.plan is not None

        buckets: List[List[int]] = []
        for name in ["accuracy_votes", "type_votes", "tempo_votes", "pp_votes", "diversity_votes"]:
            votes = getattr(s, name)
            if votes is not None:
                buckets.append(votes)

        n_candidates = len(s.plan.candidates)
        totals = [0] * n_candidates
        for bucket in buckets:
            if len(bucket) != n_candidates:
                # Defensive: truncate/pad to align
                bucket = (bucket + [0] * n_candidates)[:n_candidates]
            for i, v in enumerate(bucket):
                totals[i] += int(v)

        # Majority pick (max votes); deterministic tiebreak = first max
        best_idx = max(range(n_candidates), key=lambda i: totals[i])
        s.final_decision = s.plan.candidates[best_idx]

        # Optional: print a quick audit
        rprint({"totals": totals, "chosen_index": best_idx, "chosen": s.final_decision})

        return End(s.final_decision)

democracy_graph = Graph(
    nodes=[PlannerNode, AccuracyVoterNode, TypeMatchupVoterNode, TempoVoterNode, PPVoterNode, DiversityVoterNode, TallyNode],
    state_type=GraphState,
)

example_agent_context = AgentContext(
    turn=1,
    weather={},
    you_active="Pikachu",
    you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
    opp_active="Bulbasaur",
    opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
    legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
    legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
    past_actions=[],
)

result = await democracy_graph.run(PlannerNode(), state=GraphState(context=example_agent_context))
rprint("Final Democratic Decision:", result.state.final_decision)

20:42.677 run graph democracy_graph
20:42.677   run node PlannerNode
21:17.224   run node AccuracyVoterNode
21:44.775   run node TypeMatchupVoterNode
21:48.946   run node TempoVoterNode
21:50.458   run node PPVoterNode
21:59.919   run node DiversityVoterNode
24:35.191   run node TallyNode

{
    'totals': [3, 3, 2],
    'chosen_index': 0,
    'chosen': PlanCandidate(
        kind='move',
        move_id='Thunderbolt',
        switch_species=None,
        rationale='Hit now with Thunderbolt — 100% accuracy and guaranteed damage this turn. Although Electric is 
not very effective vs Grass/Poison Bulbasaur, this is your only immediate attacking option and it may chip or 
finish a weakened/paralyzed foe.'
    )
}

Final Democratic Decision:
PlanCandidate(
    kind='move',
    move_id='Thunderbolt',
    switch_species=None,
    rationale='Hit now with Thunderbolt — 100% accuracy and guaranteed damage this turn. Although Electric is not 
very effective vs Grass/Poison Bulbasaur, this is your only immediate attacking option and it may chip or finish a 
weakened/paralyzed foe.'
)

🎭 Actor–Critic Multi-Agent Workflow#

Our final agentic swarm model is the actor–critic style workflow, again implemented using pydantic-graph. This model is inspired by reinforcement learning but adapted for multi-LLM reasoning. Here, we explicitly separate proposal and evaluation, allowing the agent swarm to reason iteratively about value and risk before acting.

🧩 Roles and flow#

The graph proceeds through four main nodes:

ActorNode – The policy generator
- Proposes 3–4 legal moves or switches based on the current Pokémon context.
- Behaves like a policy network, outputting candidate actions with rationales.
CriticNode – The value estimator
- Evaluates each candidate with a Q-value (expected outcome), risk, and confidence score.
- This acts as a “value network” estimating how good each candidate really is.
ImproveNode – The policy improver (optional)
- If the best candidate is too risky or has low Q-value, the improver agent asks the actor to refine its plan.
- The critic is then re-invoked to rescore the improved candidates.
- This mimics the actor–critic policy improvement loop in RL.
SelectorNode – The final decision layer
- Combines critic outputs into an adjusted score: \(\text{AdjustedScore} = Q \times (1 - \lambda \cdot \text{risk}) \times (0.5 + 0.5 \times \text{confidence})\)
- Picks the candidate with the highest adjusted value and terminates the graph with an End node.

🧠 What makes this pattern powerful#

Iterative refinement – Unlike the manager-coordinator (hierarchical) or democratic (ensemble) designs, the actor–critic loop learns from its own evaluation.
Value-based reasoning – The critic explicitly quantifies the expected reward of each move, enabling long-term strategic play rather than greedy local choices.
Adaptive depth – The ImproveNode only triggers refinement when quality or safety drops, giving us dynamic compute allocation.
Interpretability – Q-values, risk, and confidence are visible for each decision, so you can trace why the agent preferred one move over another.

⚙️ Comparing paradigms#

Workflow Type	Nature	Example Use	Pros	Trade-offs
Manager–Coordinator	Hierarchical	Strategic planning under constraints	Modular, dynamic routing	Slight overhead for routing logic
Democratic	Ensemble	Collective judgment / robustness	High diversity, fault tolerance	Higher latency, no feedback loop
Actor–Critic	Iterative feedback	Adaptive value-based control	Learns/refines actions, interpretable	Slightly more compute per turn

🚀 Why it fits Pokémon and beyond#

Battles require balancing expected gain vs. survivability, just like value-based RL tasks.
The critic captures contextual trade-offs (damage, tempo, risk), while the actor continuously learns what kinds of proposals score best.
This same structure can generalize to decision-making agents in finance, robotics, or multi-stage planning — anywhere feedback-driven refinement is useful.

from __future__ import annotations
from typing import List, Optional, Literal
from dataclasses import dataclass

from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pydantic_graph import BaseNode, End, Graph, GraphRunContext
from rich import print as rprint

class CriticScore(BaseModel):
    index: int = Field(description="Index into Plan.candidates[]")
    q_value: float = Field(ge=0.0, description="Estimated value; higher is better")
    risk: float = Field(ge=0.0, le=1.0, description="0 safe, 1 very risky")
    confidence: float = Field(ge=0.0, le=1.0, description="Critic confidence in this score")
    notes: Optional[str] = None

class GraphState(BaseModel):
    context: AgentContext
    plan: Optional[Plan] = None
    critic_scores: Optional[List[CriticScore]] = None
    refined: bool = False
    final_decision: Optional[PlanCandidate] = None

@dataclass
class ActorNode(BaseNode):
    actor = Agent(
        model="openrouter:google/gemini-2.5-pro",
        system_prompt=(
            "ACTOR: Propose 3-4 LEGAL actions (moves or switches) for the current Pokémon context.\n"
            "Favor super-effective, high-accuracy moves; include a safe SWITCH if matchup is bad.\n"
            "Do NOT invent illegal actions. Keep rationales concise."
        ),
        output_type=Plan,
        retries=3
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "CriticNode":
        s = context.state
        plan = (await self.actor.run(agent_context_to_string(s.context))).output
        s.plan = plan
        return CriticNode()


@dataclass
class CriticNode(BaseNode):
    critic = Agent(
        model="openrouter:anthropic/claude-sonnet-4.5",
        system_prompt=(
            "CRITIC: For each candidate, estimate a Q-value in [0, +inf) capturing expected outcome "
            "(damage, survival, tempo) for THIS TURN and near future. Also output risk in [0,1] "
            "(0=safe,1=dangerous) and confidence in [0,1]. Keep notes brief. "
            "Return a list aligned with candidates using fields: index, q_value, risk, confidence, notes."
        ),
        output_type=List[CriticScore],
        retries=3
    )

    async def run(self, context: GraphRunContext[GraphState]) -> "ImproveNode":
        s = context.state
        assert s.plan is not None
        prompt = agent_context_to_string(s.context) + "\n\nCANDIDATES:\n" + s.plan.model_dump_json(indent=2)
        scores = (await self.critic.run(prompt)).output
        # Defensive: clamp and align indices
        n = len(s.plan.candidates)
        clean = []
        for sc in scores:
            i = max(0, min(n - 1, int(sc.index)))
            clean.append(CriticScore(
                index=i,
                q_value=max(0.0, float(sc.q_value)),
                risk=min(1.0, max(0.0, float(sc.risk))),
                confidence=min(1.0, max(0.0, float(sc.confidence))),
                notes=sc.notes,
            ))
        s.critic_scores = clean
        return ImproveNode()


@dataclass
class ImproveNode(BaseNode):
    """Optional one-step policy improvement: if best Q is weak or risk is high, ask actor to refine once."""
    improver = Agent(
        model="openrouter:openai/gpt-5",
        system_prompt=(
            "IMPROVER: Given context, current candidates, and critic feedback, produce up to 2 REFINED "
            "legal alternatives that address the critic's concerns (e.g., too risky, low value). "
            "If current best is already strong, return an empty list to keep it."
        ),
        output_type=List[PlanCandidate],
        retries=2
    )

    # thresholds for triggering refinement
    min_good_q: float = 0.75
    max_ok_risk: float = 0.65

    async def run(self, context: GraphRunContext[GraphState]) -> "SelectorNode":
        s = context.state
        assert s.plan is not None and s.critic_scores is not None

        # Determine if refinement is needed
        best = max(s.critic_scores, key=lambda x: x.q_value)
        need_refine = (best.q_value < self.min_good_q) or (best.risk > self.max_ok_risk)

        if not need_refine or s.refined:
            return SelectorNode()

        prompt = (
            agent_context_to_string(s.context)
            + "\n\nCANDIDATES:\n" + s.plan.model_dump_json(indent=2)
            + "\n\nCRITIC:\n" + "\n".join(f"[{c.index}] Q={c.q_value:.2f} risk={c.risk:.2f} conf={c.confidence:.2f} {c.notes or ''}"
                                          for c in s.critic_scores)
        )
        new_opts = (await self.improver.run(prompt)).output

        if new_opts:
            # merge refined options (append, keep old too)
            s.plan = Plan(candidates=(s.plan.candidates + new_opts)[:6])  # cap to avoid prompt bloat
            s.refined = True
            return CriticNode()  # re-score with critic after refinement
        else:
            return SelectorNode()


@dataclass
class SelectorNode(BaseNode):
    """Pick argmax of an adjusted score: Q * (1 - λ*risk) with confidence weighting."""
    lambda_risk: float = 0.35

    async def run(self, context: GraphRunContext[GraphState]) -> End:
        s = context.state
        assert s.plan is not None and s.critic_scores is not None

        adjusted = []
        for sc in s.critic_scores:
            adj = sc.q_value * (1.0 - self.lambda_risk * sc.risk) * (0.5 + 0.5 * sc.confidence)
            adjusted.append((sc.index, adj))

        best_idx, _ = max(adjusted, key=lambda t: t[1])
        s.final_decision = s.plan.candidates[best_idx]

        return End(s.final_decision)

actor_critic_graph = Graph(
    nodes=[ActorNode, CriticNode, ImproveNode, SelectorNode],
    state_type=GraphState,
)

example_agent_context = AgentContext(
    turn=1,
    weather={},
    you_active="Pikachu",
    you_team=[TeamMon(species="Pikachu", hp=1.0, fainted=False, types=["Electric"], boosts={"atk": 1}, status=None, must_recharge=False)],
    opp_active="Bulbasaur",
    opp_known=[TeamMon(species="Bulbasaur", hp=1.0, fainted=False, types=["Grass", "Poison"], boosts={"def": -1}, status="paralyzed", must_recharge=False)],
    legal_moves=[MoveOption(move_id="Thunderbolt", base_power=90, accuracy=100, priority=0)],
    legal_switches=[SwitchOption(species="Charizard", hp=1.0, fainted=False, types=["Fire", "Flying"])],
    past_actions=[],
)

result = await actor_critic_graph.run(ActorNode(), state=GraphState(context=example_agent_context))
rprint("Actor–Critic Decision:", result.state.final_decision)

33:07.115 run graph actor_critic_graph
33:07.117   run node ActorNode
33:22.450   run node CriticNode
33:30.335   run node ImproveNode
33:30.337   run node SelectorNode

Actor–Critic Decision:
PlanCandidate(
    kind='move',
    move_id='Thunderbolt',
    switch_species=None,
    rationale='Thunderbolt is a strong, reliable STAB move with 100% accuracy. Bulbasaur is paralyzed, so it may 
not even move. This is a good offensive option.'
)

actor_critic_graph = Graph(
    nodes=[ActorNode, CriticNode, ImproveNode, SelectorNode],
    state_type=GraphState,
)

actor_critic_agent = PydanticGraphAgent(name="LLM Agent", agentic_graph=actor_critic_graph, first_node=ActorNode())

await actor_critic_agent.battle_against(RandomPlayer(), n_battles=1)

33:35.277 run graph None
33:35.280   run node ActorNode
33:49.671   run node CriticNode
34:00.227   run node ImproveNode
34:00.229   run node SelectorNode

T5 DECISION: A safe switch. Ironthorns has a good defensive typing (Rock/Electric) against Mightyena's likely 
Dark-type STAB moves and can threaten back with its own powerful attacks.

34:00.237 run graph None
34:00.238   run node ActorNode
34:30.396   run node CriticNode
34:40.744   run node ImproveNode
34:40.744   run node SelectorNode

T6 DECISION: Deals STAB damage and allows a safe switch to a better-positioned Pokémon like Krookodile, adapting to
the opponent's move.

34:40.752 run graph None
34:40.752   run node ActorNode
34:57.234   run node CriticNode
35:03.734   run node ImproveNode
35:03.734   run node SelectorNode

T6 DECISION: Mightyena is a Dark-type pokemon, so a Bug-type move like Megahorn would be super-effective. 
Ironthorns is faster and should be able to knock out Mightyena with this move.

35:03.741 run graph None
35:03.742   run node ActorNode
35:16.440   run node CriticNode
35:28.183   run node ImproveNode
35:28.183   run node SelectorNode

T7 DECISION: Close Combat is a super-effective, high-power move that will likely knock out Mightyena.

35:28.199 run graph None
35:28.199   run node ActorNode
35:46.445   run node CriticNode
35:54.060   run node ImproveNode
35:54.061   run node SelectorNode

T8 DECISION: This is another safe switch. Zapdos resists Flying-type attacks and can hit Enamorus with a 
super-effective Electric-type STAB move.

35:54.073 run graph None
35:54.074   run node ActorNode
36:07.962   run node CriticNode
36:15.468   run node ImproveNode
36:15.468   run node SelectorNode

T9 DECISION: Ironthorns resists Normal-type attacks, making it a safe Pokémon to switch into against Maushold. It 
can absorb a potential Population Bomb and retaliate.

36:15.474 run graph None
36:15.474   run node ActorNode
36:36.471   run node CriticNode
36:45.996   run node ImproveNode
36:45.996   run node SelectorNode

T10 DECISION: Volt Switch deals damage and allows a pivot to another Pokémon. This is a safe, strategic option to 
maintain momentum and react to a potential switch from the opponent.

36:46.004 run graph None
36:46.004   run node ActorNode
37:05.105   run node CriticNode
37:14.771   run node ImproveNode
37:14.771   run node SelectorNode

T10 DECISION: Krookodile is immune to Jolteon's electric attacks and can threaten it with a super-effective ground 
type move. It is a high-risk high-reward play.

37:14.779 run graph None
37:14.780   run node ActorNode
37:29.325   run node CriticNode
37:35.950   run node ImproveNode
37:35.950   run node SelectorNode

T11 DECISION: Enamorus is immune to Earthquake and threatens Krookodile with its Fairy typing. Iron Thorns has a 
type advantage, resisting Enamorus's Flying-type moves and can hit back with super-effective Rock or Electric-type 
attacks.

37:35.955 run graph None
37:35.956   run node ActorNode
37:54.459   run node CriticNode
38:02.806   run node ImproveNode
38:02.806   run node SelectorNode

T12 DECISION: Krookodile is immune to Jolteon's Electric-type STAB moves and can OHKO it with a Ground-type move in
return.

38:02.816 run graph None
38:02.816   run node ActorNode
38:19.945   run node CriticNode
38:31.231   run node ImproveNode
38:31.231   run node SelectorNode

T13 DECISION: Earthquake is a super-effective STAB move against Jolteon, which is very likely to knock it out. 
Given Krookodile's immunity to Electric attacks, this is a strong offensive choice.

38:31.235 run graph None
38:31.235   run node ActorNode
38:49.396   run node CriticNode
38:57.798   run node ImproveNode
38:57.799   run node SelectorNode

T14 DECISION: Earthquake is Krookodile's strongest and most reliable STAB move against Maushold. It has high power 
and perfect accuracy, making it the most straightforward and effective offensive option.

38:57.804 run graph None
38:57.805   run node ActorNode
39:15.425   run node CriticNode
39:25.445   run node ImproveNode
39:25.446   run node SelectorNode

T15 DECISION: Iron Thorns is an excellent switch-in. Its Rock/Electric typing gives it a 4x super-effective 
advantage against Enamorus's Flying type, and it resists Flying-type moves.

39:25.454 run graph None
39:25.454   run node ActorNode
39:41.633   run node CriticNode
39:51.240   run node ImproveNode
39:51.240   run node SelectorNode

T16 DECISION: Wild Charge is a powerful, super-effective STAB move with 100% accuracy. It has a high chance to 
knock out Enamorus in one hit.

39:51.250 run graph None
39:51.250   run node ActorNode
40:13.728   run node CriticNode
40:24.176   run node ImproveNode
40:24.177   run node SelectorNode

T17 DECISION: Volt Switch deals damage and allows a switch to a healthier Pokémon like Zapdos or Feraligatr, which 
can more safely absorb a hit from Maushold.

40:24.184 run graph None
40:24.185   run node ActorNode
40:43.283   run node CriticNode
40:51.208   run node ImproveNode
40:51.208   run node SelectorNode

T17 DECISION: Switching to Zapdos is another strong choice. It has good defensive typing, resisting potential 
Fighting-type coverage moves from Maushold. Zapdos is also very fast and can threaten with powerful special 
attacks.

40:51.217 run graph None
40:51.218   run node ActorNode
41:17.314   run node CriticNode
41:26.949   run node ImproveNode
41:26.949   run node SelectorNode

T18 DECISION: Discharge is a reliable STAB move with 100% accuracy, offering consistent damage against Maushold 
without the risk of missing.

41:26.951 run graph None
41:26.951   run node ActorNode
41:47.572   run node CriticNode
41:56.741   run node ImproveNode
41:56.741   run node SelectorNode

T19 DECISION: Maushold is asleep and at low health. Discharge is a reliable STAB move with 100% accuracy that could
secure the knockout.

23:41:56.750 run graph None
23:41:56.750   run node ActorNode

42:10.500   run node CriticNode
42:18.964   run node ImproveNode
42:18.967   run node SelectorNode

CONTEXT:
AgentContext(
    turn=1,
    weather={},
    you_active='deoxys',
    you_team=[
        TeamMon(
            species='deoxys',
            hp=1.0,
            fainted=False,
            types=['PSYCHIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='brutebonnet',
            hp=1.0,
            fainted=False,
            types=['GRASS', 'DARK'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='weavile',
            hp=1.0,
            fainted=False,
            types=['DARK', 'ICE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='regice',
            hp=1.0,
            fainted=False,
            types=['ICE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='fluttermane',
            hp=1.0,
            fainted=False,
            types=['GHOST', 'FAIRY'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='zapdosgalar',
            hp=1.0,
            fainted=False,
            types=['FIGHTING', 'FLYING'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    opp_active='dedenne',
    opp_known=[
        TeamMon(
            species='dedenne',
            hp=1.0,
            fainted=False,
            types=['ELECTRIC', 'FAIRY'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    legal_moves=[
        MoveOption(move_id='psychoboost', base_power=140, accuracy=0.9, move_type='PSYCHIC', pp=8, priority=0),
        MoveOption(move_id='knockoff', base_power=65, accuracy=1.0, move_type='DARK', pp=32, priority=0),
        MoveOption(move_id='extremespeed', base_power=80, accuracy=1.0, move_type='NORMAL', pp=8, priority=2),
        MoveOption(move_id='superpower', base_power=120, accuracy=1.0, move_type='FIGHTING', pp=8, priority=0)
    ],
    legal_switches=[
        SwitchOption(species='brutebonnet', hp=1.0, fainted=False, types=['GRASS', 'DARK']),
        SwitchOption(species='weavile', hp=1.0, fainted=False, types=['DARK', 'ICE']),
        SwitchOption(species='regice', hp=1.0, fainted=False, types=['ICE']),
        SwitchOption(species='fluttermane', hp=1.0, fainted=False, types=['GHOST', 'FAIRY']),
        SwitchOption(species='zapdosgalar', hp=1.0, fainted=False, types=['FIGHTING', 'FLYING'])
    ],
    past_actions=[
        'T5: SWITCH to ironthorns (from krookodile)',
        'T6: MOVE voltswitch (ironthorns vs mightyena)',
        'T6: FALLBACK random',
        'T7: MOVE closecombat (braviary vs mightyena)',
        'T8: SWITCH to zapdos (from braviary)',
        'T9: SWITCH to ironthorns (from zapdos)',
        'T10: MOVE voltswitch (ironthorns vs mausholdfour)',
        'T10: SWITCH to krookodile (from ironthorns)',
        'T11: SWITCH to ironthorns (from krookodile)',
        'T12: SWITCH to krookodile (from ironthorns)',
        'T13: MOVE earthquake (krookodile vs jolteon)',
        'T14: MOVE earthquake (krookodile vs mausholdfour)',
        'T15: SWITCH to ironthorns (from krookodile)',
        'T16: MOVE wildcharge (ironthorns vs enamorus)',
        'T17: MOVE voltswitch (ironthorns vs mausholdfour)',
        'T17: SWITCH to zapdos (from ironthorns)',
        'T18: MOVE discharge (zapdos vs mausholdfour)',
        'T19: MOVE discharge (zapdos vs mausholdfour)'
    ]
)

DECISION:
PlanCandidate(
    kind='switch',
    move_id=None,
    switch_species='fluttermane',
    rationale='Flutter Mane resists Electric-type moves and can threaten Dedenne with its STAB moves on the 
following turn. This is a relatively safe switch.'
)

42:19.033 run graph None
42:19.034   run node ActorNode
42:40.585   run node CriticNode
42:50.718   run node ImproveNode
42:50.719   run node SelectorNode

T2 DECISION: High power STAB move that will deal significant damage to Glalie. Fluttermane is faster, so it will 
attack first.

42:50.731 run graph None
42:50.732   run node ActorNode
43:10.551   run node CriticNode
43:18.699   run node ImproveNode
43:18.700   run node SelectorNode

T3 DECISION: Moonblast is Flutter Mane's strongest STAB move and is likely to knock out the weakened Skuntank.

43:18.717 run graph None
43:18.719   run node ActorNode
43:38.243   run node CriticNode
43:49.050   run node ImproveNode
43:49.051   run node SelectorNode

T4 DECISION: Moonblast is a 4x super-effective STAB move against Garchomp, and with Flutter Mane's high speed and 
special attack, it has a strong chance of securing a one-hit knockout.

43:49.066 run graph None
43:49.069   run node ActorNode
44:12.956   run node CriticNode
44:23.324   run node ImproveNode
44:23.325   run node SelectorNode

T5 DECISION: This is your strongest neutral attack against Dedenne. Given Flutter Mane's high Speed and Special 
Attack, this is a reliable way to deal significant damage.

44:23.338 run graph None
44:23.339   run node ActorNode
44:38.240   run node CriticNode
44:47.859   run node ImproveNode
44:47.861   run node SelectorNode

T6 DECISION: Moonblast is your strongest STAB move against Dedenne and is very likely to knock it out, as Dedenne 
is already at 52% HP.

44:47.886 run graph None
44:47.888   run node ActorNode
45:08.940   run node CriticNode
45:18.377   run node ImproveNode
45:18.378   run node SelectorNode

T7 DECISION: Regice has massive special defense and resists Ice-type attacks, making it a very safe switch-in 
against Glalie.

45:18.384 run graph None
45:18.386   run node ActorNode
45:37.245   run node CriticNode
45:44.577   run node ImproveNode
45:44.577   run node SelectorNode

T8 DECISION: Weavile resists Glalie's STAB Ice-type moves and can hit back with its own super-effective attacks.

45:44.585 run graph None
45:44.586   run node ActorNode
46:01.809   run node CriticNode
46:09.654   run node ImproveNode
46:09.655   run node SelectorNode

T9 DECISION: Weavile is faster than Glalie. Knock Off deals solid neutral damage and removes Glalie's item, which 
could be a threat.

46:09.671 run graph None
46:09.673   run node ActorNode
46:26.144   run node CriticNode
46:35.060   run node ImproveNode
46:35.061   run node SelectorNode

T10 DECISION: Weavile is faster and can deal significant neutral damage with Knock Off, potentially knocking out 
Glalie. This is a strong offensive option.

46:35.074 run graph None
46:35.076   run node ActorNode
46:53.692   run node CriticNode
47:01.924   run node ImproveNode
47:01.925   run node SelectorNode

T11 DECISION: Guaranteed to hit first and has 100% accuracy. Glalie is at low HP, so this priority STAB move should
secure the knockout.

47:01.935 run graph None
47:01.935   run node ActorNode
47:23.878   run node CriticNode
47:32.369   run node ImproveNode
47:32.370   run node SelectorNode

T12 DECISION: This is the safest option. With its priority, Ice Shard is guaranteed to hit first, and with 100% 
accuracy, it's very likely to knock out the weakened Glalie without any risk.

47:32.384 run graph None
47:32.385   run node ActorNode
47:51.924   run node CriticNode
48:01.713   run node ImproveNode
48:01.713   run node SelectorNode

T13 DECISION: Brute Bonnet's Grass typing provides a 4x resistance to Pikachu's Electric STAB moves, making it a 
very safe switch-in.

48:01.729 run graph None
48:01.730   run node ActorNode
48:24.346   run node CriticNode
48:35.492   run node ImproveNode
48:35.492   run node SelectorNode

T14 DECISION: Puts the opponent to sleep with 100% accuracy, neutralizing the immediate threat and creating an 
opportunity for a follow-up attack or a safe switch.

48:35.532 run graph None
48:35.534   run node ActorNode
49:01.947   run node CriticNode
49:10.117   run node ImproveNode
49:10.117   run node SelectorNode

T15 DECISION: This is your strongest attack. It's a super-effective STAB move against a sleeping opponent, so you 
are guaranteed to land a powerful hit.

49:10.154 run graph None
49:10.155   run node ActorNode
49:35.530   run node CriticNode
49:44.780   run node ImproveNode
49:44.780   run node SelectorNode

T16 DECISION: Zapdos-Galar has a 4x resistance to Ground and a resistance to Dark, making it an excellent switch-in
against Krookodile. It's at full health and can threaten a super-effective Fighting-type move.

49:44.800 run graph None
49:44.802   run node ActorNode
50:09.285   run node CriticNode
50:17.974   run node ImproveNode
50:17.974   run node SelectorNode

T17 DECISION: Brave Bird is a powerful STAB move that can deal significant neutral damage to Pikachu. This is a 
high-risk, high-reward play that could KO Pikachu if it doesn't have a-lot of investment in bulk.

50:17.982 run graph None
50:17.982   run node ActorNode
50:35.423   run node CriticNode
50:42.465   run node ImproveNode
50:42.466   run node SelectorNode

T18 DECISION: Since Pikachu is asleep, this is a free turn to set up. Bulk Up will boost Zapdos-Galar's Attack and 
Defense, making it much stronger against the incoming Krookodile.

50:42.473 run graph None
50:42.474   run node ActorNode
51:06.331   run node CriticNode
51:17.538   run node ImproveNode
51:17.538   run node SelectorNode

T19 DECISION: This move will KO the opposing Pikachu and is also super-effective against the likely switch-in, 
Krookodile.

51:17.565 run graph None
51:17.565   run node ActorNode
51:40.014   run node CriticNode
51:51.301   run node ImproveNode
51:51.302   run node SelectorNode

T20 DECISION: Close Combat is a super-effective STAB move that, when combined with the +2 Attack boost, has a very 
high chance of knocking out Krookodile.

51:51.311 run graph None
51:51.311   run node ActorNode
52:08.270   run node CriticNode
52:19.718   run node ImproveNode
52:19.718   run node SelectorNode

CONTEXT:
AgentContext(
    turn=1,
    weather={},
    you_active='regidrago',
    you_team=[
        TeamMon(
            species='regidrago',
            hp=1.0,
            fainted=False,
            types=['DRAGON'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='veluza',
            hp=1.0,
            fainted=False,
            types=['WATER', 'PSYCHIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='eelektross',
            hp=1.0,
            fainted=False,
            types=['ELECTRIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='gliscor',
            hp=1.0,
            fainted=False,
            types=['GROUND', 'FLYING'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='dewgong',
            hp=1.0,
            fainted=False,
            types=['WATER', 'ICE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        ),
        TeamMon(
            species='volcarona',
            hp=1.0,
            fainted=False,
            types=['BUG', 'FIRE'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    opp_active='ampharos',
    opp_known=[
        TeamMon(
            species='ampharos',
            hp=1.0,
            fainted=False,
            types=['ELECTRIC'],
            boosts={'accuracy': 0, 'atk': 0, 'def': 0, 'evasion': 0, 'spa': 0, 'spd': 0, 'spe': 0},
            status=None,
            must_recharge=False
        )
    ],
    legal_moves=[
        MoveOption(move_id='dragondance', base_power=0, accuracy=1.0, move_type='DRAGON', pp=32, priority=0),
        MoveOption(move_id='outrage', base_power=120, accuracy=1.0, move_type='DRAGON', pp=16, priority=0),
        MoveOption(move_id='dracometeor', base_power=130, accuracy=0.9, move_type='DRAGON', pp=8, priority=0),
        MoveOption(move_id='earthquake', base_power=100, accuracy=1.0, move_type='GROUND', pp=16, priority=0)
    ],
    legal_switches=[
        SwitchOption(species='veluza', hp=1.0, fainted=False, types=['WATER', 'PSYCHIC']),
        SwitchOption(species='eelektross', hp=1.0, fainted=False, types=['ELECTRIC']),
        SwitchOption(species='gliscor', hp=1.0, fainted=False, types=['GROUND', 'FLYING']),
        SwitchOption(species='dewgong', hp=1.0, fainted=False, types=['WATER', 'ICE']),
        SwitchOption(species='volcarona', hp=1.0, fainted=False, types=['BUG', 'FIRE'])
    ],
    past_actions=[
        'T5: SWITCH to ironthorns (from krookodile)',
        'T6: MOVE voltswitch (ironthorns vs mightyena)',
        'T6: FALLBACK random',
        'T7: MOVE closecombat (braviary vs mightyena)',
        'T8: SWITCH to zapdos (from braviary)',
        'T9: SWITCH to ironthorns (from zapdos)',
        'T10: MOVE voltswitch (ironthorns vs mausholdfour)',
        'T10: SWITCH to krookodile (from ironthorns)',
        'T11: SWITCH to ironthorns (from krookodile)',
        'T12: SWITCH to krookodile (from ironthorns)',
        'T13: MOVE earthquake (krookodile vs jolteon)',
        'T14: MOVE earthquake (krookodile vs mausholdfour)',
        'T15: SWITCH to ironthorns (from krookodile)',
        'T16: MOVE wildcharge (ironthorns vs enamorus)',
        'T17: MOVE voltswitch (ironthorns vs mausholdfour)',
        'T17: SWITCH to zapdos (from ironthorns)',
        'T18: MOVE discharge (zapdos vs mausholdfour)',
        'T19: MOVE discharge (zapdos vs mausholdfour)',
        'T1: SWITCH to fluttermane (from deoxys)',
        'T2: MOVE moonblast (fluttermane vs glalie)',
        'T3: MOVE moonblast (fluttermane vs skuntank)',
        'T4: MOVE moonblast (fluttermane vs garchomp)',
        'T5: MOVE shadowball (fluttermane vs dedenne)',
        'T6: MOVE moonblast (fluttermane vs dedenne)',
        'T7: SWITCH to regice (from fluttermane)',
        'T8: SWITCH to weavile (from regice)',
        'T9: MOVE knockoff (weavile vs glalie)',
        'T10: MOVE knockoff (weavile vs glalie)',
        'T11: MOVE iceshard (weavile vs glalie)',
        'T12: MOVE iceshard (weavile vs glalie)',
        'T13: SWITCH to brutebonnet (from weavile)',
        'T14: MOVE spore (brutebonnet vs pikachuhoenn)',
        'T15: MOVE seedbomb (brutebonnet vs pikachuhoenn)',
        'T16: SWITCH to zapdosgalar (from brutebonnet)',
        'T17: MOVE bravebird (zapdosgalar vs pikachuhoenn)',
        'T18: MOVE bulkup (zapdosgalar vs pikachuhoenn)',
        'T19: MOVE closecombat (zapdosgalar vs pikachuhoenn)',
        'T20: MOVE closecombat (zapdosgalar vs krookodile)'
    ]
)

DECISION:
PlanCandidate(
    kind='move',
    move_id='earthquake',
    switch_species=None,
    rationale='Earthquake is a super-effective, high-power, and perfectly accurate move against the opposing 
Ampharos. It is the strongest and most reliable offensive option.'
)

52:19.761 run graph None
52:19.761   run node ActorNode
52:38.907   run node CriticNode
52:48.803   run node ImproveNode
52:48.804   run node SelectorNode

T2 DECISION: Volcarona has a major advantage, resisting Cresselia's likely Ice and Fairy moves and threatening with
super-effective Bug-type attacks.

52:48.810 run graph None
52:48.810   run node ActorNode
53:06.655   run node CriticNode
53:14.731   run node ImproveNode
53:14.731   run node SelectorNode

T3 DECISION: Bug Buzz is a powerful STAB move that is super-effective against Cresselia's base Psychic typing, 
dealing significant damage despite its Special Defense boost.

53:14.741 run graph None
53:14.741   run node ActorNode
53:40.216   run node CriticNode
53:52.998   run node ImproveNode
53:53.000   run node SelectorNode

T4 DECISION: Bug Buzz is a powerful super-effective STAB move. With 100% accuracy, it's the most reliable way to 
inflict significant damage on Cresselia, even with its Special Defense boost.

53:53.018 run graph None
53:53.020   run node ActorNode
54:13.774   run node CriticNode
54:22.119   run node ImproveNode
54:22.120   run node SelectorNode

T5 DECISION: Gliscor is immune to Electric-type attacks, making it a perfect switch-in against a Terastallized 
Electric Cresselia. It can then threaten with super-effective Ground-type moves.

54:22.126 run graph None
54:22.126   run node ActorNode
54:41.783   run node CriticNode
54:50.920   run node ImproveNode
54:50.920   run node SelectorNode

T6 DECISION: Gliscor has a 4x weakness to Delibird's Ice-type attacks. Switching to Eelektross is a safe and 
advantageous move, as it resists Flying-type moves and can counter with super-effective Electric-type attacks.

54:50.926 run graph None
54:50.927   run node ActorNode
55:02.354   run node CriticNode
55:11.527   run node ImproveNode
55:11.527   run node SelectorNode

T7 DECISION: Drain Punch is super-effective (Fighting vs. Dark/Fire) and has 100% accuracy, making it a strong and 
reliable offensive option to deal significant damage to Houndoom.

55:11.548 run graph None
55:11.551   run node ActorNode
55:28.942   run node CriticNode
55:37.917   run node ImproveNode
55:37.917   run node SelectorNode

T8 DECISION: Gliscor is immune to Falinks's Fighting-type STAB moves, making it a very safe switch-in. It can 
threaten Falinks back with its own STABs.

55:37.941 run graph None
55:37.944   run node ActorNode
56:00.686   run node CriticNode
56:11.033   run node ImproveNode
56:11.033   run node SelectorNode

T9 DECISION: Earthquake is Gliscor's strongest move against Falinks and has a high chance of knocking it out at its
current health. Gliscor's typing resists Falinks's STAB moves, making this a safe and powerful offensive option.

56:11.055 run graph None
56:11.057   run node ActorNode
56:25.091   run node CriticNode
56:33.863   run node ImproveNode
56:33.863   run node SelectorNode

T10 DECISION: Earthquake is a powerful STAB move that should be able to knock out the already weakened Falinks, 
especially with its lowered defense.

56:33.876 run graph None
56:33.877   run node ActorNode
56:52.833   run node CriticNode
57:03.311   run node ImproveNode
57:03.312   run node SelectorNode

T11 DECISION: Switching to Eelektross is a safe move. It resists Ampharos's Electric STAB moves and is only hit 
neutrally by potential Ice-type coverage. Its Levitate ability also makes it immune to any predicted Ground-type 
attacks.

57:03.318 run graph None
57:03.318   run node ActorNode
57:23.176   run node CriticNode
57:33.817   run node ImproveNode
57:33.818   run node SelectorNode

T12 DECISION: This move has 100% accuracy and should be sufficient to knock out the low-HP Falinks, making it a 
safer option than Supercell Slam.

57:33.833 run graph None
57:33.834   run node ActorNode
57:51.153   run node CriticNode

2025-11-10 23:57:55,507 - PydanticGraphAge 1 - ERROR - Unhandled exception raised while handling message:
>battle-gen9randombattle-279
|request|{"active":[{"moves":[{"move":"Drain Punch","id":"drainpunch","pp":15,"maxpp":16,"target":"normal","disabled":false},{"move":"Supercell Slam","id":"supercellslam","pp":24,"maxpp":24,"target":"normal","disabled":false},{"move":"Coil","id":"coil","pp":32,"maxpp":32,"target":"self","disabled":false},{"move":"Fire Punch","id":"firepunch","pp":23,"maxpp":24,"target":"normal","disabled":false}],"canTerastallize":"Fighting"}],"side":{"name":"PydanticGraphAge 1","id":"p1","pokemon":[{"ident":"p1: Eelektross","details":"Eelektross, L87, F","condition":"127/290","active":true,"stats":{"atk":250,"def":189,"spa":232,"spd":189,"spe":137},"moves":["drainpunch","supercellslam","coil","firepunch"],"baseAbility":"levitate","item":"leftovers","pokeball":"pokeball","ability":"levitate","commanding":false,"reviving":false,"teraType":"Fighting","terastallized":""},{"ident":"p1: Veluza","details":"Veluza, L85, F","condition":"292/292","active":false,"stats":{"atk":222,"def":173,"spa":181,"spd":159,"spe":168},"moves":["aquacutter","aquajet","psychocut","nightslash"],"baseAbility":"sharpness","item":"choiceband","pokeball":"pokeball","ability":"sharpness","commanding":false,"reviving":false,"teraType":"Dark","terastallized":""},{"ident":"p1: Gliscor","details":"Gliscor, L76, F","condition":"192/239 tox","active":false,"stats":{"atk":188,"def":234,"spa":112,"spd":158,"spe":188},"moves":["protect","toxic","knockoff","earthquake"],"baseAbility":"poisonheal","item":"toxicorb","pokeball":"pokeball","ability":"poisonheal","commanding":false,"reviving":false,"teraType":"Water","terastallized":""},{"ident":"p1: Volcarona","details":"Volcarona, L77, M","condition":"160/257","active":false,"stats":{"atk":97,"def":145,"spa":252,"spd":206,"spe":199},"moves":["bugbuzz","quiverdance","terablast","fireblast"],"baseAbility":"swarm","item":"heavydutyboots","pokeball":"pokeball","ability":"swarm","commanding":false,"reviving":false,"teraType":"Water","terastallized":""},{"ident":"p1: Dewgong","details":"Dewgong, L94, F","condition":"322/322","active":false,"stats":{"atk":185,"def":204,"spa":185,"spd":232,"spe":185},"moves":["surf","knockoff","tripleaxel","encore"],"baseAbility":"thickfat","item":"heavydutyboots","pokeball":"pokeball","ability":"thickfat","commanding":false,"reviving":false,"teraType":"Grass","terastallized":""},{"ident":"p1: Regidrago","details":"Regidrago, L77","condition":"435/435","active":false,"stats":{"atk":199,"def":122,"spa":199,"spd":122,"spe":168},"moves":["dragondance","outrage","dracometeor","earthquake"],"baseAbility":"dragonsmaw","item":"lumberry","pokeball":"pokeball","ability":"dragonsmaw","commanding":false,"reviving":false,"teraType":"Dragon","terastallized":""}]},"rqid":30}
Traceback (most recent call last):
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\ps_client\ps_client.py", line 142, in _handle_message
    await self._handle_battle_message(split_messages)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\player\player.py", line 291, in _handle_battle_message
    await self._handle_battle_request(battle)
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\poke_env\player\player.py", line 416, in _handle_battle_request
    choice = self.choose_move(battle)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SHRESHTH\AppData\Local\Temp\ipykernel_27708\3706119269.py", line 21, in choose_move
    result = asyncio.run(self.graph.run(self.first_node, state=GraphState(context=ctx)))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "C:\Users\SHRESHTH\AppData\Local\Programs\Python\Python312\Lib\asyncio\futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "C:\Users\SHRESHTH\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py", line 304, in __step_run_and_handle_result
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 152, in run
    async for _node in graph_run:
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 766, in __anext__
    return await self.next(self._next_node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\SHRESHTH\Desktop\build-your-own-super-agents\.venv\Lib\site-packages\pydantic_graph\graph.py", line 739, in next
    self._next_node = await node.run(ctx)
                      ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SHRESHTH\AppData\Local\Temp\ipykernel_27708\3721675952.py", line 102, in run
    best = max(s.critic_scores, key=lambda x: x.q_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: max() iterable argument is empty

23:57:55.504   run node ImproveNode

Let’s see this battle in action:

You can also view the full battle here.

🏁 Conclusion#

In this tutorial, we explored how multi-agent workflows enable structured, interpretable reasoning in complex environments — using Pokémon battles as our sandbox.

We built and compared three distinct coordination paradigms:

🧠 Manager–Coordinator: hierarchical and resource-aware — perfect for structured, goal-driven orchestration.
🗳️ Democratic Swarms: ensemble-style decision-making — robust and diverse, harnessing the collective judgment of many agents.
🎭 Actor–Critic: feedback-driven refinement — adaptive and value-based, blending planning with learned evaluation.

Each architecture has strengths depending on your problem structure, latency budget, and decision uncertainty. Together, they form the foundation of Agentic AI systems — not just single agents, but entire cognitive ecosystems working in concert.

🌐 Beyond these: other agentic swarm paradigms#

Modern agent research spans several other fascinating coordination styles that extend or hybridize these ideas:

🕸️ Blackboard Systems / Shared Memory Agents — agents read and write to a common “knowledge blackboard” (e.g., HUB, LangGraph Shared Memory, ReAct with tool state).
🪶 Evolutionary or Genetic Agent Swarms — agents mutate and compete (e.g., EvoAgents, Yuan et al. (2024)) to explore strategy space efficiently.
🔁 Agentic Swarms — multiple agents monitor and fine-tune their own reasoning, using reflection or secondary evaluators (e.g., SwarmAgentic, Zhang et al. (2025)).

Together, these paradigms form a growing ecosystem of Agentic Swarms — distributed reasoning systems where intelligence emerges from structured interaction rather than single-model dominance.

🔮 What’s next#

So far, we’ve focused on how agents reason and collaborate. In the next tutorial, we’ll explore how to select the optimal model for each agent automatically — using meta-evaluation, model profiling, and cost–quality trade-offs to dynamically assign the best LLM to each sub-task.

We’ll essentially teach our system to self-optimize its model choices — the first step toward autonomous orchestration in large multi-agent setups.

07. Multi-Agent Workflows and Agentic Swarms

Contents

07. Multi-Agent Workflows and Agentic Swarms#

⚙️ What Are Multi-Agent Workflows?#

🧩 Enter Pydantic Graphs#

🧩 What is poke-env?#

🧪 Getting Started with poke-env#

⚡ Creating a “Max Damage” Baseline#

🎮 Pokémon battle mechanics — and how we encode them for our agents#

🧱 Our context schema (how we feed the LLM the game state)#

🛠️ How the code builds this context#

🤖 A minimal “thinking player”#

⚔️ Running our first Agentic battle#

🕸️ Introducing Pydantic Graphs — the foundation for structured multi-agent workflows#

⚙️ What are Pydantic Graphs?#

🤝 Why multi-agent workflows?#

🔀 Static vs Dynamic Query Routing#

✏️ Query Rewriting#

🧭 The Manager–Coordinator Paradigm in Agentic Swarms#

🕸️ The Idea: Manager + Specialists = Smarter Decisions#

⚙️ How Pydantic Graphs Enable This#

🧩 Why the Manager-Coordinator Model Matters#

🗳️ Democratic multi-agent swarms#

🧠 When to use Democratic swarms vs Manager–Coordinator#

🧬 Mixed-model ensembles (per-agent LLMs)#

🎭 Actor–Critic Multi-Agent Workflow#

🧩 Roles and flow#

🧠 What makes this pattern powerful#

⚙️ Comparing paradigms#

🚀 Why it fits Pokémon and beyond#

🏁 Conclusion#

🌐 Beyond these: other agentic swarm paradigms#

🔮 What’s next#

🧩 Enter Pydantic Graphs #

🧩 What is `poke-env`?#

🧪 Getting Started with `poke-env`#