Agents & State Machines
How LLMs Run Multi-Step Tasks — and Why Most Agent Bugs Are State Bugs
Single-turn inference is well understood. But the moment a model starts taking actions, checking results, and deciding what to do next, you're dealing with a fundamentally different kind of system — one where state management matters as much as the model itself.
What an agent actually is
The word "agent" gets used loosely. For the purposes of inference engineering, an agent is a system where a language model makes decisions that affect what happens next — including what inputs it receives in subsequent calls. The model is in a loop, not just answering a single question.
The minimal definition: an agent runs the model, observes the result, takes some action based on that result, and then runs the model again. This loop continues until some termination condition is met. Everything else — memory, tools, planning, multi-agent coordination — is built on top of that basic structure.
What distinguishes this from single-turn inference is the presence of state. Between each model call, something about the world (or the agent's knowledge of it) has changed. Managing that state — what to track, how to represent it, when to update it, and how to recover when it goes wrong — is where most of the real engineering work in agentic systems lives.
// The key distinction
Single-turn inference: input → model → output. Done. Agentic inference: the output feeds back into the input. The model's decisions shape what it sees next. This creates feedback loops, and feedback loops require careful state management.
What a state machine is
A state machine (formally: a finite state automaton) is one of the oldest and most useful abstractions in computer science. The concept is simple: a system is always in exactly one state from a defined set of possible states. Events or conditions cause transitions from one state to another. Each state has defined behaviour, and each transition has defined conditions.
That's the whole thing. States, transitions, conditions. The power comes from making implicit behaviour explicit.
// simple_state_machine_example — a turnstile
State machines have a key property that makes them valuable: they make illegal states unrepresentable. If you've defined your states correctly, the system can never be in a state you haven't thought about. Compare this to a tangle of boolean flags and conditionals — where you can easily end up in a state you never intended, because no one made the implicit states explicit.
The three parts of any state machine
Every state machine has the same three components, regardless of complexity:
// state_machine_components
Agents as state machines
An LLM agent is a state machine whether you model it that way or not. The question is whether your state machine is explicit (designed, documented, testable) or implicit (scattered across prompts, conditionals, and glue code that nobody fully understands).
Consider a basic research agent that can browse the web and answer questions. Even something this simple has multiple states:
// research_agent_states
Without the state machine framing, this same agent is usually implemented as a chain of if/else conditions and prompt checks. It works until an unexpected transition happens — a tool returns a malformed response, the model decides to call a tool that's out of scope, a loop runs longer than expected. At that point, undefined behaviour takes over.
The inference loop in an agent
From an inference engineering perspective, an agent is a process that runs the model repeatedly, with each call's output feeding the next call's input. Each model call is stateless — the model has no memory between calls — so the agent must construct the full context window on every iteration.
// agent_inference_loop
Environment / Observation
The current state of the world is serialised into text: tool results, memory contents, previous turns, system context. This becomes the model's input.
Model Inference
A single forward pass. The model produces an output: either a final answer, a tool call, or a reasoning step. From the model's perspective, this is just inference — it has no awareness of being in a loop.
Action / Tool Execution
If the model called a tool, the agent executes it: web search, code execution, database query, API call. The result is an observation that feeds back into the environment.
State Update & Termination Check
The agent's state machine transitions based on what just happened. Is the task complete? Has a termination condition been met? Is the context window approaching its limit? If not, construct the next context and loop.
Notice that the model itself is just one component in this loop. Everything around it — context construction, tool dispatch, state transitions, termination conditions — is orchestration code. This is why "building an agent" is primarily a software engineering problem, not a prompting problem.
Context window as working memory
Each model call starts from zero. The model has no memory of previous calls — it sees only what is in its context window right now. This means the agent's context window is its working memory. Everything the model needs to know to make a good decision must be present in the context.
This creates a hard constraint: context windows are finite. An agent that naively appends every observation, tool result, and reasoning step will eventually hit the context limit and either fail or produce degraded outputs as the beginning of the context rolls off. Managing what goes into the context — and what gets summarised, compressed, or evicted — is state management.
// The context window trap
Long-running agents often fail not because the model can't do the task, but because the context window fills up with irrelevant earlier steps, the model loses track of the original goal, and its outputs degrade. Context management is agent state management. They're the same problem.
Where agents fail — a state machine diagnosis
Most agent failures map directly to state machine failure modes. Once you see them this way, they become much easier to prevent:
| Failure mode | What it looks like | State machine cause |
|---|---|---|
| Infinite loops | Agent keeps calling the same tool or asking the same question | No transition out of the current state; missing termination condition |
| Silent tool failure | Tool returns an error; agent proceeds as if it succeeded | No ERROR state defined; tool result not checked before transition |
| Goal drift | Agent ends up working on a subtask and forgets the original task | State doesn't encode the top-level goal; context window crowded out original instruction |
| Hallucinated tool calls | Model invents a tool that doesn't exist or calls with wrong arguments | Valid transitions not constrained; model can attempt any transition, including invalid ones |
| Premature termination | Agent decides task is done when it isn't | Termination condition too loose; model's DONE state doesn't match task completion criteria |
| Unrecoverable error | One bad tool call crashes the entire agent run | No retry/recovery transitions defined from ERROR state |
Designing agent states well
The practical implication of treating your agent as a state machine is that you have to decide, upfront, what your states are. This sounds obvious but most agent implementations skip it entirely, jumping straight to the LLM call and hoping the model figures out what to do.
Good state design has a few properties. States should be meaningful — they should represent a real distinction in what the agent is doing or what information is available. States should be mutually exclusive — the agent is in exactly one state at a time. And states should be complete — every plausible situation should map to one of your defined states, including failure modes.
// what_to_define_per_state
Structured outputs as transition guards
One practical technique for enforcing state machine behaviour is to constrain the model's outputs structurally. If the model can only output a predefined set of actions — defined by a schema, an enum, or a grammar — then invalid transitions are impossible by construction. The model cannot call a tool that doesn't exist if tool calls are validated against a strict schema before execution.
This is one of the main practical benefits of structured output generation. The constraint isn't just aesthetic tidiness — it's a way of making the agent's transition graph enforceable rather than aspirational.
// State machines and reliability
An agent with an explicit state machine is dramatically easier to test, debug, and monitor than one without. You can log state transitions, write tests for specific transitions, and build alerts for states that should be transient but persist. The state machine gives you a vocabulary for describing what went wrong.
The inference cost of agents
Every iteration of the agent loop is at minimum one model call. Multi-step agents make many calls. This has a direct, compounding effect on latency and cost that single-turn inference does not have.
A task that takes ten model calls at 2 seconds per call takes at minimum 20 seconds — even with zero overhead. If each call has a long context (accumulated tool results and history), the per-call cost rises too, because prefill cost scales with context length. Long-running agents on large models can be expensive to run at scale.
This is why agent design and inference efficiency are not separate concerns. Decisions like context compression strategy, which model to use per step (a smaller model may suffice for tool call routing; only use the large model for synthesis), and how aggressively to cache prefixes — these are both agent design decisions and inference engineering decisions simultaneously.
Multi-agent systems
When multiple agents coordinate — a planner agent that breaks down tasks, specialist agents that execute subtasks, a critic agent that reviews outputs — you have a system of communicating state machines. Each agent has its own state. The coordination layer has its own state. Messages between agents are transitions.
The same principles apply, just at a higher level. The coordination layer needs its own explicit states (what are all the ways the multi-agent system can be?), defined transitions, and explicit failure handling. The most common failure in multi-agent systems is treating inter-agent communication as reliable when it isn't — an agent that receives no response from a subagent can end up in an undefined state.
// multi_agent_coordination_example
// In short