Back to Writing
AI Engineering 2026-05-01

Designing AI Agents as Phase-Gated Pipelines

Most agentic systems fail because one prompt tries to do everything. Splitting work into discrete phases — each with a single deliverable and enforced constraints — produces dramatically more reliable results. Here's the architecture.


There’s a seductive simplicity to the single-prompt agent: one system message, one loop, one model that does it all. It investigates the problem, writes the fix, validates the output, creates the PR, and posts the summary. It sounds clean.

In practice, it’s a disaster.

The model context becomes a graveyard of partially-completed thoughts. Exploration bleeds into implementation. Implementation bleeds into validation. The agent makes file edits in the middle of an investigation turn and overwrites its own reasoning. By the time it tries to create the PR, it has no clear picture of what it actually changed or why.

The fix is simple in concept: phases.


What a Phase Is

A phase is a unit of work with exactly one deliverable and explicitly forbidden actions.

  • Refine: reads the problem, locates the code, produces a structured plan. Cannot edit files.
  • Execute: reads the plan, makes the minimal code changes. Cannot investigate — that’s done.
  • Validate: runs static analysis, checks blast radius, pushes the branch, creates the PR. Cannot write new logic.
  • Review: reads the diff, checks for regressions and style. Cannot push.
  • Learn: extracts reusable patterns, updates the knowledge base. Cannot change code.

Each phase gets its own system prompt that makes the constraints explicit. Not “try to avoid editing files” — but “do NOT call edit_code or write_file. You are in REFINE. The EXECUTE phase does the edits.”


Enforcing the Contract at the Prompt Level

The constraints aren’t enforced by a wrapper that blocks tool calls. They’re enforced by the system prompt. This is intentional — it trains the model to self-police rather than silently swallow tool errors.

A refine system prompt might open with:

## DO NOT EDIT FILES
You are in REFINE. You produce a PLAN.
Do NOT call edit_code, write_file, edit_file, or generate_unified_diff.
The EXECUTE phase does the edits.

When the model has been told clearly what its deliverable is, it stops trying to do everything else. The cognitive load of “should I fix this now or plan it?” is eliminated.


The Deliverable Contract

Each phase must produce a specific structured output. Refine produces a YAML plan. Execute produces committed file changes. Validate produces a PR URL and a Jira comment. If a phase completes without its deliverable, the run is considered failed — not just incomplete.

This matters because the deliverable is the interface between phases. The execute agent doesn’t read a transcript of what refine said — it reads the YAML. Structured output forces the refine agent to compress its findings into something actionable, which also catches cases where the agent explored a lot but concluded nothing.


Phase Isolation Prevents Context Contamination

When a single agent handles everything, the model’s context carries both the investigation artifacts (raw grep output, partial theories, dead ends) and the implementation state (which files were edited, what the diff looks like). These interfere with each other.

Splitting into phases means each agent starts with a clean context scoped to its job. The execute agent only knows: here is the plan, here are the files. It doesn’t know about the three alternative theories the refine agent considered and discarded.

This produces smaller, more focused context windows — which produces better outputs.


When to Split Further

Three to five phases is enough for most engineering workflows. The temptation is to keep adding phases, but each boundary adds latency and coordination overhead.

The right signal to split a phase is when a single phase is regularly doing two things that could fail independently. If your “validate” phase both runs tests and creates the PR, a test failure now blocks the PR — even if the PR should still be created as a draft for human review. Split them, and each failure mode becomes independently recoverable.

Phase-gated pipelines aren’t a framework. They’re a discipline. The constraint isn’t technical — it’s a decision to give each agent exactly one job and hold it accountable to a single deliverable.