L6: Plan Mode and Iterative Refinement

L6: Plan Mode and Iterative Refinement

Plan Mode and Iterative Refinement

In the previous lessons, we customized Claude Code's environment to make it highly capable. However, giving an autonomous agent access to your file system and build tools introduces significant risk. If you ask an agent to "refactor the authentication module," and it immediately starts deleting and rewriting dozens of files, a hallucination could break your entire repository.

To mitigate this, architects enforce Plan Mode and Iterative Refinement —shifting the workflow from "Zero-Shot Execution" to "Plan-and-Solve."

1. The Danger of "Zero-Shot" Execution in Codebases

When developers use standard chat interfaces, they are accustomed to zero-shot generation: you ask a question, and the LLM spits out the complete answer.

If you apply this directly to Claude Code (e.g., "Build a new payment gateway endpoint"), the agent might try to write the database schema, the API routes, the frontend components, and the tests in one massive, chaotic loop.

  • The Failure State: By step 4, the agent forgets what it named the variables in step 1. It writes 500 lines of code, runs a test, gets 50 errors, and gets trapped in an unrecoverable hallucination loop trying to fix them, wasting massive amounts of context tokens.

2. What is Plan Mode?

Plan Mode is an architectural workflow where the agent is strictly forbidden from executing code or modifying files until it has written a proposed plan and received human approval.

The Workflow:

  1. Exploration: The agent uses read-only tools (grep, ls, reading files) to understand the current state of the codebase.

  2. Drafting: The agent outputs a markdown checklist (e.g., Step 1: Update schema, Step 2: Write tests, Step 3: Implement controller).

  3. Human-in-the-Loop (HITL): The developer reviews the plan. They can spot architectural flaws before any code is written (e.g., "No, do not use Redis for this, use Postgres").

  4. Execution: Once approved, the agent executes the plan one step at a time.

3. Enforcing Plan Mode via Architecture

Plan Mode is not just a polite request; it must be structurally enforced. In Claude Code, architects achieve this through CLAUDE.md or system-level prompts.

ExampleCLAUDE.md Enforcement:

"For any task requiring more than one file modification:

  1. You MUST first output a numbered step-by-step plan.

  2. You MUST use the ask_human_for_approval tool (or pause execution) before writing any code.

  3. Once approved, you MUST execute only ONE step at a time, verifying the step is complete before moving to the next."

Some advanced architectures even force the agent to write the plan into a temporary PLAN.md file in the root directory, allowing the agent to continuously read the file to track its progress across a long session.

4. Iterative Refinement and the TDD Pattern

Once the plan is approved, how should the agent execute it? The most reliable agentic workflow is the Test-Driven Development (TDD) Pattern.

LLMs perform exponentially better when they have a deterministic way to prove their code works.

  • Step 1 (Write the Test): The agent writes a unit test for the specific feature it is about to build.

  • Step 2 (Run & Fail): The agent runs the test suite (e.g., npm test). The test fails because the feature doesn't exist yet. This is crucial telemetry data for the agent.

  • Step 3 (Implement): The agent writes the minimum amount of code required to make the test pass.

  • Step 4 (Run & Pass): The agent runs the test again. If it fails, it reads the error and refines the code. If it passes, it checks the item off its plan.

By enforcing the TDD pattern, you tightly bound the agent's context. It is no longer trying to "build a feature"; it is simply trying to "make this specific error message go away," which LLMs are exceptionally good at.

5. Managing Context During Iteration

A major challenge of Iterative Refinement is that running tests and compilers generates massive terminal outputs. If an agent loops through 5 test failures, the context window fills up with thousands of lines of stack traces.

Architectural Best Practice:

As an architect, you must train developers (or configure your agentic workflows) to aggressively manage session state during execution.

  • After a major step in the plan is completed successfully, use context-clearing commands (like /compact in Claude Code) to summarize the previous steps and wipe the raw terminal outputs from memory.

  • This ensures that as the agent moves to Step 3, it is not distracted by the debugging logs from Step 1, preventing context degradation.