L3: Multi-Pass Review Strategies

L3: Multi-Pass Review Strategies

Multi-Pass Review Strategies

In the previous lessons, we relied on explicit instructions, few-shot examples, and strict JSON schemas to get the right output on the first try. We also built Validation-Retry loops to catch schema and logic errors. However, when generating complex, nuanced outputs—like enterprise-grade code, architectural documents, or heavily constrained creative text—a single pass is rarely perfect.

This lesson covers Multi-Pass Review Strategies , the architectural practice of using LLMs to critique and iteratively refine their own (or another agent's) work before showing it to the user.

1. The Limitation of Single-Pass Generation

When you ask an LLM to generate a complex output (e.g., "Write a Python script that connects to AWS, processes this data, and handles all edge cases"), the model's attention is split across multiple competing priorities: syntax, logic, security, and fulfilling the core request.

The Architectural Reality: The more constraints you place in a single prompt, the higher the probability that the model will "drop" one. It might write perfect logic but forget your instruction to add inline comments.

2. What is a Multi-Pass Strategy?

A multi-pass strategy breaks generation down into distinct, sequential API calls. Instead of asking for the final product immediately, you architect a pipeline:

  1. Pass 1 (The Draft): The model focuses entirely on solving the core problem.

  2. Pass 2 (The Critique): A model is asked only to read the draft and identify flaws against a strict rubric.

  3. Pass 3 (The Refinement): A model takes the original draft and the critique, and generates the final polished version.

By isolating the "creation" phase from the "review" phase, you allow the model to dedicate 100% of its context window and attention mechanisms to one specific cognitive task at a time.

3. Pattern 1: Self-Correction (Single Agent)

The simplest implementation is a linear Self-Correction loop within a single session state.

The Workflow:

  1. You prompt Claude to generate the code.

  2. Claude outputs the code.

  3. You immediately append a new user message to the same conversation history: "Review the code you just wrote. Does it strictly adhere to the PEP-8 style guide and handle network timeouts? List any violations, then output the corrected code."

  • Advantages: Easy to implement, low latency overhead, maintains session context.

  • Drawbacks (Confirmation Bias): Because the model shares the same context window as its original draft, it is mathematically biased toward its own previous reasoning. It will often "rubber-stamp" its own work, missing subtle logic errors.

4. Pattern 2: The Critic/Reviewer (Multi-Agent)

For production-grade GenAI systems, architects prefer the Multi-Agent Review Pattern (heavily utilized in CI/CD pipelines, as discussed in Module 3).

Instead of asking the same agent to review its work, you spin up a completely isolated, fresh API call with a different System Prompt.

The Architecture:

  • The Generator Agent: Prompted to be creative and solve the problem.

  • The Reviewer Agent: Prompted to act as a strict Quality Assurance auditor. Its system prompt contains the explicit grading rubric.

Why this works: The Reviewer Agent has no memory of why the Generator Agent wrote the code the way it did. It evaluates the output purely on the merits of the text against the rubric, completely eliminating LLM confirmation bias.

5. Designing the Critique Prompt

The success of a multi-pass system relies entirely on how you instruct the Reviewer Agent. If you just say "Make this better," the Reviewer might unnecessarily rewrite perfect code just to fulfill the prompt.

Architectural Best Practice: The "Critique-Only" Constraint

Do not let the Reviewer Agent rewrite the output. Force it to only output a structured list of flaws.

Reviewer Prompt Example:

"You are a strict QA Reviewer. Read the drafted SQL query below. Evaluate it strictly against these three rules: 1. No SELECT * statements. 2. Must use table aliases. 3. Must include a WHERE clause.

Output a JSON array of violations. If there are no violations, output an empty array[]. DO NOT output the corrected SQL."

Your application code then takes that JSON array. If it is empty, the draft is approved and sent to the user. If it has items, your code passes the array back to the Generator Agent: "Your draft had the following violations: [Array]. Fix them."

6. The Architect's Dilemma: Cost vs. Quality

Multi-pass workflows produce significantly higher-quality outputs, but they come with severe architectural trade-offs:

  • Token Multipliers: A 3-pass workflow consumes at least 3x the input tokens and 3x the output tokens of a single-pass workflow.

  • Latency: If a single API call takes 4 seconds, a 3-pass review takes 12+ seconds. This is unacceptable for a real-time customer service chatbot.

When to use Multi-Pass:

Reserve this pattern for high-stakes, asynchronous tasks. Generating code, drafting legal contracts, summarizing critical medical records, or running background CI/CD checks are perfect use cases where absolute accuracy vastly outweighs a few seconds of latency.