L2: Validation-Retry Loops

Validation-Retry Loops

In Lesson 4.3, we established that using the Tool Use API guarantees that Claude will output data matching your JSON schema's structure and types. However, structural validity does not guarantee logical validity.

If your schema asks for an integer representing a user's age, Claude might output 999. The JSON parses perfectly, but the business logic is broken. To build truly resilient systems, architects implement Validation-Retry Loops.

1. The "Hope is Not a Strategy" Principle

In a standard LLM implementation, developers take the generated JSON, assume it is correct, and pipe it directly into their database. This is a fragile architecture.

The Architectural Reality: LLMs are probabilistic. They will occasionally hallucinate impossible dates, violate math constraints, or invent data not present in the source text.

You cannot engineer a prompt that is 100% immune to logical errors.
Therefore, your architecture must assume the model will fail and provide a mechanism for it to fix its own mistakes.

2. Anatomy of the Validation-Retry Loop

The Validation-Retry Loop is a programmatic safety net built into your application code. It sits between Claude and your database.

The Workflow:

Generation: Claude outputs a structured JSON response via a tool call.
Validation: Your application code catches the JSON and runs it through a deterministic validator (e.g., Pydantic in Python, or Zod in TypeScript).
The Fork:
- If Valid: The data is saved, and the workflow continues.
- If Invalid: The application intercepts the error, blocks the database write, and triggers the Retry phase.
The Re-Prompt: The application sends a new message back to Claude containing the exact error message and asks it to try again.

3. Schema Validation vs. Business Logic Validation

An architect must validate data at two distinct levels before accepting it from an agent.

Level 1: Schema/Type Validation: Did Claude output the right data types? (e.g., A string instead of an array). Modern SDKs and the Anthropic API catch most of these natively, but your code must still verify them.
Level 2: Business Logic Validation: This is where the Retry Loop shines.
- Example: You are building a flight booking agent. Claude extracts the departure date as 2024-05-01 and the return date as 2024-04-20.
- Schema validation passes (both are valid ISO date strings).
- Business logic validation fails (the return date is before the departure date).

4. Designing Specific Error Feedback

When your validation logic trips, how you pass that error back to Claude determines whether the retry will succeed.

As discussed in earlier modules, you must treat the error message as a dynamic prompt.

Poor Feedback: "Error: Invalid date logic. Try again." (Claude might change the departure date instead of the return date, or guess randomly).
Architectural Feedback: "Validation Error: The 'return_date' (2024-04-20) cannot be before the 'departure_date' (2024-05-01). Please review the user's request and update the 'return_date' accordingly."

By explicitly pointing out which key failed, what value it held, and why it broke the rule, you give the LLM the exact context it needs to self-correct on the very next iteration.

5. Circuit Breakers and Pattern Tracking

A poorly designed Retry Loop will result in an infinite loop. If Claude does not understand the error, it will repeatedly output the exact same broken JSON, consuming tokens endlessly.

Architectural Safeguards:

Themax_retries Limit: You must implement a strict circuit breaker. Typically, architects allow 2 or 3 retries.
Graceful Escalation: If retry_count > 3, the loop must break. The system should return a safe failure state to the user (e.g., "I'm having trouble verifying those dates, could you please type them out again?").
Telemetry and Pattern Tracking: Every time a validation error triggers a retry, it must be logged in your system telemetry. If you notice that Claude is consistently failing the return_date validation across hundreds of user sessions, it means your original System Prompt or Tool Description is ambiguous. The retry loop acts as an automated diagnostic tool, telling the architect exactly where the prompt needs to be rewritten.