L7: Structured Error Responses

L7: Structured Error Responses

Structured Error Responses That Enable Agent Recovery

In traditional software development, when a function fails, it throws an exception. This exception is usually logged for a developer to read later, and the user is shown a generic "500 Internal Server Error."

In agentic architecture, Claude is the developer. If a tool fails, returning a raw stack trace or a generic error code will cause the agent to hallucinate, panic, or blindly repeat the same failed action. To build resilient systems, architects must design structured error responses that teach the agent how to recover.

1. The Problem with Raw Exceptions

When an agent triggers a tool (e.g., query_database) and your application code encounters an error, that error must be sent back to Claude as a tool_result.

If you send back a raw Python stack trace:

  • Token Waste: Stack traces are massive and consume hundreds of tokens.

  • Context Degradation: The technical jargon clutters the context window, distracting Claude from the primary user goal.

  • Lack of Direction: A stack trace tells Claude what broke, but not how to fix it.

2. The "Error as a Prompt" Paradigm

Architects must treat tool errors not as system failures, but as dynamic prompts. When a tool fails, the error message returned to the model should be explicitly written to guide the model's next step.

  • Poor Error Response: {"error": "KeyError: 'user_id'"}

    • Agent Reaction: The agent might try to guess a user_id or just tell the user the system is broken.
  • Architectural Error Response: {"error": "Missing parameter. The 'user_id' is required to search this table. If you only have a user's email, use the 'get_id_by_email' tool first, then retry this tool."}

    • Agent Reaction: The agent immediately pivots, uses the email lookup tool, and successfully recovers the workflow without the user ever knowing there was a hiccup.

3. Access Failures vs. Empty Results (Crucial Distinction)

One of the most common architectural failures is confusing an Access Failure with an Empty Result. If you do not structure these differently, Claude will make disastrous assumptions.

Scenario: Claude queries a database for an employee named "John Doe".

  • Empty Result (Valid Execution, No Data):

    • Incorrect return: {"error": "Not Found"} -> Claude assumes the tool is broken.

    • Structured return: {"status": "success", "data": [], "message": "Query executed successfully, but 0 records matched 'John Doe'."} -> Claude knows the tool works, but the data doesn't exist. It can confidently tell the user, "John Doe does not work here."

  • Access Failure (Invalid Execution):

    • Structured return: {"status": "error", "error_type": "AuthException", "message": "You do not have permission to access the HR table. Escalate to the human user to request an admin override."} -> Claude knows it cannot complete the task and triggers the escalation path.

4. Designing the Structured Error Payload

To ensure consistency across a multi-agent system, all tools should return errors matching a standardized JSON schema. This allows Claude to instantly recognize when a tool has failed.

A production-grade error payload should include:

  1. error_code: A human-readable categorization (e.g., INVALID_SCHEMA, TIMEOUT, UNAUTHORIZED).

  2. message: A plain-English explanation of the failure.

  3. recovery_hint: Explicit instructions on what Claude should do next.

Example Payload:

JSON

{
  "status": "error",
  "error_code": "INVALID_DATE_FORMAT",
  "message": "The API rejected the date format '04-15-2026'.",
  "recovery_hint": "Convert the date to ISO 8601 format (YYYY-MM-DD) and retry the tool execution."
}
  

5. The Validation-Retry Loop

Structured errors are the foundation of the Validation-Retry Loop , a core resilience pattern in agent architecture.

  1. Action: Claude generates JSON to call an external API.

  2. Validation: Your application code intercepts the JSON and validates it against the schema. It notices Claude hallucinated a parameter.

  3. Feedback (Structured Error): Your application blocks the API call and returns a structured error to Claude, pointing out the exact hallucinated parameter.

  4. Recovery: Claude reads the error, self-corrects the JSON, and tries again.

  5. Iteration Limits: To prevent infinite loops (e.g., Claude repeatedly failing to fix the JSON), your application code must track retry attempts. If retry_count > 3, the application returns a final error: {"status": "fatal", "message": "Multiple failures. Abort task and ask human for clarification."}