L3: Tool Use for Reliable Structured Output

L3: Tool Use for Reliable Structured Output

Tool Use for Reliable Structured Output

In the previous lessons, we discussed how to use prompts and a few-shot examples to encourage Claude to format its text correctly. However, when you need to pipe Claude's output directly into a database or an application backend, "encouraging" the model is not enough. You need absolute, mathematical certainty that the output will be strictly formatted JSON.

This lesson covers the most reliable architectural pattern for data extraction: using Tool Use (Function Calling) not to take an action, but purely to force structured output.

1. The Problem with Text-Based JSON Extraction

Historically, developers tried to extract JSON by writing prompts like: "Analyze this receipt and output the data in JSON format. Do not include any other text."

This approach is fundamentally flawed in production for three reasons:

  • Conversational Filler: LLMs are trained to be chatty. Even with strict instructions, Claude might occasionally output: "Here is the JSON you requested: \njson { ... } " This instantly breaks standard JSON parsers (JSON.parse() will throw an error).

  • Schema Hallucinations: The model might invent new keys (e.g., returning "cost" instead of "total_amount") or change data types (returning a string "100" instead of an integer 100).

  • Markdown Wrappers: The model almost always wraps the output in markdown code blocks, requiring you to write regex to strip the backticks before parsing.

2. The "Data Extractor" Tool Pattern

To solve this, architects completely abandon text-based JSON prompting. Instead, they use the Tool Use API.

Normally, a tool is used to interact with the outside world (e.g., query_database). In this pattern, you define a "dummy tool" whose sole purpose is to act as a structured container for Claude's thoughts.

The Workflow:

  1. You define a tool called something like record_receipt_data.

  2. You define the exact JSON schema this tool requires (e.g., merchant_name (string), total (number), date (string)).

  3. You prompt Claude: "Analyze this receipt and use therecord_receipt_data tool to save the information."

  4. Claude outputs a tool_use API response.

  5. Your application code intercepts this response, extracts the perfectly formatted JSON arguments, saves them to your database, and simply terminates the agentic loop without actually executing a local function.

3. Forcing Execution with tool_choice

If you simply give Claude the tool, it might still choose to talk to the user instead of using it. To guarantee structured output, you must use the tool_choice parameter in the API request.

By setting the API payload to explicitly force a specific tool, you physically strip the model of its ability to output conversational text.

Architectural Implementation:

JSON

{
  "model": "claude-3-5-sonnet-20240620",
  "messages": [...],
  "tools": [
    {
      "name": "extract_clinical_notes",
      "description": "Extracts symptoms and diagnoses from the medical text.",
      "input_schema": { ... }
    }
  ],
  "tool_choice": { "type": "tool", "name": "extract_clinical_notes" }
}
  

Result: Claude's response will contain zero conversational text. It will immediately output the requested JSON object, guaranteed to match your schema's required keys and data types.

4. Schema-Level Prompt Engineering

When using this pattern, the JSON Schema itself becomes your prompt. You do not put the extraction rules in the main System Prompt; you put them directly into the schema descriptions.

  • Poor Schema Design:

"properties": { "sentiment": { "type": "string" } }

  • Architectural Schema Design:

"properties": { "sentiment": { "type": "string", "enum": ["POSITIVE", "NEGATIVE", "NEUTRAL"], "description": "The overall sentiment of the customer review. Default to NEUTRAL if the review is just a question." } }

By moving the instructions directly into the property descriptions, you place the rules exactly where the model's attention is focused at the exact moment it is generating that specific piece of data.

5. Architectural Benefits of Structured Output Tools

Mandating the Tool Use pattern for all data extraction provides massive system-wide benefits:

  • Deterministic Parsing: You never have to write regex to strip markdown backticks again.

  • Type Safety: If your schema demands an array of integers, the Anthropic API guarantees Claude will attempt to output an array of integers.

  • Seamless Downstream Integration: Because the output matches your exact schema, you can pipe the tool's JSON arguments directly into your ORM (e.g., SQLAlchemy, Prisma) or validate them instantly with Pydantic or Zod without intermediate transformation layers.