L4: Tool Distribution Across Agents

In earlier modules, we established the Hub-and-Spoke architecture to prevent context degradation. However, a multi-agent architecture is not just about distributing conversation ; it is fundamentally about distributing tools. This lesson covers the architectural principles of how to divide an arsenal of functions across multiple agents to maximize reliability and security.

1. The "Overloaded Agent" Problem

When developers first build an LLM application, they typically pile every available function—search_web, query_sql, send_email, read_jira, write_confluence—into a single agent's tools array.

This creates immediate architectural failures:

Token Bloat: Every tool requires a JSON schema and a description. Giving an agent 30 tools can consume 5,000+ tokens before the conversation even begins, inflating costs on every API call.
Attention Degradation: LLMs struggle with "distraction." If an agent has 30 tools, the mathematical probability of it selecting the wrong one (or hallucinating a combination of two similar tools) increases exponentially.
Security Risks: If a single agent has both read_unfiltered_customer_data and send_public_tweet, A prompt injection attack has a direct path to cause a massive data leak.

2. The Principle of Least Privilege

Borrowed from traditional cybersecurity, the Principle of Least Privilege is mandatory in agentic architecture. An agent should only be granted the exact tools necessary to complete its specific, narrowly defined prompt.

If a subagent's job is to format a drafted email into HTML, it absolutely should not have access to the send_email tool or the query_database tool. It should only have text-manipulation tools. This creates an architectural firewall. Even if the formatting agent hallucinates or goes rogue, it physically lacks the tools to cause external damage.

3. Domain-Driven Tool Grouping

To refactor an overloaded agent, architects group tools by functional domain and assign each domain to a specialized subagent.

Example Refactoring:

Before (1 Agent): Has sql_read, sql_write, get_weather, send_slack, draft_email.
After (Hub & Spokes):
- Data Subagent: Given only sql_read and sql_write. Prompted to act as a strict DBA.
- Comms Subagent: Given only send_slack and draft_email. Prompted to act as a corporate communications liaison.
- Coordinator (Hub): Given no direct execution tools.

4. The Coordinator's "Meta-Tools."

If the Coordinator (Hub) handles the user but has no execution tools, how does work get done?

The Coordinator is given Meta-Tools. To the Coordinator, a subagent is just a tool.

Its tools array looks like this:

ask_data_subagent(query: string)
ask_comms_subagent(draft: string, channel: string)
ask_human_for_approval(summary: string)

Architectural Advantage: The Coordinator does not need to know the schema of the SQL database or the API parameters for Slack. It only needs to know how to write a clear instruction (query) to pass to the subagent. The subagent holds the complex schemas. This drastically reduces the cognitive load on the Coordinator model.

5. Shared Tools (The Exceptions)

While most tools should be strictly isolated, there are specific utility tools that architects intentionally distribute to multiple agents:

Calculation Tools: A simple calculator or math_evaluator tool is often given to almost all subagents, as LLMs are notoriously bad at native arithmetic.
Time/Date Tools: If your application is time-sensitive, giving a lightweight get_current_utc_time tool to multiple agents prevents them from hallucinating deadlines.

6. Architectural Trade-offs: Latency vs. Accuracy

Distributing tools across agents is not a free lunch. You are trading latency for accuracy.

Single Agent: 1 API call -> generates tool JSON -> executes. (Fast, but prone to logic errors).
Distributed Agents: Coordinator API call -> Subagent API call -> generates tool JSON -> executes -> Subagent API call (returns to Coordinator) -> Coordinator API call (synthesizes). (Much slower, slightly higher token usage, but exponentially more reliable).

As an architect, you must evaluate the use case. If you are building a real-time voice assistant, you might keep a slightly "overloaded" single agent to reduce latency. If you are building a background data-processing pipeline where accuracy is paramount, you strictly distribute the tools across a multi-agent hub-and-spoke.