L2: What Is Claude? LLMs Explained for Developers

As a developer transitioning into AI architecture, the first conceptual hurdle is shifting your mindset from traditional software engineering to working with Large Language Models (LLMs). This lesson establishes the foundational mechanics of LLMs, with a specific focus on Anthropic's Claude.

1. The Paradigm Shift: Deterministic vs. Probabilistic Computing

Traditional code is deterministic : if X, then Y. You write explicit rules, and the output is perfectly predictable.

LLMs like Claude operate on a probabilistic model. They do not "know" facts in the way a database does; instead, they are sophisticated prediction engines. Given a sequence of text, they calculate the probability of what the next most logical piece of text should be.

The Developer's Challenge: You are no longer writing strict logical pathways. You are designing constraints, contexts, and instructions (prompts) to guide a probabilistic engine toward a reliable, deterministic-like outcome.

2. What Exactly is Claude?

Claude is a family of advanced LLMs developed by Anthropic , an AI research and safety company. Claude is heavily optimized for enterprise and developer use cases, prioritizing steerability, predictable outputs, and safety.

The "Constitutional AI" Difference

What makes Claude architecturally distinct is how it is trained. Anthropic uses a method called Constitutional AI. Instead of relying purely on human feedback to tell the model what is "good" or "bad" (RLHF), Claude is given a "constitution" (a set of principles based on human rights, helpfulness, and safety). During training, the model evaluates its own responses against this constitution and corrects itself.

Why this matters for architects: This makes Claude exceptionally resistant to jailbreaks, highly reliable in enterprise environments, and less prone to generating harmful or off-brand outputs when deployed in production apps.

3. Core LLM Concepts Every Developer Must Know

Before making an API call, you must understand the physics of the model:

Tokens: LLMs do not read words; they read tokens. A token is a chunk of characters. In English, 1 token is roughly 3/4 of a word (or ~4 characters). "Apple" might be one token, while "Architecture" might be split into two or three. API pricing and limits are strictly calculated in tokens, not words or bytes.
The Context Window: This is Claude's short-term memory for a single API call. It includes both your input (the prompt) and Claude's output. Claude has a massive context window (often 200,000 tokens, which is roughly a 500-page book).
Statelessness: This is a critical architectural concept. The Claude API has no memory of previous calls. If you are building a chatbot or an agent, you (the architect) are responsible for storing the conversation history and sending the entire relevant transcript back to Claude with every new request.
Temperature: A parameter (ranging from 0.0 to 1.0) that controls the randomness of the output.
- Temperature = 0.0: Highly deterministic, focused, analytical (Best for coding, data extraction, JSON generation).
- Temperature = 1.0: Highly creative, varied (Best for brainstorming or creative writing).

4. Claude's Architectural Advantages (The "Why Claude?" Checklist)

When designing systems around Claude, architects leverage these specific strengths:

Massive Context Processing: The ability to ingest entire codebases, massive PDFs, or long database dumps in a single prompt.
Nuanced Reasoning: Claude performs exceptionally well on complex, multi-step logic tasks where other models might lose the thread.
Native Tool Use (Function Calling): Claude is highly tuned to output structured data (like JSON) that matches a schema you provide, allowing it to trigger your external APIs, run SQL queries, or interact with an environment.
Prompt Caching: A feature that allows developers to cache large system prompts or contextual documents, drastically reducing latency and costs for repetitive API calls within the same context.