L2: GitHub Structure, Commit Standards

GitHub Structure, Commit Standards, and the Message Batches API

In Lesson 9.1, we integrated Claude with Rally to track why work is being done. Now, we must govern how the work is saved and reviewed. Version control is the central nervous system of the SDLC. If autonomous agents are allowed to push code wildly, your repository will quickly devolve into an unreviewable mess of broken branches.

This lesson covers how AI Architects structure GitHub repositories for agentic interaction, enforce strict commit standards, and utilize the Anthropic Message Batches API to process massive codebase operations at enterprise scale.

1. Agentic Repository Structure

Human engineers can navigate chaotic branching strategies through tribal knowledge. Autonomous agents require absolute predictability. If an agent does not know whether to push to main, dev, or a feature branch, it will freeze or hallucinate a workflow.

The Architectural Standard:

You must enforce a strict GitOps branching strategy and codify it in your CLAUDE.md.

Branch Naming Conventions: Instruct Claude to natively tie branches to the Rally/Jira integration. "When creating a new branch, you must use the formatfeature/[TICKET-ID]-[short-description] (e.g., feature/REQ-102-auth-bypass)."
The.github Directory: The agent must understand that the .github/workflows directory is the infrastructure boundary. You explicitly instruct standard Development Agents: "You are forbidden from modifying YAML files in the.github directory unless explicitly authorized by the DevOps Subagent."

2. Enforcing Commit Standards

The most common and dangerous anti-pattern when using agents like Claude Code is the "Mega-Commit." If an agent refactors 20 files and pushes a single commit with the message "Updated files", a human reviewer cannot possibly audit the blast radius of that change.

Architectural Constraints for Version Control:

You must force the agent to write code like a senior open-source contributor.

Atomic Commits: "You must commit your work incrementally. Do not wait until the entire feature is built. Commit the backend controller first, then commit the unit tests, then commit the frontend component."
Conventional Commits: Force Claude to use the industry-standard Conventional Commits format. "Every commit message must follow this schema:type(scope): description. Valid types are feat, fix, docs, style, refactor, test, chore."
The Traceability Hook: "The footer of the commit message MUST contain the Rally/Jira ticket ID:Resolves: #REQ-102."

3. The PR Reviewer Agent (Automated Code Review)

The most immediate ROI in SDLC automation is the PR Reviewer Agent. Instead of waiting 24 hours for a senior engineer to review a Pull Request, you hook Claude directly into GitHub Actions.

The Workflow: When a PR is opened, a GitHub Action triggers a Claude API call, passing the raw git diff as the prompt.
The Prompt: "You are a Senior Staff Engineer. Review this PR diff. Check for security vulnerabilities, violation of theCLAUDE.md engineering standards, and Big-O performance bottlenecks. Output your findings as a JSON array of file paths, line numbers, and actionable review comments."
The Execution: A lightweight Node.js/Python script parses Claude's JSON and uses the GitHub API to post the comments directly onto the specific lines of code in the PR interface, blocking the merge until the issues are resolved.

4. The Message Batches API (Scaling the Architectures)

If your PR Reviewer Agent analyzes one small PR, a standard synchronous API call works perfectly. But what if you need an agent to audit 10,000 legacy Python files for Python 2 to 3 migration? Or process 5,000 daily user feedback logs?

Executing 10,000 synchronous API calls will instantly hit your enterprise rate limits (Tokens Per Minute/Requests Per Minute) and cause catastrophic pipeline timeouts.

The Solution: Anthropic's Message Batches API

The Message Batches API is an asynchronous architecture designed specifically for massive, non-real-time parallel processing. It allows you to send up to 10,000 independent LLM requests in a single file, which Anthropic processes in the background.

Architectural Advantages:

Rate Limit Bypass: Batch requests do not count against your standard real-time API rate limits.
Massive Cost Reduction: Because Anthropic processes these requests during off-peak compute hours (typically completing within 24 hours), the API costs are discounted by 50%.
Resilience: If one request in the batch fails, the other 9,999 continue processing perfectly.

5. Architecting a Batch Processing Workflow

For tasks like massive repository refactoring or bulk log analysis, architects design a specific pipeline:

Data Preparation (Local): Your application scripts gather the 10,000 files or PR diffs and construct a single .jsonl (JSON Lines) file. Every line is a complete, isolated Anthropic API request payload containing the prompt and the code.
Upload & Initiate: Your system uploads the .jsonl file to Anthropic and calls the POST /v1/messages/batches endpoint.
Asynchronous Polling: Anthropic returns a batch_id. Your CI/CD pipeline or chron job polls the API (e.g., every 30 minutes) to check the batch status (in_progress, ended).
Retrieval & Execution: Once complete, your system downloads the result .jsonl file. It iterates through the responses—perhaps extracting 10,000 refactored code blocks—and programmatically opens automated Pull Requests for the human engineering team to review at their leisure.