The Security Tax — Cutting AI Agent Context Burn From 31% to 5% Without Removing a Single Rule
Table of Contents
Two months ago I showed you your AI agent reads your SSH keys. Then your 87 unapproved tools. Then Claude Code's source code leaked. I built 7 layers of defense. They worked. They also nearly killed my flow. Every commit, push, and merge had me pressing Enter to approve. Last week I hit 31% context burn before doing any actual work.
So I rebuilt the architecture.
Not the security posture. The loading mechanics.
The 31% Problem
My setup had become technically correct and operationally painful.
Every git operation needed approval. git add, git commit, git push, branch switches, PR commands. That was annoying but defensible. I wanted manual friction before anything touched history or left my machine.
The bigger problem was startup.
My /os command had turned into a ritual. Read the main rules. Read session discipline. Read active priorities. Read the handoff brief. Read recent sessions. Read the Codex prompt queue. Read enough surrounding docs to avoid stepping on an active branch.
Each individual read made sense when I added it. Together they meant the agent was burning context before the real task started.CLAUDE.md was the worst offender. It had grown to 491 lines and roughly 14,100 tokens. The migration target was closer to 2,500 tokens for a high-signal project file. Mine was 5.6x that.
The /os startup file reads were worse: about 245K tokens across 15 mandatory full reads before the agent did useful work. On a 1M-token model that sounds tolerable. In practice, the useful attention budget was already polluted.
I had built a safety layer that worked by making every session re-read the entire operating manual.
What The Research Said
The fix started with a sentence from Anthropic's "Effective context engineering for AI agents": good context engineering means finding the smallest possible set of high-signal tokens for the desired outcome.
That sounds obvious until you audit your own setup.
The same article argues for "just in time" context. Do not preload the whole corpus. Give the agent lightweight references, paths, and rules for when to retrieve detail. Let it pull the right file when the task actually needs it.
Shrivu Shankar's Claude Code setup gave me a useful comparison point. His professional monorepo burns about 20K tokens, or 10%, in a fresh session. My setup was hitting 31% after /clear.
I also looked at two small open-source patterns. shihchengwei-lab/claude-code-session-kit uses 4 hooks, tested across 92 real sessions, to preserve continuity without hand-rebuilding context. tzu-hsun/recap stores session journals and rotates them at about 2000 lines instead of letting one memory file grow forever.
The lesson was not "delete rules." The lesson was "stop loading rules as prose when they can be loaded as structure."
The 4-Phase Migration
Phase A was the obvious cut: slim CLAUDE.md.
I moved from a 491-line, 14.1K-token file to 146 lines and roughly 2.4K tokens. The root file became a lookup table: what matters, where to find the full procedure, when to load it.
Stable rules moved into 12 skills:
git-flow
read-before-edit
n8n-safety
multi-tenant
supabase-patterns
nextjs-patterns
i18n-patterns
session-discipline
plan-mode-decision
doc-discipline
output-style
vault-sync
The important part: these were not deleted. They moved from always-loaded prose into task-triggered instructions.
If the agent touches n8n workflow JSON, it reads n8n-safety. If it edits a route with tenant scope, it reads multi-tenant and supabase-patterns. If it prepares a PR, it reads git-flow. Same rules, lower baseline.
Phase B replaced startup rereads with a SessionStart hook.
Instead of telling the agent to read 15 files at the start of every session, the hook injects a compact status snapshot: latest migration, tests count, recent commits, open branches, today's handoff brief, top priorities, and current repo state.
That pushed the startup load from roughly 245K tokens to about 5K. More importantly, it made the context current without asking the agent to scrape the whole repo every time.
Phase C extended the hook beyond "state" into "direction."
The hook now surfaces a recommended next action, checks for yesterday's close gaps, and detects Codex-friendly parallel work. This came from a specific pain: I was starting sessions by asking "what should I do next?" even though the answer was already encoded in handoff docs and active priorities.
That is wasted cognition. Machines are good at mechanical summarization. Let the hook do that.
Phase D was the part most people skip: enforce the important failure mode below the agent.
Migration 079 added protect_published_content_ideas, a database trigger that blocks accidental status drift for published blog posts while content_posts still says they are live. That came after a quarantine SQL query pulled live posts out of the public blog index.
This matters because context architecture is not only about speed. If a rule is truly load-bearing, it should become a guardrail in the system, not just a sentence in CLAUDE.md.
Each phase landed as a separate commit and merge:
Phase A: e5c0a38 -> 16e7567
Phase B: 9b6a465 -> 4ddbe6a
Mig 079: 8371a79 -> 7deaddf
That made the migration reversible. If the slim file failed, I could roll back Phase A. If the hook injected bad state, I could disable Phase B. If the DB trigger blocked a legitimate action, the rollback was documented in the migration file.
The Numbers
| Layer | Before | After | Change |
|---|---:|---:|---:|
| CLAUDE.md | ~14.1K tokens | ~2.4K tokens | -83% |
| /os startup reads | ~245K tokens | ~5K tokens | -98% |
| Fresh /clear context burn | ~270K tokens | ~15K tokens | -94% |
| Context window used | 31% | ~1.5% | -29.5 points |
The math is not subtle. I was spending more context on remembering how to work than on the work itself.
The Defense Stayed Intact
This is the important part.
The deny lists from the 87-tools audit stayed. The PreToolUse hooks stayed. Manual approval for write-sensitive operations stayed. Network isolation stayed. The rules around production data, tenant isolation, and credential access stayed.
What changed was when the rules enter context.
Before, the agent loaded the whole defense manual at session start. After the migration, it loads a compact baseline, then retrieves the relevant rule set when the task touches that domain.
That distinction matters. The risks from the first three posts did not disappear. SSH keys are still dangerous. Unapproved MCP tools are still dangerous. Background agent behavior is still dangerous. Late-night database edits are still dangerous.
The fix was not to trust the agent more. It was to stop confusing "safe" with "read every rule every time."
For ConnectEngine OS, this is now the pattern I want across the product: small default context, strong runtime boundaries, and hard checks where a mistake would leak data or break production.
The Compound Engineering Trap
Every rule was right when I added it.
The SSH-key audit needed deny lists. The MCP audit needed tool approval. The n8n incidents needed workflow rules. The tenant-scope bugs needed database patterns. The session handoff mistakes needed close discipline.
But 23 hard rules plus 17 session-discipline rules turned into compound-interest debt. Not because the rules were bad. Because I never reviewed their loading cost.
That is the trap. Rules added after incidents are often correct. Rules that never get re-architected become bottlenecks.
The fix was structural: skills, hooks, and just-in-time retrieval.
Audit your CLAUDE.md token count. If it is over 3,000 tokens, you are paying for it on every turn. Move stable rules into skills under .claude/skills/. Move startup state into a SessionStart hook under .claude/hooks/. Keep CLAUDE.md as a lookup table, not a brain dump.
The defense layer does not have to cost flow. Mine took about 3 hours to migrate, and I am getting that back every session.
Tobias Koehler
Founder, ConnectEngine