Token Trim
Your AI reads your system instruction files at the start of every session. CLAUDE.md, gemini.md, .cursorrules, whatever you call yours. Every byte in those files eats context window before you type a single word.
The formatting in those files is for you. The AI does not need it. Strip the formatting and you get the same behavior with half the tokens.
Real results from this system: 7 files, 139KB down to 65KB. 53% reduction. No information lost. No behavior changes. The AI reads the compressed versions exactly the same way it read the originals.
The problem
System instruction files grow. You add a section for a new workflow. A memory file picks up another entry. A month in, your CLAUDE.md is 14KB and your memory files total 125KB. That is context window you are paying for with every session, every message, every tool call.
Markdown formatting accounts for a surprising amount of that. Bold markers, horizontal rules, backtick-wrapped paths, blank lines between list items, full English sentences where a phrase would do. All of it tokenizes. None of it changes what the AI does.
The method
Five rules. Apply them mechanically.
1. Strip markdown formatting. Remove bold markers (**), horizontal rules (---), backtick wrapping around file paths and code terms. The AI knows what a file path is without backticks telling it.
2. Remove blank lines. Blank lines between list items, between sections, after headings. Keep one blank line between major sections for your own scanning. Delete the rest.
3. Use shorthand notation. Replace full sentences with compressed phrases. Use equals signs, periods, and parentheticals instead of prose.
4. Collapse lists into inline text. A bulleted list of three items becomes one line with periods or commas separating them.
5. Remove redundant context. If a fact appears in two places, keep one. If a sentence restates what the heading already says, delete it.
Before and after
This is the Session Initialization section from a real CLAUDE.md file. The original on the left; the compressed version on the right.
Before (410 bytes)
## Session Initialization
**COLD START -- minimum reads only. Load nothing
else until the task requires it.**
1. Read `C:\Users\tgaum\claude\assistant\state.md`
(if it exists). Extract carry-forwards only.
2. Read `C:\Users\tgaum\claude\projects\
time-sensitive.md` (8-line table, ~100 tokens).
3. Call `memory_session_start` with session
context (surfaces relevant memories from
MCP server).
4. Generate greeting. STOP. Do not read
PROJECT_REGISTRY.md, workflow-candidates.md,
or Calendar until Thomas requests them or
they are directly needed by the task.
After (270 bytes)
## Session Initialization
COLD START minimum reads only. load nothing
else until task requires it.
1. Read state.md(if exists) extract
carry-forwards only
2. Read time-sensitive.md(8-line table
~100 tokens)
3. Call memory_session_start with session
context
4. Generate greeting. STOP. do not read
PROJECT_REGISTRY.md workflow-candidates.md
or Calendar until thomas requests or
task needs
Same information. Same behavior. 29% smaller in this section alone. The savings compound across hundreds of lines.
Memory file example
Memory and topic files compress even harder because they tend to be verbose. This is from a workflow notes file.
Before
## Delegation Policy
- Agent dispatch decision gate (Thomas's rule,
2026-03-07): 1) Can an agent do it?
2) Will deploying the agent cost more tokens
than doing it directly? If #2 is yes, just
do it. No agent overhead for low-bandwidth
operations.
- Model routing for cost AND speed: Structured
extraction -> Haiku (seconds).
Judgment/synthesis -> Sonnet (moderate).
Complex editorial -> Opus (slowest). Human
wait time matters as much as token cost.
After
delegation: 1)can agent do it 2)will
agent cost more tokens than direct?
if #2 yes just do it. model routing:
structured extraction=haiku(seconds)
judgment/synthesis=sonnet(moderate)
complex editorial=opus(slowest) human
wait time matters as much as token
cost (2026-03-07)
The original file: 19,487 bytes. After compression: 11,211 bytes. 42% reduction. The AI reads both versions identically.
Full results
| File | Before | After | Reduction |
|---|---|---|---|
| CLAUDE.md | 13,860 B | 10,415 B | 25% |
| workflow.md | 19,487 B | 11,211 B | 42% |
| browser-automation.md | 18,573 B | 9,706 B | 48% |
| errors-and-solutions.md | 46,571 B | 11,885 B | 74% |
| editorial-judgment.md | 29,439 B | 14,567 B | 51% |
| publishing-pipeline.md | 6,513 B | 4,121 B | 37% |
| analytics.md | 4,644 B | 3,088 B | 34% |
| Total | 139,087 B | 64,993 B | 53% |
The biggest wins come from files with the most prose. errors-and-solutions.md dropped 74% because it was full of narrative explanations that compressed into single-line entries.
What not to compress
Files that humans read regularly should stay readable. README files, documentation, anything you share with other people. This technique is for files the AI reads and humans rarely touch.
Also: do not compress code. Code formatting carries semantic meaning. Indentation, line breaks, and whitespace in code are functional, not decorative.
Platform compatibility
This works anywhere an AI loads instructions from a file.
- Claude Code: CLAUDE.md and memory files in .claude/
- Gemini CLI: gemini.md (GEMINI.md) loaded hierarchically
- Cursor: .cursorrules and .cursor/ files
- Windsurf: .windsurfrules
- Any custom system: If your prompt loads from a file, compress the file
The AI does not care about your formatting. It parses meaning from text. Give it less text with the same meaning and it performs the same way with more room to work.
How to do it
You can ask the AI to compress its own instruction files. This prompt works:
Read [filename]. Compress it for token efficiency.
Rules: strip markdown formatting (bold, backticks,
horizontal rules). Remove blank lines between items.
Use shorthand notation (equals signs, parentheticals,
commas instead of full sentences). Collapse bulleted
lists into inline text. Remove redundant context.
Preserve all semantic meaning. Output the compressed
version.
Review the output. Diff it against the original. Confirm nothing was lost. Save the original as a backup (filename-original.md). Replace the active file with the compressed version.
That is the entire process. No tooling required. The AI compresses its own files, and you verify the result.