Token Trim

Your AI reads your system instruction files at the start of every session. CLAUDE.md, gemini.md, .cursorrules, whatever you call yours. Every byte in those files eats context window before you type a single word.

The formatting in those files is for you. The AI does not need it. Strip the formatting and you get the same behavior with half the tokens.

Real results from this system: 7 files, 139KB down to 65KB. 53% reduction. No information lost. No behavior changes. The AI reads the compressed versions exactly the same way it read the originals.

The problem

System instruction files grow. You add a section for a new workflow. A memory file picks up another entry. A month in, your CLAUDE.md is 14KB and your memory files total 125KB. That is context window you are paying for with every session, every message, every tool call.

Markdown formatting accounts for a surprising amount of that. Bold markers, horizontal rules, backtick-wrapped paths, blank lines between list items, full English sentences where a phrase would do. All of it tokenizes. None of it changes what the AI does.

The method

Five rules. Apply them mechanically.

1. Strip markdown formatting. Remove bold markers (**), horizontal rules (---), backtick wrapping around file paths and code terms. The AI knows what a file path is without backticks telling it.

2. Remove blank lines. Blank lines between list items, between sections, after headings. Keep one blank line between major sections for your own scanning. Delete the rest.

3. Use shorthand notation. Replace full sentences with compressed phrases. Use equals signs, periods, and parentheticals instead of prose.

4. Collapse lists into inline text. A bulleted list of three items becomes one line with periods or commas separating them.

5. Remove redundant context. If a fact appears in two places, keep one. If a sentence restates what the heading already says, delete it.

Before and after

This is the Session Initialization section from a real CLAUDE.md file. The original on the left; the compressed version on the right.

Before (410 bytes)

## Session Initialization

**COLD START -- minimum reads only. Load nothing
else until the task requires it.**

1. Read `C:\Users\tgaum\claude\assistant\state.md`
   (if it exists). Extract carry-forwards only.
2. Read `C:\Users\tgaum\claude\projects\
   time-sensitive.md` (8-line table, ~100 tokens).
3. Call `memory_session_start` with session
   context (surfaces relevant memories from
   MCP server).
4. Generate greeting. STOP. Do not read
   PROJECT_REGISTRY.md, workflow-candidates.md,
   or Calendar until Thomas requests them or
   they are directly needed by the task.

After (270 bytes)

## Session Initialization
COLD START minimum reads only. load nothing
else until task requires it.
1. Read state.md(if exists) extract
   carry-forwards only
2. Read time-sensitive.md(8-line table
   ~100 tokens)
3. Call memory_session_start with session
   context
4. Generate greeting. STOP. do not read
   PROJECT_REGISTRY.md workflow-candidates.md
   or Calendar until thomas requests or
   task needs

Same information. Same behavior. 29% smaller in this section alone. The savings compound across hundreds of lines.

Memory file example

Memory and topic files compress even harder because they tend to be verbose. This is from a workflow notes file.

Before

## Delegation Policy
- Agent dispatch decision gate (Thomas's rule,
  2026-03-07): 1) Can an agent do it?
  2) Will deploying the agent cost more tokens
  than doing it directly? If #2 is yes, just
  do it. No agent overhead for low-bandwidth
  operations.
- Model routing for cost AND speed: Structured
  extraction -> Haiku (seconds).
  Judgment/synthesis -> Sonnet (moderate).
  Complex editorial -> Opus (slowest). Human
  wait time matters as much as token cost.

After

delegation: 1)can agent do it 2)will
agent cost more tokens than direct?
if #2 yes just do it. model routing:
structured extraction=haiku(seconds)
judgment/synthesis=sonnet(moderate)
complex editorial=opus(slowest) human
wait time matters as much as token
cost (2026-03-07)

The original file: 19,487 bytes. After compression: 11,211 bytes. 42% reduction. The AI reads both versions identically.

Full results

File	Before	After	Reduction
CLAUDE.md	13,860 B	10,415 B	25%
workflow.md	19,487 B	11,211 B	42%
browser-automation.md	18,573 B	9,706 B	48%
errors-and-solutions.md	46,571 B	11,885 B	74%
editorial-judgment.md	29,439 B	14,567 B	51%
publishing-pipeline.md	6,513 B	4,121 B	37%
analytics.md	4,644 B	3,088 B	34%
Total	139,087 B	64,993 B	53%

The biggest wins come from files with the most prose. errors-and-solutions.md dropped 74% because it was full of narrative explanations that compressed into single-line entries.

What not to compress

Files that humans read regularly should stay readable. README files, documentation, anything you share with other people. This technique is for files the AI reads and humans rarely touch.

Also: do not compress code. Code formatting carries semantic meaning. Indentation, line breaks, and whitespace in code are functional, not decorative.

Platform compatibility

This works anywhere an AI loads instructions from a file.

Claude Code: CLAUDE.md and memory files in .claude/
Gemini CLI: gemini.md (GEMINI.md) loaded hierarchically
Cursor: .cursorrules and .cursor/ files
Windsurf: .windsurfrules
Any custom system: If your prompt loads from a file, compress the file

The AI does not care about your formatting. It parses meaning from text. Give it less text with the same meaning and it performs the same way with more room to work.

How to do it

You can ask the AI to compress its own instruction files. This prompt works:

Read [filename]. Compress it for token efficiency.
Rules: strip markdown formatting (bold, backticks,
horizontal rules). Remove blank lines between items.
Use shorthand notation (equals signs, parentheticals,
commas instead of full sentences). Collapse bulleted
lists into inline text. Remove redundant context.
Preserve all semantic meaning. Output the compressed
version.

Review the output. Diff it against the original. Confirm nothing was lost. Save the original as a backup (filename-original.md). Replace the active file with the compressed version.

That is the entire process. No tooling required. The AI compresses its own files, and you verify the result.