Token Reduction Strategies

Advanced strategies to reduce token consumption: @ commands, context management, and RTK techniques.

Learning objectives

  • Use @ commands to target context precisely
  • Apply context reduction techniques
  • Monitor and optimize token costs

Why it matters

Every token Claude reads costs money, consumes rate-limit quota, and takes time.

Cost. Tokens are billed on every message, not just the text you type. Claude receives the full conversation history on each turn, so a session that starts at 500 tokens can balloon past 8,000 by turn ten. On a busy team running dozens of sessions per day, a bloated context window becomes a real line item.

Rate limits. The Claude API enforces tokens-per-minute limits. A context that consumes 6,000 tokens per turn leaves far less headroom than one consuming 600. When you hit a rate limit mid-task, the session stalls. Tighter contexts give you more turns before that happens.

Quality and speed. This one surprises people. A larger context does not produce better answers — it often produces worse ones. Claude has to filter signal from noise across everything you have included. An irrelevant file, a stale conversation turn, or a 2,000-word CLAUDE.md all compete for the model’s attention. Shorter, focused context produces more accurate, faster responses. Less really is more.

/cost

The /cost command prints the token usage and estimated cost for the current session. Run it after any substantial task to build intuition for what different kinds of work actually cost.

> /cost
Session cost: $0.031
  Input tokens:  12,840  ($0.025)
  Output tokens:  1,204  ($0.006)

Total since last /clear

The breakdown separates input from output tokens. Input tokens are almost always the larger number — and the one most worth optimizing, because they include everything Claude has read this session.

The counter resets when you run /clear. If you are comparing the cost of two approaches, run /clear, do the work, then run /cost for a clean reading.

Context management

Two techniques let you control exactly what Claude can see before it starts reasoning.

The @ prefix

The @ prefix lets you include a specific file or directory in your prompt rather than letting Claude decide what to read. Compare these two prompts:

# Vague — Claude may read many files to orient itself
> claude "fix the bug in the transform module"

# Targeted — Claude reads exactly one file
> claude "@src/pipeline/transform.py the test on line 42 fails — fix it"

The targeted version is not just cheaper. It also produces a more focused fix because Claude is not reasoning about the rest of the codebase.

Claude’s file exploration is scoped to the working directory. A narrower working directory means fewer files in scope by default. Inside an interactive session, use ! to run a shell command inline and scope the session to a subdirectory:

> ! cd src/features/
> the normalization function is producing NaN for single-row inputs — debug it

The ! prefix runs the command in the current shell without leaving the Claude Code session. After ! cd, Claude’s file exploration and relative path resolution are both anchored to that subdirectory for the rest of the session.

/clear

/clear wipes the entire conversation history and resets the session cost counter. Use it whenever you are starting a genuinely different task. The context from the previous task adds noise and cost to the next one — there is no benefit to carrying it over.

The habit to build: treat /clear the way you would treat opening a new terminal tab. Each distinct problem gets a fresh start.

/compact

/compact compresses the conversation history into a summary and continues from there. It is less aggressive than /clear — you keep a condensed record of what was decided and done, but the raw back-and-forth is replaced with a summary.

Use /compact when you are mid-task and the context is growing large but you do not want to lose the thread. A good rule of thumb: if /cost shows your session above $0.05 and you still have significant work ahead, run /compact to reclaim headroom.

Checking CLAUDE.md size

CLAUDE.md is loaded at the start of every session. A 2,000-token CLAUDE.md costs roughly $0.006 per session at Sonnet pricing — small in isolation, but across a team running 50 sessions a day, that is $0.30 daily, or around $110 per year, just from loading instructions that may not be relevant to most tasks.

The fix is to keep CLAUDE.md lean and move specialized instructions into skills. Detailed code review criteria, deployment checklists, or dataset documentation belong in purpose-specific skills — not in CLAUDE.md. Claude only loads a skill when you invoke it, so the cost is incurred only when the instruction is actually needed.

A practical split:

Belongs in CLAUDE.mdBelongs in a skill
Project name and languageDetailed code review criteria
Key commands (install, test, lint)Deployment runbook
Directory structure (brief)Dataset schema documentation
Coding conventions (2–3 lines)PR checklist

Tip

Run wc -w CLAUDE.md to check your word count. Aim to stay under 500 words. If you are over that, look for blocks of text that are only relevant to specific tasks — those are candidates to extract into skills.

Semantic context retrieval via MCP

For very large codebases where even a targeted @src/features/ pulls in several thousand tokens, the next step is semantic retrieval: indexing the codebase and fetching only the chunks relevant to the current query, rather than including whole files.

This is achievable through the MCP ecosystem. Tools that expose a search or retrieval interface as an MCP server can be wired into Claude Code via .claude/settings.json, letting Claude query for relevant context automatically before responding.

The pattern is worth knowing even if you do not need it immediately. For most codebases under a few thousand lines, @file targeting and /compact cover the majority of cases. Semantic retrieval becomes relevant when you regularly hit context limits or work across a repo too large to navigate by file reference alone.

Third-party tools

The built-in Claude Code commands handle most situations, but dedicated tools go further by compressing what Claude reads before it even enters the context window.

RTK

RTK is a CLI proxy that intercepts tool outputs — git status, pytest, npm install, container logs, linter output — and compresses them before Claude sees the result. It installs a PreToolUse hook into Claude Code that rewrites commands transparently: Claude calls git status, RTK intercepts, compresses the output, and returns a compact summary. The model never sees the rewrite.

RTK token compression flow
direction: right

without: Without RTK {
  direction: right
  claude: Claude
  shell: Shell
  git: git

  claude -> shell: git status
  shell -> git
  git -> claude: ~2,000 tokens (raw)
}

with: With RTK {
  direction: right
  claude: Claude
  rtk: RTK
  git: git

  claude -> rtk: git status
  rtk -> git
  git -> rtk: raw output
  rtk -> claude: ~200 tokens (filtered)
}

Install RTK and wire it into Claude Code globally:

$ cargo install rtk
$ rtk init -g

The -g flag writes the hook into your global Claude Code settings so it applies to every project.

To see how much RTK has saved in a session:

$ rtk gain
Commands intercepted: 47
Tokens before: 18,420
Tokens after:   3,104
Reduction:        83%

RTK ships with compression rules for 100+ commands. For a large repository with frequent git and test runner calls, the savings are substantial.

Note

RTK complements the built-in strategies above, it does not replace them. Start with /compact, @file targeting, and a lean CLAUDE.md. Add RTK once you have measured that built-in strategies alone are not enough.