Tokens, Context and Pricing

Understand what tokens are, how the context window works, and how to keep your Claude Code usage costs under control.

Learning objectives

  • Understand what a token is and how to count them
  • Understand how the context window works
  • Learn strategies to optimize usage costs

What is a token?

Every character you type and every word Claude returns passes through the same unit of measurement: the token. A token is roughly 3–4 characters of English text, which works out to about three quarters of a word. Code tends to be more token-dense than prose because identifiers, punctuation, and indentation each consume tokens.

Some concrete examples give a better feel for the scale:

  • "python" → 1 token
  • "def calculate_churn_rate(df):" → approximately 8 tokens
  • A 200-line Python file → roughly 1,500–2,000 tokens

To understand the density difference, a typical paragraph of English prose lands around 100–150 tokens. The same number of lines in Python code will often run 300–500 tokens due to symbols, brackets, and whitespace handling.

Note

The /cost command shows token usage for the current session. Run it after any large task to build intuition for what different types of work actually cost.

What is context?

The context window is the total amount of text Claude can see at any given moment. It is not just your latest message. It holds the entire active session: your CLAUDE.md, the conversation so far, any files you loaded with @, and every tool output Claude generated along the way.

All of that content competes for the same fixed space. When the window fills, Claude has to compress or drop older material. You may notice responses becoming less precise about early conversation details — that is the window filling up.

What fills the context window
direction: right

clamd: CLAUDE.md
convo: Conversation history
files: "@files"
tools: Tool outputs
window: Context window
model: Model

clamd -> window
convo -> window
files -> window
tools -> window
window -> model

Each turn in a conversation re-sends the full accumulated context to the model. A session with a large CLAUDE.md, several @ file references, and twenty back-and-forth messages is sending a lot of tokens on every single request — not just the first one.

How is pricing computed?

Claude charges separately for input tokens (everything sent to the model) and output tokens (everything Claude writes back). Input is cheaper than output. For practical work, input costs dominate because the context you send grows with every turn while responses stay relatively short.

Claude Code uses Sonnet as its default model. Haiku is faster and significantly cheaper, suited for lighter tasks. Opus is the most capable and most expensive, warranted for genuinely complex reasoning work. Approximate relative cost: Haiku is around 5x cheaper per token than Opus, with Sonnet sitting in the middle.

Prices change. What doesn’t: a large CLAUDE.md loaded at the start of every session accrues cost on every request, not just the first one. A 2,000-token CLAUDE.md sent across 30 turns costs the same as sending a 60,000-token document once.

Tip

Lesson 10, Token Reduction Strategies, covers concrete techniques for trimming context without losing capability. The strategies there apply directly to the patterns introduced here.