# Claude Code Training — Complete Reference

---

# Part 1 — Claude Code: Zero to Proficient

---

## 1. Prompt Mode and Prefixes

<SlideStart />

## The Three Prompt Modes

Claude Code has three operating modes. Each one defines what Claude is allowed to do during a session.

You start every session in **Default mode**. The other two are opt-in — activated when the situation calls for them.

<SlideEnd />

<SlideStart />

### Default Mode

**Default mode** is the starting point for every session. Claude can read files, write files, run commands, and act on its own judgment.

There is nothing to configure — you are already in Default mode the moment you open Claude Code.

<Callout type="tip">Default mode is appropriate for most tasks. Use the other modes only in specific situations, not as a habit.</Callout>

<SlideEnd />

<SlideStart />

### Plan Mode

**Plan mode** removes the ability to act. Claude can read and reason freely, but it will not edit any file or run any command. It describes what it *would* do instead.

**When to use it:**
- Before a large refactor you have not fully mapped out
- When touching code you do not fully understand
- Any time you want to review an approach before anything changes

**Activate:** `Shift+Tab` twice, or type `/plan` — **Exit:** `Shift+Tab` once, or `/plan` again

<Callout type="tip">Plan mode costs nothing and can save you from a bad refactor. Run it first on any task that touches unfamiliar files, then switch back once you are satisfied with the plan.</Callout>

<SlideEnd />

<SlideStart />

### Accept Edits Mode

**Accept edits mode** does the opposite of Plan mode. Claude applies file edits automatically, without stopping to ask for confirmation on each change.

**When to use it:**
- After reviewing a plan — you understand what will happen
- When running a batch of changes without interruption

**Activate:** `Shift+Tab` once from Default mode — **Exit:** `Shift+Tab` once to return to Default mode

<Callout type="warning">Only use Accept edits mode after reviewing a plan first. Changes apply immediately — there is no confirmation prompt to catch mistakes.</Callout>

<SlideEnd />

<SlideStart />

## The Input Prefixes

Prefixes are single characters you place immediately before a word. Each one changes how Claude Code interprets that input.

| Prefix | Name | What it does |
|--------|------|--------------|
| `@` | File reference | Include a file or directory in context |
| `!` | Shell command | Run a command and pass its output to Claude |
| `&` | Background task | Run a command in the background while Claude continues |
| `#` | Comment | Record a note in the session log without sending it to Claude |

Prefixes work in both interactive and non-interactive sessions, and you can combine more than one in the same message.

<SlideEnd />

<SlideStart />

### @ — File Reference

The `@` prefix includes a specific file or directory in Claude's context for that message. Without it, Claude may read several files to orient itself. With it, you point directly to what is relevant.

```bash
# Include a single file
> @src/model.py explain the forward pass

# Include multiple files at once
> @src/preprocessing.py @tests/test_preprocessing.py the test is failing — why?
```

This reduces token usage and focuses Claude on the relevant code. In a large project with dozens of modules, you will see fewer unnecessary file reads.

<SlideEnd />

<SlideStart />

### ! — Shell Command

The `!` prefix enters bash mode — the input is run directly as a shell command, without Claude interpreting it. The command output is added to the conversation context. You then ask Claude about it in the next message.

```bash
# Step 1: run the command
! pytest tests/test_model.py -v

# Step 2: ask Claude about the output
> what is causing the assertion error in test_predict?
```

```bash
# Step 1: add the git log to context
! git log --oneline -10

# Step 2: ask Claude to reason about it
> which of these commits is most likely to have introduced the regression?
```

This is particularly useful for data scientists: run a training job output, a dataset shape check, or a notebook cell result, then ask Claude about it without leaving the session.

<SlideEnd />

<SlideStart />

### & — Background Task

The `&` prefix runs a command in the background. Claude Code returns a background task ID immediately and can continue responding to prompts while the command executes.

```bash
# Start a long training run in the background
> & python src/train.py --config configs/default.yaml

# Continue working while it runs
> @src/evaluate.py review the evaluation logic while the training runs
```

Use it for anything slow — training jobs, test suites, data pipelines — where waiting is just wasted time.

<SlideEnd />

<SlideStart />

### # — Comment

The `#` prefix records your input in the session log without sending it to Claude as a prompt. Claude does not respond to it.

Use it to leave a note mid-session without breaking your flow:

```bash
# Trying the sklearn approach first — may switch to custom transformer if it fails
> @src/features/transform.py refactor the imputation step to use ColumnTransformer
```

The comment stays visible in your session history as a reference for yourself, while Claude proceeds as if it wasn't there.

<SlideEnd />

<SlideStart />

### Combining Prefixes

The prefixes complement each other within the same session. Use them in sequence to build up the context Claude needs before asking your question.

```bash
# Add a file to context
> @src/train.py

# Run a command to get its output into context
! python -c "import src.train; print(src.train.__version__)"

# Now ask Claude with full context available
> does the version in the file match what Python reports?
```

By the time you ask the question, Claude has the file and the command output in context.

<SlideEnd />

---

## 2. What is CLAUDE.md?

<SlideStart />

## What is a CLAUDE.md?

CLAUDE.md is a plain Markdown file that Claude Code reads automatically at the start of every session. Place it at the root of your project and it acts as a permanent briefing — loaded before Claude Code touches a single line of code.

The right mental model: it's the message you'd give a new senior engineer on their first day. The tech stack, the exact commands to run, the conventions your team follows, and the rules that aren't obvious from the code. Not a tutorial. Not a README. Operational context.

Without CLAUDE.md, Claude Code starts each session with no knowledge of your project. It may make assumptions that conflict with your setup, ask questions you've answered before, or repeat mistakes you've already corrected. CLAUDE.md eliminates that friction permanently.

Every session, Claude Code loads the file into its context. The content counts toward your token usage — one reason to keep it concise and relevant.

<Callout type="tip">
  CLAUDE.md is read automatically every session. Write it once, benefit from it indefinitely. You will never need to re-explain your project at the start of a conversation.
</Callout>

<SlideEnd />

<SlideStart />

## How to create a CLAUDE.md

There are two starting points depending on where you are with your project.

**New project:** create `CLAUDE.md` at the root manually and fill it in from scratch. Even a ten-line file that lists your stack and your build commands is a meaningful starting point.

**Existing project:** use the `/init` command inside a Claude Code session. Claude Code scans the repository, infers what it can, and generates a starter file for you to review and refine. This is the recommended path — it ensures the file reflects reality rather than intentions.

```bash title="Inside a Claude Code session"
> /init
```

Claude Code will write a first draft of `CLAUDE.md` based on what it finds. Review it, trim anything inaccurate, and add the conventions it could not infer.

<SlideEnd />

<SlideStart />

### How CLAUDE.md files are loaded

There is not just one CLAUDE.md. Claude Code discovers and concatenates multiple files every session — understanding how they load is what lets you use them effectively.

**Discovery — upward traversal.** When you launch Claude Code in a directory, it walks up the directory tree and loads every `CLAUDE.md` and `CLAUDE.local.md` it finds along the way. If you are in `~/projects/churn/src/features/`, Claude loads instructions from that directory, then `src/`, then `churn/`, then `projects/`. You get inherited context automatically.

**Subdirectory files load lazily.** Files in directories *below* your working directory are not loaded at session start. They are loaded on-demand, the first time Claude reads a file in that subdirectory. This keeps the initial context lean on large repos.

**Concatenation, not overriding.** All discovered files are appended together in load order — none replaces another. Within a single directory, `CLAUDE.local.md` is appended after `CLAUDE.md`, so your personal notes land last at that level and take effect when instructions conflict.

**The five sources, in load order:**

| Source | Path | Scope | Committed? |
|--------|------|-------|-----------|
| Managed | set by organization | All machines | N/A |
| Global | `~/.claude/CLAUDE.md` | Your machine | No |
| Project | `./CLAUDE.md` | Current repo | Yes |
| Local | `./CLAUDE.local.md` | Your machine only | No — gitignore it |
| Path rules | `.claude/rules/*.md` | On-demand, by glob | Yes |

<DiagramViewer title="CLAUDE.md — load order and traversal">
```d2
direction: right

anc: "Ancestor dirs\n(CLAUDE.md each)"
global: "~/.claude/CLAUDE.md\n(global)"
proj: "./CLAUDE.md\n(project)"
local: "./CLAUDE.local.md\n(local)"
ctx: "Session context\n(concatenated)"
sub: "Subdirectory CLAUDE.md\n(lazy — on demand)"

global -> ctx: "1 loaded at start"
anc -> ctx: "2 loaded at start"
proj -> ctx: "3 loaded at start"
local -> ctx: "4 loaded at start"
sub -> ctx: "5 on first file access"
```
</DiagramViewer>

Most teams start with a single project-level `CLAUDE.md`. Add the others when a real need appears — not speculatively.

<SlideEnd />

<SlideStart />

## Importing files with @

CLAUDE.md files can pull in other files using the `@path/to/file` syntax. The imported content is expanded inline before Claude reads the file — useful for splitting a large CLAUDE.md into focused pieces, or reusing shared rules across projects.

```markdown title="CLAUDE.md"
# CLAUDE.md — Churn Model

See @README.md for project overview.
See @package.json for available scripts.

## Git workflow
@docs/git-conventions.md

## Conventions
- Type hints on all public functions
```

A few rules to know:
- Paths are **relative to the file containing the import**, not the working directory.
- Imports can chain — an imported file can import others, up to **5 hops deep**.
- The first time Claude loads a file from outside the project, it shows an approval dialog listing the files to be imported.

<Callout type="tip">
  Use `@` imports to keep your project-level `CLAUDE.md` short while keeping detailed rules (deployment steps, dataset conventions) in their own files that only get included when referenced.
</Callout>

<SlideEnd />

<SlideStart />

## Template

Here is a realistic minimal template for a Python ML project. Inner commands are indented to avoid nested code fence issues when the outer block is rendered.

```markdown title="CLAUDE.md"
# CLAUDE.md — Customer Churn Model

## Project
Python 3.11 churn prediction pipeline. Stack: scikit-learn, pandas, FastAPI.
Training data lives in data/raw/ (not committed to git).

## Commands

pip install -e ".[dev]"          # install with dev dependencies
pytest tests/ -v                 # run tests
ruff check src/                  # lint
python src/train.py --config configs/default.yaml  # train

## Conventions
- Type hints on all public functions
- Docstrings in English, numpy format
- Models versioned under models/v{major}/
- Never hardcode secrets — load from .env via python-dotenv
- IMPORTANT: do not modify data/raw/ — treat it as read-only

## Notes for Claude
- Dataset is confidential — never reproduce real rows in examples or comments
- Prefer sklearn Pipeline over manual feature transforms
- The feature engineering logic in src/features/ is intentionally verbose — do not refactor it unless explicitly asked
```

Sections: **Project** sets the stage, **Commands** gives Claude the exact strings it needs to run things, **Conventions** captures team rules, and **Notes for Claude** is where you put operational constraints that are not apparent from reading the code.

<Callout type="danger">
  Never put secrets, passwords, or API keys in CLAUDE.md. This file is committed to git and visible to anyone with access to the repository.
</Callout>

<SlideEnd />

<SlideStart />

## Dos and don'ts

| Do | Don't |
|----|-------|
| List exact install, build, test, and lint commands | Describe file structure — Claude can read the repo |
| Write team conventions Claude cannot guess from the code | Write general programming advice |
| Add "never do X" rules for known failure modes | Repeat information already in the README |
| Keep it under 100 lines | Let it grow unbounded |
| Commit `CLAUDE.md` to git | Commit `CLAUDE.local.md` — add it to `.gitignore` |
| Use `IMPORTANT:` to flag critical rules | Include secrets or API keys |

<SlideEnd />

<SlideStart />

## What makes a CLAUDE.md effective

Boris Cherny (the creator of Claude Code at Anthropic) has distilled what separates useful CLAUDE.md files from ones that get ignored or become outdated:

**Write what Claude cannot infer from the code.** Claude Code can read your files. It does not need a description of your directory structure or a tour of your modules. What it cannot read is intent: why a decision was made, what constraint a convention exists to enforce, what failure mode a rule is protecting against. Write the *why*.

**Keep it short and actionable.** A focused 20-line file outperforms a 500-line document. CLAUDE.md is loaded on every session, which means verbosity has a real cost in tokens and in the attention Claude Code gives to any single instruction.

**Use it for operational instructions, not documentation.** CLAUDE.md is not a README. It tells Claude Code how to behave. It is not the place to explain the business problem or onboard a human reader.

**Update it when something goes wrong.** If Claude Code repeats the same mistake twice, that is a signal: add a "do not" rule. CLAUDE.md improves through iteration. Do not try to write the perfect file up front — write a good-enough file and add rules as you discover what Claude Code needs to know.

The `IMPORTANT:` prefix is worth knowing. When you prepend a rule with `IMPORTANT:`, Claude Code treats it as higher priority. Use it sparingly, for the rules where getting it wrong has real consequences.

**Use HTML comments for maintainer notes.** Block-level HTML comments (`<!-- like this -->`) are stripped from CLAUDE.md before Claude reads it. They cost zero tokens and are invisible to the model — useful for notes to yourself about why a rule exists, or reminders to update a section.

```markdown
<!-- Last updated 2026-03: added after repeated sklearn mistake — revisit when model changes -->
- Prefer sklearn Pipeline over manual feature transforms
```

<Callout type="tip">
  Boris Cherny (the creator of Claude Code at Anthropic) shares 72 practical tips on how he uses Claude Code day-to-day, including his team's CLAUDE.md methodology, at [howborisusesclaudecode.com](https://howborisusesclaudecode.com/).
</Callout>

<SlideEnd />

---

## 3. Tokens, Context and Pricing

<SlideStart />

## What is a token?

Every character you type and every word Claude returns passes through the same unit of measurement: the token. A token is roughly 3–4 characters of English text, which works out to about three quarters of a word. Code tends to be more token-dense than prose because identifiers, punctuation, and indentation each consume tokens.

Some concrete examples give a better feel for the scale:

- `"python"` → 1 token
- `"def calculate_churn_rate(df):"` → approximately 8 tokens
- A 200-line Python file → roughly 1,500–2,000 tokens

To understand the density difference, a typical paragraph of English prose lands around 100–150 tokens. The same number of lines in Python code will often run 300–500 tokens due to symbols, brackets, and whitespace handling.

<Callout type="note">
The `/cost` command shows token usage for the current session. Run it after any large task to build intuition for what different types of work actually cost.
</Callout>

<SlideEnd />

<SlideStart />

## What is context?

The context window is the total amount of text Claude can see at any given moment. It is not just your latest message. It holds the entire active session: your CLAUDE.md, the conversation so far, any files you loaded with `@`, and every tool output Claude generated along the way.

All of that content competes for the same fixed space. When the window fills, Claude has to compress or drop older material. You may notice responses becoming less precise about early conversation details — that is the window filling up.

<DiagramViewer title="What fills the context window">
```d2
direction: right

clamd: CLAUDE.md
convo: Conversation history
files: "@files"
tools: Tool outputs
window: Context window
model: Model

clamd -> window
convo -> window
files -> window
tools -> window
window -> model
```
</DiagramViewer>

Each turn in a conversation re-sends the full accumulated context to the model. A session with a large CLAUDE.md, several `@` file references, and twenty back-and-forth messages is sending a lot of tokens on every single request — not just the first one.

<SlideEnd />

<SlideStart />

## How is pricing computed?

Claude charges separately for input tokens (everything sent to the model) and output tokens (everything Claude writes back). Input is cheaper than output. For practical work, input costs dominate because the context you send grows with every turn while responses stay relatively short.

Claude Code uses Sonnet as its default model. Haiku is faster and significantly cheaper, suited for lighter tasks. Opus is the most capable and most expensive, warranted for genuinely complex reasoning work. Approximate relative cost: Haiku is around 5x cheaper per token than Opus, with Sonnet sitting in the middle.

Prices change. What doesn't: a large CLAUDE.md loaded at the start of every session accrues cost on every request, not just the first one. A 2,000-token CLAUDE.md sent across 30 turns costs the same as sending a 60,000-token document once.

<Callout type="tip">
Lesson 10, [Token Reduction Strategies](/lessons/token-reduction), covers concrete techniques for trimming context without losing capability. The strategies there apply directly to the patterns introduced here.
</Callout>

<SlideEnd />

---

## 4. Basic Commands

<SlideStart />

## What is a command?

When you work with Claude Code, you will encounter three distinct types of input. Understanding the difference between them saves a lot of confusion early on.

A **slash command** is something you type directly in the interactive session that controls the session itself. Commands like `/help`, `/clear`, and `/cost` do not ask Claude to do anything with your code. They manage the state of the conversation — resetting history, showing token usage, switching models.

A **shell command** is a terminal instruction you want to run without leaving Claude Code. Prefix it with `!` and it executes in your working directory. `! git status` runs git status and shows you the output. `! python -m pytest` runs your test suite. The result is visible in the session and Claude can read it.

A **prompt** is plain text — your natural language message to Claude. "Refactor this function to use list comprehension" is a prompt. So is "Why is this model overfitting?". Prompts are what you send most of the time.

The distinction matters because `/clear` and `clear the conversation` do the same thing, but `! ls` and `list the files` do not. The first actually runs `ls`; the second asks Claude to describe what it thinks is in the directory, which may or may not be accurate.

<SlideEnd />

<SlideStart />

## How to find all available commands

There are two places to look, depending on where you are.

From your terminal, before starting a session:

```bash
> claude --help
```

This shows CLI flags — options you pass when launching Claude Code, like `--model`, `--verbose`, or `--version`. It is the right place to look when you want to understand how to start a session differently.

Once inside a running session, type:

```bash
/help
```

This shows every slash command available in the current version of Claude Code, with a one-line description of each. The list is longer than most people expect. Claude Code is updated regularly and new commands appear without much fanfare, so running `/help` in an actual session is more reliable than any documentation.

The output format is consistent: command name on the left, short description on the right. Scan it vertically and you will spot commands worth exploring.

<SlideEnd />

<SlideStart />

## Most common and useful commands

Here are the commands you will reach for most often, grouped by what they do.

**Starting Claude Code from the terminal:**

```bash
> claude                                         # [1]
> claude "fix the bug in src/model.py"           # [2]
> claude --model haiku "summarize this file"     # [3]
```

`# [1]` Opens an interactive session in your current directory. Claude loads `CLAUDE.md` automatically if one exists.

`# [2]` Starts a session and immediately sends that string as the first prompt. Useful when you already know the task.

`# [3]` Starts a session with a specific model. Haiku is faster and cheaper; Opus is more capable. You can also switch models mid-session with `/model`.

**Inside a session:**

| Command | What it does | When to use it |
|---|---|---|
| `/help` | Lists all available slash commands | When you want to discover what is available |
| `/clear` | Clears conversation history | When context is stale or too large |
| `/compact` | Compresses history to save tokens | When you want to continue but reduce cost |
| `/cost` | Shows token usage and estimated cost | After a long session, before continuing |
| `/model` | Switch model (Haiku / Sonnet / Opus) | When a task needs more or less capability |
| `/plan` | Toggles Plan mode (Claude proposes, does not edit) | When you want to review before any file changes |
| `/init` | Generates a `CLAUDE.md` for the current project | When starting on a new codebase |
| `! <cmd>` | Runs a shell command inline | `! git diff`, `! pytest`, `! ls src/` |

`/clear` and `/compact` are worth understanding as a pair. `/clear` wipes the entire history and starts fresh — useful when you have gone down a dead end. `/compact` keeps recent context but summarizes older turns — useful when a session is going well but getting expensive.

`/plan` is particularly useful when you are about to make a large change and want to see Claude's approach before it touches anything. Toggle it on, describe the task, review the plan, then toggle it off when you are ready to proceed.

`/init` is a one-time operation per project. It reads your codebase and generates a `CLAUDE.md` with sensible defaults. Run it once in a new repo and then edit the file to fit your actual conventions.

<SlideEnd />

---

## 5. Custom Commands

<SlideStart />

## How to create a custom command

Custom commands are plain Markdown files stored in `.claude/commands/`. The filename determines the command name: `review-pr.md` becomes `/review-pr`, `summarize.md` becomes `/summarize`. Claude Code scans that directory automatically — no registration step required.

To create your first command:

```bash
# Create the commands directory if it doesn't exist
> mkdir -p .claude/commands

# Create a command file
> touch .claude/commands/review-pr.md
```

Open `review-pr.md` and write the prompt you want Claude to receive when you type `/review-pr`. No frontmatter required — the file content is the prompt template. Any text typed after the command name is passed in as `$ARGUMENTS`.

For example, if your file contains `Review the pull request at $ARGUMENTS for correctness and style` and you type `/review-pr #142`, Claude receives `Review the pull request at #142 for correctness and style`.

<DiagramViewer title="How a custom command resolves">
```d2
direction: right

cmd: "/review-notebook notebooks/churn.ipynb"
file: .claude/commands/review-notebook.md
prompt: "Prompt with arguments"
claude: Claude Code

cmd -> file
file -> prompt
prompt -> claude
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

## Template

Here is a complete, realistic command for a DS/ML workflow. It reviews a Jupyter notebook for common issues that affect reproducibility and model correctness.

```markdown title=".claude/commands/review-notebook.md"
Review the notebook at $ARGUMENTS for:
1. Reproducibility — are random seeds set? are paths relative?
2. Data leakage — does any preprocessing use future information?
3. Code quality — are functions documented? is logic reusable?
4. Output — are all cells run in order? are large outputs cleared?

Provide a structured report with a severity level (high/medium/low) for each issue found.
```

To invoke it:

```bash
/review-notebook notebooks/churn_model.ipynb
```

Claude replaces `$ARGUMENTS` with `notebooks/churn_model.ipynb` and runs the prompt. You can reuse the same command on any notebook in your project — no edits needed.

`$ARGUMENTS` captures everything after the command name as a single string. For instance, `/compare-models models/v1.pkl models/v2.pkl` passes `models/v1.pkl models/v2.pkl` as one block of text.

When you need individual arguments separately, use positional access: `$ARGUMENTS[0]` for the first, `$ARGUMENTS[1]` for the second — or the shorthand `$0`, `$1`, `$2`. A command that compares two model versions might use:

```markdown title=".claude/commands/compare-models.md"
Compare $0 and $1. Focus on accuracy, latency, and memory footprint.
Output a structured table with one row per metric.
```

Running `/compare-models models/v1.pkl models/v2.pkl` substitutes each argument by position.

A few things worth getting right when writing command templates:

- Be explicit about the output format. If you want a numbered list with severity labels, say so.
- Do not assume Claude knows your project context. If the command depends on a convention (like "all data is in `data/raw/`"), state it in the template.
- Write the prompt as carefully as you would write it by hand. The command is not magic — it is just a saved prompt.

<SlideEnd />

<SlideStart />

## Dos and don'ts

| Do | Don't |
|----|-------|
| Write commands for workflows you repeat more than twice a week | Create commands for one-off tasks |
| Use `$ARGUMENTS` to make commands reusable | Hardcode specific file paths |
| Keep command files in `.claude/commands/` and commit them | Store commands in ad-hoc places outside the project |
| Write the prompt as clearly as you'd write it manually | Assume Claude will infer missing context |
| Test the command on a real case before sharing | Share untested commands with the team |

The test-before-sharing rule matters more than it looks. A command that works well on one notebook may produce shallow results on another if the prompt is underspecified. Run it on two or three real cases before committing it as a team standard.

<SlideEnd />

---

## 6. Skills and Meta-Skills

<SlideStart />

## What is a skill?

A skill is a Markdown file stored in a subdirectory of `.claude/skills/` that contains a reusable prompt template. On the surface that sounds similar to a custom command, and the two do share the same idea: write the instruction once, reuse it many times. The difference is how and when they are invoked.

Custom commands are triggered explicitly — you type `/review` and Claude runs it. Skills can be invoked the same way, but they can also be picked up without explicit invocation. Claude reads each skill's `description` field alongside the current task and decides whether to apply the skill. That makes the `description` field the most important part of a well-written skill: write it precisely, and Claude will reach for the skill in the right situations. Write it vaguely, and the skill will rarely be used.

Skills are also richer than commands. They can carry longer instructions, structured output requirements, and domain-specific guidance that would be unwieldy to type in a prompt every time.

<SlideEnd />

<SlideStart />

## How are skills activated?

Skills are activated in two ways.

**Manually** — you reference the skill directly in a prompt, or you document it in CLAUDE.md so Claude knows to use it for a specific class of task:

```markdown title="CLAUDE.md"
## Skills
When reviewing code quality, use the skill at `.claude/skills/code-review.md`.
```

With this in CLAUDE.md, any time you ask Claude to review code, it will pull in that skill's instructions automatically, without you having to say "use the code-review skill" every time.

**Implicitly** — Claude reads the `description` field of each loaded skill and decides whether the skill is relevant to the current task. This is a judgment call by the model, not a keyword lookup. A precise description like "Review Python code for quality, type safety, and DS best practices" gives Claude enough signal to invoke the skill when you ask for a code review. A vague one like "helps with code" does not.

The two activation paths are complementary. CLAUDE.md references are explicit and reliable — use them for skills you want applied consistently. Implicit activation through the `description` field works well once the description is specific enough to leave no ambiguity.

<DiagramViewer title="Custom command vs. skill invocation">
```d2
direction: right

slash: User types /command
exec: Claude runs command
prompt: User types a prompt
check: Read skill descriptions
invoke: Invoke skill automatically
direct: Respond without skill

slash -> exec
prompt -> check
check -> invoke: match found
check -> direct: no match
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

## How to create a skill?

A skill file has YAML frontmatter with at minimum a `name` and a `description`, followed by the prompt body — the actual instructions Claude will follow when the skill is invoked.

Here is a complete example for a Python code review skill:

```markdown title=".claude/skills/code-review/SKILL.md"
---
name: code-review
description: Review Python code for quality, type safety, and DS best practices
---
Review the provided code for:
1. Type hints — all functions must have type annotations
2. Docstrings — public functions need a one-line summary
3. Error handling — no bare `except:` clauses
4. Data practices — no hardcoded paths, no global state for model objects

For each issue: file, line number, severity (high/medium/low), and suggested fix.
```

A few things worth noting about this structure. The `description` is written from Claude's perspective — it describes what the skill does, not what you want to happen. That is what makes automatic activation work. The prompt body is numbered and specific: it tells Claude exactly what to check and exactly what format to use for the output. Vague instructions in the body produce vague output.

You can also add a `paths` field to the frontmatter to scope the skill to specific file patterns (e.g. `paths: "src/**/*.py"`), but for most use cases a well-written `description` is sufficient.

<SlideEnd />

<SlideStart />

## Dos and don'ts

| Do | Don't |
|----|-------|
| Write a clear `description` so Claude knows when to invoke the skill | Leave `description` vague or empty |
| Keep each skill focused on one responsibility | Put multiple unrelated tasks in one skill |
| Commit skills to `.claude/skills/` in git | Keep skills local if your team would benefit from them |
| Reference skills from CLAUDE.md for reliable activation | Rely only on automatic invocation for critical workflows |
| Test the skill on a real case before committing | Share untested skills with your team |

The single most common mistake is writing a skill with a weak description. If the description says something like "helps with code", Claude has no basis for automatic activation. Compare that to "Review Python code for quality, type safety, and DS best practices" — that is specific enough for Claude to match against real tasks.

Scope is the second common problem. A skill that tries to handle data validation, model training, and documentation all at once is hard to invoke correctly and hard to maintain. One skill, one job.

<SlideEnd />

<SlideStart />

## Metaskill: skill-creator

Once you have a few skills in your project, a natural next step is to standardize how new ones get written. The most effective way to do that is with a skill that creates other skills — a `skill-creator`.

A `skill-creator` skill contains instructions that tell Claude how to author skill files: what frontmatter fields are required, how the `description` should be phrased, what level of detail belongs in the prompt body, and any project-specific standards your team follows. Once it exists, you can say "create a skill for validating dataset schemas" and Claude generates a properly structured `.md` file that matches your conventions.

```markdown title=".claude/skills/skill-creator/SKILL.md"
---
name: skill-creator
description: Create a new Claude Code skill file following project conventions
---
When asked to create a new skill, create a directory in `.claude/skills/` and place a `SKILL.md` file inside it:

Frontmatter:
- `name`: kebab-case, describes the action (e.g. `validate-schema`, `summarize-notebook`)
- `description`: one sentence, starts with a verb, specific enough for automatic activation

Body:
- Numbered steps for multi-part tasks
- Specify output format explicitly if the result needs a consistent structure
- No open-ended instructions — every step should have a clear stopping condition

Save the file and confirm the path. Do not activate the skill until the user reviews it.
```

Every new skill follows the same format. The collection stays consistent without anyone having to enforce it.

<SlideEnd />

---

## 7. Agents

<SlideStart />

## What is an agent?

An agent is a specialized Claude Code subprocess. It has its own context window, its own set of allowed tools, and an optional system prompt that defines how it behaves. It runs in isolation from the main conversation — it does not share history, state, or tool access with the process that spawned it.

Claude Code itself is an agent. When you open a session and type a prompt, you are talking to the main agent. That main agent can then spawn sub-agents to handle specific delegated work.

Three things define an agent:
- A **name** and **description** that tell Claude when and why to invoke it
- A list of **tools** it is allowed to use
- An optional **system prompt** that shapes its focus and constraints

Agents are stored as Markdown files in `.claude/agents/`. They live alongside your CLAUDE.md and custom commands, under version control, as part of your project.

The main reason to use agents is parallelism. The main Claude Code process can spawn multiple sub-agents simultaneously, each working independently on a separate task. When they finish, results come back to the main process. For work that is naturally decomposable — running tests while generating documentation, checking multiple data schemas at once — this is a significant speedup over sequential prompting.

<SlideEnd />

<SlideStart />

## How are agents activated?

Agents are activated in three ways.

**Automatically**: Claude Code reads the `description` field of every agent in `.claude/agents/` and uses it to decide whether a given task should be delegated. When a user prompt or an in-session task matches an agent's description closely enough, Claude spawns it without being explicitly told to.

**By name in natural language**: You can write "use the test-runner agent" in your prompt. Claude decides whether to delegate — it is a suggestion, not a guarantee.

**By `@`-mention**: Type `@` and pick the agent from the typeahead (the same way you `@`-mention files). This **guarantees** that specific agent runs, regardless of what Claude would otherwise decide.

```
@"test-runner (agent)" check the auth module changes
```

You can also type the mention manually without the picker: `@agent-test-runner`. The `@`-mention controls which agent Claude invokes — your full message still goes to Claude, which writes the agent's task prompt based on what you asked.

<DiagramViewer title="Parallel agent execution">
```d2
direction: right

user: User prompt
main: Claude Code
a: Agent A
b: Agent B
results: Results merged
response: Response

user -> main
main -> a
main -> b
a -> results
b -> results
results -> main
main -> response
```
</DiagramViewer>

Parallelism is what makes agents distinct from commands and skills. If two tasks are independent — running the test suite and updating docstrings, for example — there is no reason to wait for one before starting the other. Agents make that possible.

Because Claude decides when to auto-invoke based on the description, writing a precise description is one of the most important parts of defining an agent. A vague description means Claude either over-invokes it or ignores it. A specific description ("run the test suite and report failures — invoke after any code change") leaves no ambiguity.

<SlideEnd />

<SlideStart />

## Skill vs agent

This is the distinction that confuses most people when they first encounter agents.

| | Skill | Agent |
|--|-------|-------|
| What it is | Prompt template or slash command | Full Claude subprocess with isolated context |
| Scope | Runs in the current conversation | Runs in isolation |
| Parallelism | No | Yes — multiple agents can run at the same time |
| Tools | Inherits from the parent process | Configurable per agent |
| Use case | Recurring prompts, structured workflows | Complex multi-step tasks, parallel work |
| Created with | `.claude/skills/<name>/SKILL.md` | `.claude/agents/<name>.md` |

The clearest one-sentence summary: a skill tells the current Claude what to do; an agent is a separate Claude that goes off and does it.

Use a skill when you want to standardize how you interact with Claude in a session. Use an agent when you want Claude to delegate a task to another process entirely, run it in isolation, and bring the results back.

For DS/ML work, the line often falls here: a skill might format your prompt for running an experiment or summarize a training log. An agent would run the validation suite against a new dataset, or generate documentation for a module you just wrote, while you or another agent continues working on something else.

<SlideEnd />

<SlideStart />

## How to create an agent

Create a Markdown file in `.claude/agents/` with a YAML frontmatter block followed by the system prompt in the body. No command required — Claude Code loads any `.md` file it finds in that directory.

```bash
> touch .claude/agents/test-runner.md
```

Here is a complete, realistic example:

```markdown title=".claude/agents/test-runner.md"
---
name: test-runner
description: Run the test suite and report failures. Invoke after any code change. # [1]
model: sonnet  # [2]
---

You are a focused test runner. Run the project test suite, identify failing  # [3]
tests, and report results clearly. Do not fix bugs — only report what failed
and why.
```

1. The description drives automatic invocation. Be specific about when this agent should be used.
2. The model field is optional. Omit it to inherit the session default (Sonnet).
3. The markdown body is the agent's system prompt. Explicitly telling it what not to do (fix bugs) prevents scope creep and keeps the agent focused.

<SlideEnd />

<SlideStart />

## Dos and don'ts

| Do | Don't |
|----|-------|
| Give each agent a single, well-defined responsibility | Create agents that do everything |
| Restrict tools to the minimum needed | Give all tools to every agent |
| Write a specific description so Claude knows when to invoke it | Leave the description vague |
| Use agents for parallelizable, isolated work | Use agents for simple one-step tasks — a skill is better |
| Commit `.claude/agents/` to git | Keep agents local when the whole team would benefit |

<SlideEnd />

---

## 8. Configuration Sharing

<SlideStart />

## What is a plugin?

A plugin is a packaged collection of Claude Code configuration. It bundles skills, agents, custom slash commands, hooks, and optionally MCP server definitions into a single installable unit.

Structurally, a plugin is a git repository with a `.claude/` directory at its root. That directory follows the same layout Claude Code already uses for project-level configuration, so there is nothing new to learn about the format itself — the novelty is that the whole thing travels together as a unit that anyone can install with one command.

This distinction matters. You have already seen how individual commands or skills improve a specific task. A plugin takes that further: it captures an entire workflow. A data science team might ship one plugin that includes a code-review skill tuned for ML code, an agent that orchestrates test runs, and a `/summarize-notebook` command — all versioned together and installable in seconds.

<DiagramViewer title="What a plugin bundles">
```d2
plugin: Plugin repo {
  commands: commands/
  skills: skills/
  agents: agents/
  settings: settings.json
}
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

## How to create a plugin

A plugin is a git repository with this directory structure:

```
my-plugin/
├── .claude/
│   ├── commands/       # slash commands
│   ├── skills/         # skill files
│   ├── agents/         # agent definitions
│   └── settings.json   # hooks and MCP server configs
└── README.md
```

Nothing about this structure is special to plugins — it is the same `.claude/` layout used in any project. The difference is intent: a plugin repo exists solely to be shared and installed, not to hold application code.

Here is a concrete example. Say your team builds a `ds-workflow` plugin:

```
ds-workflow/
├── .claude/
│   ├── commands/
│   │   └── summarize-notebook.md    # [1]
│   ├── skills/
│   │   └── code-review.md           # [2]
│   ├── agents/
│   │   └── test-runner.md           # [3]
│   └── settings.json
└── README.md
```

- **[1]** `/summarize-notebook` — a command that takes a notebook path and produces a plain-English summary of what it does, suitable for pull request descriptions
- **[2]** `code-review` skill — instructs Claude on ML-specific review criteria: data leakage, pipeline correctness, random seed consistency
- **[3]** `test-runner` agent — an agent definition that locates test files, runs them, and reports failures with suggested fixes

Each file is written the same way you would write a standalone command or skill. The plugin just groups them under a single repo so they can be versioned, reviewed, and installed together.

To publish the plugin, push the repository to GitHub. That is sufficient — no build step, no packaging format, no registry submission required.

<SlideEnd />

<SlideStart />

## How plugins work — marketplace

Claude Code installs plugins from git URLs. When you run an install command, Claude Code clones the plugin repository and merges its `.claude/` contents into your project or global configuration. The plugin's commands show up in `/help`, its skills become available to the model, and its agent definitions are active immediately.

The community shares plugins on GitHub. Common conventions include the `claude-code-plugin` topic tag on repositories and curated lists maintained in the Claude Code documentation. A more formal marketplace at claude.ai/marketplace is also evolving — check the official Claude Code docs for the current state, since this ecosystem is still developing quickly.

Plugins are activated per project or globally, depending on where you install them. A plugin installed globally is available in every session; one installed into a specific project is scoped to that project's `.claude/` directory.

<DiagramViewer title="Plugin installation and activation flow">
```d2
direction: down

git: "GitHub / git URL"
local: "Local .claude/ directory"
cm: commands/
sm: skills/
am: agents/
im: settings.json
active: Active in session

git -> local: plugin install
local -> cm
local -> sm
local -> am
local -> im
cm -> active
sm -> active
am -> active
im -> active
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

## How to install a plugin

Installation takes one command:

```bash
# Install a plugin from a git URL
> claude plugin install https://github.com/example/ds-workflow-plugin   # [1]
```

- **[1]** Claude Code clones the repository, reads its `.claude/` directory, and merges the contents into your configuration. After this completes, the plugin's commands and agents are immediately available.

Once installed, you can confirm the plugin's commands are active:

```bash
> claude
/help    # [2]
```

- **[2]** The `/help` output now lists the plugin's slash commands alongside your own. No restart required.

If you prefer to install manually — or if your organization has a private plugin repo that requires SSH access — you can also add the plugin directly to your Claude Code settings file and point it at the local path after cloning.

<Callout type="tip">
  The most effective team setup is a shared internal plugin in your organization's private git repo. One install command and everyone gets the same workflow — same commands, same skills, same agent behaviors. When a command improves, you update the plugin repo, teammates pull, and the change is live for everyone.
</Callout>

<SlideEnd />

---

## 9. MCP Integrations

<SlideStart />

## What is MCP?

MCP stands for Model Context Protocol. It is an open standard that lets Claude Code connect to external tools and data sources through a defined server protocol. An MCP server exposes a set of **tools** that Claude can call — read from a database, query an API, run a search, fetch a document. From Claude's perspective, an MCP tool looks exactly like a built-in tool such as Bash or Read. The protocol standardizes how those tools are described, called, and responded to.

The architecture has three moving parts: the Claude Code client, the MCP server, and the external system that server wraps.

<DiagramViewer title="MCP architecture">
```d2
direction: right

claude: Claude Code
client: MCP Client
server: MCP Server
external: {
  db: Database
  api: API
  tool: Tool
}

claude <-> client
client <-> server
server <-> external.db
server <-> external.api
server <-> external.tool
```
</DiagramViewer>

Claude Code acts as the MCP client. It sends a tool call with structured inputs to the MCP server, which handles the request against the external system and returns a structured response. Claude never touches the external system directly — all communication goes through the server.

<SlideEnd />

<SlideStart />

## Why use MCP in Claude Code?

Before MCP, giving Claude access to an internal system meant writing a skill that told Claude to construct a specific `curl` command, pass the right headers, handle authentication inline, and parse whatever JSON came back. That works for a one-off task. It breaks when the API changes, when a colleague uses the same skill against a different environment, or when the endpoint requires more than one call to get a useful result.

MCP separates the integration logic from the instructions. You write a server once. The server exposes typed tools with defined inputs and outputs. Claude calls those tools the same way it calls Bash — with validated parameters, without needing to know anything about the underlying API.

There is another advantage that matters specifically in team settings: MCP servers are reusable across any MCP-compatible client. A server your team writes for Claude Code today can be used by other tools that adopt the same protocol. You write the integration once and share it.

<SlideEnd />

<SlideStart />

## MCP vs creating manual skills

Consider a common DS/ML task: querying a feature store to retrieve training features for a given entity.

**Manual skill approach.** You write a skill file that instructs Claude to run a specific `curl` command against the feature store REST API, substitute the entity ID into the URL, set the `Authorization` header using a token it reads from the environment, and then parse the JSON array from the response. Claude follows the instructions, but there is no type checking on inputs, no schema for the response, and the skill breaks silently if the endpoint path or response shape changes.

**MCP approach.** You write a `feature-store` MCP server that exposes a single tool: `get_features`. Claude calls it with named, typed parameters and receives a structured object back. The server handles authentication, retries, and response parsing internally.

Here is what an MCP tool definition looks like in Python using the official SDK:

```python title="feature_store_server.py"
@server.list_tools()
async def list_tools():
    return [
        {
            "name": "get_features",                          # [1]
            "description": "Fetch features for a given entity from the feature store.",
            "inputSchema": {                                 # [2]
                "type": "object",
                "properties": {
                    "entity_id": {"type": "string"},
                    "feature_names": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["entity_id", "feature_names"]
            }
        }
    ]
```

1. The tool name is what Claude uses when it decides to call this tool. Keep names short and verb-noun.
2. The input schema is a standard JSON Schema object. Claude uses it to construct valid calls — no prompt engineering required to get the right shape.

When Claude calls `get_features`, it passes `entity_id` and `feature_names` as structured arguments. The server validates them against the schema before executing anything.

<SlideEnd />

<SlideStart />

## MCP servers available by default

Claude Code does not bundle MCP servers, but the MCP community maintains a set of official servers that cover the most common integrations. These install on demand via `npx`. The ones most relevant to DS/ML work:

| Server | Package | What it gives Claude |
|--------|---------|----------------------|
| PostgreSQL | `@modelcontextprotocol/server-postgres` | Run SQL queries against a Postgres database |
| SQLite | `@modelcontextprotocol/server-sqlite` | Query local SQLite files |
| Filesystem | `@modelcontextprotocol/server-filesystem` | Structured file operations beyond the built-in Read/Write tools |
| GitHub | `@modelcontextprotocol/server-github` | Issues, pull requests, repository search |
| Brave Search | `@modelcontextprotocol/server-brave-search` | Live web search |

To enable a server, add it to `.claude/settings.json` under `mcpServers`. Each entry specifies the command to launch the server and any arguments it needs:

```json title=".claude/settings.json"
{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
    }
  }
}
```

Once the config is saved, restart your Claude Code session. The server starts automatically and its tools appear in Claude's tool list for the duration of that session.

For credentials, use environment variables. The `env` field in the server config accepts references to your shell environment:

```json title=".claude/settings.json"
{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": {
        "POSTGRES_URL": "${POSTGRES_URL}"
      }
    }
  }
}
```

Set `POSTGRES_URL` in your shell or in a `.env` file that is not committed to git.

<SlideEnd />

<SlideStart />

## Dos and don'ts

| Do | Don't |
|----|-------|
| Use MCP for stable, typed integrations with internal systems | Use MCP for one-off scripts — a skill is simpler |
| Restrict MCP server permissions to read-only when possible | Give write access without careful review |
| Test MCP tools in isolation before using them in Claude sessions | Assume Claude will handle MCP errors gracefully |
| Document what each MCP tool does in CLAUDE.md | Leave MCP tools undocumented |
| Keep MCP server config in `.claude/settings.json` committed to git | Store credentials in settings.json |

<Callout type="danger">
  Never store API keys, database passwords, or any credentials directly in `.claude/settings.json`. That file is committed to git and will end up in version history. Use environment variables and reference them with `${VAR_NAME}` in the `env` field.
</Callout>

<SlideEnd />

---

## 10. Token Reduction Strategies

<SlideStart />

## Why it matters

Every token Claude reads costs money, consumes rate-limit quota, and takes time.

**Cost.** Tokens are billed on every message, not just the text you type. Claude receives the full conversation history on each turn, so a session that starts at 500 tokens can balloon past 8,000 by turn ten. On a busy team running dozens of sessions per day, a bloated context window becomes a real line item.

**Rate limits.** The Claude API enforces tokens-per-minute limits. A context that consumes 6,000 tokens per turn leaves far less headroom than one consuming 600. When you hit a rate limit mid-task, the session stalls. Tighter contexts give you more turns before that happens.

**Quality and speed.** This one surprises people. A larger context does not produce better answers — it often produces worse ones. Claude has to filter signal from noise across everything you have included. An irrelevant file, a stale conversation turn, or a 2,000-word CLAUDE.md all compete for the model's attention. Shorter, focused context produces more accurate, faster responses. Less really is more.

<SlideEnd />

<SlideStart />

## /cost

The `/cost` command prints the token usage and estimated cost for the current session. Run it after any substantial task to build intuition for what different kinds of work actually cost.

```bash
> /cost
```

```
Session cost: $0.031
  Input tokens:  12,840  ($0.025)
  Output tokens:  1,204  ($0.006)

Total since last /clear
```

The breakdown separates input from output tokens. Input tokens are almost always the larger number — and the one most worth optimizing, because they include everything Claude has read this session.

The counter resets when you run `/clear`. If you are comparing the cost of two approaches, run `/clear`, do the work, then run `/cost` for a clean reading.

<SlideEnd />

<SlideStart />

## Context management

Two techniques let you control exactly what Claude can see before it starts reasoning.

### The @ prefix

The `@` prefix lets you include a specific file or directory in your prompt rather than letting Claude decide what to read. Compare these two prompts:

```bash
# Vague — Claude may read many files to orient itself
> claude "fix the bug in the transform module"

# Targeted — Claude reads exactly one file
> claude "@src/pipeline/transform.py the test on line 42 fails — fix it"
```

The targeted version is not just cheaper. It also produces a more focused fix because Claude is not reasoning about the rest of the codebase.

### Navigating with cd

Claude's file exploration is scoped to the working directory. A narrower working directory means fewer files in scope by default. Inside an interactive session, use `!` to run a shell command inline and scope the session to a subdirectory:

```bash
> ! cd src/features/
> the normalization function is producing NaN for single-row inputs — debug it
```

The `!` prefix runs the command in the current shell without leaving the Claude Code session. After `! cd`, Claude's file exploration and relative path resolution are both anchored to that subdirectory for the rest of the session.

<SlideEnd />

<SlideStart />

### /clear

`/clear` wipes the entire conversation history and resets the session cost counter. Use it whenever you are starting a genuinely different task. The context from the previous task adds noise and cost to the next one — there is no benefit to carrying it over.

The habit to build: treat `/clear` the way you would treat opening a new terminal tab. Each distinct problem gets a fresh start.

<SlideEnd />

<SlideStart />

### /compact

`/compact` compresses the conversation history into a summary and continues from there. It is less aggressive than `/clear` — you keep a condensed record of what was decided and done, but the raw back-and-forth is replaced with a summary.

Use `/compact` when you are mid-task and the context is growing large but you do not want to lose the thread. A good rule of thumb: if `/cost` shows your session above $0.05 and you still have significant work ahead, run `/compact` to reclaim headroom.

<SlideEnd />

<SlideStart />

## Checking CLAUDE.md size

CLAUDE.md is loaded at the start of every session. A 2,000-token CLAUDE.md costs roughly $0.006 per session at Sonnet pricing — small in isolation, but across a team running 50 sessions a day, that is $0.30 daily, or around $110 per year, just from loading instructions that may not be relevant to most tasks.

The fix is to keep CLAUDE.md lean and move specialized instructions into skills. Detailed code review criteria, deployment checklists, or dataset documentation belong in purpose-specific skills — not in CLAUDE.md. Claude only loads a skill when you invoke it, so the cost is incurred only when the instruction is actually needed.

A practical split:

| Belongs in CLAUDE.md | Belongs in a skill |
|---|---|
| Project name and language | Detailed code review criteria |
| Key commands (install, test, lint) | Deployment runbook |
| Directory structure (brief) | Dataset schema documentation |
| Coding conventions (2–3 lines) | PR checklist |

<Callout type="tip">
  Run `wc -w CLAUDE.md` to check your word count. Aim to stay under 500 words. If you are over that, look for blocks of text that are only relevant to specific tasks — those are candidates to extract into skills.
</Callout>

<SlideEnd />

<SlideStart />

## Semantic context retrieval via MCP

For very large codebases where even a targeted `@src/features/` pulls in several thousand tokens, the next step is semantic retrieval: indexing the codebase and fetching only the chunks relevant to the current query, rather than including whole files.

This is achievable through the MCP ecosystem. Tools that expose a search or retrieval interface as an MCP server can be wired into Claude Code via `.claude/settings.json`, letting Claude query for relevant context automatically before responding.

The pattern is worth knowing even if you do not need it immediately. For most codebases under a few thousand lines, `@file` targeting and `/compact` cover the majority of cases. Semantic retrieval becomes relevant when you regularly hit context limits or work across a repo too large to navigate by file reference alone.

<SlideEnd />

<SlideStart />

## Third-party tools

The built-in Claude Code commands handle most situations, but dedicated tools go further by compressing what Claude reads before it even enters the context window.

### RTK

[RTK](https://github.com/rtk-ai/rtk) is a CLI proxy that intercepts tool outputs — `git status`, `pytest`, `npm install`, container logs, linter output — and compresses them before Claude sees the result. It installs a `PreToolUse` hook into Claude Code that rewrites commands transparently: Claude calls `git status`, RTK intercepts, compresses the output, and returns a compact summary. The model never sees the rewrite.

<DiagramViewer title="RTK token compression flow">
```d2
direction: right

without: Without RTK {
  direction: right
  claude: Claude
  shell: Shell
  git: git

  claude -> shell: git status
  shell -> git
  git -> claude: ~2,000 tokens (raw)
}

with: With RTK {
  direction: right
  claude: Claude
  rtk: RTK
  git: git

  claude -> rtk: git status
  rtk -> git
  git -> rtk: raw output
  rtk -> claude: ~200 tokens (filtered)
}
```
</DiagramViewer>

Install RTK and wire it into Claude Code globally:

```bash
$ cargo install rtk
$ rtk init -g
```

The `-g` flag writes the hook into your global Claude Code settings so it applies to every project.

To see how much RTK has saved in a session:

```bash
$ rtk gain
```

```
Commands intercepted: 47
Tokens before: 18,420
Tokens after:   3,104
Reduction:        83%
```

RTK ships with compression rules for 100+ commands. For a large repository with frequent `git` and test runner calls, the savings are substantial.

<Callout type="note">
  RTK complements the built-in strategies above, it does not replace them. Start with `/compact`, `@file` targeting, and a lean CLAUDE.md. Add RTK once you have measured that built-in strategies alone are not enough.
</Callout>

<SlideEnd />

---

# Part 2 — Spec-Driven Development

---

## 1. Why Spec-Driven Development?

<SlideStart />

## The basic workflow

Most people start working with Claude Code the same way. You open a session, describe what you want in plain language, Claude writes some code, you read it, ask for changes, and repeat. For a short, well-bounded task — rename a column, write a unit test, fix a specific bug — this works well.

The problem appears when the task grows. Building a feature engineering pipeline. Designing a model evaluation harness. Refactoring a data preprocessing module that touches ten files. These tasks have many moving parts, implicit constraints, and decisions that need to be made before the first line of code is written. The basic workflow has no mechanism for any of that. You just start, and you figure it out as you go.

That approach produces a recognizable pattern: a long, wandering session that ends with code that sort of works, a nagging feeling that some decisions were wrong, and no clear record of what was decided or why. The next time you start Claude Code, that context is gone.

<SlideEnd />

<SlideStart />

## Where it breaks down

Here are the specific failure modes, each illustrated with an example from a feature engineering pipeline project.

<SlideEnd />

<SlideStart />

### Scope creep through negotiation

You ask Claude to build a pipeline that handles missing values. Claude implements median imputation everywhere. You say "actually, use mode for categorical columns." Claude changes it. You realize it should be configurable per column. You say that. Claude rewrites again, but now the API has changed. You spend ten messages negotiating the interface rather than building the thing. No individual response was wrong — Claude was responding reasonably each time. But without a clear spec, you were defining requirements through trial and error in production.

<DiagramViewer title="Scope creep through negotiation">
```d2
direction: right

prompt: "missing value\npipeline"
v1: v1\nmedian everywhere
fb1: "mode for\ncategoricals"
v2: v2\nmedian + mode
fb2: "configurable\nper column"
v3: v3\nAPI changed
fb3: "broke the\ntraining loop"

prompt -> v1
v1 -> fb1
fb1 -> v2
v2 -> fb2
fb2 -> v3
v3 -> fb3
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

### Lost context between sessions

Claude Code has no memory across sessions. Every `/clear` or new session starts from zero. In a long project, you will make dozens of decisions: this column gets log-transformed, this feature is out of scope, imputation happens before scaling, not after. Those decisions live in the conversation. Once the conversation ends, they are gone. In the next session, Claude makes different choices — reasonable ones, but different. You spend time re-explaining, and sometimes you don't realize the choices diverged until the model performance changes.

<DiagramViewer title="Lost context between sessions">
```d2
direction: right

s1: Session 1 {
  direction: down
  d1: log-transform income
  d2: impute before scaling
  d3: drop correlated features
}

gap: /clear\n(context gone)

s2: Session 2 {
  direction: down
  d1: standard scaling
  d2: scale before impute
  d3: keep all features
}

s1 -> gap
gap -> s2
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

### Misaligned assumptions

Claude will make assumptions when your prompt leaves gaps. When you say "build a feature engineering pipeline," Claude assumes something about the input data format, the output schema, the sklearn version, whether the result should be a `Pipeline` object or a plain function. Those assumptions are reasonable in the abstract. They may not match your actual system. You typically discover the mismatch 200 lines in, when the pipeline fails to fit inside your training loop.

<DiagramViewer title="Misaligned assumptions">
```d2
direction: right

prompt: "build a feature\nengineering pipeline"

claude: Claude assumes {
  direction: down
  a1: input is a DataFrame
  a2: returns a Pipeline object
  a3: sklearn 1.x API
}

actual: Your system expects {
  direction: down
  a1: input is a numpy array
  a2: returns a plain function
  a3: sklearn 0.24 API
}

conflict: Mismatch found\nat line 200

prompt -> claude
prompt -> actual
claude -> conflict
actual -> conflict
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

### Non-reproducible process

You finish a session and the pipeline works. The model trains. The evaluation looks good. Two weeks later, a colleague asks how the pipeline handles outliers. You look at the code and can't tell — there's no comment, no doc, and the conversation where you worked it out is gone. The code is the only artifact, and code doesn't explain the decisions behind it. A conversation is not a specification.

<DiagramViewer title="Non-reproducible process">
```d2
direction: right

session: Session {
  direction: down
  c1: "clip outliers at p99?"
  c2: "yes, clip them"
  c3: "or winsorize instead?"
  c4: "let's winsorize"
}

clear: /clear

code: pipeline.py {
  direction: down
  l1: winsorize(df, limits=0.01)
}

question: "Why winsorize?\nWhat threshold?"

session -> clear: conversation ends
clear -> code: only artifact remains
code -> question: no answer
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

### No review surface

A pull request with fifty changed files and no written specification is difficult to review. A reviewer can read the code, but they can't check "is this what the team agreed to?" without a written description of intent. The only source of truth is the conversation, and conversations don't diff.

<DiagramViewer title="No review surface">
```d2
direction: right

pr: Pull request\n(50 files changed)

reviewer: Reviewer

spec: Written spec {
  shape: document
}

question: "Is this what\nwe agreed to?"

pr -> reviewer
reviewer -> spec: looks for
spec -> question: does not exist
reviewer -> question: can't answer
```
</DiagramViewer>

<SlideEnd />

<SlideStart />

## The cost for DS/MLEs specifically

Software engineers face these problems too, but data scientists face them more acutely. Here is why.

Pipelines have implicit data contracts. A function that processes a pandas DataFrame embeds assumptions about column names, dtypes, the presence or absence of nulls, the range of values, and the meaning of each column. None of that is in the function signature. When Claude writes a pipeline based on a vague description, those assumptions get baked in silently. If they don't match the actual data, the pipeline may run without error and produce subtly wrong results — wrong features, silent data leakage, or an output schema that breaks downstream training.

"The code works" is not the same as "the experiment is valid." In software, a passing test suite is usually sufficient evidence that the code does what it should. In machine learning, you also need to know that the preprocessing is correct, the train/test split is sound, the label is defined consistently, and the evaluation metric matches the business goal. Claude can write code that runs correctly and still get these things wrong if the spec doesn't address them.

Experiments are inherently exploratory. You often don't know exactly what you want until you've tried a few things. That's fine — exploration is part of the work. But there's a difference between exploring deliberately and exploring because you never wrote down what you were trying to build. The basic workflow conflates the two. A spec lets you keep the exploration but eliminate the accidental ambiguity.

Decisions accumulate invisibly. A feature engineering pipeline might encode thirty decisions: which features to include, how to handle each type of missing value, what transformations to apply, what the output schema looks like. In the basic workflow, those decisions get made implicitly during the session and never written down. When a model unexpectedly degrades on new data, you need to audit those decisions. Without a written spec, you're auditing code that may not fully reflect the intent behind it.

<SlideEnd />

<SlideStart />

## What would solve this?

The failure modes above have a common root: there is no shared, written artifact that describes what should be built before Claude starts building it. The conversation substitutes for that artifact, and conversations are a poor substitute — they're ephemeral, hard to review, and don't force clarity upfront.

The solution is a specification: a structured, written description of the task that exists before any code is written. Not a vague README paragraph. Not a list of bullet points in a comment. A document that answers the hard questions explicitly — inputs, outputs, constraints, edge cases, what's out of scope — in a form that both you and Claude can treat as a contract.

When Claude has a spec, it can implement against it rather than infer against your description. It can flag when the spec is ambiguous before writing code that embeds a wrong assumption. It can validate its own output against the acceptance criteria. And when the session ends, the spec persists — so the next session starts with context, not from scratch.

This is Spec-Driven Development. The next lesson covers what it actually looks like: how a spec is structured, what belongs in one, and how to write specs that work well with Claude Code.

<SlideEnd />

---

## 2. Spec-Driven Development Concepts

<SlideStart />

## What is Spec-Driven Development?

Spec-Driven Development is a workflow where you write a structured specification before asking Claude to implement anything. The spec is a Markdown document — or a small set of documents — that describes what to build, the data contracts, the expected behavior, constraints, and acceptance criteria. Claude reads the spec and builds to it.

The spec is the source of truth. Not the conversation. Not the code. Not Claude's assumptions about what you probably meant.

A good spec has four properties:

**Readable by Claude.** Write it in plain language. No diagrams that only exist in your head, no acronyms that aren't defined somewhere in the file.

**Specific enough to converge.** If you handed the same spec to two different Claude sessions with no other context, they should make the same key design decisions. If they'd diverge significantly, the spec needs more precision.

**Small enough to fit in context.** A spec is not a novel. If it fills the context window, split it. Each spec should cover one coherent piece of work.

**Reviewable before any code is written.** The whole point is that a human reads it and signs off before implementation starts. If no one reviews the spec, you've skipped the alignment step that makes SDD worthwhile.

<SlideEnd />

<SlideStart />

## Basic Workflow vs SDD

The difference between a basic Claude session and SDD comes down to where alignment happens.

In a basic workflow, you describe what you want in a prompt, Claude builds something, you look at the code, realize it's not quite right, and you correct it. This loop repeats until the result is close enough. The alignment is happening through the code itself — which is the most expensive place for it to happen.

In SDD, you write the spec first. A human reviews it. Only then does Claude build. You review the code against the spec rather than against a vague mental model. There's a shared, written definition of "done" from the start.

<DiagramViewer title="Basic workflow vs Spec-Driven Development">
```d2
direction: right

basic: Basic Workflow {
  direction: down
  a: User prompt
  b: Claude builds
  c: User reviews code
  d: Correct?
  e: Done

  a -> b
  b -> c
  c -> d
  d -> a: no
  d -> e: yes
}

sdd: Spec-Driven Development {
  direction: down
  a: User writes spec
  b: Human reviews spec
  c: Claude builds to spec
  d: User reviews code
  e: Done

  a -> b
  b -> c
  c -> d
  d -> e
}
```
</DiagramViewer>

Correcting a spec costs minutes. Correcting code costs hours, and sometimes requires explaining to Claude why the thing it just built isn't what you wanted — a conversation that could have been a one-line edit to a spec document.

<SlideEnd />

<SlideStart />

## How It's Organized

A spec typically has five sections. None of them are optional.

**Goal.** One paragraph. What does this component do, and why does it exist? This is the context Claude needs to make sensible decisions when the spec doesn't cover every edge case.

**Inputs / Outputs.** Data schemas, types, and formats. For a Python function: the argument types and the return type. For an API: the request body schema and the response structure. Be precise — "a DataFrame" is not specific enough; "a pandas DataFrame with columns `customer_id` (str), `tenure_months` (int), `monthly_spend` (float)" is.

**Behavior.** What the component does, step by step. This is not pseudocode, but it's close. Describe the logic at a level where an engineer could implement it without guessing.

**Constraints.** What the component must not do. Performance requirements. External dependencies and their versions. Compatibility requirements with other parts of the system.

**Acceptance criteria.** The conditions under which you'll call this done. Each criterion must be testable. "Fast" is not a criterion. "Processes 100,000 rows in under 3 seconds on a standard laptop" is.

Here's a minimal spec for a feature engineering function in a churn prediction model:

```markdown title="specs/feature-engineering-tenure-buckets.md"
## Goal

Bin the `tenure_months` column into four labeled categories for use as a
categorical feature in the churn model. Avoids treating tenure as linear
when the relationship with churn is non-linear in our data.

## Inputs / Outputs

Input: pandas DataFrame with a `tenure_months` column (int, non-negative).
Output: the same DataFrame with a new column `tenure_bucket` (str, categorical).

## Behavior

1. Apply the following fixed bins: 0–6 months → "new", 7–24 → "growing",
   25–60 → "established", 61+ → "mature".
2. Bins are closed on the right (i.e., 6 maps to "new", 7 maps to "growing").
3. The original `tenure_months` column is preserved unchanged.
4. The function must be stateless — no fitting step, same bins every call.

## Constraints

- Python 3.11+, pandas 2.x
- No external dependencies beyond pandas
- Must work inside a sklearn Pipeline (implements fit/transform interface)

## Acceptance Criteria

- [ ] Given a DataFrame with tenure_months [0, 6, 7, 24, 25, 60, 61, 120],
      the output tenure_bucket values are
      ["new", "new", "growing", "growing", "established", "established", "mature", "mature"].
- [ ] Given a DataFrame with no rows, the function returns an empty DataFrame
      with the tenure_bucket column present.
- [ ] The function raises ValueError if tenure_months contains negative values.
```

Short. Concrete. Reviewable in two minutes.

<SlideEnd />

<SlideStart />

## When Is SDD Better?

SDD adds overhead. Writing a spec, reviewing it, then implementing to it takes more time than just prompting Claude directly. For small tasks, that overhead isn't worth it.

SDD pays off when one or more of these is true:

- The task is large — more than roughly two hours of implementation work
- Multiple people need to agree on what gets built before work starts
- The output has a data contract: an API response schema, a model input format, a file format that other code depends on
- You'll need to reproduce or extend this work in a future session
- The work spans multiple Claude sessions, where context from session one won't carry to session two

A one-liner like `@src/utils.py add a type hint to this function` doesn't need a spec. "Build a feature store integration for our training pipeline" does.

<Callout type="tip">
  If you're unsure whether a task needs a spec, ask yourself: could two different engineers read this prompt and build the same thing? If the answer is no, write a spec.
</Callout>

<SlideEnd />

---

## 3. Spec-Driven Frameworks

<SlideStart />

## OpenSpec

OpenSpec (openspec.dev) is a framework for writing machine-readable specifications. It defines a structured YAML/Markdown format that both tools and LLMs can parse consistently, without ambiguity about where to find the goal, the interfaces, or the acceptance criteria.

Every spec in OpenSpec follows a fixed schema: a goal section, interface definitions, behavior descriptions, and tests. That predictability is the point. When all your specs share the same structure, you can validate them automatically, diff them in pull requests, and feed them to any tool that understands the format.

OpenSpec specs live in version control alongside the code they describe. The spec and the implementation evolve together, which makes it harder for documentation to drift out of sync.

This approach works best for teams that need consistency across multiple projects or contributors. If five engineers are each writing specs their own way, OpenSpec gives you a shared language without forcing a heavy process on anyone.

<SlideEnd />

<SlideStart />

## spec-kit

spec-kit (github.com/github/spec-kit) is GitHub's approach to spec-driven development. It was designed for teams using GitHub Copilot and centers on lightweight spec files stored in `.github/specs/`. Before any code is written, a spec file describing the feature lands in that directory and goes through pull request review.

The integration with GitHub's review process is the practical advantage here. Reviewers see the spec before they see the implementation. If the spec is wrong or incomplete, that gets caught before any code exists, which is a much cheaper point to fix it.

spec-kit suits teams that are already working inside GitHub and using Copilot for code generation. The tooling assumption is built in — specs are artifacts that Copilot can read to generate implementations that match the intent. If your team is not on that stack, the integration benefits disappear and you are left with a Markdown convention you could replicate yourself.

<SlideEnd />

<SlideStart />

## BMAD

BMAD (Breakthrough Method for Agile Development, bmad.fr) is the most structured of the four options. It defines four roles — Analyst, Architect, Developer, QA — and a corresponding set of artifact types: Brief, PRD, Architecture doc, and Stories.

The workflow moves linearly through these roles. The Analyst gathers requirements and produces a Brief and PRD. The Architect translates those into a technical Architecture doc. The Developer writes code against Stories derived from that architecture. The QA validates the output against the acceptance criteria in those Stories.

<DiagramViewer title="BMAD workflow">
```d2
direction: right

analyst: Analyst
architect: Architect
developer: Developer
qa: QA

analyst -> architect: "Brief + PRD"
architect -> developer: Architecture doc
developer -> qa: "Stories + code"
```
</DiagramViewer>

That sounds heavyweight, and on a large team it is deliberately so. That traceability is the point for organizations that need audit trails or where multiple people are building together with Claude.

For a solo data scientist, BMAD is worth understanding differently. You are not adopting four roles — you are adopting four thinking modes. Writing a brief before you architect, and an architecture before you code, enforces a discipline that prevents the most common failure mode in AI-assisted development: jumping to implementation before the problem is fully understood.

BMAD is a strong fit for MLOps projects where pipelines, models, APIs, and monitoring are separate concerns owned by different people, and where a requirement that was misunderstood at the start costs weeks to fix later.

<SlideEnd />

<SlideStart />

## Custom approaches

Many teams do not need a full framework. They need a template — something that ensures everyone asks the same questions before starting work.

The start-work project (github.com/prillcode/start-work) is a useful reference. It is a shell script combined with a Markdown template that generates a spec file at the start of any new Claude session. The automation is minimal; the value is in the habit it enforces.

The minimal version of this approach is a `SPEC.md` template committed to your repository. Before opening a new Claude session for a feature, fill it in. That single practice eliminates most of the context loss that happens when you start coding without a clear goal.

```markdown title="SPEC.md"
# Feature: [name]

## Goal
[One paragraph]

## Inputs
[Data types and formats]

## Outputs
[Data types and formats]

## Behavior
[Step by step]

## Constraints
[What it must NOT do]

## Done when
[Testable acceptance criteria]
```

The six sections cover what the feature does, what it touches, and how you know it is finished. That last section matters most. The "Done when" section forces you to write acceptance criteria before any code exists — and that single habit has the biggest impact on what Claude actually builds.

You can extend this template as your needs grow. Add a section for data schemas if you work with structured data. Add a rollback plan if you are deploying models. The format is yours to adapt.

<SlideEnd />

<SlideStart />

## Comparison

| | OpenSpec | spec-kit | BMAD | Custom |
|---|---|---|---|---|
| Complexity | Medium | Low | High | Low |
| Tooling | Yes | Yes (GitHub) | Yes | None |
| Best for | Cross-team consistency | GitHub-centric teams | Large structured projects | Small teams, fast start |
| Learning curve | Medium | Low | High | Minimal |
| DS/ML fit | Good | Moderate | Good for MLOps | Best for experimentation |

<Callout type="tip">
  If you are a solo data scientist or a small team, start with a custom SPEC.md template. Adopt a framework only when the pain of inconsistency outweighs the cost of learning it.
</Callout>

<SlideEnd />

---

# Cheatsheets

---

## Prompt Modes & Input Prefixes

## Prompt modes

| Mode | How to activate | What Claude can do | When to use |
|------|-----------------|-------------------|-------------|
| Default | Launch Claude Code (always on by default) | Read files, edit files, run commands | Most everyday work |
| Plan | `Shift+Tab` twice, or `/plan` | Read and reason only — no edits, no commands | Before large or risky changes |
| Accept edits | `Shift+Tab` once | Auto-apply file edits without per-change confirmation | Trusted batch edits after reviewing a plan |

## Input prefixes

| Prefix | Name | Purpose | Example |
|--------|------|---------|---------|
| `@` | File reference | Include a specific file or directory in context | `@src/model.py` |
| `!` | Shell command | Run a command inline and pass its output to Claude | `! pytest tests/ -v` |
| `&` | Background task | Run a command in the background; Claude keeps responding | `& python src/train.py` |

---

## Tokens, Context & Pricing

## Term definitions

| Term | Definition |
|------|------------|
| Token | The basic unit of text processed by the model. Roughly 3–4 characters or ¾ of an English word. |
| Context window | The total space available for the current session: CLAUDE.md + conversation + files + tool outputs. |
| Input tokens | All tokens sent to the model in a request. Includes everything in the context window at that moment. |
| Output tokens | Tokens in Claude's response. Billed at a higher rate than input tokens. |
| `/cost` | Claude Code command that reports token usage and estimated cost for the current session. |

## Model cost tiers (relative)

| Model | Speed | Cost | Best for |
|-------|-------|------|----------|
| Haiku | Fastest | Cheapest | Quick lookups, simple edits, high-volume tasks |
| Sonnet | Balanced | Mid-range | Default for most Claude Code work |
| Opus | Slowest | Most expensive | Complex reasoning, architecture decisions |

*Exact per-token pricing changes. Check [anthropic.com/pricing](https://www.anthropic.com/pricing) for current rates.*

---

## CLI Commands

## All commands

| Command | Purpose | Example |
|---------|---------|---------|
| `claude` | Open interactive session | `$ claude` |
| `claude "..."` | Session with initial prompt | `$ claude "explain src/pipeline.py"` |
| `claude --model <id>` | Open session with specific model | `$ claude --model opus` |
| `claude --help` | Show CLI flags | `$ claude --help` |
| `/help` | List all slash commands | `/help` |
| `/clear` | Wipe conversation history | `/clear` |
| `/compact` | Compress history, keep recent context | `/compact` |
| `/cost` | Show token usage and cost | `/cost` |
| `/model` | Switch model mid-session | `/model` |
| `/plan` | Toggle Plan mode (no file edits) | `/plan` |
| `/init` | Generate CLAUDE.md for current project | `/init` |
| `! <cmd>` | Run shell command inline | `! git status` |
| `& <cmd>` | Run command in background | `& python src/train.py` |
| `@<file>` | Add file to context (input prefix, not a command) | `@src/train.py` |

---

## Custom Commands

## Where commands live

```
.claude/
└── skills/
    ├── review-notebook/
    │   └── SKILL.md          # → /review-notebook
    ├── summarize-pr/
    │   └── SKILL.md          # → /summarize-pr
    └── check-leakage/
        └── SKILL.md          # → /check-leakage
```

## Naming convention

- Directory name becomes the command name
- Use kebab-case: `review-pr`, not `reviewPR` or `review_pr`
- Keep names short and action-oriented: `/doc`, `/review`, `/test`

## SKILL.md structure

```markdown
---
name: review-notebook
description: Review a Jupyter notebook for reproducibility and data leakage
---
Review the notebook at $ARGUMENTS for...
```

## `$ARGUMENTS` syntax

| You type | `$ARGUMENTS` value |
|----------|--------------------|
| `/review src/model.py` | `src/model.py` |
| `/compare models/v1.pkl models/v2.pkl` | `models/v1.pkl models/v2.pkl` |
| `/review` (no argument) | *(empty string)* |

## How to invoke

Skills in `.claude/skills/` are loaded automatically — no activation step needed.

```bash
$ claude
> /review-notebook notebooks/churn_model.ipynb
> /summarize-pr #87
> /check-leakage src/features/
```

---

## Token Reduction Strategies

## Techniques at a glance

| Technique | When to use | Token impact |
|-----------|-------------|--------------|
| `@file` | Target a specific file | High — avoids broad exploration |
| `cd` into subdirectory | Working on one component | Medium — narrows file discovery scope |
| `/clear` | Starting a new, unrelated task | High — resets full history |
| `/compact` | Mid-task, context growing large | Medium — compresses history to summary |
| Trim CLAUDE.md | Always — keep under 500 words | Medium — saves tokens on every session |
| Move instructions to skills | When instructions are task-specific | Medium — loads only when invoked |
| MCP retrieval tool | Large codebase, semantic search needed | High — loads relevant chunks, not whole files |