The three things that matter most
If you change only three habits, change these: keep a lean CLAUDE.md so every turn does not carry a tax, manage context deliberately with /compact and /clear instead of letting one session sprawl, and push verbose work like log reading and test runs into subagents. Everything else is a refinement of these three.
Write a CLAUDE.md that earns its tokens
CLAUDE.md is injected into every request, so a bloated one is a fixed tax on every turn. Keep it tight — well under 500 tokens is a good target. Put durable facts in it: build and test commands, project conventions, where things live, and hard rules. Leave out anything that changes per task or that the agent can discover by reading a file. If a rule is really a repeatable action, it usually belongs in a hook or a skill, not in the always-on context.
Manage context with /compact and /clear
A coding session accumulates files, diffs, tool output, and failed attempts, and every new turn can carry more than the last. Use /clear when you switch to unrelated work so the next task starts clean. Use /compact when the current state still matters but the history is noisy — just remember compaction is lossy, so it summarizes rather than perfectly preserves. The instinct to keep one long session open because it is convenient is the most common source of surprise token bills.
Use subagents — but know the trade-off
Subagents are the strongest tool for keeping your main context clean: hand a noisy job (read this 10,000-line log, run the suite, fetch these docs) to a subagent, and only a short summary returns to your main thread. The honest caveat is that each subagent spins up its own context and instructions, so heavy fan-out can multiply total tokens even as it keeps any single thread lean. Delegate the genuinely verbose work; do not spawn a subagent for a one-line task.
Preprocess before the model ever sees it
The cheapest token is the one you never send. A hook that greps a log for ERROR and returns the matching lines turns tens of thousands of tokens into hundreds. Filter PDFs, lockfiles, generated output, and repo dumps before they reach the prompt. Paste the failing function and the exact error, not the whole file and the full terminal history.
Common mistakes that quietly cost you
Leaving extended thinking on for tasks that do not need it, asking broad questions (refactor the whole module) instead of scoped ones (fix this function), keeping a single session alive across unrelated tasks, and letting an agent loop on the same error for several turns without intervening. Each is small in isolation and expensive in aggregate.
Know whether the habits are actually working
Best practices reduce waste, but it is hard to tell by feel. Tokens 4 Breakfast keeps your Claude Code usage, 5-hour window, and weekly pressure in the macOS menu bar, so you can see whether a leaner CLAUDE.md or more aggressive compaction is actually moving the number. It is free for one provider and takes about two minutes to set up.