Guide · Cost5 min read

Reasoning effort is a hidden line item on your AI bill

Reasoning models think before they answer, and those thinking tokens bill at the same rate as the answer. Crank effort to maximum on every request and you can pay many times over for quality you never needed. The fix is simple: match effort to the shape of the task, not its importance. Here is how to choose, model by model, and how to see what it is costing you.

Quick answer

Reasoning effort controls how many hidden thinking tokens a model spends before answering, and those tokens bill at output rates. Use low or minimal effort for recall, extraction, and formatting; keep medium as your default; reserve high or max for architecture, hard debugging, and genuine multi-step reasoning. A task that spends 4,000 thinking tokens before a 500-token answer can cost roughly 9x the bare answer, while accuracy gains from extra thinking flatten quickly. Tokens 4 Breakfast shows the token spend that effort drives, by session and model, so a high-effort habit does not quietly inflate your bill.

Download Free for macOS Read the token waste guide

Step by step

The full walkthrough

Each step stands on its own — skip to the one that matches where you are.

Know what reasoning effort actually is
Reasoning models produce hidden thinking tokens before the visible answer, and higher effort means a larger thinking budget. The catch is that those thinking tokens are billed at the same rate as output tokens, so effort is a direct multiplier on cost, not a free quality dial.
Use one rule: match effort to the shape of the task
Match effort to how much reasoning the task needs, not how important it feels. Recall, extraction, reformatting, and simple edits need almost none, so use low or minimal. Multi-step problems with real depth, like architecture, tricky debugging, or algorithms, are where high effort earns its cost. When unsure, start at medium and escalate only on evidence.
Set effort per model
Claude Code exposes Low, Medium, High, and Max, and you can cap the ceiling with MAX_THINKING_TOKENS (8000 is a sensible default) or dial it per task with /effort. Note that the default recently shifted toward high, which can quietly raise costs unless you override it. OpenAI's GPT-5.x and Codex take a reasoning_effort from none through xhigh, and Gemini exposes a thinking budget. The idea is the same everywhere: spend less thinking on shallow tasks.
Do the cost math before you default to maximum
A request that burns 4,000 thinking tokens before a 500-token answer costs roughly 9x the bare answer. Multiply that across hundreds of daily calls and max effort becomes one of the largest lines on your bill. The payoff is not linear either: accuracy from extra thinking improves logarithmically, with steep early gains that flatten fast, so most of the spend past medium buys very little.
See what effort is costing you, then dial it
You cannot tune what you cannot see. Tokens 4 Breakfast tracks the token spend your effort settings drive, broken out by session and model, live in the Mac menu bar. When a day spikes, you can tell whether a high-effort habit or an Opus-heavy session is behind it, and dial effort down where it was never needed.

Pro tips

Escalate on evidence, not on nerves: test a task at medium first, and only raise effort if the result is actually wrong.
Cap the ceiling once with MAX_THINKING_TOKENS=8000 so no single request can run away on thinking tokens.
Defaults are creeping toward high effort. Check your config and override it for the classes of task that never needed it.

FAQ

Common questions

Short, direct answers to the things people ask most about this.

Do reasoning or thinking tokens cost money?

Yes. The hidden thinking tokens a model spends before answering are billed at the same rate as output tokens. Higher reasoning effort means more of them, so effort directly raises cost.

Is higher reasoning effort always better quality?

No. Accuracy from extra thinking improves logarithmically, with steep early gains that flatten quickly. On recall, extraction, and formatting it adds cost with little or no quality gain, so reserve high effort for genuine multi-step reasoning.

How do I see how much reasoning effort is costing me?

Tokens 4 Breakfast shows your token spend by session and model in the Mac menu bar, so you can spot when a high-effort habit or an expensive model is inflating your bill and adjust before the invoice arrives.

Keep reading

Deep dive: how to choose reasoning effort Reduce AI token waste Claude Code cost tracker

Reasoning effort is a hidden line item on your AI bill

The full walkthrough

Know what reasoning effort actually is

Use one rule: match effort to the shape of the task

Set effort per model

Do the cost math before you default to maximum

See what effort is costing you, then dial it

Common questions

Do reasoning or thinking tokens cost money?

Is higher reasoning effort always better quality?

How do I see how much reasoning effort is costing me?