AI agent
Controlling agent cost
Pick models per task, cap context, set monthly budget — three knobs to keep bills sane.
Why bills explode
"I just asked it to organize a few docs and it cost $20" — because the agent calls tools more than once. A complex task may run 50+ tool-call rounds, each shipping the current context (tens of thousands of tokens) back to the model. Sonnet at $3/M input × 50 rounds × 30K context = 1.5M input tokens = $4.50 for one task.
The fix is not "use the agent less" — it is "configure routing properly": cheap models for simple work, strong models for hard problems, concurrency caps so you don't blow the budget in one shot.
Three core habits
- Simple tasks → haiku / mini (10× cheaper)
- AI fields default to manual refresh, not auto
- Set a monthly cap in Settings → Billing
Model routing (Pro)
Pro unlocks mounting multiple providers at once and routing per task type. For example: code edits → sonnet, summaries → haiku, deep research → opus.
{
"routing": {
"default": "anthropic:claude-sonnet-4-5",
"rules": [
{
"when": { "tool": ["Read", "grep", "glob"] },
"use": "anthropic:claude-haiku-4"
},
{
"when": { "subagent": "research-summarizer" },
"use": "openai:gpt-4o-mini"
},
{
"when": { "context_tokens_gt": 80000 },
"use": "anthropic:claude-opus-4"
}
]
}
}Hard budget gates
- Settings → Billing → Monthly cap: hit it and the agent halts
- Per-session cap: stop automatically when a chat uses up its budget
- Per-tool-call cap: reject a call whose output exceeds X tokens
- Pair with a
Stophook to log token usage to a table for monthly review
Shrinking context size
The longer a chat lives, the bigger the context, the higher the cost. Three moves help immediately: use @ to reference precisely instead of dumping a folder, start a New Chat periodically, and offload long work to subagents so the parent stays slim.
Keep .kition/agent.md under 2KB — it loads into the system prompt every turn. Put project knowledge into searchable docs rather than stuffing it here.