Cost monitoring for LLM APIs over time

Question

Accepted Answer

Three layers of cost surveillance. **(1) Per-call accounting.** Log input tokens, output tokens, cached tokens (if applicable), model snapshot, provider, and cost per call. Most providers return token counts in the response; otherwise tokenize locally with `tiktoken` (OpenAI) or the provider's tokenizer. Multiply by the current published per-1K rate. **(2) Aggregate trend monitoring.** Daily and weekly totals broken down by snapshot. Watch for: average tokens per call drifting up (the model got more verbose), cache-hit rate drifting down (your prompts changed or provider tightened caching), per-request cost on a fixed eval suite drifting up (provider raised prices or tokenizer changed). **(3) Provider-side benchmark.** Run a fixed eval suite against each snapshot daily and track cost-per-suite-run as a metric. This isolates provider-side changes from your-side changes. Anthropic and OpenAI have both adjusted prices mid-snapshot historically — `gpt-4o` got cheaper in Aug 2024; Claude prompt caching was added and altered pricing math. Without a fixed baseline you cannot detect these. Tokenizer changes are a sneaky cost issue. Claude 3 and Claude 3.5 use slightly different tokenization; same English text can tokenize to different counts, affecting both cost (per-token billing) and context-window usage. ModelWatch tracks per-snapshot cost-per-call on its standard eval suite and alerts when the unit cost moves >5 percent week-over-week.