LLM Cost Planning Calculator
Model AI inference spend across providers and workloads. Estimate, compare models side-by-side, and calculate unit economics for individual features — before you ship and before costs surprise you.
Workload Configuration
Prompt + system + context
Completion length
Prompt cache reads (if enabled)
Expected monthly call volume
Cost / request
$0.002940
incl. 5% retries
Cost / 1k requests
$2.940
Monthly estimate
$29.40
10,000 req/mo
Annual estimate
$352.80
at current volume
Cost Breakdown per Request
Largest cost driver: Output tokens (71% of request cost). Output tokens are typically 4–6× more expensive than input. Shorter completions or explicit max_tokens limits have the highest cost impact.
Retry rate of 5% adds $0.000140 per nominal request in expected retry overhead.
Scenario Planning
Scenarios model realistic token and volume variations. Use the optimized row as a target state for prompt engineering and caching work.
| Scenario | Monthly | vs Base | |
|---|---|---|---|
Base case As configured above | $29.40 | base | |
Optimized with caching 70% of input context cached; 25% shorter completions | $20.60 | -30% | |
With prompt caching (70%) 70% cached input tokens; full output tokens | $25.20 | -14% | |
High output scenario 2.5× output tokens (verbose completions or CoT) | $60.90 | +107% | |
Worst case 1.5× input, 2.5× output, no caching, 20% retries | $74.40 | +153% |
Pricing assumptions & limitations
Prices are drawn from the CostLynx pricing catalog (updated periodically) or your connected DB if available. Provider rates change without notice — verify current rates on the provider's pricing page before making financial commitments.
Cached tokens use prompt-cache read rates (typically 50–90% discount on input price). Cache eligibility depends on provider-specific minimum prefix lengths and TTL policies.
Output tokens are typically 4–6× more expensive than input. Chain-of-thought reasoning models (o1, o3) may use significantly more reasoning tokens internally, which are not reflected in your output token count.
This tool does not account for batch API discounts, committed-use agreements, free-tier allocations, or regional pricing differences. Actual invoiced cost may differ.