LLM Cost Planning Calculator

Model AI inference spend across providers and workloads. Estimate, compare models side-by-side, and calculate unit economics for individual features — before you ship and before costs surprise you.

OpenAI · Anthropic · Google Gemini·Scenario planning · Growth projection·Unit economics & margin analysis

Workload Configuration

Provider

Model

Input tokens / request

Prompt + system + context

Output tokens / request

Completion length

Cached tokens / request

Prompt cache reads (if enabled)

Requests / month

Expected monthly call volume

Cost / request

$0.002940

incl. 5% retries

Cost / 1k requests

$2.940

Monthly estimate

$29.40

10,000 req/mo

Annual estimate

$352.80

at current volume

Cost Breakdown per Request

Input 29% · $0.000800/req

Output 71% · $0.002000/req

Largest cost driver: Output tokens (71% of request cost). Output tokens are typically 4–6× more expensive than input. Shorter completions or explicit max_tokens limits have the highest cost impact.

Retry rate of 5% adds $0.000140 per nominal request in expected retry overhead.

Scenario Planning

Scenarios model realistic token and volume variations. Use the optimized row as a target state for prompt engineering and caching work.

Scenario	Monthly	vs Base
Base case As configured above	$29.40	base
Optimized with caching 70% of input context cached; 25% shorter completions	$20.60	-30%
With prompt caching (70%) 70% cached input tokens; full output tokens	$25.20	-14%
High output scenario 2.5× output tokens (verbose completions or CoT)	$60.90	+107%
Worst case 1.5× input, 2.5× output, no caching, 20% retries	$74.40	+153%

Pricing assumptions & limitations

Prices are drawn from the CostLynx pricing catalog (updated periodically) or your connected DB if available. Provider rates change without notice — verify current rates on the provider's pricing page before making financial commitments.

Cached tokens use prompt-cache read rates (typically 50–90% discount on input price). Cache eligibility depends on provider-specific minimum prefix lengths and TTL policies.

Output tokens are typically 4–6× more expensive than input. Chain-of-thought reasoning models (o1, o3) may use significantly more reasoning tokens internally, which are not reflected in your output token count.

This tool does not account for batch API discounts, committed-use agreements, free-tier allocations, or regional pricing differences. Actual invoiced cost may differ.