AI API pricing is opaque, inconsistent, and changes frequently. We built costofaicalc.com to give developers and teams a single, clear tool for understanding and forecasting AI infrastructure costs.

No paywalls. No accounts. Just open the calculator you need and get answers in seconds. Prices are sourced directly from provider documentation and updated when changes are announced.

Suggest

Missing a model?

If a model or pricing update is missing, let us know and we'll add it.

Model name

Provider

Input price ($/1M tokens)

Output price ($/1M tokens)

Source URL (optional)

01 — Calculator

Token Cost Calculator

Estimate the exact API cost for a single call or average request across any major language model.

Model

Input tokens (prompt)

Prompt, system message, context, documents

Output tokens (completion)

Generated response length

Paste your prompt

~4 chars per token estimated

Expected output tokens

Number of requests

Multiply for batch estimation

Total cost

$0.0000

per request

Input cost $0.0000

Output cost $0.0000

Input price —

Output price —

Total tokens —

Cost per 1M tokens —

1,000 requests —

1M requests —

Daily (100 req/day) —

Monthly (3k req/day) —

Tip: Output tokens are typically 2–5× more expensive than input tokens. Caching repeated prompts can save 80–90% on input costs with providers that support prompt caching.

03 — Calculator

Monthly Spend Estimator

Forecast your monthly API bill from real usage patterns. Set volumes, average lengths, and see a full cost breakdown.

Model

Daily requests

Avg input tokens per request

Include system prompt in this count

Avg output tokens per request

Prompt cache hit rate (%)

Cache hits typically cost 10% of normal input price

Growth rate (% per month)

Month 1 cost

$0.00

Month 6 cost (with growth)

$0.00

Month 12 cost

$0.00

Monthly requests —

Monthly input tokens —

Monthly output tokens —

Cache savings —

Cost per request —

At scale: Costs compound quickly with growth. Consider negotiating volume discounts at $10k+/month spend. Switching to a smaller model can cut costs 80–95%.

04 — Calculator

ROI Calculator

Calculate the true return on your AI investment. Compare labor costs saved against API fees to find net benefit and payback period.

Task type

Tasks per month

Time per task without AI (minutes)

Time per task with AI (minutes)

Hourly labor cost (fully loaded, $)

Include salary, benefits, overhead

Monthly API cost ($)

Use the Monthly Spend Estimator to calculate this

Implementation / setup cost ($)

One-time: dev time, tooling, testing

Monthly net savings

Payback period

—

Labor saved (monthly) —

API cost (monthly) —

Time saved per task —

Total hours saved/mo —

Efficiency gain —

Annual benefit —

Configure inputs to see your ROI analysis.

05 — Calculator

Context Window Cost

Understand the true cost of large context windows, document ingestion, and long conversations.

Model

System prompt (tokens)

Documents / knowledge base (tokens)

~750 words ≈ 1,000 tokens

Conversation turns (messages back and forth)

Tokens per user message

Tokens per assistant response

Daily conversations

Cost per conversation

$0.0000

Total context at end —

Avg input per turn —

Total input tokens —

Total output tokens —

Daily cost —

Monthly cost —

Context grows fast. In a 10-turn conversation, turn 10 includes ALL previous messages in the input. Long system prompts and documents are repeated on every turn — a major hidden cost.

06 — Calculator

Batch Processing Cost

Compare real-time API calls vs batch processing. Batch APIs (like Anthropic's) offer 50% discounts for async workloads.

Model

Total requests to process

Avg input tokens per request

Avg output tokens per request

Batch discount (%)

Anthropic Batch API: 50%. OpenAI Batch API: 50%.

Real-time cost

—

Batch cost

—

You save

—

Total input tokens —

Total output tokens —

Real-time per request —

Batch per request —

Savings % —

Best for: Bulk data enrichment, document processing, large-scale inference, overnight jobs. Batch APIs accept the same payloads but run async — results within 24h. Not suitable for real-time user interactions.