AI API Pricing Comparison

Every major model's API cost in one table. Prices per million tokens, updated monthly.

Last updated:

Gemini 2.0 Flash is the cheapest quality option at $0.10/M input. For flagship models, GPT-4o offers the best price-to-performance ratio. Claude Opus 4 is the most expensive but leads in long-document analysis and coding tasks.

All Models — March 2026

Model Provider Input $/M Output $/M Context Best For
GPT-4o OpenAI $2.50 $10.00 128K Multimodal, general
Claude Opus 4 Anthropic $15.00 $75.00 200K Long docs, analysis
Claude Sonnet 4 Anthropic $3.00 $15.00 200K Balanced performance
Gemini 2.5 Pro Google $1.25 $10.00 1M Massive context
GPT-o3 OpenAI $10.00 $40.00 200K Complex reasoning
GPT-o3-mini OpenAI $1.10 $4.40 200K Budget reasoning
Grok 3 xAI $3.00 $15.00 131K Real-time data
Llama 4 Maverick Meta $0.20 $0.60 1M Cost efficiency
Mistral Large 2 Mistral $2.00 $6.00 128K EU compliance
Qwen 3 235B Alibaba $0.80 $3.20 128K Multilingual
Command R+ Cohere $2.50 $10.00 128K RAG / Enterprise
Claude Haiku 4.5 Anthropic $0.80 $4.00 200K Fast & cheap
GPT-4o-mini OpenAI $0.15 $0.60 128K Budget tasks
Gemini 2.0 Flash Google $0.10 $0.40 1M Cheapest quality

Key Takeaways

  • Cheapest flagship: GPT-4o at $2.50/$10.00 per M tokens
  • Best budget: Gemini 2.0 Flash at $0.10/$0.40 — 25x cheaper than GPT-4o
  • Largest context: Gemini and Llama 4 at 1M tokens (2,500 pages)
  • Open source winner: Llama 4 Maverick at $0.20/$0.60 via API providers
  • Most expensive: Claude Opus 4 at $15/$75 — justified for complex analysis

Frequently Asked Questions

What are tokens?

Tokens are chunks of text — roughly 3/4 of a word. "Hello world" is 2 tokens. Pricing is per million tokens (M), so $1/M input means processing 750,000 words costs about $1.

Why is output more expensive than input?

Generating text (output) requires more computation than reading it (input). The model has to make decisions for each token it produces, which is more GPU-intensive than encoding input.

Which model offers the best value?

For most use cases, Gemini 2.0 Flash offers the best quality-per-dollar ratio. For tasks requiring top-tier reasoning, Claude Sonnet 4 offers the best balance of capability and cost.