Pour mieux comprendre le modèle de paiement au token, j'ai trouvé un article très bien fait, en anglais, très didactique, expliquant ce que sont les token d'entrée (les questions posées ou autres..), et ceux de sortie, les réponses des AI, qui coûtent plus chers disant que mêmes les ponctuation sont comptées, le rôle de l'historique dans un échange avec l'IA, des exemples de prix pratiqués par les différents opérateurs...
L'un des problèmes des entreprises qui utilisent l'IA est que le coût total est difficilement prévisible sans mettre de limites au contrat.
Voir cet article de Mind Studio :
What Is Token-Based Pricing for AI Models
Understand AI model pricing. Learn how token-based pricing works, why output tokens cost more than input, and how to estimate costs across providers.
MindStudio Team · February 6, 2026
Understanding Tokens: The Currency of AI
When you use AI models like GPT-4, Claude, or Gemini, you’re charged based on tokens. A token is a small chunk of text that AI models process. Think of tokens as the fundamental unit of work in AI systems.
Here’s a simple breakdown:
- 1,000 tokens equals roughly 750 words in English
- The word “hello” is typically one token
- The word “tokenization” might be split into two tokens: “token” and “ization”
- Punctuation marks and spaces count as tokens too
AI models don’t read text the way humans do. They convert everything into numerical representations called tokens. Every prompt you send and every response you get consumes tokens. And every token costs money.
How Token-Based Pricing Works
Token-based pricing is straightforward: you pay for what you use. Most AI providers charge separately for input tokens (what you send) and output tokens (what the model generates).
The basic formula looks like this:
Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
For example, if you send a 500-token prompt to GPT-4 and get back a 200-token response:
- Input cost: 500 tokens × $0.01 per 1,000 tokens = $0.005
- Output cost: 200 tokens × $0.03 per 1,000 tokens = $0.006
- Total: $0.011 per request
This seems cheap for a single request. But multiply that by 10,000 daily users, and you’re looking at $110 per day, or $3,300 per month. Scale matters.
Input vs. Output Token Pricing
Output tokens almost always cost more than input tokens. Here’s why: generating text requires more computational work than processing it. The model needs to predict each token one at a time, running complex calculations for every word it produces.
Typical pricing patterns in January 2026:
- Input tokens: $0.15 to $5.00 per million tokens
- Output tokens: $0.60 to $25.00 per million tokens
- Output tokens typically cost 3-5x more than input tokens
Token Counting Isn’t Universal
Different AI providers count tokens differently. Each model uses its own tokenizer, which means the same text can produce different token counts across providers.
A developer testing three models found the same text produced:
- Model A: 7 tokens
- Model B: 8 tokens
- Model C: 9 tokens
This matters for cost estimation. You can’t assume token counts transfer directly between providers.
Common Tokenization Methods
Most modern AI models use subword tokenization approaches:
- Byte-Pair Encoding (BPE): Used by OpenAI’s GPT models
- WordPiece: Common in Google’s models
- SentencePiece: Used by various open-source models
Each method splits text differently. BPE might handle “unhappiness” as “un-happiness” while another tokenizer might keep it as one unit.
AI Model Pricing Comparison
Token pricing varies dramatically across providers. As of January 2026, here’s what major models charge:
Budget-Friendly Options
Gemini 2.0 Flash Lite and Gemini 1.5 Flash lead in affordability at $0.08 per million input tokens and $0.30 per million output tokens.
GPT-4o Mini offers strong value at $0.15 input and $0.60 output per million tokens. It delivers GPT-4 level quality at 93% lower cost with multimodal capabilities.
Mid-Range Models
GPT-4o: $2.50 input, $10.00 output per million tokens
Claude 3.5 Sonnet: $3.00 input, $15.00 output per million tokens
Gemini 2.0 Pro: $1.25 input, $5.00 output per million tokens
Premium Models
Claude Opus 4.5: $5.00 input, $25.00 output per million tokens. This model handles complex reasoning tasks and offers 200K token context windows.
GPT-5 (reasoning models): $15.00 input, $75.00 output per million tokens. These models use extended chain-of-thought processes for advanced problem-solving.
Specialized Pricing
Some providers offer additional pricing tiers:
- Batch API: 50% discount for non-urgent workloads with 24-hour turnaround
- Prompt caching: Cached tokens cost roughly 10x less than regular input tokens
Vendors also adjust pricing in non-obvious ways via “multiplier tables” rather than raw per-token rates. GitHub Copilot’s new multiplier table, for example, raised effective costs on several models without changing the headline price-per-token — a reminder to check how each provider actually bills, not just what they list.
- Reasoning tokens: Separate pricing for internal reasoning steps, often 10-30x more expensive
What Affects Token Costs
Several factors influence how many tokens you consume and what you pay:
Prompt Length
Longer prompts consume more input tokens. A detailed system prompt with examples and instructions might use 2,000-5,000 tokens before you even send user input.
Context matters too. If you’re building a chatbot that maintains conversation history, each exchange adds tokens. A 10-turn conversation can easily accumulate 15,000+ tokens.
Response Length
Output token costs dominate most bills because responses are typically longer than prompts. A support chatbot generating 500-word answers consumes far more tokens than the brief questions it receives.
Context windows determine how much information a model can process at once. Larger windows enable more sophisticated analysis but increase token consumption.
Common context window sizes in 2026:
- Small models: 4K-32K tokens
- Standard models: 128K-200K tokens
- Extended models: 1M-10M tokens
Models with larger context windows often charge more per token, especially for prompts exceeding certain thresholds. Some providers use tiered pricing where tokens 0-128K cost less than tokens 128K-256K.
Language and Script
Non-English text typically requires more tokens. The same meaning expressed in English might need 20-30% more tokens in languages like Arabic, Chinese, or Hindi.
This happens because most AI models were trained primarily on English text. Their tokenizers are optimized for English word patterns, making other languages less efficient to encode.
Technical Content
Code, mathematical formulas, and technical jargon often tokenize inefficiently. Special characters, indentation, and structured data formats can inflate token counts by 30-40% compared to plain text.
Model Architecture
Different models have different vocabulary sizes, which affects tokenization efficiency. Models with larger vocabularies (like GPT-OSS-120B with 200,019 tokens) can represent text more efficiently than models with smaller vocabularies.
Hidden Token Costs
The tokens you see in your prompts and responses aren’t the only ones you pay for:
System Prompts
Many applications include hidden system prompts that set behavior and context. These prompts can add 500-3,000 tokens to every request.
Tool Definitions
If your AI agent uses tools or functions, each tool definition adds tokens to your context. A chatbot with access to 10 different APIs might consume an extra 2,000-5,000 tokens per request just for tool descriptions.
Retrieval-Augmented Generation (RAG)
RAG systems retrieve relevant information from databases before generating responses. This retrieved context adds 2,000-10,000 tokens per query, depending on your retrieval settings.
Conversation History
Maintaining conversation context means sending previous messages with each new request. A 5-turn conversation might accumulate 8,000-12,000 tokens of history.
Reasoning Tokens
Advanced reasoning models like GPT-5 generate internal reasoning traces before producing final answers. These “thinking tokens” can multiply your costs by 10-30x for complex queries.
...
https://www.mindstudio.ai/blog/token-based-pricingBon, MinStudio a elel-même une offre pour accéder à 200 modèles d'IA, donc sa présentation n'est pas forcément complétement objective.