La Fibre
Fonctionnement du forum => A lire avant de commencer... =>
Bistro => Discussion démarrée par: alain_p le 01 juin 2026 à 22:38:49
-
Il y a plusieurs aspects dans le retour sur investissements de l'IA. Bien sur, pour les industriels des semi-conducteurs, qui fabriquent des GPUs, de la RAM, des SSDs, voir de matériel électrique, tels Nvidia, TSMC, Samsung, HKhynix, Micron, AMD..., le boom des investissements dans l'IA, des centaines de milliards d'euros est une manne financière extraordinaire qui paie plusieurs fois les investissements faits pour développer de nouveaux GPU, convertir des chaines qui fabriquaient de la DRAM en chaines qui fabriquent de la RAM HBM etc...
On sait par exemple que vu les énormes bénéfices de Samsung, 5 milliards d'euros en un seul trimestre, les employés de Samsung semi-conducteurs ont menacé de faire grève, et ont réussi à obtenir une prime incroyable de 290.000 € par personne :
https://www.bfmtv.com/economie/entreprises/ils-vont-toucher-290-000-euros-chacun-en-moyenne-apres-avoir-menace-de-faire-greve-78-000-employes-de-samsung-electronics-approuvent-l-accord-sur-une-mega-prime-liee-aux-profits-de-l-ia_AD-202605270095.html
C'est aussi une manne incroyable pour tous les entrepreneurs qui fabriquent ou louent des services de datacenters.
Pour les entreprises qui développent des modèles IA, comme Open AI, Mistral... la situation est plus mitigée. Bien sûr, leur valorisation boursière sont immenses, plus de 1000 milliards de dollars souvent, mais ils brûlent des dizaines de milliards par mois, sans avoir pour l'instant les revenus équivalents en face. Ils espèrent tous y parvenir dans les années à venir, mais en face, il faut avoir des clients prêts à payer. Et ce sera peut-être compliqué.
Pour les GAFAM, la situation est mixte. D'un côté, ils ont des capacités de datacenters qu'ils louent aux entreprises IA, pour des sommes qui peuvent se compter en dizaines de milliards, avec souvent des investissements croisés (ils prêtent de l'argent aux entreprises d'IA pour louer ensuite leurs infrastructures). Ils investissent aussi des centaines de milliards de dollars dans de nouveaux datacenters très énergivores, en espérant pouvoir les commercialiser ensuite...
Et d'un autre côté, ce sont des sociétés de développement, qui encouragent l'usage de l'IA parmi les employés, en mettant en avant ceux qui en utilisent le plus. Ce qui a conduit au tokenmaxxing, c'est à dire la maximisation de l'utilisation de tokens IA par les employés pour se faire bien voir de leurs employeurs.
Et c'est là que certaines ont commencé çà voir que cela leur coûtait très cher, car le token est l'unité de facturation des entreprises d'IA.
Certes, ces tokens sont vendus à des prix qui peuvent paraitre peu chers, genre 10$ le million de token, mais quand les employés en consomment des centaines de milliards, cela finit par représenter des coûts très significatifs.
Par exemple le site Axios a publié une news le 28 Mai dernier à ce sujet, citant les propos d'un consultant disant qu'une entreprise qu'il conseille avait dépensé 500 M$ de dollars en un mois de tokens de l'agent IA Claude d'Anthropic.
C'est à confirmer, mais certains soupçonnent que l'entreprise en question, qui serait forcément très grosse, pourrait être Microsoft, qui a récemment, après avoir introduit Claude dans ses équipes et produits (Office...), a décidé brusquement mi Mai, d'arrêter le contrat avec Anthropic pour la fin Juin, qui se trouve être la fin de son année fiscale.
Il y a un sentiment qui grandit que les AIs pourraient devenir plus chères que de payer des employés humains. Dans le développement, l'intérêt de l'IA pour accélérer les développements et la productivité parait assez évident, mais à quel coût, dans d'autres secteurs, c'est beaucoup moins évident. Et le retour sur investissement encore moins.
Microsoft a arrêté les tableaux d'honneur de ses employés consommant le plus de tokens, mais c'est aussi le cas récemment d'Amazon :
https://next.ink/brief-article/bruler-des-tokens-nest-pas-travailler-amazon-ferme-son-classement-ia-interne/
Voir l'article d'Axios :
AI sticker shock hits corporate America
Madison Mills - May 28, 2026 - Technology
Corporate leaders are starting to question whether soaring AI spending is delivering meaningful returns.
Why it matters: Companies that rushed to embrace AI are now confronting ballooning IT costs, uncertain productivity gains and growing employee skepticism.
Driving the news: Microsoft canceled most of its Claude Code licenses, in part over costs, according to The Verge, and Uber's COO said AI costs are getting "harder to justify."
- An AI consultant tells Axios one of their clients recently spent half a billion dollars in a single month after failing to put usage limits on Claude licenses for employees.
- Companies are citing AI's ability to automate jobs as a cause for layoffs, though Anuj Kapur, CEO of CloudBees, told Axios that workforce cuts may simply be "the only lever they can pull" to offset their AI bills.
- Consumer sentiment around AI is also nosediving, and employees are rebelling against the use of the technology at work. Ansari hopes this correction will push companies toward more efficient AI use.
While the market views these tools as working equally well across the enterprise, Ansari says "the reality of AI right now is that it only works for coding."
That disconnect can drive up IT bills without leading to high return on investment in agents, he said.
What they're saying: The enterprise is undergoing a "healthy swing" away from AI overuse — or "tokenmaxxing," the push to burn as many AI tokens as possible — Ali Ansari, CEO of model training firm Micro1, told Axios.
- Ansari hopes this correction will push companies toward more efficient AI use.
- While the market views these tools as working equally well across the enterprise, Ansari says "the reality of AI right now is that it only works for coding."
- That disconnect can drive up IT bills without leading to high return on investment in agents, he said.
Friction point: Corporate AI adoption is running into four unique problems.
- Use cases: "Most people default to automating tasks they dislike rather than tasks most valuable to the company," Sophia Velastegui, CEO of Velastegui Ventures and former chief AI officer at Microsoft, told Axios. Instead, they should focus on using AI to drive revenue.
- Costs: One CTO told Axios that employees were using AI models to check the weather. That gets expensive fast: Enterprise AI plans are not truly "all you can eat," and even simple chatbot queries can carry heavy token costs.
- Humans: We are the bottleneck to more efficient adoption, as we're still catching up on AI. Leadership isn't always helping: Throwing AI licenses at the wall and seeing what sticks (or what Velastegui calls the "thousand flowers bloom" approach) isn't leading to tangible returns, she said.
- Data: When enterprises are hesitant to give AI agents unfettered access to proprietary data, those agents become less effective, Josh Pantony, CEO of Boosted.ai, which focuses on AI tools for finance, told Axios.
What we're watching: Whether companies get more disciplined about AI use. Or overcorrect and clamp down.
https://www.axios.com/2026/05/28/ai-spending-roi-enterprise-costs
Pour les individuels, beaucoup ont pris des abonnements IA, je crois qu'Open AI parlait de 44 millions d'abonnés, mais surtout sur des offres d'appel, et la plupart risquent de ne pas suivre si les prix augmentent, et une fois l'effet de mode passé.
https://intelligence-artificielle.developpez.com/actu/382714/OpenAI-prevoit-une-chute-de-80-pourcent-du-nombre-d-abonnements-a-ChatGPT-Plus-passant-de-44-millions-en-2025-a-9-millions-en-2026-et-espere-echapper-a-ce-desastre-en-augmentant-la-publicite/
-
Salut,
j'espère que tu n'as pas mis ton PER là dedans .... :-\
-
Pour mieux comprendre le modèle de paiement au token, j'ai trouvé un article très bien fait, en anglais, très didactique, expliquant ce que sont les token d'entrée (les questions posées ou autres..), et ceux de sortie, les réponses des AI, qui coûtent plus chers disant que mêmes les ponctuation sont comptées, le rôle de l'historique dans un échange avec l'IA, des exemples de prix pratiqués par les différents opérateurs...
L'un des problèmes des entreprises qui utilisent l'IA est que le coût total est difficilement prévisible sans mettre de limites au contrat.
Voir cet article de Mind Studio :
What Is Token-Based Pricing for AI Models
Understand AI model pricing. Learn how token-based pricing works, why output tokens cost more than input, and how to estimate costs across providers.
MindStudio Team · February 6, 2026
Understanding Tokens: The Currency of AI
When you use AI models like GPT-4, Claude, or Gemini, you’re charged based on tokens. A token is a small chunk of text that AI models process. Think of tokens as the fundamental unit of work in AI systems.
Here’s a simple breakdown:
- 1,000 tokens equals roughly 750 words in English
- The word “hello” is typically one token
- The word “tokenization” might be split into two tokens: “token” and “ization”
- Punctuation marks and spaces count as tokens too
AI models don’t read text the way humans do. They convert everything into numerical representations called tokens. Every prompt you send and every response you get consumes tokens. And every token costs money.
How Token-Based Pricing Works
Token-based pricing is straightforward: you pay for what you use. Most AI providers charge separately for input tokens (what you send) and output tokens (what the model generates).
The basic formula looks like this:
Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
For example, if you send a 500-token prompt to GPT-4 and get back a 200-token response:
- Input cost: 500 tokens × $0.01 per 1,000 tokens = $0.005
- Output cost: 200 tokens × $0.03 per 1,000 tokens = $0.006
- Total: $0.011 per request
This seems cheap for a single request. But multiply that by 10,000 daily users, and you’re looking at $110 per day, or $3,300 per month. Scale matters.
Input vs. Output Token Pricing
Output tokens almost always cost more than input tokens. Here’s why: generating text requires more computational work than processing it. The model needs to predict each token one at a time, running complex calculations for every word it produces.
Typical pricing patterns in January 2026:
- Input tokens: $0.15 to $5.00 per million tokens
- Output tokens: $0.60 to $25.00 per million tokens
- Output tokens typically cost 3-5x more than input tokens
Token Counting Isn’t Universal
Different AI providers count tokens differently. Each model uses its own tokenizer, which means the same text can produce different token counts across providers.
A developer testing three models found the same text produced:
- Model A: 7 tokens
- Model B: 8 tokens
- Model C: 9 tokens
This matters for cost estimation. You can’t assume token counts transfer directly between providers.
Common Tokenization Methods
Most modern AI models use subword tokenization approaches:
- Byte-Pair Encoding (BPE): Used by OpenAI’s GPT models
- WordPiece: Common in Google’s models
- SentencePiece: Used by various open-source models
Each method splits text differently. BPE might handle “unhappiness” as “un-happiness” while another tokenizer might keep it as one unit.
AI Model Pricing Comparison
Token pricing varies dramatically across providers. As of January 2026, here’s what major models charge:
Budget-Friendly Options
Gemini 2.0 Flash Lite and Gemini 1.5 Flash lead in affordability at $0.08 per million input tokens and $0.30 per million output tokens.
GPT-4o Mini offers strong value at $0.15 input and $0.60 output per million tokens. It delivers GPT-4 level quality at 93% lower cost with multimodal capabilities.
Mid-Range Models
GPT-4o: $2.50 input, $10.00 output per million tokens
Claude 3.5 Sonnet: $3.00 input, $15.00 output per million tokens
Gemini 2.0 Pro: $1.25 input, $5.00 output per million tokens
Premium Models
Claude Opus 4.5: $5.00 input, $25.00 output per million tokens. This model handles complex reasoning tasks and offers 200K token context windows.
GPT-5 (reasoning models): $15.00 input, $75.00 output per million tokens. These models use extended chain-of-thought processes for advanced problem-solving.
Specialized Pricing
Some providers offer additional pricing tiers:
- Batch API: 50% discount for non-urgent workloads with 24-hour turnaround
- Prompt caching: Cached tokens cost roughly 10x less than regular input tokens
Vendors also adjust pricing in non-obvious ways via “multiplier tables” rather than raw per-token rates. GitHub Copilot’s new multiplier table, for example, raised effective costs on several models without changing the headline price-per-token — a reminder to check how each provider actually bills, not just what they list.
- Reasoning tokens: Separate pricing for internal reasoning steps, often 10-30x more expensive
What Affects Token Costs
Several factors influence how many tokens you consume and what you pay:
Prompt Length
Longer prompts consume more input tokens. A detailed system prompt with examples and instructions might use 2,000-5,000 tokens before you even send user input.
Context matters too. If you’re building a chatbot that maintains conversation history, each exchange adds tokens. A 10-turn conversation can easily accumulate 15,000+ tokens.
Response Length
Output token costs dominate most bills because responses are typically longer than prompts. A support chatbot generating 500-word answers consumes far more tokens than the brief questions it receives.
Context windows determine how much information a model can process at once. Larger windows enable more sophisticated analysis but increase token consumption.
Common context window sizes in 2026:
- Small models: 4K-32K tokens
- Standard models: 128K-200K tokens
- Extended models: 1M-10M tokens
Models with larger context windows often charge more per token, especially for prompts exceeding certain thresholds. Some providers use tiered pricing where tokens 0-128K cost less than tokens 128K-256K.
Language and Script
Non-English text typically requires more tokens. The same meaning expressed in English might need 20-30% more tokens in languages like Arabic, Chinese, or Hindi.
This happens because most AI models were trained primarily on English text. Their tokenizers are optimized for English word patterns, making other languages less efficient to encode.
Technical Content
Code, mathematical formulas, and technical jargon often tokenize inefficiently. Special characters, indentation, and structured data formats can inflate token counts by 30-40% compared to plain text.
Model Architecture
Different models have different vocabulary sizes, which affects tokenization efficiency. Models with larger vocabularies (like GPT-OSS-120B with 200,019 tokens) can represent text more efficiently than models with smaller vocabularies.
Hidden Token Costs
The tokens you see in your prompts and responses aren’t the only ones you pay for:
System Prompts
Many applications include hidden system prompts that set behavior and context. These prompts can add 500-3,000 tokens to every request.
Tool Definitions
If your AI agent uses tools or functions, each tool definition adds tokens to your context. A chatbot with access to 10 different APIs might consume an extra 2,000-5,000 tokens per request just for tool descriptions.
Retrieval-Augmented Generation (RAG)
RAG systems retrieve relevant information from databases before generating responses. This retrieved context adds 2,000-10,000 tokens per query, depending on your retrieval settings.
Conversation History
Maintaining conversation context means sending previous messages with each new request. A 5-turn conversation might accumulate 8,000-12,000 tokens of history.
Reasoning Tokens
Advanced reasoning models like GPT-5 generate internal reasoning traces before producing final answers. These “thinking tokens” can multiply your costs by 10-30x for complex queries.
...
https://www.mindstudio.ai/blog/token-based-pricing
Bon, MinStudio a elel-même une offre pour accéder à 200 modèles d'IA, donc sa présentation n'est pas forcément complétement objective.
-
Salut,
j'espère que tu n'as pas mis ton PER là dedans .... :-\
Bah non, je suis très méfiant dans la bulle spéculative de l'IA.