Source URL: https://www.tomtunguz.com/trillion-token-race/
Source: Tomasz Tunguz
Title: Beyond a Trillion : The Token Race
Feedly Summary: One trillion tokens per day. Is that a lot?
“And when we look narrowly at just the number of tokens served by Foundry APIs, we processed over 100t tokens this quarter, up 5x year over year, including a record 50t tokens last month alone.”
In April, Microsoft shared a statistic, revealing their Foundry product is processing about 1.7t tokens per month.
Yesterday, Vipul shared Together.ai is processing 2t of open-source inference daily.
In July, Google announced a staggering number :
“At I/O in May, we announced that we processed 480 trillion monthly tokens across our surfaces. Since then we have doubled that number, now processing over 980 trillion monthly tokens, a remarkable increase.”
Company
Daily Tokens (trillions)
vs Microsoft
Date
Google
32.7
574x
July 2025
Together
2.0
35x
September 2025
Microsoft Foundry
0.057
1x
April 2025
Google processes 32.7t daily, 16x more than Together & 574x more than Microsoft Foundry’s April volume.
From these figures, we can draw a few hypotheses :
Open-source inference is a single-digit fraction of inference. It’s unclear what fraction of Google’s inference tokens are from their open source models like Gemma. But, if we assume Anthropic & OpenAI are 5t-10t tokens per day1 & all closed-source, plus Azure is roughly similar in size, then open-source inference is likely around 1-3% of total inference. 2
Agents are early. Microsoft’s data point suggests the agents within GitHub, Visual Studio, Copilot Studio, & Microsoft Fabric contribute less than 1% of overall AI inference on Azure.
With Microsoft expected to invest $80 billion compared to Google’s $85 billion in AI data center infrastructure this year, the AI inference workloads of each company should increase significantly both through hardware coming online & algorithmic improvements.
“Through software optimization alone, we are delivering 90% more tokens for the same GPU compared to a year ago.”
Microsoft is squeezing more digital lemonade from their GPUs & Google must also be doing similar.
When will we see the first 10t or 50t AI tokens processed per day? It can’t be far off now.
Estimates from thin air! ↩︎
Google & Azure at 33t tokens per day each, Together & 5 other neoclouds at roughly 2t tokens per day each, & Anthropic & OpenAI at 5t tokens per day, gives us 88t tokens per day. If we assume 5% of Google’s tokens are from open-source models, that’s 1.65t tokens per day, or roughly 1.9% of total inference. Again, very rough math. ↩︎
AI Summary and Description: Yes
Summary: The text provides an analysis of the increasing token processing capabilities of several major tech companies involved in AI, specifically focusing on Google, Microsoft, and Together.ai. It highlights significant growth rates in token volumes for AI inference and raises questions about the proportion of open-source versus closed-source inference. This insight is particularly relevant for professionals monitoring trends in AI performance and resource allocation.
Detailed Description: The provided text outlines substantial statistics regarding token processing in the AI domain, highlighting rapid growth in the number of tokens being processed by major companies.
– **Key Statistics:**
– Microsoft Foundry: 100 trillion tokens processed in a quarter, with 5x year-over-year growth.
– Google: Processing 980 trillion tokens monthly, up from 480 trillion shortly before its I/O announcement.
– Together.ai: Processing 2 trillion open-source inference tokens daily.
– **Comparative Analysis:**
– Google processes 32.7 trillion tokens daily, vastly outpacing Together.ai (2 trillion) and Microsoft Foundry (0.057 trillion).
– The figures suggest that open-source inference may account for a small fraction (1-3%) of the overall AI token processing.
– **Future Projections:**
– With substantial investments in AI infrastructure projected (Microsoft at $80 billion and Google at $85 billion), it is expected that AI inference workloads will significantly rise due to advancements in both hardware and algorithms.
– **Performance Optimizations:**
– Microsoft claims to have achieved a 90% efficiency improvement in token processing per GPU via software optimization.
– **Implications for AI Development:**
– The rapid increase in token processing capabilities raises the prospect of hitting new milestones in daily token processing (e.g., 10t or 50t tokens), which may impact the future landscape of AI applications and performance metrics.
This text serves as a valuable resource for security and compliance professionals seeking to understand the evolving metrics of AI development, production efficiency, and potential vulnerabilities associated with the increasing scale of AI inference transactions. With the growing reliance on token data, there are clear implications for security measures around data integrity, access controls, and compliance with regulations related to AI and data processing.