Tomasz Tunguz: Beyond a Trillion : The Token Race

Sep 18, 2025

—

Source URL: https://www.tomtunguz.com/trillion-token-race/
Source: Tomasz Tunguz
Title: Beyond a Trillion : The Token Race

Feedly Summary: One trillion tokens per day. Is that a lot?

“And when we look narrowly at just the number of tokens served by Foundry APIs, we processed over 100t tokens this quarter, up 5x year over year, including a record 50t tokens last month alone.”

In April, Microsoft shared a statistic, revealing their Foundry product is processing about 1.7t tokens per month.

Yesterday, Vipul shared Together.ai is processing 2t of open-source inference daily.
In July, Google announced a staggering number :

“At I/O in May, we announced that we processed 480 trillion monthly tokens across our surfaces. Since then we have doubled that number, now processing over 980 trillion monthly tokens, a remarkable increase.”

Company
Daily Tokens (trillions)
vs Microsoft
Date

Google
32.7
574x
July 2025

Together
2.0
35x
September 2025

Microsoft Foundry
0.057
1x
April 2025

Google processes 32.7t daily, 16x more than Together & 574x more than Microsoft Foundry’s April volume.
From these figures, we can draw a few hypotheses :

Open-source inference is a single-digit fraction of inference. It’s unclear what fraction of Google’s inference tokens are from their open source models like Gemma. But, if we assume Anthropic & OpenAI are 5t-10t tokens per day1 & all closed-source, plus Azure is roughly similar in size, then open-source inference is likely around 1-3% of total inference. 2
Agents are early. Microsoft’s data point suggests the agents within GitHub, Visual Studio, Copilot Studio, & Microsoft Fabric contribute less than 1% of overall AI inference on Azure.
With Microsoft expected to invest $80 billion compared to Google’s $85 billion in AI data center infrastructure this year, the AI inference workloads of each company should increase significantly both through hardware coming online & algorithmic improvements.

“Through software optimization alone, we are delivering 90% more tokens for the same GPU compared to a year ago.”

Microsoft is squeezing more digital lemonade from their GPUs & Google must also be doing similar.
When will we see the first 10t or 50t AI tokens processed per day? It can’t be far off now.

Estimates from thin air! ↩︎

Google & Azure at 33t tokens per day each, Together & 5 other neoclouds at roughly 2t tokens per day each, & Anthropic & OpenAI at 5t tokens per day, gives us 88t tokens per day. If we assume 5% of Google’s tokens are from open-source models, that’s 1.65t tokens per day, or roughly 1.9% of total inference. Again, very rough math. ↩︎

AI Summary and Description: Yes

Summary: The text provides an analysis of the increasing token processing capabilities of several major tech companies involved in AI, specifically focusing on Google, Microsoft, and Together.ai. It highlights significant growth rates in token volumes for AI inference and raises questions about the proportion of open-source versus closed-source inference. This insight is particularly relevant for professionals monitoring trends in AI performance and resource allocation.

Detailed Description: The provided text outlines substantial statistics regarding token processing in the AI domain, highlighting rapid growth in the number of tokens being processed by major companies.

– **Key Statistics:**
– Microsoft Foundry: 100 trillion tokens processed in a quarter, with 5x year-over-year growth.
– Google: Processing 980 trillion tokens monthly, up from 480 trillion shortly before its I/O announcement.
– Together.ai: Processing 2 trillion open-source inference tokens daily.

– **Comparative Analysis:**
– Google processes 32.7 trillion tokens daily, vastly outpacing Together.ai (2 trillion) and Microsoft Foundry (0.057 trillion).
– The figures suggest that open-source inference may account for a small fraction (1-3%) of the overall AI token processing.

– **Future Projections:**
– With substantial investments in AI infrastructure projected (Microsoft at $80 billion and Google at $85 billion), it is expected that AI inference workloads will significantly rise due to advancements in both hardware and algorithms.

– **Performance Optimizations:**
– Microsoft claims to have achieved a 90% efficiency improvement in token processing per GPU via software optimization.

– **Implications for AI Development:**
– The rapid increase in token processing capabilities raises the prospect of hitting new milestones in daily token processing (e.g., 10t or 50t tokens), which may impact the future landscape of AI applications and performance metrics.

This text serves as a valuable resource for security and compliance professionals seeking to understand the evolving metrics of AI development, production efficiency, and potential vulnerabilities associated with the increasing scale of AI inference transactions. With the growing reliance on token data, there are clear implications for security measures around data integrity, access controls, and compliance with regulations related to AI and data processing.

1 10 2 2025 3 4 5 7 a access access control access controls account Act actions ads advancement advancements age agent agents AI AI applications AI development air algorithm algorithmic improvements algorithms All analysis and Anthropic anti API APIs app Application applications art as at ated Azure being beyond Bi bot by C capabilities CI CIA CleaR closed Cloud co companies compliance compliance professionals control controls Copilot Copilot Studio cross D data data center data center infrastructure data integrity data processing day de development digital domain Double e efficiency efficiency improvement end Ester exp face first for Foundry future future projections g Gemma Gen git GitHub Go Google GPU GPUs growth Growth Rates H hardware hardware co high Highlight HR http HTTPS I/O impact implications implications for security in Inference inference workloads infrastructure integrity investment Investments io IRS J Just k Key l land led Li line load M man math measures metrics Micro Microsoft Mila milestone Mode model models Monitor monitoring N new no NSA o of off on one ons open open-source open-source models openai opilot OPM opt optimization optimizations oS oss other out over per performance performance metrics performance optimization performance optimizations pilot point potential pro process processes processing product production production efficiency professionals project ps Q question R Raise Rapid growth rate RCE re record red Regulation regulations resource resource allocation Ro row s sam Scale sec security security and compliance security measure security measures SHA short Sig Sim single size small SoC software software optimization source source model source models specific SSE SSO studio T tech tech companies ted text the to Together.ai token token processing tokens Tor TP transactions trends trillion under up US V val vulnerabilities Ware Wi workload workloads x z