Cloud Blog: How Jina AI built its 100-billion-token web grounding system with Cloud Run GPUs

Jul 11, 2025

—

Source URL: https://cloud.google.com/blog/products/application-development/how-jina-ai-built-its-100-billion-token-web-grounding-system-with-cloud-run-gpus/
Source: Cloud Blog
Title: How Jina AI built its 100-billion-token web grounding system with Cloud Run GPUs

Feedly Summary: Editor’s note: The Jina AI Reader is a specialized tool that transforms raw web content from URLs or local files into a clean, structured, and LLM-friendly format. In this post, Han Xiao details how Cloud Run empowers Jina AI to build a secure, reliable, and massively scalable web scraping system that remains economically viable. This post explores the collaborative innovation, technical hurdles, and breakthrough achievements behind Jina Reader, a web grounding system now processing 100 billion tokens daily.

When Jina Reader launched in April 2024, its explosive growth — serving more than 10 million requests and 100 billion tokens daily — confirmed huge demand for reliable, LLM-friendly web content. Jina Reader isn’t just another scraper; it takes a different approach to how AI systems consume web content by transforming raw, noisy web pages into clean, structured markdown.
The core challenge for any AI system processing web data is the “web grounding problem." Modern websites are a chaotic mix of content, ads, tracking scripts, and dynamic JavaScript, creating an overwhelming noise-to-signal ratio. Traditional scrapers struggle with this complexity, often failing on dynamic single-page applications or generating unusable, ungrounded data for LLMs. Jina Reader’s breakthrough, ReaderLM-v2, is a purpose-built 1.5-billion-parameter language model that intelligently extracts content, trained on millions of documents to understand web structure beyond simple rules.

FIgure 1: Jina Reader: a sophisticated browser automation system

Cloud Run: The engine behind Jina Reader’s scale
Jina Reader faced inherent burstiness and unpredictability of web scraping workloads. Traditional virtual machine setups meant either costly over-provisioning or critical failures under load. Google Cloud Run became the essential solution, enabling Jina Reader to build a web scraping system that is secure, reliable, massively scalable, and economically viable.

The web grounding app (the browser automation system that scrapes and cleans web content) is hosted on Cloud Run (CPU). It runs full Chrome browser instances.

ReaderLM-v2 is a purpose-built 1.5-billion-parameter language model for HTML-to-markdown conversion that runs on Cloud Run with serverless GPUs.

Cloud Run directly addressed several critical issues:

Optimized Performance: The deep collaboration between Jina Reader and Google Cloud engineering was essential. We jointly optimized container lifecycle management for browser automation, reducing startup times from over 10 seconds to under two seconds through prewarming, optimized images, and intelligent resource allocation. For ReaderLM-v2, Google’s team helped create custom container configurations to efficiently run a 1.5-billion-parameter model on Cloud Run GPUs. The on-demand scaling and fast start capabilities of Cloud Run GPUs were critical in helping optimize model performance, directly impacting our ability to process 100 billion tokens daily.

Figure 2: On-demand AI inference with Cloud Run GPUs (hosting ReaderLM-v2 model)

True Scale-to-Zero Serverless: Cloud Run’s ability to run full Chrome browser instances allowed cost-effective operations. Each request spawns an isolated container with its own headless Chrome, and crucially, these containers disappear when the request is done. This ephemeral nature is vital for processing untrusted web content, mitigating security risks and memory leaks.

Global Multi-Regional Deployment: Cloud Run’s global presence ensures requests are processed close to both the users and target websites. This significantly minimizes latency and boosts success rates, even against geo-restricted content.

Massive & Automatic Scaling: The platform seamlessly scales from a handful to over 1,000 container instances during peak traffic, handling the unpredictable nature of web scraping without manual intervention.

Economic Viability: With Cloud Run’s pay-per-use model, Jina Reader can offer a generous free tier to end users while maintaining profitability even with substantial monthly usage. This pricing flexibility was fundamental to our widespread adoption.

Resilience and Operational Excellence: During a recent sustained DDoS attack, Cloud Run’s serverless architecture proved invaluable. It scaled up to absorb massive loads (over 100,000 requests per minute), while intelligent rate limiting filtered malicious traffic. Critically, costs returned to normal immediately after the attack subsided due to its scale-to-zero capability. The system has maintained over 99.9% uptime.

Conclusion
Building Jina Reader on Google Cloud Run proved that AI capabilities and cloud-native architecture are complementary. Cloud Run’s unique capabilities — serverless GPUs, container isolation, global deployment and scale-to-zero economics — made the architecture possible. Our close partnership demonstrates that deep integration between AI-first systems and modern cloud infrastructure can create capabilities previously thought impossible, enabling us to process 100 billion tokens every day.
You can discover more about Cloud Run GPUs on our product page, and if you want to learn how to host a large language model on Cloud Run, watch this video.

aside_block
), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

AI Summary and Description: Yes

**Summary:** The text discusses Jina AI Reader’s advanced web scraping system, powered by Google Cloud Run. It highlights the challenges of processing chaotic web data and presents technical solutions that enable high-volume token processing. The performance, scalability, and economic viability of the system are underscored, particularly within the context of AI and cloud infrastructure.

**Detailed Description:**

– **Jina AI Reader:** A specialized tool designed for transforming raw web content into structured formats suitable for AI systems, focusing on high scalability and reliability in web scraping.

– **Core Innovation – ReaderLM-v2:**
– This is a custom 1.5-billion-parameter language model designed for HTML-to-markdown conversion and trained extensively on web structure.
– Addresses the “web grounding problem,” making it effective at extracting usable data from the complex HTML structure of modern web pages.

– **Cloud Run Benefits:**
– **Performance Optimization:** Joint efforts between Jina AI and Google Cloud engineers led to significant reductions in startup times for applications, enhancing overall responsiveness.
– **Scale-to-Zero Serverless Architecture:** Enables cost-effective operation by creating isolated containers for each request that terminate immediately once done, mitigating security risks associated with processing untrusted content.
– **Global Deployment and Low Latency:** The global presence of Cloud Run ensures that requests are processed with minimal latency, facilitating access to geo-restricted content.
– **Automatic Scaling:** The platform can swiftly adjust to traffic demands, handling up to 1,000 container instances seamlessly, crucial for fluctuating web scraping activities.
– **Economic Viability:** The pay-per-use model allows Jina Reader to maintain a generous free tier for users while ensuring profitability.

– **Resilience Against Threats:**
– Demonstrated robustness during a DDoS attack where the serverless architecture absorbed vast traffic loads without degrading service quality, retaining over 99.9% uptime.

– **Conclusion:**
– The synergy between AI capabilities and cloud-native solutions like Google Cloud Run showcases how modern architectures can facilitate extraordinary functionalities desired in AI applications.
– The discussion reflects broader implications for cloud computing security and capabilities, relevant for professionals in the domain seeking innovative solutions to complex data processing challenges.

1 10 2 2024 24 4 5 a access Act addresses adoption ads advanced after AI AI applications AI capabilities AI systems and anti API app Application applications Arch architecture architectures ARM art as at ated attack Auto automation benefits beyond Bi boosts browser browser automation building built by C capabilities capability cell challenge challenges Chrome Chrome browser CI CIA Cloud cloud computing cloud computing security cloud infrastructure Cloud Run cloud-native co Col collaboration collaborative collaborative innovation complexity Computing Configuration configurations Console container Container Isolation containers content Context core cost cost-effective Costs CPU critical D data data processing day DDoS DDoS attack de deep demand demo deployment design development document domain DoS DoS attack e economic viability economics effective efficient end end users Engineer engineering engineers Excel exp face fail failures fast file first flexibility for free Free tier friendly full function g Gen geo Global global deployment Go Google Google Cloud Google Cloud Run GPU GPUs grading grounding growth H handling Helm high Highlight hosted hosting HR http HTTPS image implications in Inference infrastructure innovation innovative solutions Instance integration Intel inter io IRS isolation issue ite J Java JavaScript Jina Just k l Labor language language model large large language model latency led Li liability life lifecycle management limiting llm llms lm local low low latency M mac machine made making malicious traffic man management markdown mass mean media memory memory leak memory leaks mini ML Mode model model design model performance Modern multi Multi-Region N native native solutions no nomic non o of off on one oost operation operational excellence operations OPM opt optimization optimized optimized performance Orb ory oS oS attack other out over page applications parameter partnership pay per performance performance optimization phi platform post Power pre pricing pro problem process processing product products professionals profit provisioning ps Q quality R rack rate rate limiting RCE red reduction Region reliability Resil resilience resource resource allocation return Risk risks Ro robustness row Rust s Sable SAP scalability scalable Scale scaling scraper scraping sec secure security security risk security risks server serverless serverless architecture service side Sig Signal Sim Simple single SoC solutions source specialized SSE SSL SSO STAR start startup structured structured formats Swift system systems T Tails team tech Technical Hurdles ted text the Thought threat threats Time times to token token processing tokens tool Tor TP tracking traffic trained Transform trial trust turn two UI under unpredictability up ups uptime US usage use user Users V val version video virtual virtual machine Vision WAN web web content web grounding web scraping website Wi workload workloads x z zero