Tag: inference workloads
-
Tomasz Tunguz: Beyond a Trillion : The Token Race
Source URL: https://www.tomtunguz.com/trillion-token-race/ Source: Tomasz Tunguz Title: Beyond a Trillion : The Token Race Feedly Summary: One trillion tokens per day. Is that a lot? “And when we look narrowly at just the number of tokens served by Foundry APIs, we processed over 100t tokens this quarter, up 5x year over year, including a record…
-
Cloud Blog: GKE network interface at 10: From core connectivity to the AI backbone
Source URL: https://cloud.google.com/blog/products/networking/gke-network-interface-from-kubenet-to-ebpfcilium-to-dranet/ Source: Cloud Blog Title: GKE network interface at 10: From core connectivity to the AI backbone Feedly Summary: It’s hard to believe it’s been over 10 years since Kubernetes first set sail, fundamentally changing how we build, deploy, and manage applications. Google Cloud was at the forefront of the Kubernetes revolution with…
-
AWS News Blog: Announcing Amazon EC2 M4 and M4 Pro Mac instances
Source URL: https://aws.amazon.com/blogs/aws/announcing-amazon-ec2-m4-and-m4-pro-mac-instances/ Source: AWS News Blog Title: Announcing Amazon EC2 M4 and M4 Pro Mac instances Feedly Summary: AWS has launched new EC2 M4 and M4 Pro Mac instances based on Apple M4 Mac mini, offering improved performance over previous generations and featuring up to 48GB memory and 2TB storage for iOS/macOS development workloads.…
-
Cloud Blog: Scaling high-performance inference cost-effectively
Source URL: https://cloud.google.com/blog/products/ai-machine-learning/gke-inference-gateway-and-quickstart-are-ga/ Source: Cloud Blog Title: Scaling high-performance inference cost-effectively Feedly Summary: At Google Cloud Next 2025, we announced new inference capabilities with GKE Inference Gateway, including support for vLLM on TPUs, Ironwood TPUs, and Anywhere Cache. Our inference solution is based on AI Hypercomputer, a system built on our experience running models like…
-
Cloud Blog: Supercharge your AI: GKE inference reference architecture, your blueprint for production-ready inference
Source URL: https://cloud.google.com/blog/topics/developers-practitioners/supercharge-your-ai-gke-inference-reference-architecture-your-blueprint-for-production-ready-inference/ Source: Cloud Blog Title: Supercharge your AI: GKE inference reference architecture, your blueprint for production-ready inference Feedly Summary: The age of AI is here, and organizations everywhere are racing to deploy powerful models to drive innovation, enhance products, and create entirely new user experiences. But moving from a trained model in a…
-
Cloud Blog: Understanding Calendar mode for Dynamic Workload Scheduler: Reserve ML GPUs and TPUs
Source URL: https://cloud.google.com/blog/products/compute/dynamic-workload-scheduler-calendar-mode-reserves-gpus-and-tpus/ Source: Cloud Blog Title: Understanding Calendar mode for Dynamic Workload Scheduler: Reserve ML GPUs and TPUs Feedly Summary: Organizations need ML compute resources that can accommodate bursty peaks and periodic troughs. That means the consumption models for AI infrastructure need to evolve to be more cost-efficient, provide term flexibility, and support rapid…