inference workloads – Page 2 – Experimental News Clipping Site

Cloud Blog: Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough

Jul 16, 2025

—

by

Source URL: https://cloud.google.com/blog/topics/developers-practitioners/implementing-high-performance-llm-serving-on-gke-an-inference-gateway-walkthrough/ Source: Cloud Blog Title: Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough Feedly Summary: The excitement around open Large Language Models like Gemma, Llama, Mistral, and Qwen is evident, but developers quickly hit a wall. How do you deploy them effectively at scale? Traditional load balancing algorithms fall short, as…

Cloud Blog: New G4 VMs with NVIDIA RTX PRO 6000 Blackwell power AI, graphics, gaming and beyond

Jun 11, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/compute/introducing-g4-vm-with-nvidia-rtx-pro-6000/ Source: Cloud Blog Title: New G4 VMs with NVIDIA RTX PRO 6000 Blackwell power AI, graphics, gaming and beyond Feedly Summary: Today, we’re excited to announce the preview of our new G4 VMs based on NVIDIA RTX PRO 6000 Blackwell Server edition — the first cloud provider to do so. This follows…

Cloud Blog: Google Cloud Serverless for Apache Spark: high-performance, unified with BigQuery

Jun 10, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/data-analytics/introducing-google-cloud-serverless-for-apache-spark-in-bigquery/ Source: Cloud Blog Title: Google Cloud Serverless for Apache Spark: high-performance, unified with BigQuery Feedly Summary: At Google Cloud, we’re committed to providing the most streamlined, powerful, and cost-effective production- and enterprise-ready serverless Spark experience. To that end, we’re thrilled to announce a significant evolution for Apache Spark on Google Cloud, with…

Cloud Blog: How Confidential Computing lays the foundation for trusted AI

May 22, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/identity-security/how-confidential-computing-lays-the-foundation-for-trusted-ai/ Source: Cloud Blog Title: How Confidential Computing lays the foundation for trusted AI Feedly Summary: Confidential Computing has redefined how organizations can securely process their sensitive workloads in the cloud. The growth in our hardware ecosystem is fueling a new wave of adoption, enabling customers to use Confidential Computing to support cutting-edge…

Cloud Blog: Introducing the next generation of AI inference, powered by llm-d

May 20, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/enhancing-vllm-for-distributed-inference-with-llm-d/ Source: Cloud Blog Title: Introducing the next generation of AI inference, powered by llm-d Feedly Summary: As the world transitions from prototyping AI solutions to deploying AI at scale, efficient AI inference is becoming the gating factor. Two years ago, the challenge was the ever-growing size of AI models. Cloud infrastructure providers…

Cloud Blog: From LLMs to image generation: Accelerate inference workloads with AI Hypercomputer

May 9, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/compute/ai-hypercomputer-inference-updates-for-google-cloud-tpu-and-gpu/ Source: Cloud Blog Title: From LLMs to image generation: Accelerate inference workloads with AI Hypercomputer Feedly Summary: From retail to gaming, from code generation to customer care, an increasing number of organizations are running LLM-based applications, with 78% of organizations in development or production today. As the number of generative AI applications…

Cloud Blog: New GKE inference capabilities reduce costs, tail latency and increase throughput

Apr 10, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/containers-kubernetes/understanding-new-gke-inference-capabilities/ Source: Cloud Blog Title: New GKE inference capabilities reduce costs, tail latency and increase throughput Feedly Summary: When it comes to AI, inference is where today’s generative AI models can solve real-world business problems. Google Kubernetes Engine (GKE) is seeing increasing adoption of gen AI inference. For example, customers like HubX run…

Cloud Blog: Introducing Ironwood TPUs and new innovations in AI Hypercomputer

Apr 9, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/compute/whats-new-with-ai-hypercomputer/ Source: Cloud Blog Title: Introducing Ironwood TPUs and new innovations in AI Hypercomputer Feedly Summary: Today’s innovation isn’t born in a lab or at a drafting board; it’s built on the bedrock of AI infrastructure. AI workloads have new and unique demands — addressing these requires a finely crafted combination of hardware…

Cloud Blog: Google, Bytedance, and Red Hat make Kubernetes generative AI inference aware

Apr 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/containers-kubernetes/google-bytedance-and-red-hat-improve-ai-on-kubernetes/ Source: Cloud Blog Title: Google, Bytedance, and Red Hat make Kubernetes generative AI inference aware Feedly Summary: Over the past ten years, Kubernetes has become the leading platform for deploying cloud-native applications and microservices, backed by an extensive community and boasting a comprehensive feature set for managing distributed systems. Today, we are…

Cloud Blog: Introducing Multi-Cluster Orchestrator: Scale your Kubernetes workloads across regions

Apr 2, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/containers-kubernetes/multi-cluster-orchestrator-for-cross-region-kubernetes-workloads/ Source: Cloud Blog Title: Introducing Multi-Cluster Orchestrator: Scale your Kubernetes workloads across regions Feedly Summary: Today, we’re excited to announce the public preview of Multi-Cluster Orchestrator, a new service designed to streamline and simplify the management of workloads across Kubernetes clusters. Multi-Cluster Orchestrator lets platform and application teams optimize resource utilization, enhance…

Tag: inference workloads