Cloud Blog: GKE under the hood: Container-optimized compute delivers fast autoscaling for Autopilot

Aug 28, 2025

—

Source URL: https://cloud.google.com/blog/products/containers-kubernetes/container-optimized-compute-delivers-autoscaling-for-autopilot/
Source: Cloud Blog
Title: GKE under the hood: Container-optimized compute delivers fast autoscaling for Autopilot

Feedly Summary: The promise of Google Kubernetes Engine (GKE) is the power of Kubernetes with ease of management, including planning and creating clusters, deploying and managing applications, configuring networking, ensuring security, and scaling workloads. However, when it comes to autoscaling workloads, customers tell us the fully managed mode of operation, GKE Autopilot, hasn’t always delivered the speed and efficiency they need. That’s because autoscaling a Kubernetes cluster involves creating and adding new nodes, which can sometimes take several minutes. That’s just not good enough for high-volume, fast-scale applications.
Enter the container-optimized compute platform for GKE Autopilot, a completely reimagined autoscaling stack for GKE that we introduced earlier this year. In this blog, we take a deeper look at autoscaling in GKE Autopilot, and how to start using the new container-optimized compute platform for your workloads today.
Understanding GKE Autopilot and its scaling challenges
With the fully managed version of Kubernetes, GKE Autopilot users are primarily responsible for their applications, while GKE takes on the heavy lifting of managing nodes and nodepools, creating new nodes, and scaling applications. With traditional Autopilot, if an application needed to scale quickly, GKE first needed to provision new nodes onto which the application could scale, which sometimes took several minutes.
To circumvent this, users often employed techniques like “balloon pods" — creating dummy pods with low priority to hold onto nodes; this helped ensure immediate capacity for demanding scaling use cases. However, this approach is costly, as it involves holding onto actively unused resources, and is also difficult to maintain.

aside_block
), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Introducing the container-optimized compute platform
We developed the container-optimized compute platform with a clear mission: to provide you with near-real-time, vertically and horizontally scalable compute capacity precisely when you need it, at optimal price and performance. We achieved this through a fundamental redesign of GKE’s underlying compute stack.
The container-optimized compute platform runs GKE Autopilot nodes on a new family of virtual machines that can be dynamically resized while they are running, from fractions of a CPU, all without disrupting workloads. To improve the speed of scaling and resizing, GKE clusters now also maintain a pool of dedicated pre-provisioned compute capacity that can be automatically allocated for workloads in response to increased resource demands. More importantly, given that with GKE Autopilot, you only pay for the compute capacity that you requested, this pre-provisioned capacity does not impact your bill.
The result is a flexible compute that provides capacity where and when it’s required. Key improvements include:

Up to 7x faster pod scheduling time compared to clusters without container-optimized compute

Significantly improved application response times for applications with autoscaling enabled

Introduction of in-place pod resize in Kubernetes 1.33, allowing for pod resizing without disruption

The container-optimized compute platform also includes pre-enabled high-performance Horizontal Pod Autoscaler (HPA) profile, which delivers:

Highly consistent horizontal scaling reaction times

Up to 3x faster HPA calculations

Higher resolution metrics, leading to improved scheduling decisions

Accelerated performance for up to 1000 HPA objects

All these features are now available out of the box in GKE Autopilot 1.32 or later.
The power of the new platform is evident in demonstrations where replica counts are rapidly scaled, showcasing how quickly new pods get scheduled.

How to leverage container-optimized compute
To benefit from these improvements in GKE Autopilot, simply create a new GKE Autopilot cluster based on GKE Autopilot 1.32 or later.

code_block
<ListValue: [StructValue([(‘code’, ‘gcloud container clusters create-auto <cluster_name> \\\r\n –location=<region> \\\r\n –project=<project_id>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9751f0ca30>)])]>

If your existing cluster is on an older version, upgrade it to 1.32 or newer to benefit from container-optimized compute platform’s new features offered.
To optimize performance, we recommend that you utilize the general purpose compute class for your workload. While the container-optimized compute platform supports various types of workloads, it works best with services that require gradual scaling and small (2 CPU or less) resource requests like web applications.
While the container-optimized compute platform is versatile, it is not currently suitable for specific deployment types:

One-pod-per-node deployments, such as anti-affinity situations

Batch workloads

The container-optimized compute platform marks a significant leap forward in improving application autoscaling within GKE and will unlock more capabilities in the future. We encourage you to try it out today in GKE Autopilot.

Container-Optimized Compute Platform for GKE Autopilot

AI Summary and Description: Yes

Summary: The text discusses the advancements made in Google Kubernetes Engine (GKE) Autopilot’s autoscaling capabilities through the introduction of a container-optimized compute platform. This innovation aims to address previous scaling challenges, enhancing speed and efficiency for Kubernetes workloads, particularly in high-demand environments.

Detailed Description: The text outlines the improvements introduced by Google in GKE Autopilot’s autoscaling functionality aimed at optimizing performance and efficiency. Key points include:

– **Background**: GKE Autopilot simplifies Kubernetes management, allowing users to focus on applications while Google manages the underlying infrastructure.
– **Scaling Challenges**: Traditional GKE Autopilot faced delays during autoscaling, as provisioning new nodes could take several minutes. Users resorted to inefficient workarounds (e.g., “balloon pods”) to mitigate delays, which were costly and hard to maintain.

– **Container-Optimized Compute Platform**:
– The new platform allows for dynamic resizing of virtual machines running GKE Autopilot without disrupting workloads, achieving near-real-time scaling.
– Pre-provisioned compute capacity can be allocated automatically in response to increased resource demands, optimizing cost as users only pay for what they use.

– **Key Improvements**:
– Up to 7x faster pod scheduling compared to prior configurations.
– Enhanced application responsiveness for auto-scaling applications.
– Introduction of in-place pod resizing for seamless adjustments.
– High-performance Horizontal Pod Autoscaler (HPA) delivering up to 3x faster calculations with higher resolution metrics.

– **Utilization Instructions**: To harness these enhancements, users can create new clusters with GKE Autopilot 1.32 or later, or upgrade existing clusters.

– **Limitations**: While versatile, the container-optimized compute platform is not ideal for certain deployment types like single-pod per node or batch workloads.

This update is particularly significant for security, privacy, and compliance professionals as it improves operational efficiency and resource utilization in cloud environments, thereby potentially reducing vulnerabilities related to excessive resource allocation or inefficient management practices. Understanding these advancements can help professionals better secure and optimize their cloud infrastructures.

1 10 2 3 5 7 a Act actions ads advancement advancements age AGI AI All and anti API app Application applications art as at ated Auto auto-scaling Autopilot autoscaler autoscaling based Best Bi Box by C capabilities capacity challenge challenges CI class CleaR Cloud cloud environment cloud environments cloud infrastructure cluster co code compliance compliance professionals compute compute capacity Configuration configurations container containers cost CPU Current custom Customer D day de decision decisions deep demand demo deployment deployments design disruption dual e efficiency efficient end environment environments face fast faster feature features file first for full function functionality future g Gen general GKE Go Google Google Kubernetes Google Kubernetes Engine grade H high high-performance Horizontal Pod Autoscaler Horizontal Scaling HP HR http HTTPS image impact improving in infrastructure infrastructures innovation instruction io Iron IRS J Just k Key Kubernetes Kubernetes Engine l language leading led Li limitations line load low M mac machine made man management management practices media metrics mission ML Mode my N network Networking new no Node non o oE of off on one only ons operation operational operational efficiency opilot opt optimized optimized compute oS out pay per performance pilot planning platform platform support point potential Power practices pre price privacy pro product products professionals project provisioning ps Q QUIC R rag rate RCE re react real real-time red Region resolution resource resource allocation resource demands resource utilization resources response response times responsible Ro RSA s scalable Scale scaling scaling capabilities scaling challenges sec secure security service services side Sig Sim single size small source specific speed SSE stack STAR start structures support T tech techniques ted text the Time time scaling times to TP two type Uber UI under up update upgrade US use use cases user Users utilization V val version virtual virtual machine virtual machines Vision vulnerabilities web web app web application web applications Wi workload workloads x z