Cloud Blog: GPUs when you need them: Introducing Flex-start VMs

Sep 25, 2025

—

Source URL: https://cloud.google.com/blog/products/compute/introducing-flex-start-vms-for-the-compute-engine-instance-api/
Source: Cloud Blog
Title: GPUs when you need them: Introducing Flex-start VMs

Feedly Summary: Innovating with AI requires accelerators such as GPUs that can be hard to come by in times of extreme demand. To address this challenge, we offer Dynamic Workload Scheduler (DWS), a service that optimizes access to compute resources when and where you need them. In July, we announced Calendar mode in DWS to provide short-term ML capacity without long-term commitments, and today, we are taking the next step: the general availability (GA) of Flex-start VMs.Available through the Compute Engine instance API, gcloud CLI, and the Google Cloud console, Flex-start VMs provide a simple and direct way to create single VM instances that can wait for in-demand GPUs. This makes it easy to integrate this flexible consumption option into your existing workflows and schedulers.What are Flex-start VMs?Flex-start VMs, powered by Dynamic Workload Scheduler, introduce a highly differentiated consumption model that’s a first among major cloud providers, letting you create single VM instances that provide fair and improved access to GPUs. Flex-start VMs are ideal for defined-duration tasks such as AI model fine-tuning, batch inference, HPC, and research experiments that don’t need to start immediately. In exchange for being flexible with start time, you get two major benefits:Dramatically improved resource obtainability: By allowing your capacity requests to persist in a queue for up to two hours, you increase the likelihood of securing resources, without needing to build your own retry logic.Cost-effective pricing: Flex-start VM SKUs offer significant discounts compared to standard on-demand pricing, making cutting-edge accelerators more accessible.Flex-start VMs can run uninterrupted for a maximum of seven days and consume preemptible quota.A new way to request capacity

With Flex-start VMs, you can now choose how your request is handled if capacity isn’t immediately available using a single parameter: request-valid-for-duration.

Without this parameter, when creating a VM, Compute Engine makes a short, best-effort attempt (about 90 seconds) to secure your resources. If capacity is available, your VM is provisioned. If not, the request fails quickly with a stockout error. This “fail-fast" behavior is good for workflows where you need an answer immediately so you can make scheduling decisions such as trying another zone or falling back to a different machine type.

However, for workloads that can wait, you can now make a persistent capacity request by setting the request-valid-for-duration flag. Select a period between 90 seconds and 2 hours to instruct Compute Engine to hold your request in a queue. Your VM enters a PENDING state, and the system works to provision your resources as they become available within your specified timeframe. This “get-in-line” approach provides a fair and managed way to access hardware, transforming the user experience from one of repeated manual retries to a simple, one-time request.

Key features of Flex-start VMs

Flex-start VMs offer several critical features for flexibility and ease of use:

Direct instance API access: Integration with instances.insert, or via a single CLI command, lets you create single Flex-start VMs simply and directly, making it easy to integrate them into custom schedulers and workflows.

Stop and start capabilities: You have full control over your Flex-start VMs. For instance, you can stop an instance to pause billing and release the underlying resources. Then, when you’re ready to resume it, simply issue a start command to place a new capacity request. Once the capacity is successfully provisioned, the seven-day maximum run duration clock resets.

Configurable termination action: For many advanced use cases, you can set instanceTerminationAction = STOP so that when your VM’s seven-day runtime expires, the instance is stopped rather than deleted. This preserves your VM’s configuration, including its IP address and boot disk, saving on setup time for subsequent runs.

What customers have to say

Customers across research and industry are using Flex-start VMs to improve their access to scarce accelerators.

“Our custom scheduling environment demands precise control and direct API access. The GA of Flex-start in the Instance API, particularly with its stop/start capabilities and configurable termination, is a game-changer. It allows us to seamlessly integrate this new, highly-efficient consumption model into our complex workflows, maximizing both our resource utilization and performance.” – Ragnar Kjørstad, Systems Engineer, Hudson River Trading (HRT)

“For our critical anti-fraud model training, Flex-start VMs are a game-changer. The queuing mechanism gives us reliable access to powerful A100 GPUs, which enhances our development cycles and security offerings at a significant performance-to-cost advantage.” – Bakai Zhamgyrchiev, Head of ML, Oz Forensics

Get started todayGetting started with a queued Flex-start VM is straightforward. You can create one using a gcloud command or directly through the API.gcloud example (to wait in queue):

code_block
)])]>

API Request Snippet (JSON):

code_block
<ListValue: [StructValue([(‘code’, ‘{\r\n "name": "my-flex-start-vm",\r\n "machineType": "zones/us-central1-a/machineTypes/a3-megagpu-8g",\r\n "scheduling": {\r\n "provisioningModel": "FLEX_START",\r\n "maxRunDuration": {\r\n "seconds": "259200"\r\n }\r\n },\r\n "params": {\r\n "request_valid_for_duration": {\r\n "seconds": "7200"\r\n }\r\n },\r\n …\r\n}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f671e9df130>)])]>

Flex-start VMs in the Instance API is a direct response to the need for more efficient, reliable, and fair access to high-demand AI accelerators. By introducing a novel queuing mechanism,you can integrate the new Flex-start consumption model into your existing workflows easily, so you can spend less time architecting retry loops for on-demand access. To learn more and try Flex-start VMs today, see the documentation and pricing information.

AI Summary and Description: Yes

**Summary:** The text introduces Flex-start VMs, a new service offered by Google Cloud that optimizes GPU access for AI workloads. This service simplifies and improves the efficiency of accessing high-demand compute resources, which is a crucial advancement for organizations engaged in AI and machine learning.

**Detailed Description:**
The text discusses the capabilities and benefits of Flex-start VMs, a novel feature integrated with Google’s Dynamic Workload Scheduler (DWS). It addresses the challenge of acquiring GPUs in times of high demand, providing an innovative solution that allows users to manage their resources more effectively. The following points outline the significance of this development:

– **Introductory Service:** Flex-start VMs allow for flexible VM requests which wait for GPU availability rather than failing immediately.
– **Queue Mechanism:** Users can request for capacity to be held in a queue for up to two hours, increasing the chances of resource access without repeated failed attempts.
– **Cost Efficiency:** The model offers significant discounts compared to traditional on-demand pricing for GPUs, making advanced resources more attainable.
– **User Experience Improvement:** The introduction of a single parameter, “request-valid-for-duration,” enhances the user experience by transforming the process of resource allocation from manual retries to a straightforward request.

**Key Features of Flex-start VMs:**
– **Direct Access via API:** Integration with Google’s API and CLI facilitates seamless implementation within existing workflows.
– **Stop and Start Controls:** Users can halt VM operations to pause costs, resuming later with an automatic reset on the seven-day runtime duration limit.
– **Configurable Termination Action:** This feature allows users to preserve VM configurations after their maximum run expires, streamlining future operational setups.

**User Testimonials:**
– **Ragnar Kjørstad from Hudson River Trading:** Highlights how Flex-start VMs improved resource utilization and workflow integration.
– **Bakai Zhamgyrchiev from Oz Forensics:** Emphasizes the tool’s reliability in accessing powerful GPUs for critical ML tasks, enhancing development cycles and security offerings.

**Conclusion:**
Flex-start VMs present a significant innovation for enterprises looking to leverage AI and machine learning efficiently by optimizing access to GPU resources. This service transforms how users manage resource allocation and introduces a fair consumption model that aligns with the demands of intensive computational tasks. Organizations can enhance their operational efficiency and reduce costs, making it a timely development for security and compliance professionals in the tech landscape.

1 10 2 3 5 7 a A10 accelerator accelerators access Act addresses ads advanced advancement after age AI AI accelerator AI accelerators ai model AI workloads air All allow alt and anti API app Arch art as at ated Auto availability batch inference Behavior being benefits Best Bi bot by C capabilities capacity challenge CI CIA cli Cloud cloud console Cloud Provider cloud providers co code command commit commitment compliance compliance professionals computation computational tasks compute Compute Engine compute resources Configuration configurations Console consumption control controls cost cost efficiency cost-effective Costs critical cross custom Customer cutting D day days de decision decisions DeFi demand development document documentation Dynamic Workload Scheduler e ease of use edge effective efficiency efficient end Engineer enterprise enterprises Entra environment ERP error EU exp experience fail fast feature features fine fine-tuning first Flex flexibility following for forensics fraud full future g Gen general Go Google Google Cloud GPU GPUs gs H hardware high Highlight HP HPC HR http HTTPS implementation in industry Inference information innovation Instance integration intensive inter io Iron IRS issue ite J js json k Key l land language learning led Li liability line load Lock logic long loop low M mac machine Machine Learning major cloud providers making man max media ML Mode model model training my N n-day nation new next no o of off on one ons operation operational operational efficiency operations OPM ops opt organization organizations ory oS oss other out over parameter per performance point Power powered pre pricing pro process product products professionals provisioning ps Q QUIC R rag Rama rate RCE re ready red release reliability research resets resource resource allocation resource utilization resources response retries Ro s saving search sec secure security security and compliance service short Sig Sim Simple single size sizes source SSE SSL STAR start state system systems T taking Task tasks tech tech landscape ted test text the Time times to tool Tor TP trading training Transform trie tuning two type UI under up ups US use use cases user user experience Users utilization V val Valid Vantage Vision vm Ware Wi workflow workflows workload workloads x XR z zone