Source URL: https://cloud.google.com/blog/products/containers-kubernetes/run-openais-new-gpt-oss-model-at-scale-with-gke/
Source: Cloud Blog
Title: Run OpenAI’s new gpt-oss model at scale with Google Kubernetes Engine
Feedly Summary: It’s exciting to see OpenAI contribute to the open ecosystem with the release of their new open weights model, gpt-oss. In keeping with our commitment to provide the best platform for open AI innovation, we’re announcing immediate support for deploying gpt-oss-120b and gpt-oss-20b on Google Kubernetes Engine (GKE). To help customers make informed decisions while deploying their infrastructure, we’re giving customers detailed benchmarks of gpt-oss-120b on accelerators on Google Cloud. You can access it here.This continues our support for a broad and diverse ecosystem of models, from Google’s own Gemma family, to models like Llama 4, and now, OpenAI’s gpt-oss. We believe that offering choice and leveraging the best of the open community is critical for the future of AI.Run demanding AI workloads at scaleThe new gpt-oss model is expected to be large and require significant computational power, likely needing multiple NVIDIA H100 Tensor Core GPUs for optimal performance. This is where Google Cloud and GKE shine. GKE is designed to handle large-scale, mission-critical workloads, providing the scalability and performance needed to serve today’s most demanding models. With GKE, you can leverage Google Cloud’s advanced infrastructure, including both GPU and TPU accelerators, to power your generative AI applications.
aside_block
Get started in minutes with GKE Inference QuickstartTo make deploying gpt-oss as simple as possible, we have made optimized configurations available through our GKE Inference Quickstart (GIQ) tool. GIQ provides validated, performance-tuned deployment recipes that let you serve state-of-the-art models with just a few clicks. Instead of manually configuring complex YAML files, you can use our pre-built configurations (refer to the screenshots below) to get up and running quickly.
GKE Inference Quickstart provides benchmarking and quick-start capabilities to ensure you are running with the best possible performance. You can learn more about how to use it in our official documentation.
You can also deploy the new OpenAI gpt-oss model via the gcloud CLI by simply setting up access to the weights from the OpenAI organization on Hugging Face and use the gcloud CLI to deploy the model on a GKE cluster with the appropriate accelerators. For example:
code_block
<ListValue: [StructValue([(‘code’, ‘gcloud alpha container ai profiles manifests create \\\r\n –model=google/gpt-oss-20b \\\r\n –model-server=vllm \\\r\n –accelerator-type=nvidia-h100-80gb \\\r\n –target-ntpot-milliseconds=200’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f38673cdd30>)])]>
Our commitment to open modelsOur support for gpt-oss is part of a broader, systematic effort to bring the most popular open models to GKE as soon as they are released while also giving customers detailed benchmarks to make informed choices while deploying their infrastructure. Get started with the new OpenAI gpt-oss model on GKE today.
AI Summary and Description: Yes
Summary: The announcement highlights OpenAI’s release of the gpt-oss model and Google Cloud’s immediate support for deploying it on Google Kubernetes Engine (GKE). This move aims to facilitate infrastructure deployment for AI workloads, enabling users to leverage powerful computing resources effectively.
Detailed Description: The text discusses OpenAI’s gpt-oss model and its integration into Google Cloud’s infrastructure, specifically through GKE. Key points include:
– **Release of gpt-oss Models**: OpenAI has introduced the gpt-oss models, named gpt-oss-120b and gpt-oss-20b, contributing to an open ecosystem of AI models.
– **Deployment on GKE**:
– Google Cloud is immediately supporting these models, emphasizing the platform’s commitment to open AI innovation.
– GKE is designed to manage large-scale, mission-critical workloads, which is essential for running demanding AI models.
– **Computational Requirements**:
– The gpt-oss model is expected to be resource-intensive, likely necessitating multiple NVIDIA H100 Tensor Core GPUs for effective performance.
– **Performance and Scalability**:
– Google Cloud’s infrastructure supports both GPU and TPU accelerators, providing the scalability and performance needed for generative AI applications.
– **GKE Inference Quickstart (GIQ)**:
– Google offers GIQ, which provides optimized configurations and validated deployment recipes to simplify the deployment process.
– Users can quickly start serving state-of-the-art models without manual YAML configurations.
– **Benchmarking and CLI Deployment**:
– Customers receive detailed benchmarks to assist in making informed decisions regarding model deployment on their infrastructure.
– Users have the option to deploy the gpt-oss model via the gcloud CLI with a straightforward command setup.
– **Commitment to Open Models**:
– Google Cloud’s ongoing support for popular open models like gpt-oss reflects a systematic effort to integrate and provide benchmarks for these resources.
This announcement provides essential insights into how organizations can effectively deploy and scale AI workloads using Google Cloud, reinforcing the significance of infrastructure and support for open model ecosystems in AI development.