model serving – Experimental News Clipping Site

Cloud Blog: Connect Spark data pipelines to Gemini and other AI models with Dataproc ML library

Oct 3, 2025

—

by

Source URL: https://cloud.google.com/blog/products/data-analytics/gemini-and-vertex-ai-for-spark-with-dataproc-ml-library/ Source: Cloud Blog Title: Connect Spark data pipelines to Gemini and other AI models with Dataproc ML library Feedly Summary: Many data science teams rely on Apache Spark running on Dataproc managed clusters for powerful, large-scale data preparation. As these teams look to connect their data pipelines directly to machine learning models,…

Cloud Blog: Scaling high-performance inference cost-effectively

Sep 10, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/gke-inference-gateway-and-quickstart-are-ga/ Source: Cloud Blog Title: Scaling high-performance inference cost-effectively Feedly Summary: At Google Cloud Next 2025, we announced new inference capabilities with GKE Inference Gateway, including support for vLLM on TPUs, Ironwood TPUs, and Anywhere Cache. Our inference solution is based on AI Hypercomputer, a system built on our experience running models like…

Simon Willison’s Weblog: Anthropic status: Model output quality

Sep 9, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Sep/9/anthropic-model-output-quality/ Source: Simon Willison’s Weblog Title: Anthropic status: Model output quality Feedly Summary: Anthropic status: Model output quality Anthropic previously reported model serving bugs that affected Claude Opus 4 and 4.1 for 56.5 hours. They’ve now fixed additional bugs affecting “a small percentage" of Sonnet 4 requests for almost a month, plus a…

Cloud Blog: Start and scale your apps faster with improved container image streaming in GKE

Aug 13, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/containers-kubernetes/improving-gke-container-image-streaming-for-faster-app-startup/ Source: Cloud Blog Title: Start and scale your apps faster with improved container image streaming in GKE Feedly Summary: In today’s fast-paced cloud-native world, the speed at which your applications can start and scale is paramount. Faster pod startup times mean quicker responses to user demand, more efficient resource utilization, and a…

Cloud Blog: Your guide to taking an open model from discovery to a production-ready endpoint on Vertex AI

Jul 25, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/take-an-open-model-from-discovery-to-endpoint-on-vertex-ai/ Source: Cloud Blog Title: Your guide to taking an open model from discovery to a production-ready endpoint on Vertex AI Feedly Summary: Developers building with gen AI are increasingly drawn to open models for their power and flexibility. But customizing and deploying them can be a huge challenge. You’re often left wrestling…

Cloud Blog: Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough

Jul 16, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/topics/developers-practitioners/implementing-high-performance-llm-serving-on-gke-an-inference-gateway-walkthrough/ Source: Cloud Blog Title: Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough Feedly Summary: The excitement around open Large Language Models like Gemma, Llama, Mistral, and Qwen is evident, but developers quickly hit a wall. How do you deploy them effectively at scale? Traditional load balancing algorithms fall short, as…

Cloud Blog: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes

Jun 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/deploying-llama4-and-deepseek-on-ai-hypercomputer/ Source: Cloud Blog Title: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes Feedly Summary: The pace of innovation in open-source AI is breathtaking, with models like Meta’s Llama4 and DeepSeek AI’s DeepSeek. However, deploying and optimizing large, powerful models can be complex and resource-intensive. Developers and…

The Register: A closer look at Dynamo, Nvidia’s ‘operating system’ for AI inference

Mar 23, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.theregister.com/2025/03/23/nvidia_dynamo/ Source: The Register Title: A closer look at Dynamo, Nvidia’s ‘operating system’ for AI inference Feedly Summary: GPU goliath claims tech can boost throughput by 2x for Hopper, up to 30x for Blackwell GTC Nvidia’s Blackwell Ultra and upcoming Vera and Rubin CPUs and GPUs dominated the conversation at the corp’s GPU…

Cloud Blog: Accelerate AI/ML workloads using Cloud Storage hierarchical namespace

Mar 17, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/storage-data-transfer/cloud-storage-hierarchical-namespace-improves-aiml-checkpointing/ Source: Cloud Blog Title: Accelerate AI/ML workloads using Cloud Storage hierarchical namespace Feedly Summary: As AI and machine learning (ML) workloads continue to grow, the infrastructure supporting them must evolve to meet their unique demands. Here on the Google Cloud Storage team, we’re committed to providing AI/ML practitioners with tools to optimize…

Cloud Blog: Announcing Gemma 3 on Vertex AI

Mar 12, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/announcing-gemma-3-on-vertex-ai/ Source: Cloud Blog Title: Announcing Gemma 3 on Vertex AI Feedly Summary: Today, we’re sharing the new Gemma 3 model is available on Vertex AI Model Garden, giving you immediate access for fine-tuning and deployment. You can quickly adapt Gemma 3 to your use case using Vertex AI’s pre-built containers and deployment…

Tag: model serving