model serving – Page 2 – Experimental News Clipping Site

Cloud Blog: Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

Mar 7, 2025

—

by

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/ai-hypercomputer-4-use-cases-tutorials-and-guides/ Source: Cloud Blog Title: Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials Feedly Summary: AI Hypercomputer is a fully integrated supercomputing architecture for AI workloads – and it’s easier to use than you think. In this blog, we break down four common use cases, including reference architectures and…

Cloud Blog: How to calculate your AI costs on Google Cloud

Mar 3, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/topics/cost-management/unlock-the-true-cost-of-enterprise-ai-on-google-cloud/ Source: Cloud Blog Title: How to calculate your AI costs on Google Cloud Feedly Summary: What is the true cost of enterprise AI? As a technology leader and a steward of company resources, understanding these costs isn’t just prudent – it’s essential for sustainable AI adoption. To help, we’ll unveil a comprehensive…

Cloud Blog: Transforming data: How Vodafone Italy modernized its data architecture in the cloud

Feb 26, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/topics/telecommunications/vodafone-italy-modernizes-with-amdocs-and-google-cloud/ Source: Cloud Blog Title: Transforming data: How Vodafone Italy modernized its data architecture in the cloud Feedly Summary: Vodafone Italy is reshaping its operations by building a modernized, AI-ready data architecture on Google Cloud, designed to enhance process efficiency, scalability, and real-time data processing. This transformation, powered by Vodafone Italy’s cloud-based platform…

Cloud Blog: An SRE’s guide to optimizing ML systems with MLOps pipelines

Feb 20, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/devops-sre/applying-sre-principles-to-your-mlops-pipelines/ Source: Cloud Blog Title: An SRE’s guide to optimizing ML systems with MLOps pipelines Feedly Summary: Picture this: you’re an Site Reliability Engineer (SRE) responsible for the systems that power your company’s machine learning (ML) services. What do you do to ensure you have a reliable ML service, how do you know…

Simon Willison’s Weblog: Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens

Jan 26, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://simonwillison.net/2025/Jan/26/qwen25-1m/ Source: Simon Willison’s Weblog Title: Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens Feedly Summary: Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens Very significant new release from Alibaba’s Qwen team. Their openly licensed (sometimes Apache 2, sometimes Qwen license, I’ve had trouble keeping…

Hacker News: Max GPU: A new GenAI native serving stac

Dec 17, 2024

—

by

system automation

in Uncategorized

Source URL: https://www.modular.com/blog/introducing-max-24-6-a-gpu-native-generative-ai-platform Source: Hacker News Title: Max GPU: A new GenAI native serving stac Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the introduction of MAX 24.6 and MAX GPU, a cutting-edge infrastructure platform designed specifically for Generative AI workloads. It emphasizes innovations in AI infrastructure aimed at improving performance…

Cloud Blog: How to deploy and serve multi-host gen AI large open models over GKE

Nov 8, 2024

—

by

system automation

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/deploy-and-serve-open-models-over-google-kubernetes-engine/ Source: Cloud Blog Title: How to deploy and serve multi-host gen AI large open models over GKE Feedly Summary: Context As generative AI experiences explosive growth fueled by advancements in LLMs (Large Language Models), access to open models is more critical than ever for developers. Open models are publicly available pre-trained foundational…

Cloud Blog: PyTorch/XLA 2.5: vLLM support and an improved developer experience

Oct 31, 2024

—

by

system automation

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/whats-new-with-pytorchxla-2-5/ Source: Cloud Blog Title: PyTorch/XLA 2.5: vLLM support and an improved developer experience Feedly Summary: Machine learning engineers are bullish on PyTorch/XLA, a Python package that uses the XLA deep learning compiler to connect the PyTorch deep learning framework and Cloud TPUs. And now, PyTorch/XLA 2.5 is here, along with a set…

Tag: model serving