Tag: model serving
-
Simon Willison’s Weblog: Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens
Source URL: https://simonwillison.net/2025/Jan/26/qwen25-1m/ Source: Simon Willison’s Weblog Title: Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens Feedly Summary: Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens Very significant new release from Alibaba’s Qwen team. Their openly licensed (sometimes Apache 2, sometimes Qwen license, I’ve had trouble keeping…
-
Hacker News: Max GPU: A new GenAI native serving stac
Source URL: https://www.modular.com/blog/introducing-max-24-6-a-gpu-native-generative-ai-platform Source: Hacker News Title: Max GPU: A new GenAI native serving stac Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the introduction of MAX 24.6 and MAX GPU, a cutting-edge infrastructure platform designed specifically for Generative AI workloads. It emphasizes innovations in AI infrastructure aimed at improving performance…
-
Cloud Blog: PyTorch/XLA 2.5: vLLM support and an improved developer experience
Source URL: https://cloud.google.com/blog/products/ai-machine-learning/whats-new-with-pytorchxla-2-5/ Source: Cloud Blog Title: PyTorch/XLA 2.5: vLLM support and an improved developer experience Feedly Summary: Machine learning engineers are bullish on PyTorch/XLA, a Python package that uses the XLA deep learning compiler to connect the PyTorch deep learning framework and Cloud TPUs. And now, PyTorch/XLA 2.5 is here, along with a set…