inference engine – Experimental News Clipping Site

Cloud Blog: 150 of the latest AI use cases from leading startups and digital natives

Oct 8, 2025

—

by

Source URL: https://cloud.google.com/blog/topics/startups/150-ai-use-cases-leading-startups-and-digital-natives/ Source: Cloud Blog Title: 150 of the latest AI use cases from leading startups and digital natives Feedly Summary: We recently hosted our first-ever AI Builders Forum, where we gathered with hundreds of the top founders, VCs, advisors, researchers, and teams powering the startups who are building the future with AI. And…

Docker: Docker MCP Toolkit: MCP Servers That Just Work

Sep 29, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.docker.com/blog/mcp-toolkit-mcp-servers-that-just-work/ Source: Docker Title: Docker MCP Toolkit: MCP Servers That Just Work Feedly Summary: Today, we want to highlight Docker MCP Toolkit, a free feature in Docker Desktop that gives you access to more than 200 MCP servers. It’s the easiest and most secure way to run MCP servers locally for your AI…

Cloud Blog: Fast and efficient AI inference with new NVIDIA Dynamo recipe on AI Hypercomputer

Sep 10, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/compute/ai-inference-recipe-using-nvidia-dynamo-with-ai-hypercomputer/ Source: Cloud Blog Title: Fast and efficient AI inference with new NVIDIA Dynamo recipe on AI Hypercomputer Feedly Summary: As generative AI becomes more widespread, it’s important for developers and ML engineers to be able to easily configure infrastructure that supports efficient AI inference, i.e., using a trained AI model to make…

The Cloudflare Blog: How Cloudflare runs more AI models on fewer GPUs: A technical deep-dive

Aug 27, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://blog.cloudflare.com/how-cloudflare-runs-more-ai-models-on-fewer-gpus/ Source: The Cloudflare Blog Title: How Cloudflare runs more AI models on fewer GPUs: A technical deep-dive Feedly Summary: Cloudflare built an internal platform called Omni. This platform uses lightweight isolation and memory over-commitment to run multiple AI models on a single GPU. AI Summary and Description: Yes Summary: The text discusses…

The Cloudflare Blog: How we built the most efficient inference engine for Cloudflare’s network

Aug 27, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://blog.cloudflare.com/cloudflares-most-efficient-ai-inference-engine/ Source: The Cloudflare Blog Title: How we built the most efficient inference engine for Cloudflare’s network Feedly Summary: Infire is an LLM inference engine that employs a range of techniques to maximize resource utilization, allowing us to serve AI models more efficiently with better performance for Cloudflare workloads. AI Summary and Description:…

Docker: Powering Local AI Together: Docker Model Runner on Hugging Face

Jul 16, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.docker.com/blog/docker-model-runner-on-hugging-face/ Source: Docker Title: Powering Local AI Together: Docker Model Runner on Hugging Face Feedly Summary: At Docker, we always believe in the power of community and collaboration. It reminds me of what Robert Axelrod said in The Evolution of Cooperation: “The key to doing well lies not in overcoming others, but in…

Docker: Why Docker Chose OCI Artifacts for AI Model Packaging

Jun 18, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.docker.com/blog/why-docker-chose-oci-artifacts-for-ai-model-packaging/ Source: Docker Title: Why Docker Chose OCI Artifacts for AI Model Packaging Feedly Summary: As AI development accelerates, developers need tools that let them move fast without having to reinvent their workflows. Docker Model Runner introduces a new specification for packaging large language models (LLMs) as OCI artifacts — a format developers…

Docker: Behind the scenes: How we designed Docker Model Runner and what’s next

Jun 18, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.docker.com/blog/behind-the-scenes-how-we-designed-docker-model-runner-and-whats-next/ Source: Docker Title: Behind the scenes: How we designed Docker Model Runner and what’s next Feedly Summary: The last few years have made it clear that AI models will continue to be a fundamental component of many applications. The catch is that they’re also a fundamentally different type of component, with complex…

Cloud Blog: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes

Jun 6, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/deploying-llama4-and-deepseek-on-ai-hypercomputer/ Source: Cloud Blog Title: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes Feedly Summary: The pace of innovation in open-source AI is breathtaking, with models like Meta’s Llama4 and DeepSeek AI’s DeepSeek. However, deploying and optimizing large, powerful models can be complex and resource-intensive. Developers and…

Cloud Blog: Introducing the next generation of AI inference, powered by llm-d

May 20, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/enhancing-vllm-for-distributed-inference-with-llm-d/ Source: Cloud Blog Title: Introducing the next generation of AI inference, powered by llm-d Feedly Summary: As the world transitions from prototyping AI solutions to deploying AI at scale, efficient AI inference is becoming the gating factor. Two years ago, the challenge was the ever-growing size of AI models. Cloud infrastructure providers…

Tag: inference engine