Tag: Huggingface
-
Simon Willison’s Weblog: Load Llama-3.2 WebGPU in your browser from a local folder
Source URL: https://simonwillison.net/2025/Sep/8/webgpu-local-folder/#atom-everything Source: Simon Willison’s Weblog Title: Load Llama-3.2 WebGPU in your browser from a local folder Feedly Summary: Load Llama-3.2 WebGPU in your browser from a local folder Inspired by a comment on Hacker News I decided to see if it was possible to modify the transformers.js-examples/tree/main/llama-3.2-webgpu Llama 3.2 chat demo (online here,…
-
Cloud Blog: Scalable AI starts with storage: Guide to model artifact strategies
Source URL: https://cloud.google.com/blog/topics/developers-practitioners/scalable-ai-starts-with-storage-guide-to-model-artifact-strategies/ Source: Cloud Blog Title: Scalable AI starts with storage: Guide to model artifact strategies Feedly Summary: Managing large model artifacts is a common bottleneck in MLOps. Baking models into container images leads to slow, monolithic deployments, and downloading them at startup introduces significant delays. This guide explores a better way: decoupling your…
-
Simon Willison’s Weblog: My 2.5 year old laptop can write Space Invaders in JavaScript now
Source URL: https://simonwillison.net/2025/Jul/29/space-invaders/ Source: Simon Willison’s Weblog Title: My 2.5 year old laptop can write Space Invaders in JavaScript now Feedly Summary: I wrote about the new GLM-4.5 model family yesterday – new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such as…
-
Cloud Blog: Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough
Source URL: https://cloud.google.com/blog/topics/developers-practitioners/implementing-high-performance-llm-serving-on-gke-an-inference-gateway-walkthrough/ Source: Cloud Blog Title: Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough Feedly Summary: The excitement around open Large Language Models like Gemma, Llama, Mistral, and Qwen is evident, but developers quickly hit a wall. How do you deploy them effectively at scale? Traditional load balancing algorithms fall short, as…
-
Cloud Blog: How to enable real time semantic search and RAG applications with Dataflow ML
Source URL: https://cloud.google.com/blog/topics/developers-practitioners/create-and-retrieve-embeddings-with-a-few-lines-of-dataflow-ml-code/ Source: Cloud Blog Title: How to enable real time semantic search and RAG applications with Dataflow ML Feedly Summary: Embeddings are a cornerstone of modern semantic search and Retrieval Augmented Generation (RAG) applications. In short, they enable applications to understand and interact with information on a deeper, conceptual level. In this post,…
-
Simon Willison’s Weblog: moonshotai/Kimi-K2-Instruct
Source URL: https://simonwillison.net/2025/Jul/11/kimi-k2/#atom-everything Source: Simon Willison’s Weblog Title: moonshotai/Kimi-K2-Instruct Feedly Summary: moonshotai/Kimi-K2-Instruct Colossal new open weights model release today from Moonshot AI, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the Moon. My HuggingFace storage calculator says the repository is 958.52 GB. It’s a…
-
Cisco Security Blog: Securing an Exponentially Growing (AI) Supply Chain
Source URL: https://feedpress.me/link/23535/17085587/securing-an-exponentially-growing-ai-supply-chain Source: Cisco Security Blog Title: Securing an Exponentially Growing (AI) Supply Chain Feedly Summary: Foundation AI’s Cerberus is a 24/7 guard for the AI supply chain, analyzing models as they enter HuggingFace and sharing results to Cisco Security products. AI Summary and Description: Yes Summary: Foundation AI’s Cerberus introduces a continuous monitoring…
-
Simon Willison’s Weblog: model.yaml
Source URL: https://simonwillison.net/2025/Jun/21/model-yaml/#atom-everything Source: Simon Willison’s Weblog Title: model.yaml Feedly Summary: model.yaml From their GitHub repo it looks like this effort quietly launched a couple of months ago, driven by the LM Studio team. Their goal is to specify an “open standard for defining crossplatform, composable AI models". A model can be defined using a…
-
Simon Willison’s Weblog: Comma v0.1 1T and 2T – 7B LLMs trained on openly licensed text
Source URL: https://simonwillison.net/2025/Jun/7/comma/#atom-everything Source: Simon Willison’s Weblog Title: Comma v0.1 1T and 2T – 7B LLMs trained on openly licensed text Feedly Summary: It’s been a long time coming, but we finally have some promising LLMs to try out which are trained entirely on openly licensed text! EleutherAI released the Pile four and a half…
-
Cloud Blog: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes
Source URL: https://cloud.google.com/blog/products/ai-machine-learning/deploying-llama4-and-deepseek-on-ai-hypercomputer/ Source: Cloud Blog Title: Accelerate your gen AI: Deploy Llama4 & DeepSeek on AI Hypercomputer with new recipes Feedly Summary: The pace of innovation in open-source AI is breathtaking, with models like Meta’s Llama4 and DeepSeek AI’s DeepSeek. However, deploying and optimizing large, powerful models can be complex and resource-intensive. Developers and…