Simon Willison’s Weblog: Structured Generation w/ SmolLM2 running in browser & WebGPU

Source URL: https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/#atom-everything
Source: Simon Willison’s Weblog
Title: Structured Generation w/ SmolLM2 running in browser & WebGPU

Feedly Summary: Structured Generation w/ SmolLM2 running in browser & WebGPU
Extraordinary demo by Vaibhav Srivastav. Here’s Hugging Face’s SmolLM2-1.7B-Instruct running directly in a web browser (using WebGPU, so requires Chrome for the moment) demonstrating structured text extraction, converting a text description of an image into a structured GitHub issue defined using JSON schema.

The page loads 924.8MB of model data (according to this script to sum up files in window.caches) and performs everything in-browser. I did not know a model this small could produce such useful results.
Here’s the source code for the demo. It’s around 200 lines of code, 50 of which are the JSON schema describing the data to be extracted.
The real secret sauce here is the web-llm by MLC. This library has made loading and executing prompts through LLMs in the browser shockingly easy, and recently incorporated support for MLC’s XGrammar library (also available in Python) which implements both JSON schema and EBNF-based structured output guidance.
Via @reach-vb.hf.co
Tags: webassembly, hugging-face, webgpu, generative-ai, mlc, ai, llms

AI Summary and Description: Yes

Summary: The text discusses an impressive demonstration of the SmolLM2 model running in a web browser using WebGPU technology, showcasing its capability for structured text extraction. This is particularly relevant for professionals in AI security and infrastructure security, as it highlights advancements in maximizing the usability of AI models within secure cloud and browser-based environments.

Detailed Description:
– The demonstration by Vaibhav Srivastav features Hugging Face’s SmolLM2-1.7B-Instruct, which executes directly in a web browser, leveraging WebGPU technology compatible with Chrome. This innovation allows for significant computational tasks to be performed locally, reducing dependencies on external servers.
– Key functionalities of the demo include:
– **Structured Text Extraction**: The model converts textual descriptions of images into structured GitHub issues defined through a JSON schema, demonstrating the practical applicability of AI in software development and issue tracking.
– **In-Browser Execution**: By loading 924.8MB of model data entirely in-browser, it indicates a shift towards more decentralized AI solutions, reducing data exposure risks typically associated with cloud computing.
– **Compact Codebase**: The demo’s source code comprises around 200 lines, showcasing the efficiency of using pre-defined JSON schemas for guiding data extraction.
– **Integration of Libraries**: The use of the web-llm library by MLC simplifies the loading and execution process of LLMs, with support for both JSON schema and EBNF-based structured output guidance, enhancing functionality and user experience.

This development underscores the importance of compliant and secure infrastructures as AI models become increasingly capable of running directly in browsers, which may reduce vulnerabilities associated with cloud environments and make powerful AI tools more accessible to developers.

– **Tags to Note**: The mention of webassembly, Hugging Face, WebGPU, generative AI, MLC, AI, and LLMs denotes the convergence of cutting-edge technologies that are shaping the future of AI and its applications in a secure digital landscape.

In summary, this content is significant for security and compliance professionals, illuminating emerging trends in the secure deployment of AI within web-based environments and the potential enhancement of development processes through structured data extraction capabilities.