Source URL: https://simonwillison.net/2025/Feb/5/s1-the-6-r1-competitor/
Source: Simon Willison’s Weblog
Title: S1: The $6 R1 Competitor?
Feedly Summary: S1: The $6 R1 Competitor?
Tim Kellogg shares his notes on a new paper, s1: Simple test-time scaling, which describes an inference-scaling model fine-tuned on top of Qwen2.5-32B-Instruct for just $6 – the cost for 26 minutes on 16 NVIDIA H100 GPUs.
Tim highlight the most exciting result:
After sifting their dataset of 56K examples down to just the best 1K, they found that the core 1K is all that’s needed to achieve o1-preview performance on a 32B model.
The paper describes a technique called “Budget forcing":
To enforce a minimum, we suppress the generation
of the end-of-thinking token delimiter and optionally append
the string “Wait” to the model’s current reasoning trace to
encourage the model to reflect on its current generation
That’s the same trick Theia Vogel described a few weeks ago.
Here’s the s1-32B model on Hugging Face. I found a GGUF version of it at brittlewis12/s1-32B-GGUF, which I ran using Ollama like so:
ollama run hf.co/brittlewis12/s1-32B-GGUF:Q4_0
I also found those 1,000 samples on Hugging Face in the simplescaling/s1K data repository there.
I used DuckDB to convert the parquet file to CSV (and turn one VARCHAR[] column into JSON):
COPY (
SELECT
solution,
question,
cot_type,
source_type,
metadata,
cot,
json_array(thinking_trajectories) as thinking_trajectories,
attempt
FROM ‘s1k-00001.parquet’
) TO ‘output.csv’ (HEADER, DELIMITER ‘,’);
Then I loaded that CSV into sqlite-utils so I could use the convert command to turn a Python data structure into JSON using json.dumps() and eval():
# Load into SQLite
sqlite-utils insert s1k.db s1k output.csv –csv
# Fix that column
sqlite-utils convert s1k.db s1u metadata ‘json.dumps(eval(value))’ –import json
# Dump that back out to CSV
sqlite-utils rows s1k.db s1k –csv > s1k.csv
Here’s that CSV in a Gist, which meas I can load it into Datasette Lite.
It really is a tiny amount of training data. It’s mostly math and science, but there are also 15 cryptic crossword examples.
Tags: duckdb, datasette-lite, inference-scaling, ai, ollama, llms, datasette, generative-ai, qwen
AI Summary and Description: Yes
Summary: The text discusses a new model called “s1: Simple test-time scaling,” which showcases an innovative inference-scaling technique fine-tuned on a generative AI model. The findings highlight significant improvements in performance with a minimal dataset, making notable contributions to the field of AI, particularly in scaling and efficiency.
Detailed Description:
The text outlines a collaborative analysis by Tim Kellogg regarding a technical paper on the s1: Simple test-time scaling model, specifically how it leverages the Qwen2.5-32B-Instruct model for cost-effective inference scaling. Below are the key points of interest:
– **Cost Efficiency**: The model achieves impressive inference capabilities for just $6 over 26 minutes using 16 NVIDIA H100 GPUs.
– **Data Reduction**: Through rigorous filtering, only 1,000 high-quality examples from an initial dataset of 56,000 were necessary to reach performance benchmarks comparable to a 32 billion parameter model. This indicates a potential shift towards reducing the need for large datasets in training neural networks.
– **Technique – Budget Forcing**: The paper introduces “Budget forcing,” a method to encourage deeper reasoning in AI model outputs by manipulating token generation. This involves suppressing the end-of-thinking token and appending prompts to facilitate reflective generation.
– **Model Accessibility**: The s1-32B model can be accessed via Hugging Face, which indicates the importance of community and collaborative resources in the AI domain.
– **Data Handling Tools**: The author utilizes DuckDB and SQLite to manage and transform data formats (from parquet to CSV and JSON), illustrating the practical applications of data management tools in AI.
– **Training Data Composition**: The reduced dataset predominantly includes math and science problems, alongside some unique examples, showing a tailored focus for model training.
Implications for security and compliance professionals:
– **Efficiency in AI Deployment**: Understanding the implications of reduced training data may assist professionals in managing AI-related resources more effectively and responsibly, which could influence data governance strategies.
– **Data Handling and Management**: The methods applied in handling data (DuckDB, SQLite) highlight the importance of robust data management practices, which are critical for any compliance and security frameworks concerning data privacy and integrity in AI applications.
– **Scalability Innovations**: Lastly, the advancements in inference scaling may prompt further investigation into AI deployment strategies, including cost-effectiveness and performance metrics critical for compliance and security assessments in cloud computing environments.
Overall, the findings underscore a significant advancement in generative AI, which can lead to broader implications for cloud-based infrastructures and their security protocols.