Source URL: https://simonwillison.net/2025/Jul/29/space-invaders/
Source: Simon Willison’s Weblog
Title: My 2.5 year old laptop can write Space Invaders in JavaScript now
Feedly Summary: I wrote about the new GLM-4.5 model family yesterday – new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such as Claude Sonnet 4.
The models are pretty big – the smaller GLM-4.5 Air model is still 106 billion total parameters, which is 205.78GB on Hugging Face.
Ivan Fioravanti built this 44GB 3bit quantized version for MLX, specifically sized so people with 64GB machines could have a chance of running it. I tried it out… and it works extremely well.
I fed it the following prompt:
Write an HTML and JavaScript page implementing space invaders
And it churned away for a while and produced the following:
Clearly this isn’t a particularly novel example, but I still think it’s noteworthy that a model running on my 2.5 year old laptop (a 64GB MacBook Pro M2) is able to produce code like this – especially code that worked first time with no further edits needed.
How I ran the model
I had to run it using the current main branch of the mlx-lm library (to ensure I had this commit adding glm4_moe support). I ran that using uv like this:
uv run \
–with ‘https://github.com/ml-explore/mlx-lm/archive/489e63376b963ac02b3b7223f778dbecc164716b.zip’ \
python
Then in that Python interpreter I used the standard recipe for running MLX models:
from mlx_lm import load, generate
model, tokenizer = load(“mlx-community/GLM-4.5-Air-3bit")
That downloaded 44GB of model weights to my ~/.cache/huggingface/hub/models–mlx-community–GLM-4.5-Air-3bit folder.
Then:
prompt = "Write an HTML and JavaScript page implementing space invaders"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True
)
response = generate(
model, tokenizer,
prompt=prompt,
verbose=True,
max_tokens=8192
)
The response started like this:
Player spaceship that can move left/right and shoot
Enemy invaders that move in formation and shoot back
Score tracking
Lives/health system
Game over conditions […]
Followed by the HTML and this debugging output:
Prompt: 14 tokens, 14.095 tokens-per-sec
Generation: 4193 tokens, 25.564 tokens-per-sec
Peak memory: 47.687 GB
You can see the full transcript here, or view the source on GitHub, or try it out in your browser.
A pelican for good measure
I ran my pelican benchmark against the full sized models yesterday, but I couldn’t resist trying it against this smaller 3bit model. Here’s what I got for "Generate an SVG of a pelican riding a bicycle":
Here’s the transcript for that.
In both cases the model used around 48GB of RAM at peak, leaving me with just 16GB for everything else – I had to quit quite a few apps in order to get the model to run but the speed was pretty good once it got going.
Local coding models are really good now
It’s interesting how almost every model released in 2025 has specifically targeting coding. That focus has clearly been paying off: these coding models are getting really good now.
Two years ago when I first tried LLaMA I never dreamed that the same laptop I was using then would one day be able to run models with capabilities as strong as what I’m seeing from GLM 4.5 Air – and Mistral 3.2 Small, and Gemma 3, and Qwen 3, and a host of other high quality models that have emerged over the past six months.
Tags: python, ai, generative-ai, local-llms, llms, ai-assisted-programming, uv, mlx, pelican-riding-a-bicycle
AI Summary and Description: Yes
Summary: The text discusses the new GLM-4.5 model family from Z.ai, highlighting its capabilities in coding and ease of use on standard hardware, which has significant implications for AI-assisted programming and the accessibility of advanced AI models for developers.
Detailed Description: The content provides insights into the latest developments in generative AI models, specifically the GLM-4.5 family released by Z.ai. Here are the major points from the text:
– **Model Specifications**:
– The GLM-4.5 models are described as large-scale, with the smallest variant (GLM-4.5 Air) possessing 106 billion parameters and requiring approximately 205.78GB of storage on Hugging Face.
– An optimized 3bit quantized version (44GB) was created to allow users with 64GB machines to run it effectively.
– **Practical Application**:
– The author successfully executed the model on a two-and-a-half-year-old laptop with 64GB RAM, generating functional code for a Space Invaders game in HTML and JavaScript on the first try.
– The ease of use and efficiency demonstrated by generating code without needing further edits signifies the model’s advanced capabilities in AI-assisted programming.
– **Technical Execution**:
– The process of running the model involved utilizing the main branch of the MLX library with specific commands in Python, showcasing the accessibility of the technology to developers familiar with programming environments.
– **Performance Insights**:
– The model’s response time and memory usage are detailed, noting its peak memory usage of around 48GB, which limits available resources for other applications during execution.
– The text emphasizes the growing efficiency and capability of coding models released recently, indicating a trend toward improved performance in AI-assisted coding tasks.
– **Industry Trends**:
– The author reflects on the progression of AI models, noting that there is a growing focus on coding applications in generative AI models released in 2025, suggesting a significant shift towards tools that aid developers.
These points indicate the relevance of the GLM-4.5 model for professionals in AI, software security, and cloud computing, as it reveals advancements in generative AI’s capabilities and accessibility for programming tasks. The availability and efficacy of such models can influence software development processes, streamline coding tasks, and promote further innovations in AI-assisted programming.