Tag: coding agent

  • Scott Logic: Delegating the Grunt Work: AI Agents for UI Test Development

    Source URL: https://blog.scottlogic.com/2025/10/06/delegating-grunt-work.html Source: Scott Logic Title: Delegating the Grunt Work: AI Agents for UI Test Development Feedly Summary: UI automation testing is valuable but time-consuming, with on-going maintenance resulting from fragile selectors, asynchronous behaviors, and complex test paths. This blog post explores whether we can release ourselves from this burden by delegating it to…

  • Simon Willison’s Weblog: Cross-Agent Privilege Escalation: When Agents Free Each Other

    Source URL: https://simonwillison.net/2025/Sep/24/cross-agent-privilege-escalation/ Source: Simon Willison’s Weblog Title: Cross-Agent Privilege Escalation: When Agents Free Each Other Feedly Summary: Cross-Agent Privilege Escalation: When Agents Free Each Other Here’s a clever new form of AI exploit from Johann Rehberger, who has coined the term Cross-Agent Privilege Escalation to describe an attack where multiple coding agents – GitHub…

  • Simon Willison’s Weblog: CompileBench: Can AI Compile 22-year-old Code?

    Source URL: https://simonwillison.net/2025/Sep/22/compilebench/ Source: Simon Willison’s Weblog Title: CompileBench: Can AI Compile 22-year-old Code? Feedly Summary: CompileBench: Can AI Compile 22-year-old Code? Interesting new LLM benchmark from Piotr Grabowski and Piotr Migdał: how well can different models handle compilation challenges such as cross-compiling gucr for ARM64 architecture? This is one of my favorite applications of…

  • Simon Willison’s Weblog: httpjail

    Source URL: https://simonwillison.net/2025/Sep/19/httpjail/#atom-everything Source: Simon Willison’s Weblog Title: httpjail Feedly Summary: httpjail Here’s a promising new (experimental) project in the sandboxing space from Ammar Bandukwala at Coder. httpjail provides a Rust CLI tool for running an individual process against a custom configured HTTP proxy. The initial goal is to help run coding agents like Claude…

  • Docker: How to Build Secure AI Coding Agents with Cerebras and Docker Compose

    Source URL: https://www.docker.com/blog/cerebras-docker-compose-secure-ai-coding-agents/ Source: Docker Title: How to Build Secure AI Coding Agents with Cerebras and Docker Compose Feedly Summary: In the recent article, Building Isolated AI Code Environments with Cerebras and Docker Compose, our friends at Cerebras showcased how one can build a coding agent to use worlds fastest Cerebras’ AI inference API, Docker…

  • Simon Willison’s Weblog: GPT‑5-Codex and upgrades to Codex

    Source URL: https://simonwillison.net/2025/Sep/15/gpt-5-codex/#atom-everything Source: Simon Willison’s Weblog Title: GPT‑5-Codex and upgrades to Codex Feedly Summary: GPT‑5-Codex and upgrades to Codex OpenAI half-released a new model today: GPT‑5-Codex, a fine-tuned GPT-5 variant explicitly designed for their various AI-assisted programming tools. I say half-released because it’s not yet available via their API, but they “plan to make…

  • Simon Willison’s Weblog: Kimi-K2-Instruct-0905

    Source URL: https://simonwillison.net/2025/Sep/6/kimi-k2-instruct-0905/#atom-everything Source: Simon Willison’s Weblog Title: Kimi-K2-Instruct-0905 Feedly Summary: Kimi-K2-Instruct-0905 New not-quite-MIT licensed model from Chinese Moonshot AI, a follow-up to the highly regarded Kimi-K2 model they released in July. This one is an incremental improvement – I’ve seen it referred to online as “Kimi K-2.1". It scores a little higher on a…