vision-llms – Page 5 – Experimental News Clipping Site

Simon Willison’s Weblog: You can now run prompts against images, audio and video in your terminal using LLM

Oct 29, 2024

—

by

Source URL: https://simonwillison.net/2024/Oct/29/llm-multi-modal/#atom-everything Source: Simon Willison’s Weblog Title: You can now run prompts against images, audio and video in your terminal using LLM Feedly Summary: I released LLM 0.17 last night, the latest version of my combined CLI tool and Python library for interacting with hundreds of different Large Language Models such as GPT-4o, Llama,…

Simon Willison’s Weblog: LLM Pictionary

Oct 26, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Oct/26/llm-pictionary/ Source: Simon Willison’s Weblog Title: LLM Pictionary Feedly Summary: LLM Pictionary Inspired by my SVG pelicans on a bicycle, Paul Calcraft built this brilliant system where different vision LLMs can play Pictionary with each other, taking it in turns to progressively draw SVGs while the other models see if they can guess…

Simon Willison’s Weblog: Running prompts against images and PDFs with Google Gemini

Oct 23, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Oct/23/prompt-gemini/#atom-everything Source: Simon Willison’s Weblog Title: Running prompts against images and PDFs with Google Gemini Feedly Summary: Running prompts against images and PDFs with Google Gemini New TIL. I’ve been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) –…

Simon Willison’s Weblog: mistral.rs

Oct 19, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Oct/19/mistralrs/#atom-everything Source: Simon Willison’s Weblog Title: mistral.rs Feedly Summary: mistral.rs Here’s an LLM inference library written in Rust. It’s not just for that one family of models – like how llama.cpp has grown beyond Llama, mistral.rs has grown beyond Mistral. This is the first time I’ve been able to run the Llama 3.2…

Simon Willison’s Weblog: mlx-vlm

Sep 29, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Sep/29/mlx-vlm/#atom-everything Source: Simon Willison’s Weblog Title: mlx-vlm Feedly Summary: mlx-vlm The MLX ecosystem of libraries for running machine learning models on Apple Silicon continues to expand. Prince Canuma is actively developing this library for running vision models such as Qwen-2 VL and Pixtral and LLaVA using Python running on a Mac. I used…

Simon Willison’s Weblog: Gemini Bounding Box Visualization

Aug 26, 2024

—

by

system automation

in Uncategorized

Source URL: https://simonwillison.net/2024/Aug/26/gemini-bounding-box-visualization/#atom-everything Source: Simon Willison’s Weblog Title: Gemini Bounding Box Visualization Feedly Summary: Gemini Bounding Box Visualization Here’s another fun tool I built with the help of Claude 3.5 Sonnet. I was browsing through Google’s Gemini documentation while researching how different multi-model LLM APIs work when I stumbled across this note in the vision…

Tag: vision-llms

Simon Willison’s Weblog: You can now run prompts against images, audio and video in your terminal using LLM

Simon Willison’s Weblog: LLM Pictionary

Simon Willison’s Weblog: Running prompts against images and PDFs with Google Gemini

Simon Willison’s Weblog: mistral.rs

Simon Willison’s Weblog: mlx-vlm

Simon Willison’s Weblog: Gemini Bounding Box Visualization