Tag: vision-llms

  • Simon Willison’s Weblog: Running prompts against images and PDFs with Google Gemini

    Source URL: https://simonwillison.net/2024/Oct/23/prompt-gemini/#atom-everything Source: Simon Willison’s Weblog Title: Running prompts against images and PDFs with Google Gemini Feedly Summary: Running prompts against images and PDFs with Google Gemini New TIL. I’ve been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) –…

  • Simon Willison’s Weblog: mistral.rs

    Source URL: https://simonwillison.net/2024/Oct/19/mistralrs/#atom-everything Source: Simon Willison’s Weblog Title: mistral.rs Feedly Summary: mistral.rs Here’s an LLM inference library written in Rust. It’s not just for that one family of models – like how llama.cpp has grown beyond Llama, mistral.rs has grown beyond Mistral. This is the first time I’ve been able to run the Llama 3.2…

  • Simon Willison’s Weblog: mlx-vlm

    Source URL: https://simonwillison.net/2024/Sep/29/mlx-vlm/#atom-everything Source: Simon Willison’s Weblog Title: mlx-vlm Feedly Summary: mlx-vlm The MLX ecosystem of libraries for running machine learning models on Apple Silicon continues to expand. Prince Canuma is actively developing this library for running vision models such as Qwen-2 VL and Pixtral and LLaVA using Python running on a Mac. I used…

  • Simon Willison’s Weblog: Gemini Bounding Box Visualization

    Source URL: https://simonwillison.net/2024/Aug/26/gemini-bounding-box-visualization/#atom-everything Source: Simon Willison’s Weblog Title: Gemini Bounding Box Visualization Feedly Summary: Gemini Bounding Box Visualization Here’s another fun tool I built with the help of Claude 3.5 Sonnet. I was browsing through Google’s Gemini documentation while researching how different multi-model LLM APIs work when I stumbled across this note in the vision…