Tag: vision-llms
-
Simon Willison’s Weblog: Running prompts against images and PDFs with Google Gemini
Source URL: https://simonwillison.net/2024/Oct/23/prompt-gemini/#atom-everything Source: Simon Willison’s Weblog Title: Running prompts against images and PDFs with Google Gemini Feedly Summary: Running prompts against images and PDFs with Google Gemini New TIL. I’ve been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) –…
-
Simon Willison’s Weblog: mlx-vlm
Source URL: https://simonwillison.net/2024/Sep/29/mlx-vlm/#atom-everything Source: Simon Willison’s Weblog Title: mlx-vlm Feedly Summary: mlx-vlm The MLX ecosystem of libraries for running machine learning models on Apple Silicon continues to expand. Prince Canuma is actively developing this library for running vision models such as Qwen-2 VL and Pixtral and LLaVA using Python running on a Mac. I used…
-
Simon Willison’s Weblog: Gemini Bounding Box Visualization
Source URL: https://simonwillison.net/2024/Aug/26/gemini-bounding-box-visualization/#atom-everything Source: Simon Willison’s Weblog Title: Gemini Bounding Box Visualization Feedly Summary: Gemini Bounding Box Visualization Here’s another fun tool I built with the help of Claude 3.5 Sonnet. I was browsing through Google’s Gemini documentation while researching how different multi-model LLM APIs work when I stumbled across this note in the vision…