Simon Willison’s Weblog: Gemini 2.0 Flash "Thinking mode"

Source URL: https://simonwillison.net/2024/Dec/19/gemini-thinking-mode/#atom-everything
Source: Simon Willison’s Weblog
Title: Gemini 2.0 Flash "Thinking mode"

Feedly Summary: Those new model releases just keep on flowing. Today it’s Google’s snappily named gemini-2.0-flash-thinking-exp, their first entrant into the o1-style inference scaling class of models. I posted about a great essay about the significance of these just this morning.
From the Gemini model documentation:

Gemini 2.0 Flash Thinking Mode is an experimental model that’s trained to generate the “thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stronger reasoning capabilities in its responses than the base Gemini 2.0 Flash model.

I just shipped llm-gemini 0.8 with support for the model. You can try it out using LLM like this:
llm install -U llm-gemini
# If you haven’t yet set a gemini key:
llm keys set gemini
# Paste key here

llm -m gemini-2.0-flash-thinking-exp-1219 "solve a harder variant of that goat lettuce wolf river puzzle"
It’s a very talkative model – 2,277 output tokens answering that prompt.
A more interesting example
The best source of example prompts I’ve found so far is the Gemini 2.0 Flash Thinking cookbook – a Jupyter notebook full of demonstrations of what the model can do.
My favorite so far is this one:

What’s the area of the overlapping region?

This model is multi-modal!
Here’s how to run that example using llm-gemini:
llm -m gemini-2.0-flash-thinking-exp-1219 \
-a https://storage.googleapis.com/generativeai-downloads/images/geometry.png \
"What’s the area of the overlapping region?"
Here’s the full response, complete with MathML working. The eventual conclusion:

The final answer is 9π/4

That’s the same answer as Google provided in their example notebook, so I’m presuming it’s correct. Impressive!
Tags: google, ai, generative-ai, llms, llm, gemini, o1, inference-scaling

AI Summary and Description: Yes

Summary: The text discusses the release of Google’s Gemini 2.0 Flash Thinking Mode, an advanced AI model designed to enhance reasoning capabilities through its unique approach. This has significant implications for AI security and the potential risks associated with deploying more capable generative AI technologies.

Detailed Description:
The content focuses on Google’s latest generative AI model, Gemini 2.0 Flash Thinking Mode, and its enhancements over previous iterations. Here are the major points:

– **Model Introduction**: Google’s Gemini 2.0 Flash Thinking Mode represents an evolution in AI models, particularly in the realm of scaling inference capabilities.
– **Enhanced Reasoning**: The model is noted for generating the thinking process behind its responses, allowing for better reasoning compared to the base model. This suggests advancements in AI interpretability and reliability, which are crucial for security applications.
– **Practical Use Cases**: The user shares how to install and set up the model using the `llm` command, demonstrating easy accessibility for users interested in testing the AI’s capabilities.
– **Multi-modal Abilities**: The model is described as multi-modal, meaning it can process and generate responses based on different types of inputs, including both text and images.
– **Example Demonstration**: A practical example is provided where the model calculates the area of an overlapping region through a Geometry image input, illustrating its computational and reasoning capabilities.

Key Insights:
– **Security Implications**: As AI systems become more capable and integrated into various applications, the potential risks related to AI security and the need for robust oversight increase. Understanding how models like Gemini function can help professionals anticipate vulnerabilities.
– **User Engagement**: By enabling users to engage with advanced models easily, policies around security practices and usage protocols become crucial to ensure that these tools are used responsibly and securely.

Overall, the release of such models represents progress in AI technology, but also necessitates ongoing discussions about security frameworks, governance, and compliance related to the deployment of generative AI systems.