Simon Willison’s Weblog: Nous Hermes 3

Source URL: https://simonwillison.net/2024/Nov/4/nous-hermes-3/#atom-everything
Source: Simon Willison’s Weblog
Title: Nous Hermes 3

Feedly Summary: Nous Hermes 3
The Nous Hermes family of fine-tuned models have a solid reputation. Their most recent release came out in August, based on Meta’s Llama 3.1:

Our training data aggressively encourages the model to follow the system and instruction prompts exactly and in an adaptive manner. Hermes 3 was created by fine-tuning Llama 3.1 8B, 70B and 405B, and training on a dataset of primarily synthetically generated responses. The model boasts comparable and superior performance to Llama 3.1 while unlocking deeper capabilities in reasoning and creativity.

The model weights are on Hugging Face, including GGUF versions of the 70B and 8B models. Here’s how to try the 8B model (a 4.58GB download) using the llm-gguf plugin:
llm install llm-gguf
llm gguf download-model ‘https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF/resolve/main/Hermes-3-Llama-3.1-8B.Q4_K_M.gguf’ -a Hermes-3-Llama-3.1-8B
llm -m Hermes-3-Llama-3.1-8B ‘hello in spanish’

Nous Research partnered with Lambda Labs to provide inference APIs. It turns out Lambda host quite a few models now, currently providing free inference to users with an API key.
I just released the first alpha of a llm-lambda-labs plugin. You can use that to try the larger 405b model (very hard to run on a consumer device) like this:
llm install llm-lambda-labs
llm keys set lambdalabs
# Paste key here
llm -m lambdalabs/hermes3-405b ‘short poem about a pelican with a twist’

Here’s the source code for the new plugin, which I based on llm-mistral. The plugin uses httpx-sse to consume the stream of tokens from the API.
Tags: llm, generative-ai, llama, ai, edge-llms, llms, meta, projects, nous-research

AI Summary and Description: Yes

Summary: The text discusses the Nous Hermes 3 family of fine-tuned models, highlighting their capabilities and implications for professionals involved in AI, particularly in terms of model development and infrastructure utilization. The integration with Lambda Labs for providing inference APIs presents notable advancements in practical AI application deployment.

Detailed Description:
– The Nous Hermes models are built upon Meta’s Llama 3.1 architecture, representing a significant development in generative AI.
– Key Features:
– The models benefit from aggressive fine-tuning on a dataset consisting primarily of synthetic responses, enhancing their adaptability to various input prompts.
– They demonstrate comparable and superior performance to Llama 3.1, especially in reasoning and creative tasks.

– **Model Availability**:
– The model weights for the 8B and 70B versions are accessible on Hugging Face, which makes them easier for developers to implement and experiment with, fostering an ecosystem around the model’s functionality.

– **Installation and Usage**:
– Specific commands are provided for downloading and utilizing the models via the `llm-gguf` and `llm-lambda-labs` plugins, which indicates an emphasis on developer usability and integration into existing AI workflows.

– **Inference APIs**:
– The partnership with Lambda Labs allows users to access inference APIs, which broadens the deployment capabilities of the models significantly. This is particularly useful for applications requiring powerful models without needing local compute resources.

– **Research and Development**:
– The text mentions the ongoing development, including the release of a plugin for interaction with these models, which can lead to more robust experimentation and application development in generative AI.

Overall, the development and deployment of these models signal advancements in AI capabilities, offering significant potential for innovations in various applications and industries. Security and compliance professionals should note the implications of hosting models remotely via APIs and the considerations that come with such integrations (e.g., data protection, access controls).