Tag: evaluator

  • Cloud Blog: Supervised Fine Tuning for Gemini: A best practices guide

    Source URL: https://cloud.google.com/blog/products/ai-machine-learning/master-gemini-sft/ Source: Cloud Blog Title: Supervised Fine Tuning for Gemini: A best practices guide Feedly Summary: Foundation models such as Gemini have revolutionized how we work, but sometimes they need guidance to excel at specific business tasks. Perhaps their answers are too long, or their summaries miss the mark. That’s where supervised fine-tuning…

  • Unit 42: Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability

    Source URL: https://unit42.paloaltonetworks.com/?p=138017 Source: Unit 42 Title: Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability Feedly Summary: The jailbreak technique “Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails. The post Bad Likert Judge: A Novel Multi-Turn Technique to…

  • AWS News Blog: New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

    Source URL: https://aws.amazon.com/blogs/aws/new-rag-evaluation-and-llm-as-a-judge-capabilities-in-amazon-bedrock/ Source: AWS News Blog Title: New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock Feedly Summary: Evaluate AI models and applications efficiently with Amazon Bedrock’s new LLM-as-a-judge capability for model evaluation and RAG evaluation for Knowledge Bases, offering a variety of quality and responsible AI metrics at scale. AI Summary and Description:…

  • AWS News Blog: New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock

    Source URL: https://aws.amazon.com/blogs/aws/new-rag-evaluation-and-llm-as-a-judge-capabilities-in-amazon-bedrock/ Source: AWS News Blog Title: New RAG evaluation and LLM-as-a-judge capabilities in Amazon Bedrock Feedly Summary: Evaluate AI models and applications efficiently with Amazon Bedrock’s new LLM-as-a-judge capability for model evaluation and RAG evaluation for Knowledge Bases, offering a variety of quality and responsible AI metrics at scale. AI Summary and Description:…

  • Simon Willison’s Weblog: OpenAI O3 breakthrough high score on ARC-AGI-PUB

    Source URL: https://simonwillison.net/2024/Dec/20/openai-o3-breakthrough/#atom-everything Source: Simon Willison’s Weblog Title: OpenAI O3 breakthrough high score on ARC-AGI-PUB Feedly Summary: OpenAI O3 breakthrough high score on ARC-AGI-PUB François Chollet is the co-founder of the ARC Prize and had advanced access to today’s o3 results. His article here is the most insightful coverage I’ve seen of o3, going beyond…

  • Hacker News: Building Effective "Agents"

    Source URL: https://www.anthropic.com/research/building-effective-agents Source: Hacker News Title: Building Effective "Agents" Feedly Summary: Comments AI Summary and Description: Yes Summary: The text provides insights into building effective large language model (LLM) agents, emphasizing simplicity over complexity in implementations. It categorizes agentic systems, detailing workflows and frameworks that can enhance LLM capabilities, and gives practical advice for…

  • Simon Willison’s Weblog: Building effective agents

    Source URL: https://simonwillison.net/2024/Dec/20/building-effective-agents/#atom-everything Source: Simon Willison’s Weblog Title: Building effective agents Feedly Summary: Building effective agents My principal complaint about the term “agents" is that while it has many different potential definitions most of the people who use it seem to assume that everyone else shares and understands the definition that they have chosen to…