Tag: evaluations

  • Simon Willison’s Weblog: Building software on top of Large Language Models

    Source URL: https://simonwillison.net/2025/May/15/building-on-llms/#atom-everything Source: Simon Willison’s Weblog Title: Building software on top of Large Language Models Feedly Summary: I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they needed to get started writing code that…

  • Cloud Blog: Evaluate your gen media models with multimodal evaluation on Vertex AI

    Source URL: https://cloud.google.com/blog/products/ai-machine-learning/evaluate-your-gen-media-models-on-vertex-ai/ Source: Cloud Blog Title: Evaluate your gen media models with multimodal evaluation on Vertex AI Feedly Summary: The world of generative AI is moving fast, with models like Lyria, Imagen, and Veo now capable of producing stunningly realistic and imaginative images and videos from simple text prompts. However, evaluating these models is…

  • Simon Willison’s Weblog: Expanding on what we missed with sycophancy

    Source URL: https://simonwillison.net/2025/May/2/what-we-missed-with-sycophancy/ Source: Simon Willison’s Weblog Title: Expanding on what we missed with sycophancy Feedly Summary: Expanding on what we missed with sycophancy I criticized OpenAI’s initial post about their recent ChatGPT sycophancy rollback as being “relatively thin" so I’m delighted that they have followed it with a much more in-depth explanation of what…

  • AWS News Blog: Amazon Nova Premier: Our most capable model for complex tasks and teacher for model distillation

    Source URL: https://aws.amazon.com/blogs/aws/amazon-nova-premier-our-most-capable-model-for-complex-tasks-and-teacher-for-model-distillation/ Source: AWS News Blog Title: Amazon Nova Premier: Our most capable model for complex tasks and teacher for model distillation Feedly Summary: Nova Premier is designed to excel at complex tasks requiring deep context understanding, multistep planning, and coordination across tools and data sources. It has capabilities for processing text, images, and…

  • Simon Willison’s Weblog: Quoting Mark Zuckerberg

    Source URL: https://simonwillison.net/2025/May/1/mark-zuckerberg/#atom-everything Source: Simon Willison’s Weblog Title: Quoting Mark Zuckerberg Feedly Summary: You also mentioned the whole Chatbot Arena thing, which I think is interesting and points to the challenge around how you do benchmarking. How do you know what models are good for which things? One of the things we’ve generally tried to…