Tag: modalities

  • Simon Willison’s Weblog: Introducing gpt-realtime

    Source URL: https://simonwillison.net/2025/Sep/1/introducing-gpt-realtime/#atom-everything Source: Simon Willison’s Weblog Title: Introducing gpt-realtime Feedly Summary: Introducing gpt-realtime Released a few days ago (August 28th), gpt-realtime is OpenAI’s new “most advanced speech-to-speech model". It looks like this is a replacement for the older gpt-4o-realtime-preview model that was released last October. This is a slightly confusing release. The previous realtime…

  • Cloud Blog: How to build a real-time voice agent with Gemini, Google ADK, and A2A protocol

    Source URL: https://cloud.google.com/blog/products/ai-machine-learning/build-a-real-time-voice-agent-with-gemini-adk/ Source: Cloud Blog Title: How to build a real-time voice agent with Gemini, Google ADK, and A2A protocol Feedly Summary: Building advanced conversational AI has moved well beyond text. Now, we can use AI to create real-time, voice-driven agents. However, these systems need low-latency, two-way communication, real-time information retrieval, and the ability…

  • Slashdot: Microsoft Says Voice Will Emerge as Primary Input for Next Windows

    Source URL: https://tech.slashdot.org/story/25/08/14/1441240/microsoft-says-voice-will-emerge-as-primary-input-for-next-windows?utm_source=rss1.0mainlinkanon&utm_medium=feed Source: Slashdot Title: Microsoft Says Voice Will Emerge as Primary Input for Next Windows Feedly Summary: AI Summary and Description: Yes Summary: The upcoming version of Windows will significantly evolve through the integration of AI technologies, specifically enhancing user interaction by making voice a primary input method. This transformation will leverage both…

  • AWS News Blog: Announcing Amazon Nova customization in Amazon SageMaker AI

    Source URL: https://aws.amazon.com/blogs/aws/announcing-amazon-nova-customization-in-amazon-sagemaker-ai/ Source: AWS News Blog Title: Announcing Amazon Nova customization in Amazon SageMaker AI Feedly Summary: AWS now enables extensive customization of Amazon Nova foundation models through SageMaker AI with techniques including continued pre-training, supervised fine-tuning, direct preference optimization, reinforcement learning from human feedback and model distillation to better address domain-specific requirements across…

  • Cloud Blog: Vertex AI Studio, redesigned: Your source for generative AI media models across all modalities

    Source URL: https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-studio-redesigned/ Source: Cloud Blog Title: Vertex AI Studio, redesigned: Your source for generative AI media models across all modalities Feedly Summary: Google Cloud’s Vertex AI platform makes it easy to experiment with and customize over 200 advanced foundation models – like the latest Google Gemini models, and third-party partner models such as Meta’s…

  • Simon Willison’s Weblog: Create and edit images with Gemini 2.0 in preview

    Source URL: https://simonwillison.net/2025/May/7/gemini-images-preview/#atom-everything Source: Simon Willison’s Weblog Title: Create and edit images with Gemini 2.0 in preview Feedly Summary: Create and edit images with Gemini 2.0 in preview Gemini 2.0 Flash has had image generation capabilities for a while now, and they’re now available via the paid Gemini API – at 3.9 cents per generated…