The Cloudflare Blog: State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI

Source URL: https://blog.cloudflare.com/workers-ai-partner-models/
Source: The Cloudflare Blog
Title: State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI

Feedly Summary: We’re expanding Workers AI with new partner models from Leonardo.Ai and Deepgram. Start using state-of-the-art image generation models from Leonardo and real-time TTS and STT models from Deepgram.

AI Summary and Description: Yes

Summary: The text discusses the expansion of Cloudflare’s Workers AI platform, emphasizing the introduction of new generative AI models from partners Leonardo.Ai and Deepgram. It highlights the capabilities of low-latency image and voice processing, showcasing the integration of various AI models with Cloudflare’s infrastructure designed for rapid inference and support for developers building AI applications.

Detailed Description:
The content reveals Cloudflare’s strategic enhancements to its Workers AI platform, focusing on generative models to cater to specific use cases such as image generation and voice interaction. Here are the significant points:

– **Infrastructure Enhancements**:
– Cloudflare built its platform on a hypothesis that AI models would grow both faster and smaller, integrating specialized GPUs in data centers globally for efficient inference services.

– **New Partnerships**:
– *Leonardo.Ai*:
– Offers generative AI models, particularly suited for low-latency image generation.
– Introduces two models:
– **Phoenix 1.0**: Excelled in text rendering and prompt coherence, generating a 1024×1024 image in under 5 seconds.
– **Lucid Origin**: Focused on photorealistic image generation, achieving a similar generation time.

– *Deepgram*:
– Develops voice AI models allowing for natural voice interaction with AI, showcasing higher bandwidth communication than text.
– The platform utilizes models for fast speech-to-text and text-to-speech operations, aimed at building low-latency voice agents on Cloudflare’s infrastructure.

– **Developer Tools**:
– By leveraging Workers AI, developers can integrate these AI models into broader applications effectively. For example:
– Use Workers to host application logic alongside AI for image or voice generation, utilizing additional services like R2 for storage.
– WebRTC and WebSocket support enhance real-time interactions for voice agents.

– **Example Implementations**:
– The text includes sample code (via `curl` commands) for integrating these generative models with Cloudflare’s REST API.
– It discusses improvements in how audio data is processed and transmitted, streamlining the development workflow.

– **Expanding Use Cases**:
– Cloudflare emphasizes its unique advantages positioned in developer tools to stimulate creative solutions using generative AI, establishing a foundation for future model partnerships.

The announcement signals a notable step in providing robust AI capabilities on a scalable cloud platform, addressing the growing demand for low-latency applications in image and voice processing, which is crucial for developers in the AI field.