The Cloudflare Blog: State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI

Aug 27, 2025

—

Source URL: https://blog.cloudflare.com/workers-ai-partner-models/
Source: The Cloudflare Blog
Title: State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI

Feedly Summary: We’re expanding Workers AI with new partner models from Leonardo.Ai and Deepgram. Start using state-of-the-art image generation models from Leonardo and real-time TTS and STT models from Deepgram.

AI Summary and Description: Yes

Summary: The text discusses the expansion of Cloudflare’s Workers AI platform, emphasizing the introduction of new generative AI models from partners Leonardo.Ai and Deepgram. It highlights the capabilities of low-latency image and voice processing, showcasing the integration of various AI models with Cloudflare’s infrastructure designed for rapid inference and support for developers building AI applications.

Detailed Description:
The content reveals Cloudflare’s strategic enhancements to its Workers AI platform, focusing on generative models to cater to specific use cases such as image generation and voice interaction. Here are the significant points:

– **Infrastructure Enhancements**:
– Cloudflare built its platform on a hypothesis that AI models would grow both faster and smaller, integrating specialized GPUs in data centers globally for efficient inference services.

– **New Partnerships**:
– *Leonardo.Ai*:
– Offers generative AI models, particularly suited for low-latency image generation.
– Introduces two models:
– **Phoenix 1.0**: Excelled in text rendering and prompt coherence, generating a 1024×1024 image in under 5 seconds.
– **Lucid Origin**: Focused on photorealistic image generation, achieving a similar generation time.

– *Deepgram*:
– Develops voice AI models allowing for natural voice interaction with AI, showcasing higher bandwidth communication than text.
– The platform utilizes models for fast speech-to-text and text-to-speech operations, aimed at building low-latency voice agents on Cloudflare’s infrastructure.

– **Developer Tools**:
– By leveraging Workers AI, developers can integrate these AI models into broader applications effectively. For example:
– Use Workers to host application logic alongside AI for image or voice generation, utilizing additional services like R2 for storage.
– WebRTC and WebSocket support enhance real-time interactions for voice agents.

– **Example Implementations**:
– The text includes sample code (via `curl` commands) for integrating these generative models with Cloudflare’s REST API.
– It discusses improvements in how audio data is processed and transmitted, streamlining the development workflow.

– **Expanding Use Cases**:
– Cloudflare emphasizes its unique advantages positioned in developer tools to stimulate creative solutions using generative AI, establishing a foundation for future model partnerships.

The announcement signals a notable step in providing robust AI capabilities on a scalable cloud platform, addressing the growing demand for low-latency applications in image and voice processing, which is crucial for developers in the AI field.

1 10 2 24 4 5 a Act actions age agent agents AGI AI AI applications AI capabilities ai model AI models All and API app Application applications art as at audio bandwidth Bi building built by C capabilities cell centers CI CIA Cloud cloud platform Cloudflare co code cohere coherence command communication content Curl D data data center data centers de deep Deepgram demand design developer Developer Tools developers development development work development workflow e effective efficient end Excel exp Expansion fast faster focused for future g Gen generation generative Generative AI generative AI models generative model Generative Models Global GPU GPUs gs H high Highlight http HTTPS image image generation implementation in Inference inference services infrastructure infrastructure design infrastructure enhancements integration inter interaction interactions io ite k l latency latency applications led Leonardo.Ai Li logic long low M man Mila ML Mode model models N new Nix no o oE of off on one ons operation operations OPM ops oS partners partnership partnerships per Phoenix platform point pro process processing prompt ps Q R R2 rag rate RCE re real real-time rendering Ro row s s Position sam scalable sec service services side Sig Signal Sim size sizes small SoC Socket solutions source specialized specific Speech speech-to-text SSE STAR start state storage strategic support T ted text text-to-speech the Time time Interaction to tool tools Tor TP tts two UI under up US use use cases V Vantage voice voice agents voice generation voice interaction web WebRTC websocket Wi workers workflow x z