The Cloudflare Blog: Cloudflare is the best place to build realtime voice agents

Aug 29, 2025

—

Source URL: https://blog.cloudflare.com/cloudflare-realtime-voice-ai/
Source: The Cloudflare Blog
Title: Cloudflare is the best place to build realtime voice agents

Feedly Summary: Today, we’re excited to announce new capabilities that make it easier than ever to build real-time, voice-enabled AI applications on Cloudflare’s global network.

AI Summary and Description: Yes

Summary: The provided text discusses innovative advancements in real-time voice AI applications powered by Cloudflare’s services. It introduces Cloudflare Realtime Agents, a runtime for orchestrating complex AI pipelines that enhance the speed and efficiency of conversational interfaces through low-latency solutions, making voice interaction more natural.

Detailed Description: This announcement highlights Cloudflare’s efforts to facilitate the development of real-time voice AI applications, offering tools that simplify complex interactions and reduce latency in processing. Key points include:

* **Introduction of Cloudflare Realtime Agents**:
– A runtime designed to streamline the orchestration of voice AI applications.
– Aimed at reducing the complexity of managing AI services by providing composable building blocks to developers.

* **Operations of Realtime Agents**:
– WebRTC connections streamline audio transmission to the nearest Cloudflare location.
– AI pipelines that handle various processing stages: speech-to-text, LLM (Large Language Model) inference, and text-to-speech.

* **Key Features & Benefits**:
– **Low Latency**: Critical for natural conversation (deadlines under 800 ms), managed through optimized infrastructure choices.
– **Flexibility**: Supports multiple AI providers and allows personalized configurations to meet specific application needs.
– **Integration with Various Models**: Enables the use of models from OpenAI and others, allowing developers to customize the AI experience freely.

* **Technical Innovations**:
– **WebRTC**: Facilitates real-time audio streaming, with advantages like reduced latency through a UDP connection.
– **WebSockets Support**: Allows persistent connections for real-time AI interactions using low-latency protocols.

* **Real-World Applications**:
– Live transcriptions, complex AI voice interactive applications, and seamless audio processing in global networks.

* **Deepgram Integration**: Bringing advanced speech-to-text capabilities directly to the edge with lower latency due to proximity, thereby enhancing user experience.

* **Call to Action**: Encourages developers to leverage these new tools and engage in the open beta phase to experiment and implement their AI solutions effectively.

This evolving landscape for real-time voice applications not only empowers developers with new technology but also represents a significant shift towards natural, conversational interfaces facilitated by AI advancements. Security and compliance professionals must ensure that these tools also meet required standards in data protection and user privacy as they become integral parts of real-time communication solutions.

800 a Act actions advanced advancement advancements age agent agents AGI AI AI advancements AI applications AI interactions All alt and app Application applications art as at ated audio audio processing benefits Best Bi building by C capabilities CI Cloud Cloudflare co Col communication communication solutions complexity compliance compliance professionals Configuration configurations conversation conversational conversational interfaces critical custom D data Data Protection day de deep Deepgram design developer developers development e edge effective efficiency exp experience face feature features flexibility for free g Gen glob Global global network global networks H high Highlight HR http HTTPS in Inference infrastructure innovation Innovations integration inter interaction interactions interactive applications interface Interfaces io ite k Key l land language language model large large language model latency led Li line llm lm low low latency low-latency solutions M making man mission ML Mode model models multi N needs network networks new no o of off on only ons open openai operation operations OPM opt optimized orchestrating orchestration oS other per persistent connections Pipeline pipelines point Power powered pre privacy pro process processing professionals protection protocol protocols ps Q R rag RCE re real real-time real-time communication real-world applications red Ro RoT RSA s Sable sec security security and compliance service services shift Sig Sim SoC Socket solutions source specific Speech speech-to-text speed SSE standards Streaming support T tech technical technical innovations technology ted text text-to-speech the Time time communication time voice agents to tool tools TP two UDP UI under up US use user user experience user privacy V Vantage voice voice agents voice interaction web WebRTC websocket WebSockets Wi world world application world applications x z