
For all the attention around chatbots, copilots, and agentic AI, the phone call remains one of the hardest places to put artificial intelligence to work well. That is partly because voice is still the channel customers turn to when an interaction urgent, emotionally charged, or too complex for a quick exchange over text-based channels. It’s also because real-time voice is far less forgiving than email, messaging, or web chat. In a live call, latency is noticeable, interruptions matter, and trust can erode in seconds.
In an effort to help fill the gap between voice and other channels, Sinch has introduced Voice Relay, an early access capability designed to connect text-based AI agents directly to live phone calls over Sinch’s enterprise voice infrastructure. Alongside Voice Relay, Sinch has also rolled out broader upgrades across its enterprise voice platform, including AI-ready voice infrastructure, enhanced branded calling protection, and expanded global network capabilities.
It’s more than yet another vendor adding AI to customer communications (though it is that, too), but it points to where the market seems to be headed. Enterprises are no longer asking whether they should deploy AI in customer engagement, but how to make AI work in the channels that matter most when a situation is complex, time-sensitive, or high stakes. Despite the use of digital channels and predictions that those channels would make voice less relevant, this trend puts voice back squarely at the center of the discussion.
McKinsey noted last year that voice remains the dominant live interaction channel, even as AI has advanced more quickly in asynchronous channels like chat and email, where latency is less visible and the interaction model is more forgiving. Metrigy, meanwhile, said its 2025-26 research found that “voice isn’t dying – it’s thriving.”
The real challenge is not the AI model
Sinch’s announcement gets at an issue that is becoming more apparent: For voice automation, the model is only part of the equation and the harder challenge is operational. A company may have a strong large language model strategy, but turning that into a live, usable phone interaction requires orchestration across speech recognition, text processing, voice synthesis, interruption handling, call routing, telecom connectivity, and quality management – all while adding minimal delay.
That is what Voice Relay is intended to deliver. According to Sinch, developers can connect an AI agent to live voice calls through a relatively simple interface, while Sinch manages the real-time conversational loop, including speech recognition, voice synthesis, interruption handling, and the underlying voice network integration. The pitch is not that Sinch is replacing the AI model layer, but that enterprises should not have to assemble and maintain the full stack required to bring the model into production over the telephone network.
Over the past two years, enterprise AI conversations have increasingly shifted from model selection to deployment architecture. In that sense, Voice Relay fits into a wider industry pattern that sees infrastructure providers trying to become the connective tissue between AI applications and the enterprise communications environment.
Sinch is also taking on the demand for flexibility. Customers want the freedom to choose which AI models power their agents, without being locked into a single proprietary intelligence layer.
“Enterprises want the freedom to choose the AI models that power their agents. Voice Relay provides the infrastructure that connects those agents to the global voice network, delivering the real-time media, reliability and control required to run AI-powered voice interactions in production.” — Daniel Morris, Chief Product Officer, Sinch.
Voice AI only scales if it is trusted
Still, making AI agents audible is only one part of the story. Making them trustworthy is the harder and more important task.
To accommodate that need, Sinch paired its Voice Relay launch with new support for branded calling protection and broader voice security capabilities. The issue is timely, since the FCC has continued to focus on caller authentication and call branding, noting that STIR/SHAKEN helps validate whether caller ID information has been spoofed, but does not by itself tell consumers who is calling. In other words, a technically valid call is not automatically a trusted one – and that’s an important distinction.
That challenge becomes even more complicated as AI-generated voice enters the mainstream. The FBI warned last year that malicious actors are using AI-generated voice messages as part of impersonation campaigns targeting senior U.S. officials and their contacts. Pindrop’s 2025 Voice Intelligence and Security Report went further, estimating $12.5 billion in fraud losses in 2024 and noting a steep rise in deepfake-related fraud activity affecting voice channels.
For the enterprise, that means voice AI cannot be treated casually as an add-on. It must be deployed inside a framework that includes identity, verification, compliance, escalation paths, and strong governance that guides when AI should speak on its own and when a human should take over. In regulated industries, in particular, that logic may be as important as the conversational experience itself.
In that context, it’s easy to view Sinch’s Voice Relay as being less about giving AI agents a voice than about lowering the barrier to production-grade voice orchestration. In other words, it’s not about the idea that AI belongs on every phone call or that AI can talk, but that enterprises need help understanding and determining when it should and how to maximize trust when it does.
Edited by
Erik Linask