Ericsson’s AI-Native Frontier: Redefining Voice Integrity and Human-Machine Interaction at Carrier Scale

Research Finder

Find by Keyword

Ericsson’s AI-Native Frontier: Redefining Voice Integrity and Human-Machine Interaction at Carrier Scale

Ericsson seeks to spearhead a paradigm shift in telecommunications by integrating low-latency AI models and advanced security frameworks directly into the network core, transforming traditional voice calls into a programmable, high-trust platform that neutralizes deepfake threats while unlocking new monetization opportunities for global service providers.

Key Highlights:

Ericsson is pioneering a sovereign trust layer by integrating low-latency AI and identity verification directly into the telecom network core to rescue the voice channel from deepfake fraud.
Strategic investments in Cartesia and Hiya can provide a competitive edge, enabling sub-100ms AI response times and carrier-grade security that OTT providers cannot match.
The IMS data channel transforms phone calls from static audio streams into programmable, interactive platforms capable of hosting photorealistic avatars and real-time translation without external apps.
Network-native AI assistants serve as a digital concierge, gatekeeping calls by conversationally challenging suspicious dialers and summarizing legitimate interactions with 99.999% reliability.
Carrier success in 2026 relies on a dual-strategy of deploying immersive AI-native experiences alongside hardened security, reclaiming value from tech platforms by making the network the arbiter of trust.

The News:

Ericsson is pivoting from incremental voice improvements to a paradigm shift by positioning the global telecom network as the prominent platform for generative AI-driven voice interfaces. By integrating low-latency AI models and the IMS data channel directly into the network, the company enables carrier-grade services such as real-time translation and AI avatar agents while simultaneously deploying advanced security to combat the rising threat of deepfake fraud. For more information, read the Ericsson blog by Vik Li, VP, Investment, Mariarosaria Romano, Marketing Manager for Regulatory Solutions, and Sven Gemski, Strategic Product Manager, IMS.

Analyst Take:

Ericsson Ventures has spent recent years exploring the audio frontier, driven by the core thesis that voice will become the dominant human-machine interface with the global telecom network serving as its primary platform. In early 2025, the firm acted on this conviction by investing in Cartesia, a company that differentiates itself from standard diffusion or transformer-based architectures. By pioneering the State Space Model architecture from first principles, we find that Cartesia’s team has created a breakthrough that aims to offer the lowest latency and most cost-effective AI model inference available across cloud, on-premises, and device environments.

Alongside this strategic investment, Ericsson is integrating AI capabilities into IMS voice calling for approximately one billion global subscribers. While some advanced use cases are currently limited by device support for the data channel mechanism, Ericsson is already partnering with AI companies to launch early services that improve subscriber experiences and enable monetization for communication service providers. Moreover, the company is actively engaging the wider industry ecosystem to accelerate the adoption of the data channel, establishing the necessary foundation to eventually realize the full potential of AI-native voice technology.

From our viewpoint, the IMS data channel can represent a leap in telecommunications, evolving the standard 3GPP framework from a simple audio pipe into a high-performance, synchronized data highway. Unlike traditional VoLTE or VoNR, which treat voice as an isolated stream, this technology weaves a low-latency data path directly into the call session. This architectural shift is significant because it enables real-time AI augmentation, such as live translation or emotional analysis, to occur natively within the network. This removes the friction of the app economy, as users no longer need to download third-party software to access advanced features; the intelligence is inherent to the dialer itself.

By merging the rigorous reliability of carrier-grade Quality of Service (QoS) with the flexibility of modern web technologies, the IMS data channel transforms a phone call from a passive experience into a programmable, interactive platform. This opens the door for a sophisticated suite of services, including intelligent call orchestration, live captioning, and AI-driven screening, all functioning with deterministic latency that over-the-top (OTT) apps struggle to match. While AI calling can technically exist without this specific channel, its implementation provides the necessary infrastructure for intuitive visual menus and service activation, modernizing the voice experience to meet the interactive expectations of a generative AI era.

The Rise of Presence-Based Calling: Orchestrating AI and Empathy at the Network Edge

The integration of photorealistic AI avatars into the IMS call session represents a shift from voice-only support to a presence-based service model. By synthesizing Tavus’s visual realism with Cartesia’s near-instantaneous speech, enterprises are humanizing automated interactions without the friction of a video-conferencing link. The strategic advantage here lies in the biometric feedback loop; if a user opts to share their camera, the AI can decode micro-expressions and body language to adjust its tone and empathy levels in real time.

This creates a high-trust environment that emulates face-to-face retail, enabling brands to scale personal connection with the efficiency of a machine. While this can function on legacy systems, the eventual adoption of the data channel will likely turn these avatars from static talking heads into interactive agents capable of sharing visual documents or navigating menus mid-call.

Moving real-time translation and live captioning into the mobile core addresses one of the most persistent barriers in global business and accessibility: the latency tax. By hosting these translation models adjacent to the network hardware, Communication Service Providers (CSPs) eliminate the processing delays that typically hinders OTT translation apps. This transition suggests a future where language is no longer a static feature of the user, but a dynamic, real-time layer of the network itself.

We find that this capability does more than just bridge a linguistic gap; it democratizes complex global communication by removing the need for high-end device processing, ensuring that even a basic smartphone can facilitate a fluid, bilingual conversation with professional-grade accuracy and speed.

The Trust Differentiator: How Network-Native AI is Rescuing the Voice Channel from the Deepfake Era

Hiya, an Ericsson Ventures portfolio company, has provided critical data in its State of the Call 2026 report that highlights a breakdown in the social contract of telecommunications due to the weaponization of generative AI. The research reveals that deepfake technology has moved from a theoretical concern to a daily reality, with one-third of global consumers already encountering synthetic voice scams. This escalation is not just a nuisance; it is an existential threat to the voice channel, as evidenced by the 86% of unknown calls that now go unanswered. This behavioral shift suggests that the primary damage of AI-driven fraud is the silencing of the network, where legitimate communication is discarded alongside the malicious.

The analysis of this data points to a new era of security-driven churn, where a subscriber's loyalty is directly tied to their sense of digital safety. With 38% of users willing to switch providers over inadequate scam protection, the role of the CSP is shifting from a utility provider to a necessary guardian of identity. In 2026, the competitive edge for a network operator is no longer just about coverage or speed; it is about the ability to rebuild a trust layer that can filter synthetic deception, ensuring that when a phone rings, the recipient feels safe enough to answer.

The escalation of AI-driven fraud has forced a pivot in telecommunications where network-level security is no longer a premium add-on, but a fundamental requirement for survival. By integrating Hiya’s Call Qualification directly into the Ericsson IMS, carriers can provide a native defense shield that operates in real-time across all devices. This moves the battleground from the handset to the network core; rather than relying on a user to identify a scam, the network proactively labels or blocks threats. For enterprises, this shift is equally vital. With 86% of unknown calls going unanswered, the Branded Call service becomes a critical operational tool rather than a mere marketing luxury. It restores the identity layer of a phone call, ensuring that legitimate business reaches the consumer through a verified, high-trust visual handshake.

The most transformative application of this technology is the emergence of the network-native personal AI voice assistant. Unlike fragmented third-party apps, these assistants live within the IMS call session, allowing them to act as a sophisticated digital concierge that challenges suspicious callers and intercepts deepfake impersonations before they reach the subscriber.

This represents a paradigm shift in productivity and privacy: the AI can transcribe, summarize, and gatekeep conversations with the carrier-grade reliability of five nines (99.999%) uptime. By embedding these capabilities directly into the infrastructure, CSPs can offer premium tiers that provide a level of security and deterministic latency that OTT applications simply cannot replicate.

From our perspective, this evolution signals a reclaiming of value by network operators. For decades, CSPs provided the dumb pipes while tech platforms captured the value of the app economy; however, AI voice changes the math. Because high-fidelity, real-time voice interaction is hypersensitive to latency, the structural advantage shifts back to the network edge. By hosting voice models natively, CSPs eliminate the latency tax and transform the network from a commodity into a programmable, high-integrity platform. In the race to dominate the human-machine interface, the entity that controls the network now controls the most essential element of the experience: trust.

The Battle for the Dial Tone: Mapping the Global Rivalries in AI-Native Telecommunications

Within the specialized landscape of AI-native voice and IMS-integrated telecommunications, Ericsson faces a multi-front competitive challenge from traditional networking giants, cloud hyperscalers, and specialized players. Its most direct hardware rivals include Nokia, which is leveraging its AVA AI suite to embed generative capabilities into consumer voice services, and Huawei, which continues to pioneer AI-edge integration through its proprietary Pangu models. Additionally, Samsung Electronics poses a distinct threat by using its position in the handset market to create an end-to-end AI ecosystem that bridges the gap between Galaxy device intelligence and 5G network infrastructure.

The competitive arena is further complicated by cloud hyperscalers such as Microsoft, AWS, and Google, who are attempting to bypass traditional hardware by offering AI voice as a service. Microsoft’s Azure for Operators uses cloud-native IMS solutions to run OpenAI-driven agents, while AWS uses its Wavelength edge zones to host high-speed voice processing. Google remains a threat by integrating its Gemini models and translation technology directly into carrier workflows, shifting the value of the call from the network pipe to the cloud platform.

Beyond the infrastructure layer, Ericsson and its partner Hiya contend with specialized software platforms that focus on the trust and intelligence layers of communication. Companies such as First Orion and Neustar are established in branded calling and identity authentication, providing critical verification services to major carriers. Alianza provides its cloud-native Intelligent Communications Fabric that enables service providers to integrate AI-driven security and identity verification directly into the network core, rivaling Hiya’s branded calling and fraud protection ecosystem. Meanwhile, firms such as AudioCodes act as critical gateways, bridging the gap between legacy voice systems and the newer AI-native applications that Ericsson is striving to standardize.

Finally, the underlying engine of this voice evolution is being contested by direct AI model developers. While Ericsson collaborates with Cartesia, the company must compete with the high-fidelity cloning and widespread API adoption of ElevenLabs, which has made inroads among third-party applications. OpenAI’s native voice capabilities present an OTT challenge; as these models become faster and more intuitive, Ericsson’s primary defense is its ability to offer deterministic latency and carrier-grade reliability, features that standalone web-based models struggle to guarantee at scale.

Looking Ahead

We believe that Ericsson's investments in Cartesia and Hiya can prove a competitive success by establishing a trust and performance framework that traditional cloud-based AI providers cannot easily replicate. By integrating Cartesia’s State Space Model architecture, Ericsson eliminates the latency tax of traditional transformers, enabling the sub-100ms response times necessary for AI-native voice to feel as fluid as a human conversation. Moreover, the partnership with Hiya provides a hardened network-level security layer that restores subscriber trust by proactively neutralizing deepfake fraud, transforming the network from a passive utility into an indispensable, high-integrity platform for the generative AI era.

In the landscape of 2026, market success belongs to CSPs that simultaneously deploy immersive AI-native experiences and hardened, AI-driven security protocols. By merging high-fidelity interaction with network-level identity verification, these operators are constructing a"sovereign trust layer that transforms voice from a vulnerable legacy service into a secure, high-value platform. This dual-strategy not only mitigates the existential threat of synthetic fraud but also reclaims the monetization edge from OTT players by making the network itself the indispensable arbiter of authentic human connection.

Author Information

Ron Westfall | Analyst In Residence

Ron Westfall is a prominent analyst figure in technology and business transformation. Recognized as a Top 20 Analyst by AR Insights and a Tech Target contributor, his insights are featured in major media such as CNBC, Schwab Network, and NMG Media.

His expertise covers transformative fields such as Hybrid Cloud, AI Networking, Security Infrastructure, Edge Cloud Computing, Wireline/Wireless Connectivity, and 5G-IoT. Ron bridges the gap between C-suite strategic goals and the practical needs of end users and partners, driving technology ROI for leading organizations

Ericsson’s AI-Native Frontier: Redefining Voice Integrity and Human-Machine Interaction at Carrier Scale

Research Finder

Find by Keyword

Ericsson’s AI-Native Frontier: Redefining Voice Integrity and Human-Machine Interaction at Carrier Scale

Key Highlights:

The News:

Analyst Take:

Looking Ahead

Ron Westfall | Analyst In Residence

Share

Like this:

@ Copyright 2026 HyperFrame Research

Ericsson’s AI-Native Frontier: Redefining Voice Integrity and Human-Machine Interaction at Carrier Scale

Research Finder

Find by Keyword

Ericsson’s AI-Native Frontier: Redefining Voice Integrity and Human-Machine Interaction at Carrier Scale

Key Highlights:

The News:

Analyst Take:

Looking Ahead

Ron Westfall | Analyst In Residence

Share

Share this:

Like this:

@ Copyright 2026 HyperFrame Research

Discover more from HyperFRAME Research