Research Notes

Can AI Agents Replace the On-Call Rotation for Good?

Research Finder

Find by Keyword

Can AI Agents Replace the On-Call Rotation for Good?

PagerDuty launches its Spring 2026 release featuring autonomous SRE agents and a multi-agent operational fabric to automate incident response.

03/13/2026

Key Highlights

  • The SRE Agent now functions as a virtual responder that sits directly on escalation policies and on-call schedules.
  • PagerDuty is positioning its platform as the coordination layer for a multi-agent ecosystem using the Model Context Protocol.
  • New integrations with Anthropic and Cursor aim to push operational data upstream into the developer environment.
  • Slack remains the primary interface for the incident lifecycle with new agentic workflows for triage and coordination.

The News

PagerDuty has unveiled its Spring 2026 release which focuses on transitioning the platform from a human-centric notification tool to an autonomous operations hub. The update introduces a virtual responder capability for its SRE Agent and deepens the integration of operational data into developer tools like GitHub Copilot and Anthropic’s Claude Code. These advancements are designed to allow AI agents to handle the initial stages of incident detection and diagnosis before a human is ever paged. Check out the announcements here.

Analyst Take

We have reached a point where the traditional definition of on-call is becoming obsolete. For years, the industry has struggled with the cognitive load placed on engineers who are woken up at 3:00 AM to deal with "noisy" alerts that often require routine fixes. PagerDuty is now betting that the solution is not just better noise reduction, but the introduction of a digital colleague that can actually do the work. The shift from a passive alerting system to an active participant in the response chain is a significant move in the DevOps market.

This strategy aligns with broader enterprise trends; according to HyperFRAME Research Lens data, 72% of organizations currently treat AI as a near-term performance lever for operational efficiency rather than a primary innovation driver. We see this as a necessary evolution as system complexity outpaces the ability of humans to monitor every microservice and dependency manually.

What Was Announced

The Spring 2026 release is architected to move PagerDuty toward what it calls autonomous operations. The centerpiece is the SRE Agent as a Virtual Responder, which is designed to be added to on-call schedules and escalation policies just like a human employee. This agent aims to deliver autonomous detection, triage, and diagnosis by inspecting the technical stack and analyzing anomalies when an incident is triggered. PagerDuty also announced a modern Slack experience that is architected to run the full incident lifecycle within a chat channel, including agentic workflows for coordination and post-incident knowledge capture.

Furthermore, the company introduced expanded agent-to-agent capabilities using the Model Context Protocol, or MCP. This is designed to create a multi-agent operational fabric where PagerDuty agents can interact with external agents from cloud providers or developer platforms. On the prevention side, new integrations with Anthropic, Cursor, and LangChain aim to deliver operational context to developers earlier in the lifecycle. For instance, the MCP plugin for Anthropic’s Claude Code is designed to analyze code changes against historical incident data to flag risky deployments. These updates are scheduled for a phased rollout, with the Virtual Responder entering early access in Q2 2026 and fully autonomous capabilities expected in the second half of the year.

The strategy here is quite clear. PagerDuty is trying to capture the "human context" of incident response, which has historically been the most difficult data to digitize. When an engineer fixes a problem, the steps they took and the decisions they made are often lost in Slack threads or private notes. By embedding AI agents into these workflows, the platform aims to create a data flywheel where the system learns from every manual intervention. This is not just about automation; it is about creating a collective memory for the operations team. We find the move to use MCP particularly savvy, as it acknowledges that PagerDuty cannot be the only agent in a company’s ecosystem. By acting as the coordination layer, they are essentially trying to become the air traffic controller for all other operational AI.

There is a pragmatism in this release that we find refreshing. Rather than promising that AI will magically fix everything, the focus is on the "shift-left" prevention and the mundane tasks of triage. By pushing incident history into tools like Cursor, a developer can see the operational consequences of their code while they are still writing it. This is a far more effective way to reduce downtime than simply trying to react faster once things break. We also note the focus on Slack-native workflows. Most engineers live in their chat tools during a crisis, and forcing them to toggle back and forth between a dashboard and a conversation is a recipe for errors. Bringing the agent directly into the channel feels like a natural progression of ChatOps.

Looking Ahead

Based on what we are observing, the market for incident management is bifurcating between traditional notification services and comprehensive operational clouds. The key trend that we are going to be looking out for is the actual efficacy of these agents in high-stakes, "black swan" events where historical data might not provide a clear path forward. Our perspective is that while autonomous triage will yield immediate ROI by reducing fatigue, the transition to a fully autonomous responder will face significant cultural and technical hurdles. HyperFRAME Research Lens data highlights a significant "execution gap" in this area, finding that only 23% of enterprise AI/ML projects launched in the last year successfully reached production and met their original ROI objectives.

Going forward, we are going to be closely monitoring how the company performs on its Model Context Protocol integrations. The success of this strategy hinges on a vibrant ecosystem where agents from different vendors can communicate without friction. When you look at the market as a whole, the announcement places PagerDuty in direct competition with emerging "AI-native" startups that are trying to rebuild the NOC from scratch. However, PagerDuty has the advantage of a massive repository of human-labeled incident data that these newcomers lack.

The integration of AI into IT operations is no longer optional for companies managing distributed cloud architectures, and PagerDuty's roadmap aligns with this necessity. HyperFRAME will be tracking how the company does in moving these features from early access to general availability in future quarters. The real test will be whether customers trust an agent to not only diagnose a problem but to actually execute a remediation script in a production environment. If PagerDuty can prove the safety and reliability of these autonomous actions, it will fundamentally change the economics of site reliability engineering.

Author Information

Steven Dickens | CEO HyperFRAME Research

Regarded as a luminary at the intersection of technology and business transformation, Steven Dickens is the CEO and Principal Analyst at HyperFRAME Research.
Ranked consistently among the Top 10 Analysts by AR Insights and a contributor to Forbes, Steven's expert perspectives are sought after by tier one media outlets such as The Wall Street Journal and CNBC, and he is a regular on TV networks including the Schwab Network and Bloomberg.