Research Finder
Find by Keyword
AI In Your Transaction - IBM Has You Covered
Telum II and Spyre are architected to deliver secure, low-latency generative AI inferencing directly within the IBM Z transactional environment.
Key Highlights
- The Telum II processor features a 5.5GHz core clock, 40% expanded cache, and an integrated 24 TOPS AI accelerator.
- The Spyre Accelerator is a 75-watt PCIe card with 32 cores and 128GB of LPDDR5 memory, designed to handle large language models.
- This ensemble architecture aims to scale generative AI and agentic AI use cases while maintaining data residency and security on the mainframe.
- IBM is directly addressing the need for high-volume, low-latency AI inference for mission-critical, transactional workloads like real-time fraud detection.
- The company’s focus is on bridging AI capabilities across the hybrid cloud spectrum, anchored by the trusted IBM Z platform.
The News
IBM announced the upcoming general availability of the IBM Spyre Accelerator, designed to pair with the enhanced Telum II processor on the IBM z17 and LinuxONE 5 infrastructure. This integrated hardware stack aims to accelerate AI inferencing, including large language models, directly within the secure and resilient mainframe environment. The combined technology enables organizations to apply generative and agentic AI to high-volume transactional data where data gravity resides. This announcement reinforces IBM’s commitment to providing a purpose-built platform for hybrid cloud and enterprise AI. Find out more by clicking here to read the press release.
Analyst Take
I spoke with Tina Tarquinio at IBM TechXchange, and she summed up the announcement perfectly when she said, “Spyre brings generative AI to the mainframe at scale with resiliency and low latency”. This announcement is fundamentally about the persistence of data gravity and the necessity of trusted execution in enterprise computing. For decades, the mainframe has been the secure repository for the world’s most sensitive transactional data, particularly in financial services, insurance, and government. IBM is now systematically applying that core strength to the newest and most demanding workload: artificial intelligence. We are observing that while the broader market focuses on training massive models on hyperscale GPU clusters, IBM is making a calculated move to own the secure, low-latency inference layer for mission-critical applications.
The introduction of the Spyre Accelerator alongside the Telum II processor, announced in the z17, represents a cohesive, two-pronged strategy. The Telum II handles the high-frequency, in-transaction inference needs—think real-time credit card fraud scoring. It is fast, consistent, and intimately tied to the transactional integrity of the core system. The Spyre Accelerator, conversely, is the scale-out solution for larger, more complex models, particularly those involved in generative and agentic AI. This division of labor acknowledges the practical reality of enterprise AI, which is that one size of silicon does not fit all. You need specialized hardware to maintain the high-quality of service that customers expect from a mainframe.
The mainframe environment, by its very nature, demands a different set of priorities than the open, distributed cloud. Security, reliability, and guaranteed low latency are table stakes. The integrated security features, including quantum-safe cryptography enhancements in Telum II, are crucial selling points for regulated industries. My perspective is that IBM is not trying to compete with NVIDIA on raw teraflops for general training workloads. Instead, the company is aiming to provide the most secure and most efficient platform for running AI where the data lives. This focus on co-locating the data, the application, and the AI model is what delivers genuine business value in high-volume environments, reducing data movement costs and security risk simultaneously.
We are seeing a clear shift toward "agentic AI," where AI agents automate complex workflows rather than just providing insights. IBM’s announcement aligns directly with this trend by developing purpose-built agents for mainframe IT operations and application development, powered by this new hardware. This is a smart move. Automating mainframe management addresses the pervasive industry challenge of a diminishing skillset for these specialized systems. The hardware is designed to directly support this operational automation, simplifying incident response and accelerating system upgrades. The underlying architecture is now explicitly geared to function as a highly secure, high-throughput AI inference engine and orchestration layer, embedding intelligence into the very fabric of enterprise transactions.
What was Announced
The IBM Telum II processor, designed on Samsung’s 5nm high-performance process node, aims to deliver significant generational improvements for the IBM Z architecture. It features eight high-performance cores running at a fixed frequency of 5.5GHz. The core design is paired with an impressive 40% increase in on-chip cache capacity over its predecessor, with the virtual L3 cache growing to 360MB and the virtual L4 cache reaching 2.88GB per processor drawer. This expanded cache hierarchy is designed to reduce memory latency, keeping the most relevant transactional data immediately accessible to the cores.
Crucially, the Telum II integrates a second-generation on-chip AI accelerator core. This component is engineered to provide a fourfold increase in compute capability per chip over the previous generation and aims to deliver 24 Trillion Operations per Second (TOPS). This integrated accelerator is designed for low-latency, high-throughput in-transaction AI inferencing, which is perfect for use cases like real-time anomaly detection. The processor also includes a new coherently attached Data Processing Unit (DPU) specialized for I/O acceleration, designed to manage complex I/O protocols and streamline data flow across the system, particularly when connecting to external accelerators.
The IBM Spyre Accelerator is introduced as a complementary, scale-out solution delivered on a 75-watt PCIe Gen 5 card. This standalone system-on-a-chip is designed specifically to handle large, complex AI models, including large language models (LLMs) for generative AI use cases, which often exceed the capacity of the integrated core. Spyre contains 32 AI-optimized processing cores, each with 2MB of scratchpad memory, and is paired with 128GB of LPDDR5 memory. The card is architected to support int4, int8, fp8, and fp16 data types, enabling high throughput and reduced latency for various AI applications. A single IBM Z or LinuxONE system can be configured to cluster up to 48 Spyre cards, providing substantial, scalable AI acceleration. This ensemble method, combining the Telum II's integrated accelerator with the scalable Spyre cards, is designed to support a broader set of AI models and use cases, allowing clients to run AI on their most sensitive data without migrating it off-platform.
Looking Ahead
Based on what I am observing, the core differentiator for IBM is the concept of "transactional gravity." In the world’s largest banks and enterprises, the data cannot simply be moved to a public cloud for AI processing due to regulatory hurdles, security constraints, and network latency issues. The strategic placement of the Telum II and Spyre is a direct acknowledgment of this reality. My perspective is that by optimizing the hardware to run AI models—both traditional and generative—directly adjacent to the mission-critical core, IBM has created a high-margin niche that hyperscalers find difficult to penetrate fully, especially for real-time transactional workloads.
When you look at the market as a whole, the announcement today positions IBM as a dedicated architect of hybrid AI governance. Hyperscalers like AWS and Azure offer a toolkit for building AI anywhere, relying heavily on commodity hardware paired with sophisticated software stacks. IBM, conversely, is offering an integrated, specialized ecosystem where the hardware, operating system, and AI software (watsonx) are intrinsically tied to provide unparalleled trust and control. The key trend that I am going to be looking out for is how effectively IBM can leverage the Spyre Accelerator to attract developers to deploy private, on-prem generative AI solutions for transactional use cases.
If IBM can successfully prove that a complex LLM for regulatory compliance or customer service can run more securely and consistently on a z17/LinuxONE system than in a distributed cloud, the platform's strategic value increases exponentially. HyperFRAME will be tracking how the company does at migrating key generative AI software stacks to exploit the Spyre silicon in future quarters. Going forward, I am also going to be closely monitoring how the DPU in the Telum II performs in real-world benchmarks, as its efficiency in managing the I/O for clustered Spyre accelerators is foundational to the overall scalability claim.
Steven Dickens | CEO HyperFRAME Research
Regarded as a luminary at the intersection of technology and business transformation, Steven Dickens is the CEO and Principal Analyst at HyperFRAME Research.
Ranked consistently among the Top 10 Analysts by AR Insights and a contributor to Forbes, Steven's expert perspectives are sought after by tier one media outlets such as The Wall Street Journal and CNBC, and he is a regular on TV networks including the Schwab Network and Bloomberg.