Research Finder
Find by Keyword
Is Nutanix Bringing Storage Into the AI Execution Path?
KV cache persistence, RDMA-enabled access, and disaggregated storage integrations position Nutanix storage closer to GPU-driven inference and distributed AI environments.
04/09/2026
Key Highlights
- Nutanix outlined an architecture for KV cache offload using Nutanix Files, extending GPU memory through persistent storage.
- Integration of NFS over RDMA and NVIDIA GPUDirect Storage enables direct data movement between storage and GPUs.
- Nutanix is building on Nutanix Unified Storage 5.3 to support AI pipelines and large-scale data environments.
- Distinct roles for file and object storage align infrastructure to inference and data pipeline requirements.
- Integration with NetApp ONTAP extends Nutanix into disaggregated storage architectures, with support for environments such as Cisco FlexPod and Lenovo ThinkSystem.
The News
At Nutanix .NEXT, the company introduced storage updates that extend the platform into AI-driven use cases and disaggregated deployment models. Nutanix announced a strategic integration with NetApp ONTAP, enabling data-in-place approaches and greater flexibility in how storage is deployed and managed, alongside expanded support for partner infrastructure including Cisco FlexPod and Lenovo ThinkSystem environments. The company also outlined how its storage architecture is evolving to support AI inference workflows and large-scale data environments, building on its existing unified storage foundation. For more details, read the main Nutanix .NEXT press release.
Analyst Take
Nutanix continues to extend its platform beyond its origins in hyperconverged infrastructure, positioning itself as a control layer across enterprise environments that now include virtual machines, containers, and AI workloads. The company is focused on simplifying how mixed environments are managed at scale, where traditional and cloud-native applications coexist under a consistent governance model.
Storage is moving into this system. The architecture for KV cache offload using Nutanix Files, combined with NFS over RDMA and NVIDIA GPUDirect Storage, places storage in the inference path. Nutanix is extending GPU memory by persisting context outside of HBM and supports faster endpoint readiness and higher concurrency. With this approach, storage can participate in execution while maintaining a consistent model across environments.
Nutanix is assigning distinct roles to file and object storage within AI environments. File storage is aligned to latency-sensitive workflows such as inference, where proximity to compute and efficient data movement are required. Object storage is positioned as the scale-out layer for AI pipelines and datasets, building on enhancements introduced in Nutanix Unified Storage (NUS) 5.3 and a roadmap that includes RDMA acceleration for higher-throughput workloads. This separation helps simplify design choices by aligning storage architecture to workload behavior.
Data movement is a central design point. By integrating RDMA-based access and GPUDirect Storage, Nutanix is optimizing how data flows between storage and GPUs. Performance is defined by the efficiency of that path, particularly as inference workloads depend on rapid access to distributed data. This aligns with Nutanix’s broader focus on GPU-intensive workloads, including bare-metal Kubernetes deployments, where minimizing latency across the data path is increasingly important.
Nutanix is also supporting disaggregated infrastructure models. Integrations with NetApp ONTAP and environments such as Cisco FlexPod show that storage does not need to be native to the platform. External systems retain control of data services, while Nutanix provides orchestration and visibility. This also reflects a shift toward supporting brownfield environments, where existing storage investments remain part of the architecture.
These developments point to a model where storage, compute, and orchestration are governed through a common control layer. Storage is moving beyond capacity and protection and is being integrated into how workloads execute, particularly in AI environments where performance, context persistence, and multi-tenant behavior must be managed together. The emphasis on removing complexity while expanding customer choice defines how Nutanix is approaching this transition.
What Was Announced
Nutanix extended its storage architecture to support AI workloads, building on the existing foundation of Nutanix Unified Storage 5.3. The announcements focus on how storage participates in inference workflows, data pipelines, and disaggregated environments.
For inference workloads, Nutanix outlined an architecture for KV cache offload using Nutanix Files. This approach uses NFS over RDMA and NVIDIA GPUDirect Storage to enable direct data movement between GPU memory and persistent file storage. By extending GPU memory into external storage, this model supports larger context windows, reduces cold start times, and improves resource utilization in inference environments.
Nutanix highlighted enhancements to its object storage architecture, building on NUS 5.3 as a foundation for AI data environments. The release expands Smart Tiering to enable data movement to public cloud object storage, including Google Cloud and OVHCloud S3, and introduces multitenant object scaling and quotas to support large AI data lakes. Nutanix is positioning object storage as a higher-performance data tier for AI pipelines, with a roadmap that includes RDMA acceleration for S3-compatible storage later in 2026 to increase throughput for training datasets and data-intensive workloads.
Nutanix also announced integration with NetApp ONTAP, enabling data-in-place approaches for virtual machine environments and allowing customers to use existing storage systems within Nutanix-managed infrastructure. Support for validated environments such as Cisco FlexPod and Lenovo ThinkSystem extends this model across partner infrastructure, combining external compute, networking, and storage systems under Nutanix orchestration. These integrations enable disaggregated deployment models where storage, compute, and management layers can be independently scaled and maintained. The NetApp integration is expected to be available later in 2026.
Nutanix .NEXT 2026: Transforming the AI Data Path Through GPU Optimization and In-Place Infrastructure Modernization
By persisting KV cache on Nutanix Files via RDMA and GPUDirect, Nutanix is creating a warm start mechanism for LLMs. This technical integration enables organizations to maintain long-running AI conversations and complex agentic workflows without the massive re-computation costs typically associated with GPU memory flushing, significantly improving the efficiency of persistent AI sessions.
The strategic separation of file and object roles, positioning Nutanix Files for low-latency inference and NUS 5.3 for high-throughput data lakes, functions as a GPU multiplier. This architectural approach turns storage into a virtual HBM (High Bandwidth Memory) extension, which enables smaller, more cost-effective GPU clusters to handle significantly larger context windows and more expansive datasets than previously possible.
We discern that Nutanix is neutralizing the refactoring trap by integrating with established systems like NetApp ONTAP and Cisco FlexPod. This allows enterprises to modernize their AI control plane while their data remains in-place, removing the massive time and risk barriers associated with data migration that frequently stall AI initiatives in existing brownfield data centers.
Looking Ahead
Nutanix is aligning its storage architecture with the requirements of AI-driven systems, where data locality, movement, and persistence directly influence performance. The introduction of KV cache persistence and RDMA-enabled access integrates storage more closely with compute workflows, particularly in inference scenarios, while simplifying how context is retained and reused across distributed inference clusters. This aligns with Nutanix’s broader focus on GPU-intensive workloads, including bare-metal Kubernetes deployments, where minimizing latency across the data path is increasingly important.
Nutanix is already appearing more frequently in VMware migration discussions, where customers are evaluating alternatives that provide continuity for existing workloads alongside a path to cloud-native and AI-driven services. The evolution of object storage and metadata scaling supports large-scale data pipelines, while planned RDMA capabilities point to continued improvements in throughput for training and data-intensive workflows. Integrations with external platforms such as NetApp ONTAP and validated architectures such as Cisco FlexPod reinforce a model that prioritizes optionality, allowing customers to extend existing infrastructure investments without disrupting established data services.
These enhancements and partnerships broaden Nutanix’s appeal, positioning the platform as both a migration destination and a foundation for new workloads. In our opinion, it will be important to watch how Nutanix translates this momentum into measurable performance benchmarks, tenant-level isolation, and tighter integration with GPU and AI software stacks. These factors will determine whether storage can consistently function as part of the execution path while sustaining predictable performance and resource utilization at scale.
Don Gentile | Analyst-in-Residence -- Storage & Data Resiliency
Don Gentile brings three decades of experience turning complex enterprise technologies into clear, differentiated narratives that drive competitive relevance and market leadership. He has helped shape iconic infrastructure platforms including IBM z16 and z17 mainframes, HPE ProLiant servers, and HPE GreenLake — guiding strategies that connect technology innovation with customer needs and fast-moving market dynamics.
His current focus spans flash storage, storage area networking, hyperconverged infrastructure (HCI), software-defined storage (SDS), hybrid cloud storage, Ceph/open source, cyber resiliency, and emerging models for integrating AI workloads across storage and compute. By applying deep knowledge of infrastructure technologies with proven skills in positioning, content strategy, and thought leadership, Don helps vendors sharpen their story, differentiate their offerings, and achieve stronger competitive standing across business, media, and technical audiences.
Ron Westfall | VP and Practice Leader for Infrastructure and Networking
Ron Westfall is a prominent analyst figure in technology and business transformation. Recognized as a Top 20 Analyst by AR Insights and a Tech Target contributor, his insights are featured in major media such as CNBC, Schwab Network, and NMG Media.
His expertise covers transformative fields such as Hybrid Cloud, AI Networking, Security Infrastructure, Edge Cloud Computing, Wireline/Wireless Connectivity, and 5G-IoT. Ron bridges the gap between C-suite strategic goals and the practical needs of end users and partners, driving technology ROI for leading organizations.