Research Notes

NVIDIA Releases Nemotron 3: A New Family of Open Models

Research Finder

Find by Keyword

NVIDIA Releases Nemotron 3: A New Family of Open Models

NVIDIA debuted the Nemotron 3 family of open models, available in Nano, Super, and Ultra sizes, providing efficiency advances, improved accuracy, and a hybrid Mixture-of-Experts architecture that can deliver throughput breakthroughs for building agentic AI applications.

18/12/2025

Key Highlights

  • NVIDIA launched the Nemotron 3 family (Nano, Super, Ultra) of open MoE models and supporting libraries to promote transparent and efficient multi-agent AI development.

  • The models use a hybrid latent MoE architecture to solve major challenges such as high inference costs and load balancing in complex, large-scale agentic systems.

  • Nemotron 3 directly addresses architectural hurdles by optimizing memory footprint and deployment complexity through its tiered sizes and integrated accelerated computing platform.

  • Functionally, the system can improve context consistency and trust/transparency by providing high throughput and being fully open-source for alignment and scrutiny.

  • NVIDIA mitigates the high barrier of training complexity by releasing comprehensive open-source data and libraries (like NeMo Gym/RL), enabling wider adoption beyond hyperscale organizations.

The News

NVIDIA announced the NVIDIA Nemotron 3 family of open models, data and libraries designed to power transparent, efficient and specialized agentic AI development across industries. Nemotron 3 Nano is available on Hugging Face and through inference service providers including Baseten, DeepInfra, Fireworks, FriendliAI, OpenRouter and Together AI.

Nemotron is offered on enterprise AI and data infrastructure platforms, including Couchbase, DataRobot, H2O.ai, JFrog, Lambda and UiPath.

For customers on public clouds, Nemotron 3 Nano will be available on AWS via Amazon Bedrock (serverless) as well as supported on Google Cloud, CoreWeave, Crusoe, Microsoft Foundry, Nebius, Nscale and Yotta soon. Nemotron 3 Nano is available as an NVIDIA NIM microservice for secure, scalable deployment anywhere on NVIDIA-accelerated infrastructure for maximum privacy and control. Nemotron 3 Super and Ultra are expected to be available in the first half of 2026. For more information, read the NVIDIA press release.

Analyst Take

NVIDIA has introduced the Nemotron 3 family of open models (available in Nano, Super, and Ultra sizes), alongside supporting data and libraries, specifically engineered to drive transparent, efficient, and specialized agentic AI development across industries. These models use a hybrid latent Mixture-of-Experts (MoE) architecture to provide the performance and openness necessary for building reliable multi-agent systems at scale, addressing challenges such as high inference costs and communication overhead.

By supporting sovereign AI efforts and being adopted by industry stalwarts such as Accenture, Oracle, and Siemens, Nemotron 3 enables developers to optimize costs and accelerate innovation by strategically balancing frontier-level models with efficient, customizable open models for agentic workflows.

Identifying and Meeting MoE Challenges

From my view, the challenges inherent to multi-agent systems and highly complex MoE architectures generally fall into two categories: architectural/deployment hurdles and behavioral/reliability issues. Architecturally, managing these systems at scale can introduce significant technical complexity. A primary challenge is load balancing and MoE Routing, where effectively distributing the computational workload across the MoE's expert sub-networks is crucial. Inefficient or uneven routing of input tokens can lead directly to underutilized hardware or processing bottlenecks.

Furthermore, the system complexity and infrastructure required to manage a hybrid latent MoE model within a multi-agent framework significantly increases the difficulty of deployment, demanding specialized orchestration tools and deep expertise to ensure high availability and performance. Finally, despite their computational efficiency, the sheer number of expert parameters means the Memory Footprint (VRAM requirement) needed to store the model remains a substantial obstacle, particularly for resource-constrained deployments on standard hardware or edge devices.

From a functional standpoint, ensuring reliable and trustworthy operation is difficult. Context drift and consistency is a major concern in multi-agent systems, as agents may struggle to maintain a coherent and consistent state of knowledge, potentially leading to unreliable or contradictory outputs. This problem is compounded by issues of trust and transparency (interpretability); the inherent complexity of the MoE routing mechanism and inter-agent collaboration makes understanding why a system reached a specific conclusion extremely challenging.

This lack of transparency erodes trust, especially in mission-critical applications. As such, the training complexity itself is a significant barrier, requiring massive proprietary datasets and highly sophisticated techniques, such as reinforcement learning, to successfully ensure that each expert specializes appropriately without detrimental overlap - a hurdle best overcome by organizations with hyperscale resources.

I see the NVIDIA Nemotron 3 family of solutions directly addressing these architectural and behavioral challenges through a combination of model architecture, supporting software, and an open framework focused on efficiency, transparency, and deployment scale.

Addressing Architectural and Deployment Hurdles

NVIDIA's Nemotron 3 family tackles architectural difficulties by optimizing the model's core design and providing a complete deployment ecosystem. To resolve issues with load balancing and MoE Routing, Nemotron 3 delivers a hybrid latent MoE architecture. This refined internal design enhances token routing efficiency compared to standard MoE models, thereby ensuring better load distribution and reducing the processing bottlenecks that commonly lead to hardware underutilization.

For managing system complexity and infrastructure, the models are provided with a full set of data and libraries integrated within NVIDIA's accelerated computing platform. This cohesive, pre-optimized ecosystem can simplify the orchestration needed for large MoE and multi-agent systems, reducing the reliance on specialized, manual tooling and extensive in-house expertise.

Furthermore, the problem of the large memory footprint (VRAM Requirement) is mitigated by offering models in Nano, Super, and Ultra sizes. The Nano size specifically targets resource-constrained deployments, making the MoE architecture accessible for edge devices and standard hardware where memory efficiency is crucial, while the MoE structure inherently improves overall efficiency and throughput.

To help ensure reliable and trustworthy multi-agent operation, Nemotron 3 focuses on boosting performance and providing tooling for transparency. The platform combats context drift and consistency by ensuring high efficiency and throughput; for example, Nemotron 3 Nano delivers four times higher throughput than its predecessor. This rapid, reliable inference capability manages the intense, real-time communication demands of multiple agents, minimizing latency and communication overhead.

Addressing trust and transparency, the models are explicitly categorized as open-source, enabling developers full visibility into the system. This openness enables the creation of transparent agentic AI systems that can be built to align with specific organizational values and regulatory requirements, directly restoring trust in mission-critical applications. To overcome the barrier of training complexity, NVIDIA provides a comprehensive collection of state-of-the-art open models, training datasets, and reinforcement learning environments and libraries. This provision of resources can lower the hurdle for adoption, enabling startups and enterprises without massive hyperscale resources to build, iterate on, and deploy specialized AI agents.

Nemotron 3: Efficient MoE Models and Open Tools for Multi-Agent AI

The Nemotron 3 family of MoE models is designed to innovate multi-agent AI by offering a range of sizes optimized for both efficiency and high accuracy. This family includes three key models: Nemotron 3 Nano (30 billion parameters, 3B active), which is the most compute-cost-efficient option, built for tasks such as software debugging, summarization, and AI assistant workflows at low inference costs. Its hybrid MoE architecture and 1-million-token context window deliver up to 4x higher token throughput and a 60% reduction in reasoning-token generation compared to its predecessor.

For more demanding applications, Nemotron 3 Super (100B parameters, 10B active) is a high-accuracy reasoning model well-suited for applications requiring many collaborating agents. The largest, Nemotron 3 Ultra (500B parameters, 50B active), serves as an advanced reasoning engine for complex AI workflows demanding deep research and strategic planning.

To accelerate the development and customization of specialized AI agents, NVIDIA has also released extensive open tools and data. This includes three trillion tokens of new Nemotron datasets for pretraining, post-training, and reinforcement learning, which can provide the coding and multistep workflow examples necessary to create domain-specialized agents.

Furthermore, the company has made the NeMo Gym and NeMo RL open-source libraries available for creating training environments, along with NeMo Evaluator for validating model safety and performance. Nemotron 3 Super and Ultra achieve efficiency gains by using the ultra-efficient 4-bit NVFP4 training format on the NVIDIA Blackwell architecture, reducing memory requirements and speeding up training without compromising accuracy. This entire ecosystem can enable developers to select the right-sized model and integrate specialized tools, supporting a scaling from dozens to hundreds of agents with faster, more accurate long-horizon reasoning.

I see that the market prospects for Multi-Agent AI (MAI) are encouraging, driven by the urgent enterprise demand for complex automation, operational efficiency, and cross-functional workflow management. This shift is highly advantageous for NVIDIA's competitive prospects, as MAI is critically dependent on their core infrastructure: the vast, collaborative nature of multi-agent systems necessitates massive parallel processing power for training and, crucially, for efficient, low-latency inference, which is the domain of NVIDIA's dominant data center GPUs (e.g., H200, Blackwell).

By releasing the Nemotron 3 family of open MoE models and the accompanying NeMo RL and Gym software ecosystem, NVIDIA can help drive that the next generation of agentic AI is developed, trained, and deployed directly onto the company’s platform, deepening its software prominence and diversifying its revenue stream beyond hardware sales into the foundational AI model layer.

Looking Ahead

I believe that the sustained, transformative advancement of AI relies upon and is strategically built upon the principles of open innovation, cultivating collaborative development and widespread access to foundational research and models. With the introduction of Nemotron, NVIDIA is actively converting advanced AI capabilities into an accessible, open platform, providing developers with the transparency and efficiency required to construct sophisticated agentic systems at large scale.

I expect that NVIDIA can bolster its competitiveness by actively promoting the adoption of the Nemotron 3 family as the de facto foundation model for agentic AI development, emphasizing its hybrid MoE architecture's efficiency and cost benefits over competing closed models. It should strategically expand the open-source NeMo ecosystem (including NeMo RL and Gym) with more comprehensive tools and pre-integrated environments, making it faster and easier for developers to deploy specialized, high-performance multi-agent systems directly onto NVIDIA GPUs. By continuing to leverage its hardware dominance while openly providing optimized software and models, NVIDIA can solidify its platform prominence and ecosystem influence across the entire AI development stack, from research to deployment.

Author Information

Ron Westfall | VP and Practice Leader for Infrastructure and Networking

Ron Westfall is a prominent analyst figure in technology and business transformation. Recognized as a Top 20 Analyst by AR Insights and a Tech Target contributor, his insights are featured in major media such as CNBC, Schwab Network, and NMG Media.

His expertise covers transformative fields such as Hybrid Cloud, AI Networking, Security Infrastructure, Edge Cloud Computing, Wireline/Wireless Connectivity, and 5G-IoT. Ron bridges the gap between C-suite strategic goals and the practical needs of end users and partners, driving technology ROI for leading organizations.