Research Notes

SIGGRAPH: NVIDIA Sharpens Physical AI Proposition

Research Finder

Find by Keyword

SIGGRAPH: NVIDIA Sharpens Physical AI Proposition

Using new NVIDIA RTX PRO Servers and NVIDIA DGX Cloud, Omiverse libraries and physical AI models enable developers to create physically accurate digital twins, reconstruct the real world in simulations, and generate the synthetic data needed to train physical AI models. 

Key Highlights:

  • NVIDIA has introduced new Omniverse libraries and Cosmos foundation models to accelerate robotics development, enabling the creation of physically accurate digital twins and synthetic data for AI training.
  • New software development kits and frameworks, including NVIDIA Isaac Sim 5.0 and NVIDIA Isaac Lab 2.2, are now available to make it easier for developers to build and test industrial AI and robotics applications.
  • NVIDIA Cosmos Reason, a new open-source Vision Language Model (VLM), enables robots to think more like humans by using prior knowledge and common sense to interpret and interact with the real world.
  • To support the growing demand for expertise in robotics and simulation, NVIDIA has launched a new OpenUSD Curriculum and Certification and is collaborating with companies to integrate new robot training frameworks.
  • NVIDIA is providing new AI infrastructure, including RTX PRO Servers and DGX Cloud on Microsoft Azure, to help developers manage complex robotics workloads and accelerate their projects.

The News

NVIDIA announced new NVIDIA Omniverse libraries and NVIDIA Cosmos world foundation models (WFMs) that accelerate the development and deployment of robotics solutions. For more information read the NVIDIA press release.

Analyst Take

NVIDIA has launched new NVIDIA Omniverse libraries and NVIDIA Cosmos world foundation models to speed up the creation and implementation of robotics solutions. These new libraries and models, powered by NVIDIA RTX PRO Servers and NVIDIA DGX Cloud, give developers the ability to create physically accurate digital twins, simulate real-world environments, generate synthetic data for training AI models, and build AI agents that can understand the physical world.

NVIDIA has released new Omniverse software development kits (SDKs) and libraries to help with the creation and deployment of industrial AI and robotics simulation applications. The new SDKs now allow for data to be shared between MuJoCo (MJCF) and Universal Scene Description (OpenUSD), which means that over 250,000 robot learning developers can easily simulate robots on different platforms. Additionally, the new Omniverse NuRec libraries and AI models introduce Omniverse RTX ray-traced 3D Gaussian splatting, a rendering technique that uses sensor data to capture, reconstruct, and simulate the real world in 3D.

In addition to these libraries, NVIDIA's open-source robot simulation and learning frameworks, NVIDIA Isaac Sim 5.0 and NVIDIA Isaac Lab 2.2, are now available on GitHub. Isaac Sim 5.0 now features NuRec neural rendering and new OpenUSD-based robot and sensor schemas, which help developers more accurately translate their simulations into real-world applications.

The NuRec rendering from Omniverse has now been integrated into CARLA, a popular open-source simulator used by more than 150,000 developers, particularly in the autonomous vehicle (AV) sector. Foretellix, a leader in AV toolchains, is also integrating NuRec, along with Omniverse Sensor RTX and Cosmos Transfer, to create more accurate and scalable synthetic data for its scenarios. Additionally, Voxel51 is using NuRec in its FiftyOne data engine, which is used by companies like Ford and Porsche, to simplify data preparation for reconstructions.

Several companies are adopting Omniverse libraries, Isaac Sim, and Isaac Lab to accelerate their robotics development, including Boston Dynamics, Figure AI, Hexagon, RAI Institute, Lightwheel, and Skild AI. Amazon Devices & Services is also leveraging these tools to power a new manufacturing solution.

Cosmos: Transforming World Generation for Robotics

The Cosmos WFMs have been downloaded over 2 million times and are used by developers to create a wide variety of data for training robots at scale. By using text, image, and video prompts, these models can generate the diverse data needed to teach robots in a cost-effective way.

From my viewpoint, the new models recently announced at SIGGRAPH offer significant improvements in the speed, accuracy, language support, and control of synthetic data generation. A new model, Cosmos Transfer-2, can simplify the prompting process and accelerate the creation of photorealistic synthetic data from 3D simulation scenes or spatial inputs. A distilled version of Cosmos Transfer also drastically reduces the time it takes to run the model, allowing developers to use it on NVIDIA RTX PRO Servers at unprecedented speeds. Companies such as Lightwheel, Moon Surgical, and Skild AI are already using Cosmos Transfer to speed up their physical AI training by simulating diverse conditions on a massive scale.

As a result, I find that NVIDIA is better positioned to take advantage of the market for WFMs. This expansion is being fueled by their use in a variety of industries, including robotics, autonomous vehicles, and enterprise automation. While there are not many specific market projections for WFMs alone, their growth can be understood by looking at the broader foundation model market. That market, for closed-source models, is expected to increase by $39.56 billion between 2024 and 2029, with a compound annual growth rate of 40.7%. 

This growth is driven by a demand for productivity and automation in businesses and by advancements in multimodal capabilities. I discern that WFMs are a specialized category and are likely to be a major part of this expansion because they can process different data types - such as text, images, and video - which helps reduce the need for physical testing in areas like autonomous driving and medical diagnostics.

Several trends are accelerating the growth of WFMs, including the rise of multimodal AI and the use of model distillation to create smaller, more efficient, and cost-effective models. North America is expected to lead this market due to early adoption and significant investment, while Europe is investing heavily in building its own AI infrastructure. However, challenges such as high infrastructure costs, data privacy issues, and the rapid pace of model obsolescence could slow this growth. Despite these obstacles, the overall foundation model market - and by extension, the WFM market - is projected to expand steadily through 2030, with applications in content and code generation, and customer support. 

Cosmos Reason: A New Era for AI World Understanding

Since the release of OpenAI's CLIP model, vision language models (VLMs) have dramatically changed how computer vision tasks like object recognition are performed. However, these models have struggled with more complex challenges, such as handling multistep tasks, dealing with ambiguity, or adapting to new situations. They often lack the ability to reason and make decisions in the same way that humans do.

To address these limitations, NVIDIA has introduced Cosmos Reason, a new open and customizable VLM with 7 billion parameters designed specifically for physical AI and robotics. Cosmos Reason allows robots and vision AI agents to think more like humans, using prior knowledge, an understanding of physics, and common sense to interpret and interact with the real world. This powerful new model can be used for several robotics and physical AI applications, including automating the curation and annotation of large, diverse training datasets and serving as the brain for robot vision language action (VLA) models to plan and make methodical decisions. It allows robots to understand environments and break down complex commands into manageable tasks, even in unfamiliar settings.

The capabilities of Cosmos Reason are already being adopted across various industries. NVIDIA's own robotics and DRIVE teams are using it for data curation, filtering, and annotation, while Uber is using it to caption and annotate autonomous vehicle training data. Magna is integrating Cosmos Reason into its City Delivery platform to help its autonomous vehicles quickly adapt to new cities, adding a deeper understanding of the world to the vehicles' long-term planning. Other companies, such as VAST Data, Milestone Systems, and Linker Vision, are using Cosmos Reason to automate traffic monitoring, improve safety, and enhance visual inspections in both urban and industrial environments.

From my perspective, VLMs are now better positioned to transform computer vision tasks such as  object recognition by integrating visual and textual data into a unified framework, enabling more robust and context-aware processing. Unlike traditional computer vision models that rely primarily on image-based training for specific tasks, VLMs can use Cosmos Reason to leverage vast datasets of paired images and text, such as those found on the internet, to learn rich, multimodal representations. 

 

As such, evolving VLMs can understand objects in context, capturing not just their appearance but also their semantic relationships, attributes, and real-world significance as described in text. For example, a VLM can recognize a red apple not just by its shape and color but also by understanding the concept of apple from textual descriptions, making it more adept at handling variations in lighting, angles, or occlusions that might challenge traditional models.

 

Additionally, VLMs have shifted the paradigm from task-specific models to general-purpose, zero-shot learning systems, reducing the need for extensive labeled datasets and retraining. By pretraining on diverse image-text pairs, VLMs such as CLIP or DALL-E can generalize across a wide range of visual tasks without fine-tuning, performing object recognition by aligning image features with text embeddings. This flexibility enables them to handle novel objects or categories not explicitly trained on, simply by interpreting descriptive prompts. 

Empowering the Developer Community

To assist robotics and physical AI developers in adopting 3D and simulation technologies, NVIDIA has made two key announcements: First, it launched the OpenUSD Curriculum and Certification to meet the growing need for expertise in Universal Scene Description (USD). This initiative has the support of several major companies, including Adobe, Amazon Robotics, Autodesk, Pixar, Siemens, and others.

Second, NVIDIA is collaborating with Lightwheel to integrate new robot training and evaluation frameworks into NVIDIA Isaac Lab. This open-source collaboration can add parallel reinforcement learning training, benchmarks, and simulation-ready assets for robot manipulation and movement.

I find that NVIDIA Isaac Lab is integral to advancing physical AI ecosystem objectives because it provides an open-source, modular framework that accelerates robotics development through high-fidelity, GPU-accelerated simulation environments built on NVIDIA Omniverse and OpenUSD. It enables developers to create, test, and train AI-driven robots in virtual settings that closely mimic real-world physics, reducing the sim-to-real gap for applications like autonomous navigation and manipulation. 

With features like customizable workflows, integration with tools such as Cosmos WFMs for synthetic data generation, and support for scalable reinforcement learning, Isaac Lab 2.2 empowers over 250,000 developers to iterate rapidly, optimize robot policies, and deploy robust solutions across industries such as manufacturing and logistics, all while leveraging NVIDIA’s RTX hardware for real-time, photorealistic simulations.

New NVIDIA AI Infrastructure for Robotics

To enable developers to fully use its advanced technologies and software libraries, NVIDIA has introduced new AI infrastructure built for the most demanding workloads. The NVIDIA RTX PRO Blackwell Servers provide a single architecture capable of handling every aspect of robot development, from training and synthetic data generation to robot learning and simulation. 

Additionally, NVIDIA DGX Cloud is now available on the Microsoft Azure Marketplace. This offers Omniverse developers a fully managed platform that simplifies the process of streaming OpenUSD- and NVIDIA RTX-based applications from the cloud, which significantly reduces the need for infrastructure management. Industry leaders such as Accenture and Hexagon are among the first to adopt this new platform.

I see that the market prospects for AI infrastructure supporting robotics are robust, driven by the increasing integration of AI in robotics for applications across industries like manufacturing, healthcare, logistics, and agriculture. AI infrastructure, encompassing high-performance computing, cloud platforms, and specialized hardware like GPUs and TPUs, enables robots to process vast datasets, enhance real-time decision-making, and improve autonomous capabilities through machine learning and computer vision. 

The global robotics market is projected to grow significantly, with estimates suggesting a CAGR of over 25% through 2030 (according to MarketsandMarkets), fueled by advancements in AI-driven automation. Investments in 5G, edge computing, and scalable AI platforms are further accelerating this growth by enabling low-latency, data-intensive robotic operations. However, challenges like high initial costs and interoperability issues may temper short-term adoption, while long-term demand for intelligent, adaptive robots ensures strong market potential.

NVIDIA Delivers Competitive Advantages

From my viewpoint, NVIDIA gains a significant competitive advantage by providing a comprehensive, end-to-end ecosystem for robotics development, rather than just isolated components. The launch of new Omniverse libraries and Cosmos WFMs strengthens this ecosystem by offering a sim-first approach that can dramatically reduce the cost and time of training and deploying robots. By providing physically accurate simulation environments, tools for generating massive amounts of synthetic data, and a new VLM (Cosmos Reason) that gives robots human-like reasoning capabilities, NVIDIA positions itself as the foundational platform for physical AI. 

This creates a powerful flywheel effect: as more developers and companies, from Boston Dynamics to Amazon, adopt NVIDIA's full stack - including their GPUs, Omniverse software, and DGX Cloud infrastructure - it further entrenches their dominance and makes it difficult for competitors to emulate. This strategy can establish NVIDIA not just as a chip provider, but as the de facto operating system for the future of robotics.

Looking Ahead

Overall I believe that the worlds of computer graphics and AI are merging to completely change robotics. By bringing together AI's reasoning capabilities with scalable, physically accurate simulations, we are empowering developers to create the next generation of robots and autonomous vehicles. These innovations are set to transform industries worth trillions of dollars.

NVIDIA is set to play an integral role in the merging of computer graphics and AI, which is poised to fundamentally transform the robotics market, presenting immense prospects for improved business outcomes and growth. This convergence is set to drive significant market expansion across various sectors. Key industries such as manufacturing, logistics, and healthcare are leading the adoption, leveraging AI-powered robots for tasks such as automated quality inspection, warehouse sorting, and surgical assistance. The technology is also fueling the development of new applications, from humanoid robots that can assist in human-centric workspaces to autonomous vehicles and smart city infrastructure. 

Author Information

Ron Westfall | Analyst In Residence

Ron Westfall is a prominent analyst figure in technology and business transformation. Recognized as a Top 20 Analyst by AR Insights and a Tech Target contributor, his insights are featured in major media such as CNBC, Schwab Network, and NMG Media.

His expertise covers transformative fields such as Hybrid Cloud, AI Networking, Security Infrastructure, Edge Cloud Computing, Wireline/Wireless Connectivity, and 5G-IoT. Ron bridges the gap between C-suite strategic goals and the practical needs of end users and partners, driving technology ROI for leading organizations.