Nvidia Unveils Nemotron 3 Ultra at Computex as the Most Powerful American Open-Weight AI Model to Date

Nvidia CEO Jensen Huang took the stage at the Computex 2026 trade show in Taipei on Sunday, wearing his signature leather jacket to introduce the company’s latest and most ambitious contribution to the open-source artificial intelligence ecosystem: Nemotron 3 Ultra. Positioned as the flagship of the Nemotron 3 family, the new model represents the largest open-weight AI system ever produced by the American semiconductor giant. While the model has been verified as the most intelligent open-weight model currently built in the United States, its debut comes with a nuanced caveat: despite its significant leap in reasoning and speed, it still trails the leading open-weight offerings from Chinese laboratories in terms of raw cognitive performance.

The release of Nemotron 3 Ultra is a cornerstone of Nvidia’s broader strategy to democratize high-performance AI while simultaneously driving demand for its data center hardware. The model features a massive 550 billion total parameters, yet it utilizes a sophisticated Mixture-of-Experts (MoE) architecture that allows it to run on only 55 billion active parameters at any single moment. This design choice is intended to balance the immense knowledge capacity of a large-scale model with the operational efficiency required for enterprise-grade deployment.

The Evolution of the Nemotron Family and the Computex Keynote

The unveiling of Nemotron 3 Ultra marks a significant milestone in a journey that began in November 2023 with the release of the first Nemotron-branded model. Since then, Nvidia has steadily iterated on its software stack, moving from early experimental weights to the sophisticated third-generation architecture announced in late 2025. The Nemotron 3 lineage is structured into three distinct tiers: Nano, designed for lightweight and on-device tasks; Super, a mid-range model optimized for general enterprise applications; and Ultra, the new flagship designed for complex reasoning, long-context analysis, and autonomous agentic workflows.

During his keynote, Huang emphasized that the development of Nemotron 3 Ultra was not merely an exercise in scaling up parameters, but an overhaul of the underlying architecture. The model utilizes a hybrid framework that integrates Mamba-2 layers with standard Transformer attention mechanisms and MoE routing. This hybrid approach is specifically engineered to address the "quadratic bottleneck" of traditional Transformers, where the computational cost of processing information grows exponentially with the length of the input. By incorporating Mamba-2, Nvidia has enabled Nemotron 3 Ultra to support a massive 1-million-token context window, allowing the model to "remember" and process the equivalent of several thick novels or a massive software codebase in a single session.

Technical Architecture: Mixture-of-Experts and Multi-Token Prediction

The efficiency of Nemotron 3 Ultra is largely attributed to its Mixture-of-Experts (MoE) design. In a traditional "dense" model, every parameter is activated for every query, which requires immense computational power and increases latency. In contrast, the MoE system functions similarly to a highly specialized medical facility. When a query is submitted, the model’s "router" identifies which "specialists"—sub-networks within the 550 billion parameters—are best suited to handle the specific task. Only those 55 billion relevant parameters are activated.

This architectural efficiency allows Nvidia to claim inference speeds that are five times faster than comparable open-weight models, with operational costs approximately 30% lower than previous industry standards. Beyond MoE, the Ultra model introduces Multi-Token Prediction (MTP). While standard AI models predict the next word or "token" one by one, MTP allows Nemotron 3 Ultra to predict several future tokens simultaneously. This parallelization significantly accelerates the generation of text and code, a critical factor for real-time applications and autonomous agents that must perform multi-step planning.

Furthermore, the model underwent extensive post-training using reinforcement learning from human feedback (RLHF) and interactive environment training. This process was designed to shift the model from being a simple conversationalist to a functional agent capable of executing complex instructions across different software environments.

Benchmarking Intelligence: A Transpacific Competition

To validate the capabilities of Nemotron 3 Ultra, Nvidia partnered with the independent evaluator Artificial Analysis for a comprehensive pre-release assessment. The model was tested on the Intelligence Index, a composite benchmark that aggregates ten different evaluations covering mathematical reasoning, coding proficiency, general knowledge, and agentic performance.

Nemotron 3 Ultra achieved a score of 48 on the Intelligence Index, firmly establishing it as the top-performing open-weight model produced in the United States. For comparison, Google’s Gemma 4 31B scored 39, while Nvidia’s own Nemotron 3 Super (released in March 2026) sits at 36. OpenAI’s gpt-oss-120b, another major American open-weight entry, trails further behind with a score of 33. The 12-point jump from Nemotron 3 Super to Ultra represents a generational leap in reasoning capability within just a few months of development.

However, the data also highlights the growing dominance of Chinese AI labs in the open-source arena. Moonshot AI’s Kimi K2.6, released in April 2026, currently holds a score of 54 on the same index, outperforming Nemotron 3 Ultra by six points. Another Chinese model, DeepSeek V4 Pro, also maintains a lead in intelligence metrics. These Chinese models are currently ranked among the top five AI systems globally, sitting just a few points behind the proprietary, closed-source flagship models from Anthropic, Google, and OpenAI, which are currently tied at a score of 57.

Speed as a Competitive Differentiator

While Nemotron 3 Ultra may currently trail the top Chinese models in raw intelligence scores, it appears to have secured a significant lead in throughput and latency. In tests conducted on DeepInfra endpoints, the model achieved an output speed of over 300 tokens per second. In the current market, Chinese models of similar intelligence classes, such as Kimi K2.6 and DeepSeek V4 Pro, typically serve between 50 and 100 tokens per second through their commercial APIs.

This speed gap is more than a technical vanity metric; it has profound implications for the burgeoning field of AI agents. An autonomous agent tasked with researching a topic, writing code, and then debugging that code may require thousands of tokens of generation across dozens of internal "reasoning steps." If each step is processed at 50 tokens per second, the cumulative delay can render the agent impractical for enterprise use. At 300 tokens per second, these multi-step workflows become nearly instantaneous, potentially giving Nvidia a practical edge in the "Agentic AI" era even if its model’s "IQ" is slightly lower than its competitors.

The Geopolitical Context: Nvidia’s $26 Billion Open-Source Gambit

The release of Nemotron 3 Ultra is part of a broader, high-stakes geopolitical and economic strategy. Historically, leading American AI companies like OpenAI and Google have moved away from open-source models, opting to keep their most advanced systems behind proprietary APIs to maintain a competitive moat and manage safety risks. This shift created a vacuum in the open-source ecosystem that Chinese firms have aggressively filled.

Industry data shows a dramatic shift in the global AI landscape: Chinese open-source models accounted for only 1.2% of global usage in late 2024, but that figure skyrocketed to 30% by the end of 2025. This trend poses a strategic risk to U.S. influence in AI standards and development. Nvidia, whose business model relies on selling the hardware that powers these models, has a vested interest in ensuring the open-source ecosystem remains vibrant and dominated by models optimized for its own DGX Cloud and H100/B200 GPU architectures.

To counter the rise of Chinese influence, Nvidia has publicly committed to a five-year, $26 billion plan dedicated to the development of open-weight AI. Nemotron 3 Ultra is the most powerful fruit of that investment to date. Furthermore, Nvidia has formed the "Nemotron Coalition," a strategic alliance of eight prominent AI labs, including Mistral AI and Perplexity. This group is co-developing the next generation of models on Nvidia’s infrastructure, ensuring a steady pipeline of high-quality open models that are "Nvidia-native."

Deployment and Future Outlook

Nvidia has announced that Nemotron 3 Ultra will be officially available starting June 4, 2026. While the model’s 550 billion parameters technically require data-center-scale hardware to run at full capacity, Nvidia is making the model accessible through its own API catalog and various cloud service providers. This allows developers to integrate the model into their applications without the need to own the underlying H100 or B200 clusters.

The company is already looking toward the future, confirming that development has begun on Nemotron 4. This next-generation model is expected to further integrate the collaborative efforts of the Nemotron Coalition and push the boundaries of multimodal reasoning and long-term planning.

The debut of Nemotron 3 Ultra at Computex serves as a clear signal of Nvidia’s intent. By providing the weights and training recipes for a model of this scale, Nvidia is not just selling chips; it is attempting to anchor the global open-source community to American-made, Nvidia-optimized software. While the intelligence gap with China remains a hurdle, the combination of massive context windows, industry-leading speed, and a multi-billion-dollar development roadmap suggests that the battle for open-source AI supremacy is only just beginning. For enterprise users, the arrival of Ultra provides a powerful new tool for building autonomous systems that can handle the complexity of modern business logic at a fraction of the cost of proprietary alternatives.

Or check our Popular Categories...

Or check our Popular Categories...

Nvidia Unveils Nemotron 3 Ultra at Computex as the Most Powerful American Open-Weight AI Model to Date

The Evolution of the Nemotron Family and the Computex Keynote

Technical Architecture: Mixture-of-Experts and Multi-Token Prediction

Benchmarking Intelligence: A Transpacific Competition

Speed as a Competitive Differentiator

The Geopolitical Context: Nvidia’s $26 Billion Open-Source Gambit

Deployment and Future Outlook

Layla Zulfa

Related Posts

JPMorgan CEO Jamie Dimon Slams Clarity Act and Coinbase Leadership as Banking Industry Vows to Fight Crypto Legislation

AI-Assisted Recovery of Long-Lost Bitcoin Wallet Sparks Debate Over Large Language Models in Digital Forensics

Leave a Reply Cancel reply

You Missed

Zilliqa Block Production Interrupted by Critical Technical Snag

Navigating the Shifting Sands of Privacy: A Deep Dive into Truly No-KYC Cryptocurrency Exchanges

Nvidia Unveils Nemotron 3 Ultra at Computex as the Most Powerful American Open-Weight AI Model to Date

The Era of Bitcoin ATMs Closes as Bitcoin Depot Files for Bankruptcy

XRP Market Paradox Deepens as Institutional Inflows and Exchange Outflows Clash With Multi-Month Price Lows

Ethereum ETFs Enter the Staking Era: Liquid Staking Emerges as the Institutional Standard