Tesla’s boast that its future AI5 chip would be able to run inference operations ten times cheaper than Nvidia’s Blackwell architecture was short-lived, as Nvidia announced its next-generation Rubin AI computational platform that offers a similar ten-fold reduction in costs per token. This new architecture finally aims to match China’s strategy of running AI inference at much lower costs compared to the current Blackwell version.
As rumors have indicated, the Nvidia Rubin platform is built around six processing subsystems that work in perfect harmony, including the Vera CPU, the new Nvidia Rubin GPU, the third-generation NVLink 6 Switch, as well as the ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch. These chips are manufactured using advanced technologies from TSMC foundries while offering interface improvements aimed at radically reducing code costs and training times. This “co-design” across the six new chips allows models to be trained using a quarter of the number of GPUs that were required in the current Nvidia Blackwell platform, reducing the cost per code by ten times.
Elon Musk has praised the Nvidia Rubin platform, describing it as a “rocket engine for artificial intelligence” that will enable the deployment of edge models on a large scale, noting that this promised decrease in cost matches what is promised for the upcoming AI5 computer from Tesla, which will not enter the mass production stage before next year. This step from Nvidia comes to finally address the issue of operating costs of models in addition to performance, especially in light of the competition with China, which prides itself on the low code prices it achieves through open source models such as DeepSeek and Link. Series of mid-range GPUs such as the Huawei 910C.
Perhaps the most interesting part of the Rubin platform is the new Nvidia Vera CPU processor engineered for data movement and agentic reasoning across accelerated systems with full support for secret computing. It can be paired with an Nvidia GPU or work as a standalone processor to run analytics, cloud services, orchestration, storage, and high-performance computing (HPC) loads with full compatibility with the Arm architecture. The specifications of this processor include 88 dedicated cores and a memory bandwidth of 1.2 TB/s LPDDR5X with extremely economical power consumption, and NVLink-C2C interface integration powers simultaneous memory access between CPU and GPU as part of optimization features that make the Rubin platform dramatically more efficient compared to its Blackwell-based predecessor.








