When One Model Reasons and Simulates
NVIDIA's Cosmos 3 bets on collapsing the physical AI model stack — VLM understanding, video world simulation, and robot action generation — into a single Mixture-of-Transformers architecture where reasoning and diffusion paths share joint attention. The key question is whether that coupling actually beats specialist models, or whether this is mainly a convenience story.
Read more →
