Meta · AI Beat

03 May 2026 · AI Beat Desk

Drop the Encoder: Meta's Tuna-2 Goes Straight to Pixels

Meta AI's Tuna-2 paper shows that a 7B unified multimodal model trained end-to-end on raw pixel patches — with no pretrained vision encoder — matches or beats its CLIP-based sibling at scale, particularly on fine-grained perception tasks. The result challenges a design assumption that has been stable in multimodal modeling for years.