Models · AI Beat

10 Jul 2026 · AI Beat Desk

Tencent's Hy3: Apache-Licensed and Punching Above Its Weight

Tencent released Hy3 on July 6 under Apache 2.0 — a 295B MoE model with 21B active parameters that scores 90.4 on GPQA Diamond and 78.0 on SWE-Bench Verified, matching or exceeding models two to five times its active-parameter count. It's available for free on OpenRouter through July 21 and on Hugging Face in both full FP16 and FP8 quantized forms.

30 Jun 2026 · AI Beat Desk

Meituan's Trillion-Parameter Model and the Chip Independence Question

Meituan open-sourced LongCat-2.0 today — a 1.6-trillion-parameter MoE with a 1M-token context window trained entirely on domestic Huawei Ascend ASICs. It is the first plausible demonstration that frontier-scale pre-training is achievable without NVIDIA hardware, arriving on the same week that US export restrictions on Anthropic's top models remained in partial force.

15 Jun 2026 · AI Beat Desk

The Weights Don't Lie

Rio de Janeiro's municipal AI company IplanRIO released Rio-3.5-Open-397B with claims of frontier performance, but an analysis of the open weights showed it is a simple 0.6/0.4 element-wise merge of Nex-N2_pro and Qwen3.5-397B-A17B. The model even introduces itself as Nex when the system prompt is removed. The episode illustrates the double-edged nature of open weights: the same transparency that enables community adoption also makes misrepresentation unusually easy to catch.

07 Jun 2026 · AI Beat Desk

When One Model Reasons and Simulates

NVIDIA's Cosmos 3 bets on collapsing the physical AI model stack — VLM understanding, video world simulation, and robot action generation — into a single Mixture-of-Transformers architecture where reasoning and diffusion paths share joint attention. The key question is whether that coupling actually beats specialist models, or whether this is mainly a convenience story.

05 Jun 2026 · AI Beat Desk

Magenta RealTime 2 Is Actually an Instrument Now

Google's Magenta RealTime 2 cuts live music generation control latency from ~3 seconds to ~200ms by shifting from chunk-based to frame-level causal processing. It runs locally on Apple Silicon MacBooks as open weights, and the latency reduction is the difference between a studio tool and something a musician can actually play.

04 Jun 2026 · AI Beat Desk

Gemma 4 12B Goes Encoder-Free

Google DeepMind's Gemma 4 12B discards the conventional encoder-stack approach to multimodal models, feeding raw pixel patches and audio waveforms directly into the LLM backbone through lightweight linear projections. The result fits in 16 GB of RAM, accepts native audio, and fine-tunes as a single unified model.

03 Jun 2026 · AI Beat Desk

Microsoft Stops Outsourcing Intelligence

Microsoft shipped two frontier models at Build 2026 — MAI-Thinking-1 and MAI-Code-1-Flash — built entirely without OpenAI data or distillation. The technical choices are interesting; the strategic signal is clearer: Microsoft is no longer content to be a reseller.

02 Jun 2026 · AI Beat Desk

MiniMax M3 and the Cost of Long Context

MiniMax M3 launches with a sparse attention mechanism that cuts per-token compute at 1M tokens to one-twentieth of its predecessor. The architecture is genuinely interesting; the benchmarks require scrutiny; the license is almost certainly not what the word "open-weight" implies.

01 Jun 2026 · AI Beat Desk

Image Generation at 1 Bit

PrismML's Bonsai Image 4B applies 1-bit and ternary quantization to a FLUX.2 Klein diffusion transformer, compressing it 8.3× to 0.93 GB — small enough to generate images on an iPhone in under 10 seconds. It's the first demonstration that extreme quantization techniques developed for language models transfer cleanly to diffusion architectures.

31 May 2026 · AI Beat Desk

OpenRouter's $113M Bet on Multi-Model Infrastructure

OpenRouter raised $113M in a Series B led by CapitalG, with participation from NVIDIA, Databricks, Snowflake, ServiceNow, and MongoDB. The platform grew from 5 trillion to 25 trillion weekly tokens in six months. The round signals that model routing — the layer that sits between applications and the expanding zoo of frontier models — is now considered infrastructure worth owning.

30 May 2026 · AI Beat Desk

Liquid AI's LFM2.5: When Half Your Layers Aren't Attention

Liquid AI ships LFM2.5-8B-A1B, a 38T-token trained hybrid model where 18 of 24 layers are gated convolution blocks rather than attention — and it reaches 253 tokens/second on an M5 Max CPU with under 6 GB of memory.

04 May 2026 · AI Beat Desk

Tracing the Model's Family Tree

Cisco released the Model Provenance Kit on May 1 — an open-source Python toolkit that fingerprints AI models using metadata, tokenizer similarity, and weight-level identity signals, then runs in compare or scan mode to verify lineage and detect shared ancestry. It's the first serious tooling aimed at the model-weight surface of AI supply chain security, a layer that package audits don't reach.

28 Apr 2026 · AI Beat Desk

The Model That Stopped at 1930

Alec Radford, Nick Levine, and David Duvenaud release Talkie: a 13B model trained on 260 billion tokens of pre-1931 English text, with no knowledge of digital computers — yet it can write basic Python from in-context examples alone. The project is less about building a useful model and more about what happens when you take contamination completely off the table.

21 Apr 2026 · AI Beat Desk

Open Weights at One Trillion

Moonshot AI ships Kimi K2.6 — 1T-parameter open-source MoE with a 256K context window and swarm support — and simultaneously releases a test suite to verify that inference providers are actually running it correctly. The same day, Alibaba closes off Qwen3.6-Max. Two labs, one problem: how do you preserve model quality when someone else runs the weights?

12 Apr 2026 · AI Beat Desk

The Moat Is the System, Not the Model

AISLE tested Anthropic's Mythos cybersecurity showcase cases against eight open-weight models from 3.6B to 120B parameters. All eight reproduced the FreeBSD NFS exploit. A 5.1B model traced the OpenBSD integer overflow chain. Smaller open models beat frontier labs on false-positive detection. Capability in this domain doesn't scale smoothly — the system architecture matters more than raw model size.