Microsoft Stops Outsourcing Intelligence

For years the question about Microsoft’s AI strategy was where OpenAI ended and Microsoft began. The answer at Build 2026 is sharper than it has been: MAI-Thinking-1 and MAI-Code-1-Flash are frontier models built from scratch by Microsoft, trained on commercially licensed data, with no distillation from any third-party model — OpenAI included.

Microsoft is careful to frame this as an enterprise differentiator: customers who need clean data provenance for compliance or IP reasons can now use Microsoft models and know exactly what they trained on. That is a real selling point in regulated industries where “we used someone else’s proprietary training mix” is a legal exposure. But framing aside, what they’ve shipped is two capable models that sit outside the OpenAI supply chain.

MAI-Thinking-1

MAI-Thinking-1 is a reasoning model with a sparse Mixture of Experts architecture: around 35 billion active parameters out of approximately one trillion total. The 256K token context window is wide enough to process full codebases or long legal documents in a single pass. Benchmarks show 97.0% on AIME 2025 and 94.5% on AIME 2026 — solidly in the range of current frontier reasoning models. Microsoft also claims it matches Claude Opus 4.6 on SWE-Bench Pro and was preferred over Claude Sonnet 4.6 in blind side-by-side evaluations run by their testing partner Surge.

The AIME numbers are self-reported and haven’t been independently reproduced yet, so the usual caveat applies. The MoE architecture at this scale — 35B active from ~1T total — is a design that lets Microsoft run a large model while spending compute only on the active slice per token. That’s the same basic approach as Mixtral, but at a scale where the tradeoffs (routing overhead, expert load balancing, cross-expert communication) are harder to get right. The fact that it’s shipping rather than being announced as research is the meaningful signal.

The model is in private preview through Microsoft Foundry, supports function calling and the Chat Completions API, and is positioned for enterprise customers with complex multi-step reasoning requirements.

MAI-Code-1-Flash

The more technically unusual of the two is MAI-Code-1-Flash. Most coding models are trained on code data and then evaluated against coding benchmarks. MAI-Code-1-Flash was trained directly with GitHub Copilot’s production harnesses — the same scaffolding that Copilot uses in real developer workflows, including multi-turn interactions, repository context, tool calls, and the specific failure modes that come up in live use rather than benchmarks.

The practical effect is a model that learns what “being good at Copilot” actually means rather than what maximizes SWE-Bench scores. It scored 51.2% on SWE-Bench Pro versus 35.2% for Claude Haiku 4.5, a 16-point gap — but that’s almost a secondary point. The more important claim is that it uses up to 60% fewer tokens to solve the same problems through what Microsoft calls adaptive thinking: short responses for simple requests, more extended reasoning for complex ones, without requiring the user to choose a mode.

Adaptive solution length is a real issue for coding assistants. Models that reason extensively for every completion waste latency and context; models that skip reasoning on hard problems produce worse results. Getting the routing right at inference time rather than at model selection time is the better place to solve it, and the production-harness training approach is a plausible way to learn when a problem is worth thinking about.

MAI-Code-1-Flash is rolling out now to GitHub Copilot Free, Pro, Pro+, and Max tiers in VS Code, as both a selectable model and an option in the auto-routing picker.

What This Actually Means

Neither model is open-weight. MAI-Thinking-1 is private preview via API; MAI-Code-1-Flash is accessible only through Copilot. But the fact that Microsoft built these at all changes the relationship dynamic with OpenAI significantly. When your primary AI supplier is also a competitor for developer tools and enterprise contracts, having your own models in the portfolio is leverage, not just diversification.

The other thing worth watching is the data provenance angle. As AI-generated training data becomes harder to distinguish from human-written data, “we know exactly where our training data came from and it’s all commercially licensed” is a statement that most model providers can’t make cleanly. Microsoft is betting that this matters enough to enterprise buyers to be worth leading with. The models will have to prove themselves in production, but the infrastructure argument for why to use them is clearer than it’s been for any Microsoft-first AI product.