Microsoft Says 'Why Choose One?'—Pairs GPT and Claude to Create AI Research Beast That Beats Everyone

Microsoft just decided the AI arms race is a zero-sum game no more. The company dropped two new features for Copilot's Researcher tool—Critique and Council—that put OpenAI's GPT and Anthropic's Claude to work on the same task. Together. The DRACO benchmark says this combo scores 57.4 points, beating the next best result by nearly 14%. Claude Opus 4.6 alone? A respectable 42.7. But two heads—or two models—are apparently better than one.

Here's how Critique works: Microsoft split the research workflow in two. GPT handles the first phase—planning, searching, pulling sources, writing the initial draft. Then Claude steps in as the strict editor, reviewing for factual accuracy, citation quality, and whether the answer actually answered the question. Only after that review does the final report reach you. The roles can eventually flip, but for now GPT drafts and Claude critiques.

The problem Critique aims to fix is simple: every AI research tool today works the same way. One model does everything with no one checking its work. Hallucinations slip in. Citations go sideways. Claims get invented. Critique puts a second set of eyes on everything.

Council takes a different approach. Instead of sequential collaboration, GPT and Claude work simultaneously and spit out full reports side by side. A third "judge" model then reads both, writes a summary explaining where they agreed, where they diverged, and what unique angles each caught that the other missed. In Critique, the models collaborate. In Council, they compete. You pick which vibe you want.

Both features are live now for users enrolled in Microsoft's Frontier program—the early-access lane for Copilot's newest toys. You'll need a Microsoft 365 Copilot license ($30/user/month) and Frontier access to try them.

The broader message? Microsoft's multibillion-dollar partnership with OpenAI hasn't blinded them to the obvious: no single model stays on top forever. The real value might be the orchestration layer—the ability to route tasks to whichever combination works best. Alibaba just dropped Qwen 3.5 Omni, its omnimodal AI that processes text, images, audio, and video across 36 languages. Meanwhile, an experimental AI app on Bluesky got blocked 125,000 times in two days, making it the second most-blocked account on the platform. And Xiaomi—yes, the "cheap phone brand"—shipped roughly 170 million phones in 2025, making it the world's third-largest smartphone maker, with its SU7 Ultra now holding the Nürburgring record for fastest mass-produced EV.

Microsoft Says 'Why Choose One?'—Pairs GPT and Claude to Create AI Research Beast That Beats Everyone

Share Article

Quick Info