Meta's New Muse Spark Drops—And It's Actually Good, But Gemini's Still the Boss

Meta launched Muse Spark on Wednesday, marking the first model built by Meta Superintelligence Labs—the team assembled nine months ago under Chief AI Officer Alexandr Wang after Meta's $14 billion Scale AI acquisition. It's live now at meta.ai and the Meta AI app, with a rollout to Facebook, Instagram, and WhatsApp coming in the next few weeks. That's right, Zuck is back in the AI race, and this time he's not just copying—he's actually trying to compete. Bold strategy, cotton.

This isn't just another chatbot upgrade or a new version of Llama. Muse Spark is natively multimodal—it processes images, text, and voice from the ground up, rather than bolting vision onto an existing text model. It comes with visual chain-of-thought, tool-use support, and something Meta is calling "Contemplating mode": a setup that runs multiple AI agents in parallel to tackle harder problems. That's Meta's answer to the extended thinking modes from Google's Gemini Deep Think and OpenAI's GPT Pro. Basically, instead of one AI thinking hard, they're running a whole committee meeting inside the model. Democracy, but for neural networks.

"Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts," Meta wrote in an official announcement. "To support further scaling, we are making strategic investments across the entire stack—from research and model training to infrastructure, including the Hyperion data center." Translation: we're spending metric tons of compute and hoping the benchmarks justify it. Classic move.

The company worked with more than 1,000 physicians to curate training data for Muse Spark's medical reasoning. The results on HealthBench Hard—an open-ended health queries benchmark—are striking: Muse Spark scored 42.8, compared to 40.1 for GPT 5.4 and just 20.6 for Gemini 3.1 Pro. That's not a marginal difference. Meta actually managed to outscore both OpenAI and Google on medical benchmarks—someone at the health AI startups just felt a disturbance in the force.

On agentic search (DeepSearchQA), Muse Spark also leads with 74.8, beating Gemini (69.7) and GPT 5.4 (73.6). On CharXiv Reasoning—figure understanding from scientific papers—it scored 86.4, the highest across the models in the comparison. For anyone keeping score at home: yes, Meta is winning at reading scientific papers. The researchers who've spent months deciphering DenseCap outputs are finally vindicated.

But good isn't the same as great. The overall benchmark picture shows Gemini 3.1 Pro still running ahead on most categories. The gap is most visible on ARC AGI 2, the abstract reasoning puzzle benchmark: Gemini scored 76.5 to Muse Spark's 42.5. On coding (LiveCodeBench Pro), Gemini's 82.9 outpaces Meta's 80.0. On MMMU Pro—multimodal understanding—Gemini scored 83.9 versus 80.4. Meta's own blog acknowledges current performance gaps in long-horizon agentic systems and coding workflows. So yeah, it's winning some battles but still losing the war. The king remains untouched, for now.

There's also a notable strategic shift baked into this launch. Muse Spark is a closed model—its architecture and weights won't be made public. That's a sharp departure from Llama, which built Meta's reputation in open AI circles. After Llama 4's underwhelming reception earlier this year, Meta appears to have decided the next chapter needs to be written differently. The company says it hopes to open-source future versions of Muse, but for now the code stays inside Meta. Oh, how the turntables—Meta, the self-proclaimed open-source champion, just dropped a closed model. The irony is palpable. Llama 4 really did that dirty.

The tech giant's stock climbed nearly 9% on Wednesday following the announcement, and finished the trading day up 6.5% to a price of $612.42. Markets loved it, because nothing says "AI dominance" like a closed model and a prayer. The shareholders are eating this up like it's free money. Which, technically, it might be.

"Contemplating mode" uses parallel agent orchestration to push the model's ceiling higher. In that configuration, Muse Spark hit 58% on Humanity's Last Exam and 38% on FrontierScience Research—territory that makes it competitive with the most capable versions of Gemini and GPT, rather than their standard releases. When you let the AI actually think in parallel instead of sequentially, it starts playing in the big leagues. Revolutionary concept, we know.

Meta is also rolling out a shopping assistant that compares products and links directly to purchases, and plans to bring Muse Spark to Facebook, Instagram, and WhatsApp in the coming weeks—following the same script implemented since Llama 3, putting it in front of more than 3.5 billion users. A private API preview is opening to select developers. Because nothing says "cutting-edge AI" like integrating it into your aunt's Facebook feed to argue about politics. The reach is insane, the use cases are questionable, but hey—scale wins in the end, right?

The model was built in nine months, internally codenamed Avocado, with Meta claiming that its new pretraining stack can reach the same capability level as Llama 4 Maverick using over 10 times less compute. Muse Spark is described internally as a "small and fast" first step in the Muse family. A more capable version is already in development. Nine months, 10x efficiency gains, and they're calling it Avocado. The naming department is clearly on something. But hey, if the next iteration delivers, we'll all be eating our words along with that metaphorical avocado toast.

Meta's New Muse Spark Drops—And It's Actually Good, But Gemini's Still the Boss

Share Article

Quick Info