MiniMax launches M3 flagship model as it pursues $33.8B STAR Market IPO

MiniMax released its M3 flagship model on Monday, claiming top-tier coding performance that surpasses GPT-5.5 on the SWE-Bench Pro benchmark, as the Chinese AI startup pursues a secondary listing on Shanghai's STAR Market after its Hong Kong shares surged 409% since January.

The model scores ahead of OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro on SWE-Bench Pro, a benchmark measuring real-world software engineering tasks, and trails only Anthropic's Claude Opus 4.7, MiniMax said in a statement. On Claw-Eval, an end-to-end evaluation for autonomous agents, M3 achieved the highest score among all tested models.

"The M3 is the only open-source model that simultaneously delivers frontier coding ability, 1 million-token context windows, and native multimodal processing," MiniMax said in its announcement.

Three tech trees, one model

M3 introduces MiniMax Sparse Attention, or MSA, a new attention architecture designed to solve the quadratic computational cost of long-context processing. The mechanism uses a two-stage approach: a lightweight Index Attention stage selects the top-k relevant KV blocks via block max pooling, followed by full sparse attention computation on only those blocks. At 1 million tokens, M3's per-token computation is one-twentieth of its predecessor, with prefilling speed up 9.7 times and decoding speed up 15.6 times, according to MiniMax.

The company said M3 was trained on interleaved text-image data from the first pre-training step, with the data pipeline rebuilt to handle up to 100 trillion tokens. MiniMax open-sourced the M2.5 and M2.7 models earlier this year and said M3's weights and technical report will follow within 10 days.

To demonstrate the model's combined capabilities, MiniMax tasked M3 with independently reproducing an ICLR 2025 Outstanding Paper Award-winning paper on learning dynamics during fine-tuning. The model ran for roughly 12 hours without human intervention, producing 18 commits and 23 experimental charts. It successfully replicated the paper's core experiments, including the squeezing effect observed in DPO training and the effectiveness of the proposed Extend mitigation method.

In a separate test, M3 optimized an FP8 matrix multiplication kernel on Nvidia's Hopper architecture, starting from a non-functional Triton skeleton. Over 24 hours, the model submitted 147 benchmarks and 1,959 tool calls, pushing Hopper FP8 peak hardware utilization from 7.6% to 71.3% — a 9.4 times acceleration. Most competing models stopped making progress within 30 submissions; M3's optimal result came at submission 145.

IPO momentum and financial context

The model launch comes days after MiniMax filed a listing counseling report with the Shanghai CSRC on May 29, initiating its A-share IPO process with CITIC Securities as advisor. The company went public in Hong Kong in January at HK$165 per share, raising about $619 million. Its stock closed at HK$840 on May 29, valuing the company at HK$263.45 billion, or roughly $33.8 billion.

MiniMax's annualized recurring revenue exceeded $300 million as of late May, more than doubling in two months, according to business metrics disclosed on May 28. The company reported 2025 revenue of $79 million with a gross margin of 25.4% and an adjusted net loss of $250 million. It counts more than 1 million enterprise and developer customers and roughly 300 million global users.

The Shanghai listing would give MiniMax access to deeper domestic capital markets at a time when Beijing has signaled it wants its AI champions funded at home. The company joins peers including Zhipu and Moonshot in pursuing public listings as China's AI sector races to convert technical credibility into market capital.

MiniMax shares, up more than 400% from their IPO price, trade at a significant premium to most global AI peers. The company will join the Hang Seng Tech Index on June 8. Whether the M3's benchmark performance can sustain that valuation — and whether the STAR Market listing proceeds on similar terms — will depend on the model's ability to convert technical wins into enterprise revenue at scale.

This article is for informational purposes only and does not constitute investment advice.