Microsoft Launches Three New MAI Models on Azure Foundry for Speech, Voice, and Image Generation
Microsoft unveils three MAI models on Azure Foundry: industry-leading speech recognition in 25 languages, ultra-fast voice synthesis, and top-3 image generation.
REDMOND, WA – April 2, 2026 – Microsoft today announced the public preview of three new foundational AI models developed in-house: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. Available on the Azure Foundry platform, these models represent a significant step in Microsoft's 'AI self-sufficiency' strategy, directly challenging OpenAI and Google in critical enterprise AI modalities.
MAI-Transcribe-1: Industry-Leading Speech Recognition in 25 Languages
MAI-Transcribe-1 achieves the lowest average Word Error Rate (WER) of 3.8% on the FLEURS benchmark across the 25 most-used languages in Microsoft products. This surpasses OpenAI's Whisper-large-v3 in all 25 languages and Google's Gemini 3.1 Flash in 22 of them. The model uses a transformer-based text decoder with a bi-directional audio encoder and is already being integrated into Copilot's Voice mode and Microsoft Teams. It operates at approximately 50% lower cost than competing solutions.
MAI-Voice-1: 60 Seconds of Natural Speech in Under One Second
The high-fidelity text-to-speech model can produce 60 seconds of natural, expressive audio in under one second on a single GPU. Priced at $22 per one million characters, it provides an accessible option for developers and enterprises looking to incorporate advanced voice capabilities.
MAI-Image-2: Top-3 on Arena.ai Leaderboard
The upgraded image generation model has secured a top-three position on the Arena.ai leaderboard. It delivers generation times at least twice as fast as its predecessor and is being rolled out across Bing and PowerPoint. Pricing is set at $5 per million text input tokens and $33 per million image output tokens. WPP, one of the world's largest advertising companies, is an early enterprise partner leveraging MAI-Image-2 at scale.
The Path to 'AI Self-Sufficiency'
Mustafa Suleyman, who leads Microsoft's superintelligence team, explained that a renegotiated contract with OpenAI in September 2025 gave Microsoft the freedom to independently pursue its own superintelligence. While the partnership with OpenAI is expected to last until at least 2032, the message is clear: Microsoft is building the capability to provide state-of-the-art models independently. The company has confirmed plans to develop frontier-level large language models to compete directly with GPT and beyond.
Tools Mentioned in This Article
AI Newsletter
Get the latest AI tools and news delivered daily
Related Articles
OpenAI Acquires Tech Talk Show TBPN in First-Ever Media Acquisition by an AI Company
OpenAI acquires popular tech talk show TBPN, marking the first media acquisition by an AI company and sparking industry debate.
Microsoft Challenges OpenAI and Google with Trio of New In-House AI Models
Microsoft unveils three proprietary 'MAI' AI models, directly competing with OpenAI and Google in enterprise AI strategy.