Enterprise Strategy

Microsoft Launches Three New MAI Models on Azure Foundry for Speech, Voice, and Image Generation

Microsoft unveils three MAI models on Azure Foundry: industry-leading speech recognition in 25 languages, ultra-fast voice synthesis, and top-3 image generation.

MicrosoftMAIAzure FoundrySpeech RecognitionImage Generation
※ このページにはアフィリエイトリンクが含まれています。リンク経由でご購入いただくと、運営費の一部として還元されます。

REDMOND, WA – April 2, 2026 – Microsoft today announced the public preview of three new foundational AI models developed in-house: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. Available on the Azure Foundry platform, these models represent a significant step in Microsoft's 'AI self-sufficiency' strategy, directly challenging OpenAI and Google in critical enterprise AI modalities.


MAI-Transcribe-1: Industry-Leading Speech Recognition in 25 Languages


MAI-Transcribe-1 achieves the lowest average Word Error Rate (WER) of 3.8% on the FLEURS benchmark across the 25 most-used languages in Microsoft products. This surpasses OpenAI's Whisper-large-v3 in all 25 languages and Google's Gemini 3.1 Flash in 22 of them. The model uses a transformer-based text decoder with a bi-directional audio encoder and is already being integrated into Copilot's Voice mode and Microsoft Teams. It operates at approximately 50% lower cost than competing solutions.


MAI-Voice-1: 60 Seconds of Natural Speech in Under One Second


The high-fidelity text-to-speech model can produce 60 seconds of natural, expressive audio in under one second on a single GPU. Priced at $22 per one million characters, it provides an accessible option for developers and enterprises looking to incorporate advanced voice capabilities.


MAI-Image-2: Top-3 on Arena.ai Leaderboard


The upgraded image generation model has secured a top-three position on the Arena.ai leaderboard. It delivers generation times at least twice as fast as its predecessor and is being rolled out across Bing and PowerPoint. Pricing is set at $5 per million text input tokens and $33 per million image output tokens. WPP, one of the world's largest advertising companies, is an early enterprise partner leveraging MAI-Image-2 at scale.


The Path to 'AI Self-Sufficiency'


Mustafa Suleyman, who leads Microsoft's superintelligence team, explained that a renegotiated contract with OpenAI in September 2025 gave Microsoft the freedom to independently pursue its own superintelligence. While the partnership with OpenAI is expected to last until at least 2032, the message is clear: Microsoft is building the capability to provide state-of-the-art models independently. The company has confirmed plans to develop frontier-level large language models to compete directly with GPT and beyond.

AI Newsletter

Get the latest AI tools and news delivered daily