What models has MAI released?

MAI has released three foundational models: MAI-Transcribe-1 (a multilingual speech-to-text model supporting 25 languages), MAI-Voice-1 (a next-generation voice model for natural, expressive speech), and MAI-Image-2 (an image generation model). There's also MAI-1-Preview, their first end-to-end foundational model, which is not currently available.

What makes MAI-Transcribe-1 special?

MAI-Transcribe-1 is described as the most accurate transcription model in the world across 25 languages. It was specifically built for challenging recording conditions, reliably handling background noise, low-quality audio recordings, and overlapping speech - making it ideal for production use cases like voice agents, meeting transcription, and call center analytics.

Where are MAI models available?

MAI models are available on Microsoft Foundry. The models can also be accessed through the MAI Playground at playground.microsoft.ai/chat.

What consumer products does MAI work on?

MAI's major consumer AI products include Copilot, Bing, GroupMe, Edge, and MSN. The division also has teams working on Data, Security, Privacy, Monetization, Health, Responsible AI, Commerce, and Microsoft Advertising.

How does MAI's strategy fit with Microsoft's OpenAI partnership?

MAI represents Microsoft's move to establish independence from its OpenAI partnership and own its AI stack. The company now offers OpenAI models through Azure OpenAI Service alongside its own MAI foundational models, giving enterprise customers more control over AI tools, especially around licensing, data privacy, and customization.

What is 'Humanist Superintelligence'?

Humanist Superintelligence is MAI's vision for advanced AI designed to stay controllable, aligned, and firmly in service to humanity. It's not about outpacing human capability but amplifying it, expanding what people can imagine and achieve. The approach prioritizes keeping humans in control, building alignment into the architecture, stress-testing safety at every stage, and prioritizing real-world impact.

Who leads MAI and when was it formed?

MAI is led by CEO Mustafa Suleyman, former co-founder of Google DeepMind. The division was formed in October (six months before the model releases), making it a relatively new but rapidly productive organization within Microsoft.

MAI

WebsiteFree TrialAI Code Assistant AI Developer Tools

MAI (Microsoft AI) is Microsoft's in-house AI research division that develops multimodal foundational models including image generation, speech transcription, and voice synthesis, ranking among the top three AI labs globally while prioritizing humanist superintelligence principles.

訪問網站

宣傳此工具

https://microsoft.ai/?ref=producthunt

概覽
影片
替代方案

產品資訊

更新時間：2026年04月10日

什麼是 MAI

Microsoft AI (MAI) is an artificial intelligence research laboratory and division of Microsoft, founded in March 2024 and headquartered in Redmond, Washington. Led by CEO Mustafa Suleyman, former co-founder of DeepMind and Inflection AI, MAI oversees consumer AI products including Copilot, Bing, Edge, and GroupMe. The division was established to give Microsoft greater technological independence from its OpenAI partnership, despite the company's $13 billion investment in OpenAI since 2019. In November 2025, MAI announced the formation of a Superintelligence team with a mission to build 'Humanist Superintelligence'—advanced AI systems designed to remain controllable, aligned with human values, and firmly in service to humanity. The division operates with frontier-scale compute infrastructure, including next-generation GB200 clusters, and has quickly established itself as a competitive force in the AI industry.

MAI 的主要功能

Microsoft AI (MAI) is Microsoft's in-house AI research division led by Mustafa Suleyman, focused on developing 'Humanist Superintelligence' - advanced AI systems that prioritize human control, safety, and practical applications. The division has released a suite of foundational multimodal AI models including MAI-Transcribe-1 for speech-to-text across 25 languages, MAI-Voice-1 for natural voice generation with custom voice cloning capabilities, and MAI-Image-2 for photorealistic image generation. These models are available through Microsoft Foundry and power consumer products like Copilot, Bing, and Edge. MAI emphasizes competitive pricing (approximately 50% lower GPU costs than alternatives), faster performance (2.5x faster than Azure Fast for transcription), and enterprise-grade safety with rigorous testing and responsible AI practices.

MAI-Transcribe-1: Multilingual Speech Recognition: State-of-the-art speech-to-text transcription across 25 languages with enterprise-grade accuracy, 2.5x faster batch processing than Azure Fast, and optimized for real-world conditions including background noise, low-quality audio, and overlapping speech at approximately 50% lower GPU cost.

MAI-Voice-1: Custom Voice Generation: Next-generation voice synthesis producing natural, expressive speech with the ability to create custom AI voices from just a few seconds of audio (10-second samples). Generates a full minute of audio in under a second on a single GPU with preserved speaker identity across long-form content.

MAI-Image-2: Photorealistic Image Creation: Advanced text-to-image model ranked #3 on Arena.ai leaderboard, built for creatives with natural lighting, accurate skin tones, lived-in environments, and reliable in-image text generation. Offers 2x faster generation times compared to predecessor with enterprise-focused licensing and data privacy.

Humanist Superintelligence Philosophy: AI development approach that puts humans at the center, optimizing for how people actually communicate and training for practical use. Emphasizes keeping AI controllable, aligned, and firmly in service to humanity with rigorous safety testing and red-teaming at every stage.

Microsoft Foundry Integration: Unified platform for deploying and managing MAI models with enterprise-grade security including data encryption, role-based access controls, compliance certifications, built-in guardrails, and governance features for secure AI deployment at scale.

Competitive Pricing and Performance: Models priced aggressively to compete with OpenAI and Google offerings - $0.36/hour for transcription, $22 per million characters for voice, $5-33 per million tokens for images - designed to reduce Microsoft's cost of goods sold while delivering superior performance.

MAI 的使用案例

Global Call Center Analytics: Deploy MAI-Transcribe-1 for real-time transcription of customer service calls across 25 languages, handling noisy phone lines and various accents to enable automated quality monitoring, sentiment analysis, and compliance tracking at 50% lower GPU costs than alternatives.

Voice Agent Development: Build conversational AI agents using MAI-Voice-1 and MAI-Transcribe-1 together to create natural voice experiences that can both listen and speak with precision, enabling customer support bots, virtual assistants, and interactive voice response systems with custom brand voices.

Creative Marketing Content Production: Use MAI-Image-2 for generating photorealistic marketing materials, social media content, product visualizations, and branded communications with accurate text rendering, natural lighting, and diverse representation, reducing post-production time for creative teams.

Meeting and Conference Transcription: Implement MAI-Transcribe-1 for enterprise meeting transcription in conference rooms and virtual settings, reliably handling overlapping speech, background noise, and multiple languages to create searchable records and automated summaries for global teams.

Healthcare Documentation: Apply MAI-Transcribe-1 in medical settings for transcribing doctor-patient consultations, medical procedures, and clinical notes across languages with enterprise-grade accuracy and compliance with healthcare data privacy standards through Microsoft's secure infrastructure.

Podcast and Media Production: Leverage MAI-Voice-1 for creating AI-generated podcast content, audiobook narration, and voice-overs with natural expressiveness and emotional range, while using MAI-Transcribe-1 for accurate transcription and subtitle generation in multiple languages.

優點

Significantly lower costs with approximately 50% GPU cost reduction compared to leading alternatives while maintaining competitive or superior performance

Comprehensive multimodal suite covering speech, voice, and image generation with seamless integration through Microsoft Foundry and existing Microsoft products

Strong emphasis on responsible AI with rigorous red-teaming, enterprise-grade security, compliance certifications, and properly licensed training data reducing legal risks

Exceptional speed performance including 2.5x faster transcription and ability to generate one minute of audio in under a second

缺點

MAI-Image-2 currently ranks #5 on Arena.ai leaderboard (previously #3), behind competitors like Google's Nano Banana 2 and OpenAI's GPT-Image 1.5, indicating performance gaps

Limited model availability with MAI-1-Preview not yet publicly accessible and some models requiring approval processes for access through Foundry

Potential strategy confusion for developers with Microsoft offering OpenAI models, MAI models, and various other AI capabilities across product lines without clear guidance on which to use

Relatively new division (formed November 2025) with models only six months old, meaning less battle-tested in production compared to established alternatives from OpenAI and Google

如何使用 MAI

1. Access MAI Models through Microsoft Platforms: MAI models are available through multiple Microsoft platforms: Microsoft Foundry (for developers and enterprise), MAI Playground (for testing and experimentation), Copilot, Bing Image Creator, Microsoft Teams, and other Microsoft products.

2. Using MAI-Image-2 for Image Generation: Access MAI-Image-2 through Copilot or Bing Image Creator. In Bing Image Creator, you can choose between MAI-Image-2, DALL-E 3, or GPT-4o. Enter your text prompt describing the image you want (e.g., 'A glacier wall towering like a cathedral interior, deep blue ice with light refracting through layers'). The model excels at photorealistic imagery with natural lighting, accurate skin tones, and lived-in environments. Images generate at least 2x faster than previous systems.

3. Using MAI-Transcribe-1 for Speech-to-Text: Access MAI-Transcribe-1 through Microsoft Foundry, Azure Speech, or MAI Playground. Upload an audio file (up to 10 MB in the Playground) or record audio directly. The model supports 25 languages and delivers accurate transcription even in noisy, real-world environments. It processes batch transcription 2.5x faster than Azure Fast offering. Pricing is $0.36 per hour of audio.

4. Using MAI-Voice-1 for Voice Generation: Access MAI-Voice-1 through Microsoft Foundry. The model can generate 60 seconds of audio in just one second. To create a custom voice, provide just a few seconds of audio sample. The model produces natural, expressive speech with emotional range and preserves speaker identity across long-form content. Pricing starts at $22 per million characters.

5. Developer Access via Microsoft Foundry: For API access and production use, sign up for Microsoft Foundry. Fill out the access form if you don't have Foundry access yet. Once approved, you can integrate MAI models into your applications with built-in guardrails, governance, and enterprise-grade controls. Pricing: MAI-Image-2 costs $5 per million tokens (text input) and $33 per million tokens (image output).

6. Testing Models in MAI Playground: Visit playground.microsoft.ai to experiment with MAI models without requiring full Foundry access. Test MAI-Transcribe-1 by recording or uploading audio files. Try MAI-Image-2 with various text prompts. Provide feedback on model performance to help improve future versions.

7. Using MAI Models in Microsoft Products: MAI-Transcribe-1 is integrated into Copilot's Voice mode and Microsoft Teams for conversation transcripts. MAI-Image-2 is rolling out in Bing, PowerPoint, and Copilot. MAI-Image-1 is available in Bing Image Creator and can be used in Story Mode for Audio Expressions. Simply use these products normally and the MAI models power the AI features behind the scenes.

8. Enterprise and Production Deployment: For enterprise use cases like call center analytics, meeting transcription, voice agents, content creation, or image generation at scale, contact Microsoft for Foundry access. Deploy models in the cloud or on-premises depending on your needs. Leverage built-in safety features, compliance tools, and governance controls for responsible AI deployment.

MAI 常見問題

MAI is Microsoft's AI division formed under Mustafa Suleyman (former Google DeepMind co-founder). Its mission is to build 'Humanist Superintelligence' - the world's most capable AI systems that are both highly capable and deeply safe, with humanity at the center of every decision. MAI aims to create practical superintelligence that addresses real problems while remaining under human control.