
Mercury
Mercury is the first commercial-scale diffusion-based large language model (dLLM) that can generate text up to 10x faster than traditional LLMs while maintaining high quality output.
https://www.inceptionlabs.ai/?ref=producthunt

Product Information
Updated:Feb 28, 2026
What is Mercury
Mercury is a groundbreaking AI model developed by Inception Labs that represents a fundamental shift from traditional autoregressive language models to diffusion-based text generation. Launched in February 2025, Mercury and its code-specialized version Mercury Coder are available through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. The model family was created by a team of researchers from Stanford, UCLA, and Cornell who pioneered foundational diffusion work. Mercury is designed to handle various tasks including code generation, reasoning, and real-time voice applications.
Key Features of Mercury
Mercury is a groundbreaking diffusion-based Large Language Model (dLLM) developed by Inception Labs that fundamentally changes how language models generate text. Unlike traditional autoregressive models that generate text sequentially, Mercury generates multiple tokens in parallel, achieving speeds of over 1,000 tokens per second on standard NVIDIA GPUs while maintaining high-quality outputs. It offers enterprise-grade capabilities including a 128K token context window, tool calling support, and compatibility with major cloud platforms like AWS Bedrock and Azure AI Foundry.
Parallel Token Generation: Uses diffusion-based architecture to generate multiple tokens simultaneously instead of sequential generation, enabling 5-10x faster processing than traditional LLMs
Cloud Platform Integration: Available through major cloud providers including AWS Bedrock and Azure AI Foundry with enterprise-grade reliability and 99.5%+ uptime
API Compatibility: Maintains OpenAI API compatibility and supports standard prompting methods (zero-shot, few-shot, CoT), making it a drop-in replacement for existing LLM workflows
Advanced Reasoning Capabilities: Features multi-step refinement process that catches errors and improves coherence during text generation, particularly strong in coding and mathematical reasoning tasks
Use Cases of Mercury
Code Development: Powers real-time code completion, intelligent tab suggestions, and rapid code edits in development environments with ultra-low latency
Enterprise Search: Enables instant data retrieval and summarization across large organizational knowledge bases with minimal latency
Real-time Voice Applications: Supports responsive voice-powered workflows including customer support, translation services, and interactive voice agents
Automated Workflows: Handles complex routing, analytics, and decision processes in enterprise environments with ultra-responsive AI capabilities
Pros
Significantly faster processing speed (1000+ tokens per second)
Lower inference costs compared to traditional LLMs
Drop-in compatibility with existing LLM workflows
Cons
Limited track record as a new technology
Currently focused primarily on coding and enterprise applications
Requires specific GPU hardware for optimal performance
How to Use Mercury
Create an account: Visit platform.inceptionlabs.ai and create an Inception Platform account or sign in if you already have one
Get API key: Go to API Keys section in your account dashboard and create a new API key. New API keys come with 10 million free tokens
Choose deployment method: You can access Mercury through direct API integration, Amazon Bedrock Marketplace, Amazon SageMaker JumpStart, or Azure AI Foundry depending on your needs
Make API calls: Use the API key to make calls to Mercury API endpoints. The API is OpenAI-compatible and can be accessed through REST calls or existing OpenAI client libraries
Basic API usage example: Make a POST request to https://api.inceptionlabs.ai/v1/chat/completions with your API key in the Authorization header and JSON payload containing model (e.g. 'mercury-2') and messages
Configure settings: Optionally set parameters like max_tokens and enable streaming/diffusion visualization by setting the diffusing parameter to true
Integrate with tools: Mercury can be integrated with popular tools and frameworks including LangChain, AISuite, and LiteLLM for more complex applications
Monitor usage: Track your token usage through the platform dashboard. Input tokens cost $0.25 per 1M tokens and output tokens cost $0.75 per 1M tokens
Get support: For issues or questions, contact [email protected] or join their Discord channel. Enterprise customers can reach out to [email protected]
Mercury FAQs
Mercury is the first commercially available diffusion-based Large Language Model (dLLM) launched by Inception Labs in February 2025. It uses a breakthrough diffusion-based approach to language generation instead of traditional auto-regressive generation.











