Mercury is the first commercial-scale diffusion-based large language model (dLLM) that can generate text up to 10x faster than traditional LLMs while maintaining high quality output.
https://www.inceptionlabs.ai/?ref=producthunt
Mercury

Product Information

Updated:Feb 28, 2026

What is Mercury

Mercury is a groundbreaking AI model developed by Inception Labs that represents a fundamental shift from traditional autoregressive language models to diffusion-based text generation. Launched in February 2025, Mercury and its code-specialized version Mercury Coder are available through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. The model family was created by a team of researchers from Stanford, UCLA, and Cornell who pioneered foundational diffusion work. Mercury is designed to handle various tasks including code generation, reasoning, and real-time voice applications.

Key Features of Mercury

Mercury is a groundbreaking diffusion-based Large Language Model (dLLM) developed by Inception Labs that fundamentally changes how language models generate text. Unlike traditional autoregressive models that generate text sequentially, Mercury generates multiple tokens in parallel, achieving speeds of over 1,000 tokens per second on standard NVIDIA GPUs while maintaining high-quality outputs. It offers enterprise-grade capabilities including a 128K token context window, tool calling support, and compatibility with major cloud platforms like AWS Bedrock and Azure AI Foundry.
Parallel Token Generation: Uses diffusion-based architecture to generate multiple tokens simultaneously instead of sequential generation, enabling 5-10x faster processing than traditional LLMs
Cloud Platform Integration: Available through major cloud providers including AWS Bedrock and Azure AI Foundry with enterprise-grade reliability and 99.5%+ uptime
API Compatibility: Maintains OpenAI API compatibility and supports standard prompting methods (zero-shot, few-shot, CoT), making it a drop-in replacement for existing LLM workflows
Advanced Reasoning Capabilities: Features multi-step refinement process that catches errors and improves coherence during text generation, particularly strong in coding and mathematical reasoning tasks

Use Cases of Mercury

Code Development: Powers real-time code completion, intelligent tab suggestions, and rapid code edits in development environments with ultra-low latency
Enterprise Search: Enables instant data retrieval and summarization across large organizational knowledge bases with minimal latency
Real-time Voice Applications: Supports responsive voice-powered workflows including customer support, translation services, and interactive voice agents
Automated Workflows: Handles complex routing, analytics, and decision processes in enterprise environments with ultra-responsive AI capabilities

Pros

Significantly faster processing speed (1000+ tokens per second)
Lower inference costs compared to traditional LLMs
Drop-in compatibility with existing LLM workflows

Cons

Limited track record as a new technology
Currently focused primarily on coding and enterprise applications
Requires specific GPU hardware for optimal performance

How to Use Mercury

Create an account: Visit platform.inceptionlabs.ai and create an Inception Platform account or sign in if you already have one
Get API key: Go to API Keys section in your account dashboard and create a new API key. New API keys come with 10 million free tokens
Choose deployment method: You can access Mercury through direct API integration, Amazon Bedrock Marketplace, Amazon SageMaker JumpStart, or Azure AI Foundry depending on your needs
Make API calls: Use the API key to make calls to Mercury API endpoints. The API is OpenAI-compatible and can be accessed through REST calls or existing OpenAI client libraries
Basic API usage example: Make a POST request to https://api.inceptionlabs.ai/v1/chat/completions with your API key in the Authorization header and JSON payload containing model (e.g. 'mercury-2') and messages
Configure settings: Optionally set parameters like max_tokens and enable streaming/diffusion visualization by setting the diffusing parameter to true
Integrate with tools: Mercury can be integrated with popular tools and frameworks including LangChain, AISuite, and LiteLLM for more complex applications
Monitor usage: Track your token usage through the platform dashboard. Input tokens cost $0.25 per 1M tokens and output tokens cost $0.75 per 1M tokens
Get support: For issues or questions, contact [email protected] or join their Discord channel. Enterprise customers can reach out to [email protected]

Mercury FAQs

Mercury is the first commercially available diffusion-based Large Language Model (dLLM) launched by Inception Labs in February 2025. It uses a breakthrough diffusion-based approach to language generation instead of traditional auto-regressive generation.

Latest AI Tools Similar to Mercury

Foundry
Foundry
Contact for PricingAI Code GeneratorGame Tools
Foundry is a versatile platform that exists in multiple forms - as a smart contract development toolchain, a virtual tabletop gaming software, and a traditional metal casting facility - each offering specialized features for their respective domains.
PythonConvert.com
PythonConvert.com
PythonConvert.com is a free web-based tool that provides AI-powered code translation between Python and other programming languages as well as Python type conversion capabilities.
Softgen
Softgen
Softgen.ai is an AI-powered full-stack project generator platform that enables users to transform their ideas into functional web applications without coding requirements.
Micro SaaS Ideas
Micro SaaS Ideas
Micro SaaS Ideas are small-scale, niche-focused software solutions that target specific problems or markets, offering entrepreneurs a way to build profitable businesses with minimal resources and complexity.