
MulmoChat
MulmoChat is an open-source multimodal AI chat interface that seamlessly integrates voice chat, image generation, and web browsing capabilities, allowing users to interact naturally through conversation while experiencing rich visual and interactive content.
https://github.com/receptron/MulmoChat?ref=producthunt

Product Information
Updated:Mar 31, 2026
What is MulmoChat
MulmoChat is a groundbreaking research prototype developed by former Microsoft engineer Satoshi Nakajima that reimagines traditional chat interfaces. Unlike conventional text-based chat applications, MulmoChat represents a new paradigm for multimodal AI chat experiences by unifying GUI (Graphical User Interface) and NLUI (Natural Language User Interface). The project is open-source and requires OpenAI and Google Gemini API keys to function, supporting Windows, macOS, and Linux platforms.
Key Features of MulmoChat
MulmoChat is a research prototype that revolutionizes AI chat interactions by combining traditional text-based communication with rich visual and interactive content. It features voice chat capabilities, image generation, web browsing, and multimodal interactions where users can engage in natural conversations while experiencing dynamic visual content directly on canvas, supported by multiple AI providers including OpenAI, Anthropic, Google Gemini, and Ollama.
Multimodal Interaction: Seamlessly integrates text, voice, images, and interactive elements in a single conversational interface, moving beyond traditional text-only chat experiences
Provider-Agnostic Text Generation: Supports multiple AI providers (OpenAI, Anthropic, Google Gemini, Ollama) through a unified API interface, allowing flexible model selection and integration
Advanced Image Generation: Integrates with ComfyUI for local image generation, supporting advanced models like FLUX with customizable parameters and workflows
Extensible Plugin Architecture: Allows developers to extend functionality through plugins, from TypeScript contracts to Vue views and configurations
Use Cases of MulmoChat
Interactive Education: Teachers can create immersive learning experiences combining verbal explanations with real-time visual aids and interactive elements
Design Collaboration: Designers can discuss concepts while generating and manipulating images in real-time, streamlining the creative process
Virtual Tourism: Travel agencies can provide interactive virtual tours combining map features, image generation, and natural conversation
Pros
Highly flexible with support for multiple AI providers
Rich multimodal interaction capabilities
Open-source and extensible architecture
Cons
Requires multiple API keys for full functionality
Complex setup with various dependencies
Research prototype status may indicate limited production readiness
How to Use MulmoChat
Install Dependencies: Run 'yarn install' to install all required dependencies for MulmoChat
Configure Environment Variables: Create a .env file and add required API keys: OPENAI_API_KEY and GEMINI_API_KEY are mandatory. Optional keys include GOOGLE_MAP_API_KEY, EXA_API_KEY, ANTHROPIC_API_KEY, OLLAMA_BASE_URL, COMFYUI_BASE_URL, COMFYUI_DEFAULT_MODEL, and COMFYUI_TIMEOUT_MS
Start Development Server: Run 'yarn dev' to start the development server
Allow Microphone Access: When opening the browser, allow it to access your microphone when prompted
Start Voice Chat: Click the 'Start Voice Chat' button in the interface to begin interacting with the AI
Optional: Set Up ComfyUI Integration: For local image generation: 1) Install ComfyUI Desktop, 2) Launch ComfyUI Desktop server, 3) Download compatible models like flux1-schnell-fp8.safetensors, 4) Configure ComfyUI environment variables if needed
Begin Multimodal Interaction: Start conversing with the AI through voice or text. The system can generate images, display maps, and provide interactive visual content based on your conversation
MulmoChat FAQs
MulmoChat is a research prototype that explores a new paradigm for multimodal AI chat experiences. Unlike traditional text-based chat interfaces, it allows users to engage in natural conversation while experiencing rich visual and interactive content directly on canvas.
Popular Articles

OpenAI Shuts Down Sora App: What the Future Holds for AI Video Generation in 2026
Mar 25, 2026

Top 5 AI Agents in 2026: How to Choose the Right One
Mar 18, 2026

OpenClaw Deployment Guide: How to Self Host a Real AI Agent(2026 Update)
Mar 10, 2026

Atoms Tutorial 2026: Build a Full SaaS Dashboard in 20 Minutes (AIPURE Hands-On)
Mar 2, 2026







