
Ollama v0.7
Ollama v0.7 introduces a new engine for first-class multimodal AI support, enabling local running of advanced vision models like Llama 4, Gemma 3, Qwen 2.5 VL, and Mistral Small 3.1 with improved reliability and memory management.
https://ollama.com/blog/multimodal-models?ref=aipure

Product Information
Updated:Aug 16, 2025
Ollama v0.7 Monthly Traffic Trends
Ollama v0.7 achieved 5.0M visits with a 14.1% growth in July, driven by the release of version 0.1.34. This update introduced a major new graphical desktop application for Windows and macOS, significantly enhancing user accessibility and expanding the user base beyond command-line users.
What is Ollama v0.7
Ollama v0.7 represents a significant evolution in local large language model deployment, moving beyond its previous dependency on llama.cpp to introduce a new dedicated engine for multimodal AI capabilities. This version focuses on making multimodal models first-class citizens, allowing users to run sophisticated vision-language models locally without requiring cloud services. The system supports various model sizes, from 7B parameters suitable for 8GB RAM machines up to larger 33B models requiring 32GB RAM, making advanced AI accessible for different hardware configurations.
Key Features of Ollama v0.7
Ollama v0.7 introduces a groundbreaking new engine that brings first-class support for multimodal AI models, enabling local execution of advanced vision-language models like Meta Llama 4, Google Gemma 3, Qwen 2.5 VL, and Mistral Small 3.1. The update features improved memory management, model modularity, and enhanced accuracy for processing images and text together, while maintaining Ollama's signature ease of use for running large language models locally.
New Multimodal Engine: Self-contained model architecture that allows each model to implement its own projection layer and handle multimodal inputs independently, improving reliability and simplification of model integration
Advanced Memory Management: Intelligent image caching system and optimized KV cache with hardware-specific configurations to maximize memory efficiency and performance
Enhanced Accuracy Processing: Improved handling of large images and tokens with proper metadata management and attention mechanisms specific to each model's training architecture
Multiple Model Support: Integration of various vision-language models including Llama 4, Gemma 3, Qwen 2.5 VL, and Mistral Small 3.1, each with their own specialized capabilities
Use Cases of Ollama v0.7
Document Analysis: Processing and extracting information from documents, including character recognition and translation of multilingual text in images
Visual Q&A: Enabling natural language interactions about images, including detailed descriptions and answering specific questions about visual content
Location-Based Analysis: Analyzing and providing information about locations, landmarks, and geographical features in images, including distance calculations and travel recommendations
Multi-Image Comparison: Analyzing relationships and patterns across multiple images simultaneously, identifying common elements and differences
Pros
Local execution of advanced multimodal models without cloud dependency
Improved reliability and accuracy in model processing
Flexible support for multiple model architectures
Efficient memory management and hardware optimization
Cons
Requires significant hardware resources for larger models
Limited Windows support (requires WSL2)
Some features still in experimental phase
How to Use Ollama v0.7
Install Ollama: Install Ollama on your system (supports MacOS, Linux, and Windows via WSL2). Make sure you have sufficient RAM - at least 8GB for 7B models, 16GB for 13B models, and 32GB for 33B models.
Start Ollama Service: Run 'ollama serve' command to start the Ollama service. For faster downloads, you can optionally use: OLLAMA_EXPERIMENT=client2 ollama serve
Pull Model: Download your desired multimodal model using 'ollama pull <model_name>'. Available models include llama4:scout, gemma3, qwen2.5vl, mistral-small3.1, llava, bakllava, and more vision models.
Run Model: Start the model using 'ollama run <model_name>'. For example: 'ollama run llama4:scout' or 'ollama run gemma3'
Input Images: You can input images by providing the image file path after your text prompt. Multiple images can be added in a single prompt or through follow-up questions. Supports WebP image format.
Interact with Model: Ask questions about the images, request analysis, or have follow-up conversations. The model will process both text and images to provide relevant responses.
Optional: Use API/Libraries: You can also interact with Ollama through its API or official Python/JavaScript libraries for programmatic access. The multimodal capabilities work across CLI and libraries.
Optional: Use Web UI: For a more user-friendly interface, you can use various community-built Web UIs and clients that support Ollama's multimodal features.
Ollama v0.7 FAQs
Ollama now supports multimodal models with a new engine that can handle vision capabilities. It supports models like Meta Llama 4, Google Gemma 3, Qwen 2.5 VL, and Mistral Small 3.1. The update includes features like image analysis, multiple image handling, document scanning, and character recognition.
Ollama v0.7 Video
Popular Articles

Nano Banana (Gemini 2.5 Flash Image) Official Release – Google’s Best AI Image Editor Is Here
Aug 27, 2025

DeepSeek v3.1: AIPURE’s Comprehensive Review with Benchmarks & Comparison vs GPT-5 vs Claude 4.1 in 2025
Aug 26, 2025

Emochi Review 2025: AI Chat with Anime-Inspired Characters
Aug 21, 2025

Leonardo AI Free Working Promo Codes in August 2025 and How to redeem
Aug 21, 2025
Analytics of Ollama v0.7 Website
Ollama v0.7 Traffic & Rankings
5.1M
Monthly Visits
#10017
Global Rank
#249
Category Rank
Traffic Trends: Apr 2025-Jul 2025
Ollama v0.7 User Insights
00:03:49
Avg. Visit Duration
4.75
Pages Per Visit
37.22%
User Bounce Rate
Top Regions of Ollama v0.7
CN: 25.51%
US: 15.76%
IN: 7.43%
DE: 4.04%
RU: 3.28%
Others: 43.98%