
Kuzco
Kuzco is a powerful Swift package that enables local Large Language Model (LLM) inference directly in iOS and macOS apps, built on llama.cpp with zero network dependency for privacy-focused AI integration.
https://github.com/jaredcassoutt/Kuzco?ref=producthunt

Product Information
Updated:Aug 19, 2025
What is Kuzco
Kuzco is a versatile Swift package designed to bring local Large Language Model capabilities to iOS, macOS, and Mac Catalyst applications. Built as a wrapper around the battle-tested llama.cpp engine, it serves as a bridge between Apple's development ecosystem and advanced AI functionality. The package supports multiple popular LLM architectures including LLaMA, Mistral, Phi, Gemma, Qwen, and others, making it a comprehensive solution for developers looking to implement AI features in their applications without relying on cloud services.
Key Features of Kuzco
Kuzco is a Swift package that enables on-device Large Language Model (LLM) inference for iOS, macOS, and Mac Catalyst applications. Built on llama.cpp, it provides local AI model execution with zero network dependency, ensuring privacy and reliability. The package supports multiple LLM architectures, offers customizable configurations, and features modern Swift concurrency with streaming responses.
On-Device LLM Processing: Runs AI models locally without internet connectivity using llama.cpp, supporting various architectures like LLaMA, Mistral, Phi, Gemma, and Qwen
Advanced Configuration Options: Provides fine-tuning capabilities for context length, batch size, GPU layers, and CPU threads to optimize performance for different devices
Modern Swift Integration: Features async/await native support with streaming responses and comprehensive error handling for seamless integration into Swift applications
Automatic Architecture Detection: Smart detection of model architectures from filenames with fallback support for better compatibility and ease of use
Use Cases of Kuzco
Private AI Chatbots: Build chat applications that process user conversations entirely on-device, ensuring user privacy and offline functionality
Enterprise Data Analysis: Process sensitive business data locally using AI models without exposing information to external servers
Mobile AI Applications: Create iOS apps with AI capabilities that work reliably regardless of internet connectivity
Educational Tools: Develop learning applications that can provide AI-powered tutoring and feedback while maintaining student privacy
Pros
Complete privacy with on-device processing
No network dependency required
High performance optimization for Apple devices
Comprehensive developer-friendly API
Cons
Requires sufficient device resources to run models
Limited to iOS/macOS platforms only
May have slower performance compared to cloud-based solutions
How to Use Kuzco
Install Kuzco via Swift Package Manager: Add Kuzco to your project by adding the package URL 'https://github.com/jaredcassoutt/Kuzco.git' and select 'Up to Next Major' with version 1.0.0+
Import and Initialize: Add 'import Kuzco' to your Swift file and initialize with 'let kuzco = Kuzco.shared'
Create a Model Profile: Create a ModelProfile with your model's ID and path: let profile = ModelProfile(id: 'my-model', sourcePath: '/path/to/your/model.gguf')
Load the Model: Load the model instance using: let (instance, loadStream) = await kuzco.instance(for: profile)
Monitor Loading Progress: Track the loading progress through the loadStream and wait for .ready stage before proceeding
Create Conversation Turns: Create conversation turns for your dialogue: let turns = [Turn(role: .user, text: userMessage)]
Generate Response: Generate a response using predict() with your desired settings: let stream = try await instance.predict(turns: turns, systemPrompt: 'You are a helpful assistant.')
Process the Response: Process the streaming response by iterating through the tokens: for try await (content, isComplete, _) in predictionStream { print(content) }
Optional: Configure Advanced Settings: Customize performance with InstanceSettings (contextLength, batchSize, gpuOffloadLayers, cpuThreads) and PredictionConfig (temperature, topK, topP, repeatPenalty, maxTokens) if needed
Kuzco FAQs
Kuzco is a Swift package that enables running Large Language Models (LLMs) directly on iOS, macOS, and Mac Catalyst apps. It's built on top of llama.cpp and allows for on-device AI with no network dependency, ensuring privacy and speed.
Popular Articles

How to Use Nano Banana Lmarena for Free (2025): The Ultimate Guide to Fast & Creative AI Image Generation
Aug 18, 2025

Nano-Banana: A Mystery AI Image Generator Better Than Flux Kontext in 2025
Aug 15, 2025

Google Veo 3: First AI Video Generator to Natively Support Audio
Aug 14, 2025

Google Genie 3: The Next Evolution in Real-Time Interactive 3D Worlds
Aug 14, 2025