
Augmentoolkit 3.0
Augmentoolkit 3.0 is a refined and battle-tested open-source tool that creates domain-expert datasets to train custom LLMs with your own data, featuring an intuitive interface, offline capability, and automatic dataset generation and training processes.
https://github.com/e-p-armstrong/augmentoolkit?ref=producthunt

Product Information
Updated:Jun 19, 2025
What is Augmentoolkit 3.0
Augmentoolkit 3.0 represents a significant evolution in custom LLM development, designed to help users create domain-expert AI models trained on their specific data and knowledge domains. This MIT-licensed tool has been refined through over a year of professional application and experimentation, making it the leading solution for creating specialized LLMs. It allows users to upload documents and, with minimal technical expertise, generate training datasets and train custom AI models that deeply understand specific subject matters, whether it's technical documentation, research papers, or fictional universes.
Key Features of Augmentoolkit 3.0
Augmentoolkit 3.0 is an advanced data generation and LLM training platform that allows users to create domain-expert AI models from custom documents and texts. It features an improved interface, automated training process, and the ability to run either locally or via APIs. The tool has been refined through over a year of professional application, generating diverse domain data while automatically balancing it with generic data, making it easier than ever to create specialized AI models without extensive technical expertise.
Intuitive Interface: Features a graphical user interface as a first-class citizen, allowing users to generate datasets by simply uploading documents and pressing buttons
Flexible Deployment Options: Can run either locally on consumer hardware or via APIs like Deepinfra, with automatic resume capability for interrupted processes
Automated Training Pipeline: Automatically handles the entire process from data generation to model training, including downloading and preparing models for inference
Discord Bot Creation: Includes functionality to easily convert custom-built models into Discord bots for sharing with friends or community
Use Cases of Augmentoolkit 3.0
Professional Research Integration: Researchers can create AI models that understand and can discuss the latest papers and developments in their specific field
Corporate Knowledge Management: Companies can develop AI assistants that understand internal documentation and procedures to help employees access information efficiently
Creative Content Development: Writers and creators can generate specialized AI models that understand specific fictional universes or writing styles for creative projects
Data Classification Projects: ML professionals can create classification datasets from large unlabeled text collections without human annotators
Pros
Cost-effective solution for creating custom AI models
Requires minimal technical expertise to use
Supports both local and API-based operation
Cons
Small datasets may require additional optimization steps for effective training
Local data generation can be slow on consumer hardware
Some new features are still in experimental/beta phase
How to Use Augmentoolkit 3.0
Install Prerequisites: Ensure you have Python 3.10 or 3.11 installed on your system. Other versions are not supported.
Clone Repository: Run 'git clone https://github.com/e-p-armstrong/augmentoolkit.git' and 'cd augmentoolkit'
Setup Environment: Run the appropriate setup script for your OS: For MacOS use 'bash macos.sh' (or 'bash local_macos.sh' for local generation), for Linux use 'bash linux.sh', and for Windows use './windows.bat'
Prepare Input Data: Place your source documents (.txt or .md files like books, manuals, instructions etc.) in the designated input folder
Configure Settings: Adjust the config.yaml file with appropriate settings for your use case. Key settings include input/output paths and model parameters.
Generate Dataset: Use either the graphical interface (recommended) or run the processing.py script to generate your training dataset. The interface will guide you through the process.
Monitor Progress: The tool will automatically resume if interrupted. Monitor progress through the interface or console output.
Train Model: Once dataset generation is complete, the tool can automatically begin model training if configured to do so (controlled by do_train setting in config)
Deploy Model: After training, you can serve your model locally or deploy it as a Discord bot using Augmentoolkit's built-in server features
Augmentoolkit 3.0 FAQs
Augmentoolkit 3.0 is an open-source tool that creates domain-expert datasets to update an AI's knowledge, making it an expert in specific areas. It has been refined through over a year of professional application and allows users to upload documents and create fully trained custom LLMs with just a button press.
Augmentoolkit 3.0 Video
Popular Articles

SweetAI Chat VS JuicyChat AI: Why SweetAI Chat Wins in 2025
Jun 18, 2025

Gentube Review 2025: Fast, Free, and Beginner-Friendly AI Image Generator
Jun 16, 2025

SweetAI Chat vs Girlfriendly AI: Why SweetAI Chat Is the Better Choice in 2025
Jun 10, 2025

SweetAI Chat vs Candy.ai 2025: Find Your Best NSFW AI Girlfriend Chatbot
Jun 10, 2025