Fish Speech Howto

Fish Speech is an open-source, multilingual text-to-speech model capable of generating high-quality, natural-sounding speech in Chinese, Japanese, and English with customizable voices and emotions.
View More

How to Use Fish Speech

Install dependencies: Install required packages by running: pip3 install torch torchvision torchaudio
Create virtual environment: Create a Python 3.10 virtual environment using conda: conda create -n fish-speech python=3.10
Activate environment: Activate the virtual environment: conda activate fish-speech
Install Fish Speech: Install Fish Speech by running: pip3 install -e .
Download models: Download required models from Hugging Face: huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft
Run inference: Generate speech by running: python tools/llama/generate.py --text "Your text here" --checkpoint-path "checkpoints/fish-speech-1.2-sft"
Decode audio: Decode the generated tokens to audio using VQGAN: python tools/vqgan/inference.py -i "codes_0.npy" --checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"
Start web UI (optional): Launch the web interface by running: python -m tools.webui --llama-checkpoint-path "checkpoints/fish-speech-1.2-sft" --decoder-checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth"

Fish Speech FAQs

Fish Speech is an open-source text-to-speech (TTS) model developed by Fish Audio. It is trained on 150,000 hours of multilingual audio data and can generate high-quality speech in Chinese, Japanese, and English.

Fish Speech Monthly Traffic Trends

Fish Speech experienced a 11.6% increase in visits, reaching 391,972 visits. The Fish Speech 1.4 launch in September, which introduced expanded training data, multilingual support, and instant voice cloning, likely contributed to this growth.

View history traffic

Latest AI Tools Similar to Fish Speech

MicVoice.Ai
MicVoice.Ai
MicVoice.Ai is an all-in-one AI voice generator platform that transforms written text into high-quality, natural-sounding speech with over 5000 realistic AI voices supporting 17+ languages.
Narrai
Narrai
Narrai is an AI-powered mobile app that instantly creates voice narration and background music for short videos by automatically generating relevant scripts and offering multiple narrator personas.
Vagent
Vagent
Vagent is a lightweight voice interface that enables users to interact with custom AI agents through voice commands, providing a natural and intuitive way to control automations with support for 60+ languages.
F5 TTS
F5 TTS
F5-TTS is a state-of-the-art, non-autoregressive text-to-speech system that uses Flow Matching and Diffusion Transformer techniques to generate highly natural and expressive speech with zero-shot voice cloning capabilities.