Audio & Music AI Tools

Generate music, voiceovers, sound effects, and process audio with AI-powered tools.

14Blocks

21Models

35Total

Pay as you go— starting from 1 credit per operation

View pricing

Kokoro 82M

Jaaari (via Replicate)

budget

Lightweight 82M parameter text-to-speech model with natural voices.

ttslightweightfast

Speech 02 HD

MiniMax (via Replicate)

premium

High-definition text-to-speech with premium voice quality.

ttshdprofessional

Lyria 2

Google (via Replicate)

premium

Google's advanced music generation model for creating original compositions.

googlemusicai-composer

Music 1.5

MiniMax (via Replicate)

standard

Generate royalty-free music tracks from text descriptions.

musicroyalty-freeminimax

Maya TTS

Fal.ai

standard

State-of-the-art speech model for expressive voice generation.

ttsexpressivenatural

Chatterbox TTS

Fal.ai

budget

Text-to-speech for memes, videos, games, and AI agents.

ttsfungames

MiniMax Music V2

MiniMax (via Fal.ai)

standard

Generate original music tracks from text descriptions.

musicgenerationminimax

Beatoven Music

Beatoven (via Fal.ai)

standard

Generate royalty-free instrumental music for any project.

musicroyalty-freeinstrumental

Beatoven SFX

Beatoven (via Fal.ai)

standard

Generate sound effects for videos, games, and multimedia.

sfxsound-effectsaudio

Whisper

OpenAI (via Fal.ai)

standard

OpenAI Whisper large v3 for accurate speech transcription and translation. Supports 99+ languages.

transcriptionopenaimultilingual

Wizper

Fal.ai

standard

Optimized Whisper v3 by Fal.ai - same accuracy, 2x faster performance.

transcriptionfastoptimized

XTTS-v2

Coqui (via Replicate)

standard

Clone any voice with just 6 seconds of audio. Supports 17 languages.

voice-cloningxttscoqui

OpenVoice

MyShell (via Replicate)

standard

Instant voice cloning with fine-grained control over style, emotion, and accent.

voice-cloningopenvoicemyshell

Parler TTS

Replicate

standard

Describe the voice you want in text. Generate speech matching your description.

ttsparlerdescribed-voices

Demucs

Meta (via Replicate)

standard

Separate music into stems: vocals, drums, bass, and other instruments.

demucsmetastems

Demucs 6-Stem

Meta (via Replicate)

standard

6-stem version separating vocals, drums, bass, guitar, piano, and other.

demucs6-stempiano

ElevenLabs Multilingual V2

ElevenLabs

premium

High-quality multilingual text-to-speech supporting 29 languages with emotional range.

ttselevenlabsmultilingual

ElevenLabs V3

ElevenLabs

premium

Latest ElevenLabs model with best-in-class voice quality and expressiveness.

ttselevenlabsv3

ElevenLabs Turbo V2.5

ElevenLabs

standard

Fast text-to-speech optimized for low latency with good quality.

ttselevenlabsturbo

ElevenLabs Flash V2.5

ElevenLabs

budget

Fastest ElevenLabs model for ultra-low latency text-to-speech.

ttselevenlabsflash

Speech 02 Turbo

MiniMax (via Replicate)

standard

Fast text-to-speech variant of Speech 02 optimized for speed.

ttsminimaxturbo

Ready to boost your audio & music workflow?

Join thousands of professionals using Promptha to automate tasks, generate content, and work smarter with AI.

Get Started Free View Pricing