Claude Code Plugins

Community-maintained marketplace

Feedback

Converts any text to speech audio files fully autonomously, without user intervention. Use this skill whenever a user asks to: \"convert text to speech\", \"generate audio narration\", \"create a voiceover\", \"make a TTS audio file\", \"add narration to a video\", \"generate MP3/WAV from text\", \"create audio for a presentation\", or anything involving producing spoken audio from written text. Also trigger when user needs audio narration for product videos, demos, slideshows, or automated workflows. Always use this skill before generating any video narration, even if narration seems like a minor part of the task.

Install Skill

Shared

Installs to .agents/skills, used by Codex, Amp, Warp, Cursor, OpenCode, and more.

CodexAmp
Warp
CursorOpenCode
Cline
Gemini CLI
GitHub Copilot
Personal

Available across projects.

$npx skills-installer add @biggora/claude-plugins-registry/text-to-speech --client shared
Project

Writes to .agents/skills.

$npx skills-installer add @biggora/claude-plugins-registry/text-to-speech -p --client shared
Note: Review the skill instructions before using it.

SKILL.md

name text-to-speech
description Converts any text to speech audio files fully autonomously, without user intervention. Use this skill whenever a user asks to: "convert text to speech", "generate audio narration", "create a voiceover", "make a TTS audio file", "add narration to a video", "generate MP3/WAV from text", "create audio for a presentation", or anything involving producing spoken audio from written text. Also trigger when user needs audio narration for product videos, demos, slideshows, or automated workflows. Always use this skill before generating any video narration, even if narration seems like a minor part of the task.

Text-to-Speech (TTS) Skill

Fully autonomous audio generation pipeline. No user intervention required.

What this skill produces

  • MP3 / WAV audio files from any text input
  • Multilingual narration (131+ languages via espeak-ng)
  • Video-ready voiceovers for product demos and presentations
  • Batch narration for automated workflows

Engine Selection Guide

Confirmed Working Engines (ordered by quality)

Engine Quality Speed Languages Notes
pyttsx3 + espeak-ng ★★★☆ Fast 131+ PRIMARY — always available
espeak-ng CLI ★★★☆ Fast 131+ Direct CLI, same backend
flite ★★☆☆ Very fast EN only Lightweight fallback
Kokoro ONNX ★★★★★ Medium EN, ZH, JA, KO, FR, ES, HI, PT, IT, BR High-quality neural TTS — use if model files available
gTTS ★★★★☆ Fast 40+ Google Neural — needs internet
edge-tts ★★★★★ Fast 100+ Microsoft Neural — needs internet

⚠️ Environment constraint: This agent has no internet access. Use pyttsx3/espeak-ng or Kokoro (offline). For production with internet access, prefer edge-tts or gTTS.


Quick Start

Standard Narration (always works)

import pyttsx3
import subprocess

def generate_tts(text: str, output_mp3: str, lang: str = "en", rate: int = 145, voice_id: str = None):
    """Generate TTS audio. lang = 'en', 'ru', 'de', 'fr', 'es', 'zh', etc."""
    engine = pyttsx3.init()
    engine.setProperty('rate', rate)      # 100–200, 145 = natural
    engine.setProperty('volume', 1.0)
    
    # Select voice by language
    if voice_id:
        engine.setProperty('voice', voice_id)
    else:
        voices = engine.getProperty('voices')
        for v in voices:
            if lang == 'en' and 'en-gb' in v.id.lower():
                engine.setProperty('voice', v.id)
                break
            elif lang != 'en' and lang in v.id.lower() and 'lv' not in v.id:
                engine.setProperty('voice', v.id)
                break
    
    wav_path = output_mp3.replace('.mp3', '.wav')
    engine.save_to_file(text, wav_path)
    engine.runAndWait()
    
    # Convert WAV → MP3 with audio enhancement
    subprocess.run([
        'ffmpeg', '-i', wav_path,
        '-af', 'aresample=44100,equalizer=f=3000:t=o:w=1:g=3,equalizer=f=200:t=o:w=1:g=-2',
        '-c:a', 'libmp3lame', '-b:a', '192k',
        output_mp3, '-y', '-loglevel', 'quiet'
    ], check=True)
    
    return output_mp3

# Example usage
generate_tts("Welcome to our product demo.", "/tmp/narration.mp3", lang="en")
generate_tts("Добро пожаловать в демонстрацию продукта.", "/tmp/narration_ru.mp3", lang="ru")

Installation (run once per session if needed)

pip install pyttsx3 --break-system-packages -q
apt-get install -y espeak-ng -q

# Verify
python3 -c "import pyttsx3; e=pyttsx3.init(); print('OK:', len(e.getProperty('voices')), 'voices')"

Engine Details

Read the appropriate reference file for the engine you're using:

  • references/pyttsx3-espeak.md — Primary engine: full API, voice selection, SSML-like control, quality tips
  • references/espeak-cli.md — Direct espeak-ng CLI usage, flags, phoneme control
  • references/kokoro-onnx.md — High-quality neural TTS (offline, needs model download)
  • references/online-engines.md — gTTS, edge-tts, OpenAI TTS (when internet available)

Multi-scene Narration (for videos)

For video narration with multiple scenes, generate per-scene audio then concatenate:

import pyttsx3, subprocess, os

scenes = [
    {"text": "Welcome to our AI-powered platform.", "duration_hint": 3},
    {"text": "Our system automatically detects and fixes issues.", "duration_hint": 5},
    {"text": "Get started today with a free trial.", "duration_hint": 3},
]

def scenes_to_audio(scenes: list, output_path: str, lang: str = "en") -> str:
    """Generate concatenated narration from scene list."""
    wav_files = []
    engine = pyttsx3.init()
    engine.setProperty('rate', 145)
    
    for i, scene in enumerate(scenes):
        wav = f"/tmp/scene_{i}.wav"
        engine.save_to_file(scene["text"], wav)
        engine.runAndWait()
        wav_files.append(wav)
    
    # Build concat list for ffmpeg
    concat_txt = "/tmp/concat_list.txt"
    with open(concat_txt, 'w') as f:
        for wav in wav_files:
            f.write(f"file '{wav}'\n")
    
    subprocess.run([
        'ffmpeg', '-f', 'concat', '-safe', '0', '-i', concat_txt,
        '-c:a', 'libmp3lame', '-b:a', '192k',
        output_path, '-y', '-loglevel', 'quiet'
    ], check=True)
    
    return output_path

scenes_to_audio(scenes, "/tmp/full_narration.mp3")

Language Reference (top languages)

Code Language espeak-ng voice ID
en English (GB) gmw/en-gb-scotland
en-us English (US) gmw/en-us
ru Russian zle/ru
de German gmw/de
fr French roa/fr
es Spanish roa/es
zh Chinese (Mandarin) sit/cmn
ar Arabic sem/ar
ja Japanese jpn/ja
pt Portuguese roa/pt

Full list: espeak-ng --voices


FFmpeg Audio Post-processing

# WAV → MP3 (standard)
ffmpeg -i input.wav -c:a libmp3lame -b:a 192k output.mp3 -y

# WAV → MP3 with EQ enhancement (clearer speech)
ffmpeg -i input.wav \
  -af "aresample=44100,equalizer=f=3000:t=o:w=1:g=3,equalizer=f=200:t=o:w=1:g=-2" \
  -c:a libmp3lame -b:a 192k output.mp3 -y

# Adjust speech speed without pitch change (0.85 = slower, 1.15 = faster)
ffmpeg -i input.wav -af "atempo=0.90" output_slow.wav -y

# Add silence padding (0.5s before, 0.5s after)
ffmpeg -i input.wav -af "adelay=500|500,apad=pad_dur=0.5" output_padded.wav -y

# Normalize audio volume
ffmpeg -i input.wav -af "loudnorm=I=-16:TP=-1.5:LRA=11" output_norm.wav -y

Common Pitfalls

Problem Solution
pyttsx3 hangs / no audio Run apt-get install -y espeak-ng first
Russian text sounds robotic Use rate=130, engine.setProperty('voice', 'zle/ru')
Audio too quiet Add -af "volume=2.0" in ffmpeg or set engine.setProperty('volume', 1.0)
gTTS / edge-tts timeout No internet in this environment — use pyttsx3
Kokoro needs model files Download from HuggingFace when internet is available; see references/kokoro-onnx.md
Audio/video sync off in video Use ffprobe to get exact audio duration; see screen-recording skill
Characters not spoken (symbols) Pre-process text: strip *, #, >, `

Text Pre-processing

Always clean text before TTS to avoid robotic artifacts:

import re

def clean_for_tts(text: str) -> str:
    """Remove markdown and symbols that confuse TTS engines."""
    text = re.sub(r'#{1,6}\s*', '', text)        # headers
    text = re.sub(r'\*{1,2}(.+?)\*{1,2}', r'\1', text)  # bold/italic
    text = re.sub(r'`{1,3}[^`]*`{1,3}', '', text)        # code blocks
    text = re.sub(r'\[(.+?)\]\(.+?\)', r'\1', text)       # links → link text
    text = re.sub(r'[|>]', '', text)              # table/quote chars
    text = re.sub(r'\s+', ' ', text).strip()
    return text

Integration with Screen Recording Skill

When used inside the screen-recording skill, replace the basic pyttsx3 call with this skill's generate_tts() function for better audio quality and language support. The audio pipeline is identical — just swap the TTS step.