name	elevenlabs
description	AI-powered audio generation using ElevenLabs API - text-to-speech with lifelike voices, sound effects generation, and music creation from text descriptions. Generate natural-sounding speech in 32 languages, create custom sound effects for games and videos, and compose royalty-free music tracks. Use this skill when the user requests: - Voice generation or text-to-speech conversion - Audio narration for content (videos, audiobooks, podcasts) - Sound effects for games, videos, or applications - Music generation from text descriptions - Multi-speaker dialogue or conversation audio - Voice cloning or custom voice creation - Audio streaming for real-time applications Capabilities: Text-to-speech (32 languages, 100+ voices), sound effects generation, music composition, voice cloning, real-time audio streaming Python SDK: elevenlabs (pip install elevenlabs)
allowed-tools	Bash, Read, Write, AskUserQuestion

ElevenLabs Audio Generation

Purpose

This skill enables AI-powered audio generation through ElevenLabs API. Create lifelike text-to-speech in 32 languages, generate custom sound effects for games and videos, and compose royalty-free music from text descriptions. Support for 100+ professional voices, custom voice cloning, real-time streaming, and multi-speaker dialogue.

When to Use

This skill should be invoked when the user asks to:

Generate speech from text ("convert this to speech", "create audio narration...")
Create voiceovers for videos, presentations, or content
Generate audio in specific voices or languages
Create sound effects ("generate footstep sounds", "create explosion audio...")
Compose music from descriptions ("generate upbeat background music...")
Build multi-speaker dialogue or conversations
Clone voices from audio samples
Stream audio in real-time applications
Create audiobooks, podcasts, or audio content

Available Capabilities

1. Text-to-Speech (Voice Generation)

Models:

Eleven Multilingual v2 (eleven_multilingual_v2) - Highest quality, 29 languages
Eleven Flash v2.5 (eleven_flash_v2_5) - Ultra-low 75ms latency, 32 languages, 50% cheaper
Eleven Turbo v2.5 (eleven_turbo_v2_5) - Balanced quality and latency

Features:

100+ premade professional voices
Custom voice cloning from audio samples
Multi-speaker dialogue generation
Real-time audio streaming
32 language support
Emotional and natural intonation
Voice settings customization (stability, similarity, style)

Output Formats:

MP3 (various bitrates: 32kbps to 192kbps)
PCM (8kHz to 48kHz)
Opus, µ-law, A-law

2. Sound Effects Generation

Model:

Eleven Text-to-Sound v2 (eleven_text_to_sound_v2)

Features:

Generate sound effects from text descriptions
Customizable duration
Looping support for seamless audio
Prompt influence control
High-quality audio for games, videos, UI/UX

Use Cases:

Game audio (footsteps, explosions, ambient)
Video production sounds
UI/UX sound design
Nature sounds (rain, wind, waves)
Mechanical sounds (doors, engines, machines)
Fantasy/sci-fi effects

3. Music Generation

Features:

Text-to-music composition
Vocal and instrumental tracks
Multiple genres and styles
Customizable track duration
Composition plans (structured music blueprints)
Royalty-free generated music

Parameters:

Text prompts describing desired music
Duration control (milliseconds)
Genre, style, mood specifications
Section-level composition control

Requirements:

Paid ElevenLabs account (music API not available on free tier)

Content Policy:

No copyrighted material (artist names, band names, trademarks)
Returns suggestions for restricted prompts

Instructions

Step 1: Understand the Request

Analyze the user's request to determine:

Task Type: Text-to-speech, sound effects, or music generation
Content: What text/description to convert
Voice/Sound: Specific voice, language, or sound characteristics
Format: Output format requirements (MP3, streaming, etc.)
Duration: Length requirements (for sound effects or music)
Use Case: Narration, video, game, podcast, etc.

Step 2: Select Appropriate Model/Capability

For Text-to-Speech:

High quality needed → eleven_multilingual_v2
Low latency/real-time → eleven_flash_v2_5
Balanced → eleven_turbo_v2_5

For Sound Effects:

Use eleven_text_to_sound_v2 model
Consider duration and looping needs

For Music:

Ensure user has paid account
Determine track length and style

Step 3: Set Up API Authentication

import os
from elevenlabs.client import ElevenLabs

# Initialize client with API key
client = ElevenLabs(api_key=os.environ.get("ELEVENLABS_API_KEY"))

API key should be set as environment variable:

export ELEVENLABS_API_KEY="your-api-key-here"

Step 4: Implement Based on Task Type

Text-to-Speech Implementation

Basic Speech Generation:

from elevenlabs.client import ElevenLabs
from pathlib import Path

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

# Generate speech
audio = client.text_to_speech.convert(
    text="Your text content here",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # Default voice (George)
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128"
)

# Save to file
output_path = Path("speech_output.mp3")
with output_path.open("wb") as f:
    for chunk in audio:
        f.write(chunk)

print(f"Audio saved to: {output_path}")

Streaming Speech (Real-time):

from elevenlabs.client import ElevenLabs
from elevenlabs import stream

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

# Stream audio in real-time
audio_stream = client.text_to_speech.convert_as_stream(
    text="This will be streamed as it generates",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_flash_v2_5",  # Low latency model for streaming
    output_format="mp3_44100_128"
)

# Stream to speakers
stream(audio_stream)

Multi-Speaker Dialogue:

# Generate conversation with multiple voices
speakers = [
    {
        "voice_id": "JBFqnCBsd6RMkjVDRZzb",  # Speaker 1
        "text": "Hello, how are you today?"
    },
    {
        "voice_id": "21m00Tcm4TlvDq8ikWAM",  # Speaker 2 (Rachel)
        "text": "I'm doing great, thanks for asking!"
    }
]

# Generate each speaker's audio and combine
from pydub import AudioSegment
combined = AudioSegment.empty()

for speaker in speakers:
    audio = client.text_to_speech.convert(
        text=speaker["text"],
        voice_id=speaker["voice_id"],
        model_id="eleven_multilingual_v2"
    )

    # Save temp file
    temp_path = Path(f"temp_{speaker['voice_id']}.mp3")
    with temp_path.open("wb") as f:
        for chunk in audio:
            f.write(chunk)

    # Add to combined audio
    segment = AudioSegment.from_mp3(str(temp_path))
    combined += segment
    temp_path.unlink()  # Clean up

# Export final dialogue
combined.export("dialogue.mp3", format="mp3")

List Available Voices:

# Get all available voices
voices = client.voices.get_all()

print("Available voices:")
for voice in voices.voices:
    print(f"- {voice.name} (ID: {voice.voice_id})")
    print(f"  Labels: {voice.labels}")
    print(f"  Description: {voice.description}")

Common Voice IDs:

JBFqnCBsd6RMkjVDRZzb - George (male, English, middle-aged)
21m00Tcm4TlvDq8ikWAM - Rachel (female, English, young)
AZnzlk1XvdvUeBnXmlld - Domi (female, English, young)
EXAVITQu4vr4xnSDxMaL - Bella (female, English, young)
ErXwobaYiN019PkySvjV - Antoni (male, English, young)
MF3mGyEYCl7XYWbV9V6O - Elli (female, English, young)
TxGEqnHWrfWFTfGW9XjX - Josh (male, English, young)

Sound Effects Implementation

Basic Sound Effect Generation:

from elevenlabs.client import ElevenLabs
from pathlib import Path

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

# Generate sound effect
audio = client.text_to_sound_effects.convert(
    text="footsteps on wooden floor, slow paced walking",
    duration_seconds=5.0,
    prompt_influence=0.5  # How closely to follow prompt (0.0-1.0)
)

# Save to file
output_path = Path("footsteps.mp3")
with output_path.open("wb") as f:
    for chunk in audio:
        f.write(chunk)

print(f"Sound effect saved to: {output_path}")

Looping Sound Effect:

# Generate seamlessly looping audio
audio = client.text_to_sound_effects.convert(
    text="gentle rain falling on leaves, ambient nature sound",
    duration_seconds=10.0,
    prompt_influence=0.5
    # Note: loop parameter may be available in newer API versions
)

output_path = Path("rain_loop.mp3")
with output_path.open("wb") as f:
    for chunk in audio:
        f.write(chunk)

Multiple Sound Effects:

# Generate various sound effects for a game
sound_effects = [
    {
        "name": "explosion",
        "description": "large explosion, debris falling, action movie style",
        "duration": 3.0
    },
    {
        "name": "door_open",
        "description": "creaky wooden door slowly opening, horror atmosphere",
        "duration": 2.0
    },
    {
        "name": "ui_click",
        "description": "soft button click, UI feedback sound, pleasant tone",
        "duration": 0.5
    }
]

for sfx in sound_effects:
    audio = client.text_to_sound_effects.convert(
        text=sfx["description"],
        duration_seconds=sfx["duration"]
    )

    output_path = Path(f"{sfx['name']}.mp3")
    with output_path.open("wb") as f:
        for chunk in audio:
            f.write(chunk)

    print(f"Generated: {output_path}")

Music Generation Implementation

Basic Music Composition:

from elevenlabs.client import ElevenLabs
from pathlib import Path

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

# Generate music from prompt
prompt = """Upbeat indie pop song with acoustic guitar, light drums, and cheerful
melody. Modern and energetic feel, perfect for background music in a lifestyle video.
Instrumental only, no vocals."""

try:
    audio = client.music_generation.compose(
        prompt=prompt,
        music_length_ms=30000  # 30 seconds
    )

    # Save music file
    output_path = Path("background_music.mp3")
    with output_path.open("wb") as f:
        for chunk in audio:
            f.write(chunk)

    print(f"Music saved to: {output_path}")

except Exception as e:
    if "paid" in str(e).lower() or "subscription" in str(e).lower():
        print("Error: Music generation requires a paid ElevenLabs account")
    else:
        print(f"Error: {e}")

Music with Composition Plan:

# Create structured composition plan first
composition_plan = client.music_generation.composition_plan.create(
    prompt="""Electronic dance music track with energetic build-up, drop section,
    and chill outro. Progressive house style.""",
    music_length_ms=60000  # 60 seconds
)

# Generate music from plan (allows for more control)
audio = client.music_generation.compose(
    composition_plan=composition_plan
)

output_path = Path("edm_track.mp3")
with output_path.open("wb") as f:
    for chunk in audio:
        f.write(chunk)

Genre-Specific Music:

# Generate music for different genres/moods
music_prompts = {
    "cinematic": """Epic cinematic orchestral music with dramatic strings, powerful
    brass, and heroic theme. Perfect for movie trailer, inspiring and grand.""",

    "lo-fi": """Chill lo-fi hip hop beats with jazz piano, vinyl crackle, and mellow
    drums. Relaxing study music atmosphere, instrumental.""",

    "ambient": """Ambient soundscape with ethereal pads, subtle textures, and peaceful
    atmosphere. Meditative and calming, perfect for relaxation.""",

    "game_menu": """Mysterious fantasy game menu music with harp, soft strings, and
    magical atmosphere. Medieval RPG feel, looping background music."""
}

for name, prompt in music_prompts.items():
    try:
        audio = client.music_generation.compose(
            prompt=prompt,
            music_length_ms=20000  # 20 seconds
        )

        output_path = Path(f"music_{name}.mp3")
        with output_path.open("wb") as f:
            for chunk in audio:
                f.write(chunk)

        print(f"Generated: {output_path}")

    except Exception as e:
        print(f"Error generating {name}: {e}")

Step 5: Handle Output and Errors

Save Audio Files:

from pathlib import Path

def save_audio(audio_generator, filename):
    """Save audio generator to file"""
    output_path = Path(filename)

    with output_path.open("wb") as f:
        for chunk in audio_generator:
            f.write(chunk)

    print(f"Saved: {output_path.absolute()}")
    return output_path

Error Handling:

import os
from elevenlabs.client import ElevenLabs

def check_api_key():
    """Verify API key is set"""
    if not os.environ.get("ELEVENLABS_API_KEY"):
        raise ValueError(
            "ELEVENLABS_API_KEY not set. "
            "Please set environment variable: export ELEVENLABS_API_KEY='your-key'"
        )

def handle_elevenlabs_request(func, *args, **kwargs):
    """Wrapper for error handling"""
    try:
        return func(*args, **kwargs)

    except Exception as e:
        error_msg = str(e).lower()

        if "api key" in error_msg or "authentication" in error_msg:
            print("Error: Invalid or missing API key")
            print("Set your API key: export ELEVENLABS_API_KEY='your-key'")

        elif "quota" in error_msg or "limit" in error_msg:
            print("Error: API quota exceeded")
            print("Check your usage at https://elevenlabs.io/app/usage")

        elif "paid" in error_msg or "subscription" in error_msg:
            print("Error: This feature requires a paid subscription")

        elif "bad_prompt" in error_msg:
            print("Error: Prompt contains restricted content")
            print("Avoid copyrighted material (artist names, brands)")

        else:
            print(f"Error: {e}")

        raise

Step 6: Provide Output to User

Report what was generated
Show file path where audio was saved
Provide playback options if appropriate
Offer refinements (different voice, longer duration, etc.)
Display metadata (duration, format, model used)

Requirements

API Key:

ElevenLabs API key (get from https://elevenlabs.io/app/settings/api-keys)
Set as environment variable: ELEVENLABS_API_KEY

Python Packages:

pip install elevenlabs pydub python-dotenv

System:

Python 3.8+
Internet connection for API access
Audio playback library (optional, for playing generated audio)
ffmpeg (required by pydub for audio processing)

Account Requirements:

Free tier: Text-to-speech and sound effects
Paid tier: Music generation, higher quotas

Best Practices

Text-to-Speech

Choose Appropriate Model:
- High quality narration → eleven_multilingual_v2
- Real-time/streaming → eleven_flash_v2_5
- Balanced use cases → eleven_turbo_v2_5
Select Right Voice:
- Match voice to content (age, gender, accent)
- Use voices.get_all() to explore options
- Consider voice labels and descriptions
Optimize for Use Case:
- Long content: Use standard conversion, not streaming
- Real-time apps: Use Flash model with streaming
- Dialogue: Generate separate audio per speaker
Format Selection:
- Web/mobile: MP3 (good quality, small size)
- High quality: Use higher bitrate (128kbps+)
- Phone systems: µ-law or A-law format

Sound Effects Generation

Be Descriptive:
- Include context: "footsteps on gravel, slow walking pace"
- Specify mood: "creepy door creak, horror atmosphere"
- Add technical details: "deep bass explosion, action movie"
Duration Control:
- Short sounds: 0.5-2 seconds (UI clicks, impacts)
- Medium sounds: 2-5 seconds (footsteps, doors)
- Ambient loops: 5-10+ seconds (rain, wind, environments)
Prompt Influence:
- High (0.7-1.0): Follow prompt closely, more literal
- Medium (0.4-0.6): Balanced creativity and adherence
- Low (0.0-0.3): More creative interpretation
Iteration:
- Generate multiple variations
- Adjust descriptions based on results
- Combine multiple effects if needed

Music Generation

Detailed Prompts:
- Specify genre, instruments, mood, tempo
- Mention structure (intro, build-up, drop, outro)
- Include use case context (game menu, video background)
Avoid Copyrighted References:
- Don't mention artist names, band names, songs
- Use generic style descriptions instead
- Focus on characteristics, not examples
Duration Planning:
- Short clips: 10-30 seconds (loops, backgrounds)
- Full tracks: 60-120 seconds (complete songs)
- Consider export time (longer = more processing)
Composition Plans:
- Use for complex multi-section tracks
- Better control over structure
- Allows section-level customization

General Best Practices

API Key Security:
- Store in environment variables, never in code
- Use .env files for local development
- Rotate keys periodically
Error Handling:
- Always wrap API calls in try/except
- Check for quota limits
- Provide helpful error messages
Cost Optimization:
- Use Flash model when quality difference is minimal
- Cache/reuse generated audio when possible
- Monitor usage via dashboard
File Management:
- Use descriptive filenames
- Organize by type (speech, sfx, music)
- Clean up temporary files
Testing:
- Test with short durations first
- Verify output quality before long generations
- Check different voices/settings

Examples

Example 1: Audiobook Narration

User request: "Convert this chapter to audiobook format"

Expected behavior:

Select appropriate voice (e.g., narrative voice like George)
Use high-quality model (eleven_multilingual_v2)
Generate speech from chapter text
Save as MP3 with high bitrate
Report duration and file location

audio = client.text_to_speech.convert(
    text=chapter_text,
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128"
)
save_audio(audio, "chapter_1.mp3")

Example 2: Video Game Sound Effects

User request: "Generate sound effects for a fantasy RPG game"

Expected behavior:

Create multiple sound effects with descriptions
Set appropriate durations for each
Save with descriptive names
Organize in game audio folder

sfx_list = [
    ("sword_swing", "sword whooshing through air, fantasy combat", 1.0),
    ("potion_drink", "drinking magical potion, gulp sound, RPG game", 0.8),
    ("spell_cast", "magical spell casting, ethereal whoosh, fantasy magic", 1.5),
    ("footsteps_stone", "footsteps on stone dungeon floor, echoing", 2.0)
]

for name, description, duration in sfx_list:
    audio = client.text_to_sound_effects.convert(
        text=description,
        duration_seconds=duration
    )
    save_audio(audio, f"sfx_{name}.mp3")

Example 3: Podcast Intro with Music

User request: "Create a podcast intro with voice and background music"

Expected behavior:

Generate intro speech
Generate background music
Note that mixing would need external tools (pydub)
Provide both audio files

# Generate intro speech
intro_text = "Welcome to the Tech Talk podcast, where we discuss the latest in technology and innovation."
speech = client.text_to_speech.convert(
    text=intro_text,
    voice_id="TxGEqnHWrfWFTfGW9XjX",  # Josh (energetic)
    model_id="eleven_flash_v2_5"
)
save_audio(speech, "podcast_intro_voice.mp3")

# Generate background music (requires paid account)
music = client.music_generation.compose(
    prompt="Upbeat tech podcast intro music, electronic beats, modern and energetic",
    music_length_ms=10000  # 10 seconds
)
save_audio(music, "podcast_intro_music.mp3")

print("Use audio editing software to mix voice and music")

Example 4: Multilingual Content

User request: "Create welcome messages in English, Spanish, and French"

Expected behavior:

Generate speech in each language
Use multilingual model
Select appropriate voices for each language
Save with language-specific filenames

messages = {
    "english": ("Hello and welcome!", "JBFqnCBsd6RMkjVDRZzb"),
    "spanish": ("¡Hola y bienvenido!", "ThT5KcBeYPX3keUQqHPh"),  # Spanish voice
    "french": ("Bonjour et bienvenue!", "XB0fDUnXU5powFXDhCwa")   # French voice
}

for lang, (text, voice_id) in messages.items():
    audio = client.text_to_speech.convert(
        text=text,
        voice_id=voice_id,
        model_id="eleven_multilingual_v2"
    )
    save_audio(audio, f"welcome_{lang}.mp3")

Example 5: Real-time Voice Streaming

User request: "Stream this news article as audio"

Expected behavior:

Use Flash model for low latency
Stream audio as it generates
Provide real-time playback or save incrementally

from elevenlabs import stream

audio_stream = client.text_to_speech.convert_as_stream(
    text=news_article_text,
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
    model_id="eleven_flash_v2_5",
    output_format="mp3_44100_128"
)

# Stream to speakers in real-time
stream(audio_stream)

Limitations

Music Generation:
- Requires paid subscription
- No copyrighted material allowed
- Processing time increases with duration
API Quotas:
- Character limits per month (tier-dependent)
- Rate limits on requests
- Different limits for free vs paid tiers
Voice Cloning:
- Not covered in Tier 1 implementation
- Requires voice samples and additional setup
Audio Quality:
- Output format affects quality and file size
- Higher quality formats may require paid tier
- Streaming has slightly lower quality than standard
Language Support:
- 32 languages supported but quality varies
- Some voices are language-specific
- Multilingual model recommended for non-English
Sound Effects:
- Limited to description-based generation
- No editing of generated effects via API
- Duration limitations (typically under 22 seconds)
Content Policy:
- No harmful or copyrighted content
- Music generation rejects artist/band names
- Strict content moderation on all endpoints

Related Skills

image-generation - For visual content creation
python-plotting - For visualizing audio data
scientific-writing - For generating narration text
python-best-practices - For writing clean audio processing code

Additional Resources

ElevenLabs Documentation: https://elevenlabs.io/docs
Python SDK: https://github.com/elevenlabs/elevenlabs-python
API Reference: https://elevenlabs.io/docs/api-reference/introduction
Voice Library: https://elevenlabs.io/voice-library
Pricing: https://elevenlabs.io/pricing
Usage Dashboard: https://elevenlabs.io/app/usage

elevenlabs

Install Skill

SKILL.md

ElevenLabs Audio Generation

Purpose

When to Use

Available Capabilities

1. Text-to-Speech (Voice Generation)

2. Sound Effects Generation

3. Music Generation

Instructions

Step 1: Understand the Request

Step 2: Select Appropriate Model/Capability

Step 3: Set Up API Authentication

Step 4: Implement Based on Task Type

Text-to-Speech Implementation

Sound Effects Implementation

Music Generation Implementation

Step 5: Handle Output and Errors

Step 6: Provide Output to User

Requirements

Best Practices

Text-to-Speech

Sound Effects Generation

Music Generation

General Best Practices

Examples

Example 1: Audiobook Narration

Example 2: Video Game Sound Effects

Example 3: Podcast Intro with Music

Example 4: Multilingual Content

Example 5: Real-time Voice Streaming

Limitations

Related Skills

Additional Resources