name	text-to-speech
description	Converts text to speech audio using OpenAI TTS API. Use when users request audio versions of text or want responses read aloud.

Text-to-Speech Skill

CRITICAL: Voice Message Reply Rules

When a user sends you a voice message, follow these rules:

ALWAYS use --voice-message flag - Required for Telegram waveform display
Generate TTS in the SAME LANGUAGE the user spoke - If they spoke English, generate English audio
Output ONLY the file path - No text commentary alongside the voice reply

Exception: If the user explicitly asks for a text response (e.g., "respond in text", "don't send voice"), respond with text instead.

Correct Example (user sent voice in English):

telclaude tts "Hello! How can I help you today?" --voice-message

Then output ONLY:

/media/outbox/voice/1234567890-abc123.ogg

WRONG - Do NOT do this:

Hello! Here is the audio you requested:
/media/outbox/tts/1234567890-abc123.mp3

This is wrong because: (1) added text alongside voice, (2) missing --voice-message flag, (3) mp3 instead of ogg, (4) wrong directory

When to Use

Use this skill when users:

Ask to "read aloud", "speak", or "say" something
Request audio versions of text content
Want voice messages or audio responses
Ask for text to be converted to speech
Send a voice message (respond in voice - see CRITICAL rules above)

How to Generate Speech

Voice Messages (Telegram waveform display)

For conversational voice replies, use --voice-message to get proper Telegram voice message formatting:

telclaude tts "Your response here" --voice-message

This outputs OGG/Opus format that displays as a voice message with waveform in Telegram.

Audio Files (music player display)

For regular audio files (longer content, podcast-style):

telclaude tts "Your text to convert to speech here"

Or use the short alias:

telclaude tts "Your text here"

Options

--voice-message: Output as Telegram voice message (OGG/Opus with waveform display)
--voice: Voice to use (alloy, echo, fable, onyx, nova, shimmer). Default: alloy
- alloy: Neutral, balanced voice
- echo: Deeper, more resonant voice
- fable: Expressive, storytelling voice
- onyx: Deep, authoritative voice
- nova: Warm, conversational voice
- shimmer: Soft, gentle voice
--speed: Speech speed from 0.25 to 4.0. Default: 1.0
--model: Quality model (tts-1, tts-1-hd). Default: tts-1
- tts-1: Standard quality, faster
- tts-1-hd: Higher quality, slightly slower
--format: Audio format (mp3, opus, aac, flac, wav). Default: mp3 (ignored with --voice-message)

Examples

# Voice message reply (when user sent a voice message)
telclaude tts "Sure, I can help you with that!" --voice-message

# Voice message with specific voice
telclaude tts "Here's what I found..." --voice-message --voice nova

# Regular audio file
telclaude tts "Hello! Here is your summary."

# High quality audio file
telclaude tts "Important announcement" --voice onyx --model tts-1-hd --speed 0.9

Response Format

The telclaude tts command outputs metadata (file path, size, format, voice, duration). You only need to include the file path in your response - the relay handles sending it to Telegram.

Voice message replies (responding to incoming voice)

Output ONLY the file path - no commentary:

/media/outbox/voice/1234567890-abc123.ogg

That's it. No "I've generated..." or "Here's your audio...". The relay sends just the voice message, like a human would.

Audio files or text+audio responses

If the user requested an audio FILE (not a voice reply), or you need to include text context:

Here's the summary as audio:
/media/outbox/tts/1234567890-abc123.mp3

Key points:

Voice messages: .../voice/*.ogg - waveform display, path only
Audio files: .../tts/*.mp3 - music player display, text OK
The relay automatically detects paths and sends the media
Paths live under TELCLAUDE_MEDIA_OUTBOX_DIR (default .telclaude-media in native mode; /media/outbox in Docker)

Best Practices

Match the medium: If user sends voice, respond with voice
Choose Appropriate Voice: Match the voice to the content type (e.g., fable for stories, onyx for announcements)
Keep Text Reasonable: Maximum 4096 characters per request
Consider Speed: Use slower speed (0.8-0.9) for important content, faster (1.2-1.5) for casual updates
Use HD Sparingly: tts-1-hd costs 2x more; use for important or long-form content

Limitations

Maximum 4096 characters per request (longer text is truncated)
Audio files are stored temporarily and cleaned up after 24 hours
Requires OPENAI_API_KEY to be configured

Cost Awareness

OpenAI TTS pricing (per 1000 characters):

tts-1: $0.015/1K chars
tts-1-hd: $0.030/1K chars

Example: A 500-word response (~2500 chars) costs ~$0.04 with tts-1

text-to-speech

Install Skill

SKILL.md