| name | podcast |
| description | Creates audio podcasts from text using browser text-to-speech. Use when user mentions podcast, audio conversation, dialogue, spoken content, voice narration, audio book, or text-to-speech generation. Supports multiple speakers with automatic language detection. Zero cost, no API keys, works in browser. |
| allowed-tools | Read, Write |
Podcast Generator
Generates podcast-style audio that plays directly in the browser. Zero cost, no API keys needed.
Workflow Decision Tree
User provides formatted dialogue
→ Use existing dialogue as-is if quality is good → Refine structure only if flow needs improvement
User provides article, list, or text content
→ Create dialogue from content (see "Dialogue Creation Process")
User provides topic only
→ Request source material before proceeding
Dialogue Creation Process
Follow this two-phase workflow when creating podcast dialogue from content:
Phase 1: Analyze Source Content
- Read source material completely
- Detect language from content (en, de, fr, es, it, etc.)
- Identify key information: facts, dates, names, numbers, details
- Organize by theme: chronology, category, or logical grouping
Phase 2: Create Dialogue
Structure conversation:
- Host (Speaker 1): ~20% - questions and transitions
- Expert (Speaker 2): ~80% - factual responses from source
- Length: as many exchanges as needed to cover all content (typically 10-50+ lines)
Apply TTS formatting: Read reference/tts-formatting.md for complete rules
Generate JSX file from template (see "Implementation Steps")
Information Accuracy
Convert text to audio. Use only source material facts.
Expert responses:
- Use only names, dates, numbers, details explicitly stated in source
- Never invent examples, context, or explanations not in source
- Never add interpretations, opinions, or evaluations
- Never explain WHY something happened unless source explains it
Host phrases:
- Use neutral transitions: "I see", "Tell me more", "Can you elaborate?"
- Reference previous statements: "You mentioned X - how does that connect to Y?" (when source shows connection)
- Never add new facts, context, or interpretations
Dialogue Format Guidelines
Host (Speaker 1) - ~20% of content
- Introduce topic with opening question
- Ask transition questions between topics
- Reference Expert's previous statements: "You mentioned X - can you elaborate?"
- Use conversational acknowledgments: "I see", "Tell me more"
- Never introduce facts not in source
Expert (Speaker 2) - ~80% of content
- Provide comprehensive factual responses from source material
- Include specific details: names, dates, numbers, locations
- Organize information logically by theme, chronology, or category
- Structure facts narratively using only source material
- Never repeat information already stated
- Never add context or examples not in source
Natural Conversation Techniques
Use these patterns without adding information:
- Vary question styles: "What happened next?" / "Can you explain that further?" / "Tell me about..."
- Ask follow-up questions based on Expert's previous response
- Expert elaborates when source provides multiple details about a topic
- Clear transitions between sections: "Moving to the next category...", "In the European context..."
Avoid
- Personal opinions: "I think...", "That's crazy..."
- Value judgments: "amazing", "fascinating", "interesting"
- Humor, irony, jokes
- Rapid back-and-forth after every sentence
Implementation Steps
When user requests a podcast:
- Analyze source content and create dialogue following format above
- Detect language from content
- Read template from
assets/podcast-template.jsx - Replace values in template:
PODCAST_SCRIPT- your generated dialoguePODCAST_TITLE- descriptive title from contentPODCAST_LANGUAGE- detected language code
- Save as JSX file - Use the Write tool to save the modified template as a
.jsxfile. The file will render as an interactive podcast player. - Recommend Microsoft Edge browser for best voice quality (250+ Natural voices vs Chrome's 19)
Technical Reference
Script Format
<speaker1>Host's question or statement.
<speaker2>Expert's response with factual information.
Voice Configuration (automatic)
- Speaker 1 (Host): Pitch 1.05, Rate 0.95
- Speaker 2 (Expert): Pitch 0.88, Rate 0.93
Platform-Aware Voice Selection
Platform detection:
- Automatically detects iOS, Android, Desktop Edge, or Desktop
- Selects best available voices based on platform
Desktop Edge:
- Priority: Microsoft Neural/Natural voices (Katja, Conrad, Aria, Guy, etc.)
- 250+ high-quality voices available
Desktop Chrome:
- Priority: Google voices (Google UK English Female, Google Deutsch, etc.)
- ~19 voices available (lower quality than Edge)
- Fallback: local system voices
iOS (Safari/Mobile):
- Priority: Native Siri voices (Samantha, Anna, Daniel, etc.)
- Best quality on iOS devices
Android (Chrome/Mobile):
- Priority: Google TTS voices (Google Deutsch, Google UK English Female, etc.)
- Wavenet voices preferred when available
Voice assignment:
- Automatically assigns different voices to Speaker 1 and Speaker 2
- Uses modulo distribution for 3+ speakers
- Ensures distinct voices even with limited availability
Player Features
- Play/Pause/Resume with full playback control
- Stop to reset to beginning
- Click any transcript line to resume from there
- Progress bar shows current position
- Auto-scroll follows current line
Technical Constraints
- Keep sentences under 14 seconds (Chrome limitation)
- 350ms pause between speakers
- Microsoft Edge browser provides 250+ high-quality Natural voices (best option)
- Chrome provides only 19 lower-quality voices with utterance bugs
- Firefox has very limited voice support
Quality Requirements
- Factual Accuracy: Expert responses use only source facts
- Natural Flow: Avoid rapid back-and-forth, value judgments
- TTS Compliance: All text must play without pronunciation errors
- Zero Hallucination: No invented examples or context
- Complete Coverage: Include all important facts from source
- No Duplicates: Each fact appears exactly once