name	Voice Synthesis
tier	3
load_policy	task-specific
description	Generate voice audio using Google Cloud TTS with enhancement
version	1.0.0
parent_skill	production-operations

Voice Synthesis Skill

The Voice Is the Heart of the Journey

This skill handles converting SSML scripts to voice audio with psychoacoustic enhancement.

Purpose

Generate high-quality, hypnotic voice audio from SSML scripts using Google Cloud Text-to-Speech.

Production Voice Standard

Always use: en-US-Neural2-H (bright female)

Parameter	Value
Voice ID	`en-US-Neural2-H`
Speaking Rate	0.88x (applied by TTS engine)
Pitch	0 semitones (base)
Enhancement	Always enabled

Canonical Command

python3 scripts/core/generate_voice.py \
    sessions/{session}/working_files/script_voice_clean.ssml \
    sessions/{session}/output

This automatically:

Uses production voice (en-US-Neural2-H)
Applies 0.88x speaking rate baseline
Generates both raw and enhanced output
Outputs voice.mp3 and voice_enhanced.mp3

Voice Options Reference

Female Voices (Recommended for Hypnosis)

Voice ID	Character	Best For
`en-US-Neural2-H`	Bright, clear	Production standard
`en-US-Neural2-E`	Deep, resonant	Darker themes, shadow work
`en-US-Neural2-C`	Soft, gentle	Very gentle sessions
`en-US-Neural2-F`	Clear, articulate	Educational content
`en-US-Neural2-G`	Warm, approachable	Confidence, empowerment

Male Voices

Voice ID	Character	Best For
`en-US-Neural2-D`	Deep, authoritative	Guided pathworkings
`en-US-Neural2-I`	Warm, compassionate	Healing journeys
`en-US-Neural2-J`	Rich, mature	Wisdom, elder guidance

Note: en-US-Neural2-A is MALE, not female.

Output Files

File	Purpose	Use For
`voice.mp3`	Raw TTS output	Never use directly
`voice_enhanced.mp3`	Production voice	Always use this
`voice_enhanced.wav`	Lossless for mixing	Audio mixing input

Voice Enhancement

The generate_voice.py script applies these enhancements:

Enhancement	Effect
Tape Warmth	Analog saturation (25% drive)
De-essing	Sibilance reduction (4-8 kHz)
Room Tone	Gentle reverb (4% wet)
EQ Shaping	Presence boost, rumble cut

Chunking System

Large scripts are automatically chunked:

Detection: Script exceeds API byte limit
Splitting: At natural break points (<break time="3s"/> or greater)
Generation: Each chunk processed separately
Concatenation: Final output seamlessly joined

Duration Verification

After generation, verify duration matches target:

# Check duration
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 \
    sessions/{session}/output/voice_enhanced.mp3

Target Duration	Expected	Acceptable Range
25 minutes	25:00	23:00 - 27:00
30 minutes	30:00	28:00 - 32:00
45 minutes	45:00	42:00 - 48:00

If duration is off:

Adjust <break> durations in SSML
Add/remove content as needed
Regenerate voice

Prerequisites

Before running voice synthesis:

Environment:
```
source venv/bin/activate
```

Google Cloud Auth:

echo $GOOGLE_APPLICATION_CREDENTIALS
# Should show path to credentials JSON

SSML Validation:

python3 scripts/utilities/validate_ssml.py sessions/{session}/working_files/script_voice_clean.ssml

SFX Stripped:

grep -c "\[SFX:" sessions/{session}/working_files/script_voice_clean.ssml
# Should return 0

Troubleshooting

Issue	Cause	Solution
"Authentication failed"	Missing credentials	Check `GOOGLE_APPLICATION_CREDENTIALS`
Robotic sound	Using slow rate in SSML	Use `rate="1.0"`, breaks for pacing
TTS reads "[SFX:..."	SFX markers not stripped	Use `script_voice_clean.ssml`
Chunking errors	Break points too far apart	Add `<break time="3s"/>` every few paragraphs
Duration too short	Not enough content	Add more script content
Duration too long	Too much content	Trim or reduce break times

Integration with Pipeline

Before (dependencies):

SSML script validated
SFX markers stripped

After (next steps):

Audio mixing with binaural and SFX
Hypnotic post-processing

Quality Checklist

Before proceeding to mixing:

voice_enhanced.mp3 exists
Duration within acceptable range
No clipping (peak < 0 dB)
No artifacts or glitches
Pacing sounds natural
All words clearly articulated

Related Resources

Skill: tier3-production/ssml-generation/ (input)
Skill: tier3-production/audio-mixing/ (next step)
Serena Memory: audio_production_methodology
Script: scripts/core/generate_voice.py

Voice Synthesis

Install Skill

SKILL.md