| name | ai-talking-head |
| description | Specialized skill for AI talking head and lip-sync video generation. Use when you need presenter videos, UGC-style content, or lip-synced avatars. Triggers on: talking head, presenter video, lip sync, UGC video. Outputs professional talking head videos. |
AI Talking Head
Generate talking head videos, presenter content, and lip-synced videos.
Use this skill when: You need a person (real or AI) talking to camera. Route here from: ai-creative-workflow, ai-creative-strategist, or direct requests.
Why This Skill Exists
The problem: Talking head videos are the most persuasive content format but:
- Recording yourself is time-consuming and requires confidence
- Professional presenters are expensive ($500-5000+ per video)
- UGC creators charge $100-500 per post and may not match your brand
- Iterating on scripts means re-filming everything
- Scaling personalized video is nearly impossible manually
The solution: AI talking heads that:
- Generate professional presenter videos in minutes
- Let you iterate on scripts without re-recording
- Create unlimited variants for A/B testing
- Maintain consistent brand presenter identity
- Scale personalized outreach cost-effectively
The game-changer: Combining avatar generation + lip-sync lets you:
- Create a consistent "brand spokesperson"
- Update any script without re-filming
- Test multiple presenter styles quickly
- Produce video content at 10x the speed
Presenter Style Exploration (Before Generation)
Critical insight from ai-creative-strategist: Don't generate with one style and hope it works. Explore genuinely DIFFERENT presenter styles first.
The Style Exploration Process
STEP 1: GENERATE 4-5 DIFFERENT PRESENTER STYLES
This is NOT: Same person with different clothes This IS: Fundamentally different presenter archetypes that each tell a different story
[YOUR BRAND] - Style Exploration
Generate presenter concepts for these 5 directions:
1. CORPORATE AUTHORITY
- Demographic: 35-50, professional appearance
- Setting: Modern office, corporate environment
- Wardrobe: Business professional, suit/blazer
- Energy: Confident, measured, authoritative
- Vibe: "Trust the expert"
2. RELATABLE FRIEND
- Demographic: 25-40, approachable look
- Setting: Home office, kitchen, casual space
- Wardrobe: Smart casual, comfortable
- Energy: Warm, conversational, genuine
- Vibe: "Let me share what worked for me"
3. ENERGETIC CREATOR
- Demographic: 22-35, creator aesthetic
- Setting: Ring light setup, content studio
- Wardrobe: Trendy casual, branded
- Energy: High, dynamic, enthusiastic
- Vibe: "You HAVE to try this"
4. EXPERT EDUCATOR
- Demographic: 30-55, credible appearance
- Setting: Study, library, professional backdrop
- Wardrobe: Smart casual, glasses optional
- Energy: Calm, explanatory, helpful
- Vibe: "Let me explain how this works"
5. LIFESTYLE ASPIRATIONAL
- Demographic: 28-45, aspirational look
- Setting: Beautiful home, travel location, luxury
- Wardrobe: Elevated casual, tasteful
- Energy: Relaxed confidence, success aura
- Vibe: "This is what my life looks like"
STEP 2: IDENTIFY WINNER
After generating style exploration:
REVIEW each presenter style:
Which presenter:
- Best matches brand voice?
- Would audience trust most?
- Fits the content type?
- Has right energy level?
- Would work across multiple videos?
WINNER: [Selected style]
BECAUSE: [Why this style wins for this brand/use case]
STEP 3: EXTRACT PRESENTER PRINCIPLES
Once winner identified:
WINNING STYLE EXTRACTION
Demographics:
- Age range: [X-X]
- Gender: [if specific]
- Ethnicity: [if specific]
- Overall look: [descriptors]
Environment:
- Primary setting: [where they present from]
- Background elements: [what's visible]
- Lighting style: [natural/studio/mixed]
Wardrobe:
- Style: [formal/casual/etc.]
- Colors: [palette]
- Accessories: [if any]
Delivery:
- Energy level: [1-10]
- Speaking pace: [slow/medium/fast]
- Hand gestures: [minimal/moderate/expressive]
- Eye contact: [direct to camera always]
Audio:
- Voice tone: [warm/authoritative/energetic]
- Pacing: [conversational/punchy/measured]
STEP 4: APPLY ACROSS CONTENT
Use extracted principles for:
- All future videos maintain consistency
- Same presenter = brand recognition
- Variations in script, not in presenter
Presenter Archetype Deep Dives
Corporate Authority
When to use: B2B, financial services, healthcare, enterprise SaaS, professional services
Visual Formula:
[Man/Woman] in [30s-50s], [silver/dark hair], wearing [tailored blazer/suit],
in [modern glass office/conference room with city view], [warm professional lighting],
[confident composed expression], [seated at desk OR standing with slight lean],
[direct eye contact with camera], [subtle hand gestures], corporate executive style
Setting Options:
- Corner office with city view
- Modern conference room
- Executive desk with minimal decor
- Standing at presentation screen
- Seated in designer chair
Wardrobe Options:
- Tailored navy blazer over white shirt
- Grey suit, no tie (modern)
- Classic suit with subtle tie
- Blazer over turtleneck (thought leader)
- Professional dress (solid colors)
Energy Markers:
- Measured pace
- Deliberate movements
- Confident pauses
- Minimal but purposeful gestures
- Assured vocal tone
Relatable Friend (UGC Style)
When to use: DTC brands, consumer products, wellness, beauty, lifestyle
Visual Formula:
[Friendly man/woman] in [25-40s], wearing [casual but put-together outfit],
in [bright modern apartment/kitchen/home office], [natural window light],
[genuine warm smile], [relaxed comfortable posture], [talking to camera like
a friend], [natural hand movements], authentic UGC creator style
Setting Options:
- Bright kitchen counter
- Cozy living room couch
- Home office with plants
- Bedroom getting-ready setup
- Outdoor patio/balcony
Wardrobe Options:
- Cozy sweater/cardigan
- Simple t-shirt
- Casual button-down
- Loungewear (if brand appropriate)
- Athleisure
Energy Markers:
- Conversational rhythm
- Natural pauses ("honestly?", "okay so...")
- Expressive facial reactions
- Genuine enthusiasm without over-selling
- Relatable body language
UGC Script Patterns:
DISCOVERY: "Okay so I found this [product] and I'm obsessed..."
REVIEW: "So I've been using [product] for [time] and here's my honest take..."
COMPARISON: "I used to use [old product] but then I tried [new product]..."
TRANSFORMATION: "Before [product] I was [problem]. Now? [result]."
Energetic Creator
When to use: Gen-Z products, entertainment, gaming, trendy DTC, social apps
Visual Formula:
[Young energetic creator] in [22-35], [colorful trendy outfit], in [content
studio with ring light/neon lights], [bright dynamic lighting], [animated
expressions], [lots of movement and gestures], [high energy delivery],
[fast-paced enthusiastic style], YouTube/TikTok creator aesthetic
Setting Options:
- Ring light setup visible
- LED/neon accent lighting
- Streaming/gaming setup
- Colorful backdrop
- Outdoor action setting
Wardrobe Options:
- Graphic tees
- Bold colors
- Branded merch
- Trendy streetwear
- Statement accessories
Energy Markers:
- Fast-paced delivery
- Big expressions
- Lots of hand movement
- Pattern interrupts
- Enthusiasm at 10
Creator Script Patterns:
HOOK: "STOP scrolling. This is important."
REVEAL: "I literally just discovered [thing] and I'm freaking out."
CHALLENGE: "I bet you can't guess what [product] does."
REACTION: "[reaction to trying product]... WAIT what?!"
Expert Educator
When to use: Online courses, professional services, B2B explainers, tutorials
Visual Formula:
[Knowledgeable expert] in [30s-55], [smart casual or academic style],
in [home study/office with books/whiteboard], [balanced lighting],
[thoughtful composed expression], [explaining with purposeful gestures],
[patient instructive tone], educator/thought leader style
Setting Options:
- Study with bookshelves
- Office with credentials visible
- Whiteboard/screen behind
- Standing at presentation
- Desk with relevant props
Wardrobe Options:
- Button-down shirt
- Blazer over casual shirt
- Sweater over collared shirt
- Glasses (authority signal)
- Minimal accessories
Energy Markers:
- Patient pace
- Teaching rhythm
- Logical structure
- Illustrative gestures
- "Here's what matters" moments
Lifestyle Aspirational
When to use: Luxury brands, high-ticket services, aspirational DTC, travel, real estate
Visual Formula:
[Elegant successful person] in [30s-50s], [elevated casual attire],
in [beautiful interior/scenic location], [golden hour OR designer lighting],
[relaxed confident demeanor], [speaking with quiet confidence], [minimal
but graceful movement], aspirational lifestyle aesthetic
Setting Options:
- Designer living room
- Travel location (balcony view)
- Luxury car interior
- High-end restaurant/hotel
- Yacht/beach/resort
Wardrobe Options:
- Designer casual
- Linen/natural fabrics
- Neutral luxury palette
- Subtle jewelry/watch
- Effortlessly elegant
Energy Markers:
- Relaxed confidence
- No rushing
- "I have time" energy
- Subtle smile
- Quiet success vibes
Video Model Roster (Quality Winners)
Generate presenter videos with ALL THREE models, present outputs for selection:
| Model | Owner | Speed | Strengths |
|---|---|---|---|
| Sora 2 | openai | ~80s | Excellent general quality, good faces |
| Veo 3.1 | ~130s | Native audio generation, natural movement | |
| Kling v2.5 Turbo Pro | kwaivgi | ~155s | Best for people/motion, most realistic |
Strategy: Run same prompt through all 3 models → User picks best output.
Model Selection Guide
FOR MAXIMUM REALISM (people quality):
→ Kling v2.5 Turbo Pro (best faces, most natural movement)
FOR SPEED + QUALITY BALANCE:
→ Sora 2 (fastest, still good quality)
FOR BUILT-IN AUDIO:
→ Veo 3.1 (generates audio with video)
FOR UGC AUTHENTICITY:
→ Kling v2.5 (handles casual movements well)
FOR CORPORATE/FORMAL:
→ Sora 2 or Kling v2.5 (cleaner, more controlled)
Lip-Sync Model
For adding speech to existing videos:
| Model | Use | Cost | Speed | Quality |
|---|---|---|---|---|
| Kling Lip-Sync | Add voiceover to any video | ~$0.20 | ~1min | Excellent |
When to use Lip-Sync:
- You have a great presenter video but need different script
- Client wants to change messaging after video generation
- Creating personalized versions of same base video
- Adding voiceover to product demo videos
- Dubbing content for different languages
Use Cases Deep Dive
1. Lip-Sync Overlay
Best for: Adding voiceover to existing video, dubbing, personalization
Input Requirements:
- Video with visible face (front-facing works best)
- Audio file (MP3, WAV) OR text script
Workflow:
{
"model_owner": "kwaivgi",
"model_name": "kling-lip-sync",
"Prefer": "wait",
"input": {
"video": "https://... (source video URL)",
"audio": "https://... (audio file URL)"
}
}
Or with text (uses built-in TTS):
{
"input": {
"video": "https://... (source video URL)",
"text": "Script text to speak"
}
}
Quality Tips:
- Source video should have face visible 70%+ of time
- Forward-facing shots work better than profiles
- Avoid videos with heavy face movement/turning
- Audio should be clear without background noise
- Script pacing should match natural speech
2. AI Presenter Generation
Best for: Creating presenter content from scratch, brand spokesperson
Multi-Model Workflow:
// Sora 2
{
"model_owner": "openai",
"model_name": "sora-2",
"input": {
"prompt": "[presenter prompt]",
"aspect_ratio": "16:9",
"duration": 5
}
}
// Veo 3.1 (with native audio)
{
"model_owner": "google",
"model_name": "veo-3.1",
"input": {
"prompt": "[presenter prompt]",
"aspect_ratio": "16:9",
"generate_audio": true
}
}
// Kling v2.5
{
"model_owner": "kwaivgi",
"model_name": "kling-v2.5-turbo-pro",
"input": {
"prompt": "[presenter prompt]",
"aspect_ratio": "16:9",
"duration": 5
}
}
Then add lip-sync if specific script needed:
{
"model_owner": "kwaivgi",
"model_name": "kling-lip-sync",
"input": {
"video": "[generated video URL]",
"text": "[script text]"
}
}
3. UGC-Style Content
Best for: Authentic testimonials, product reviews, social proof
The UGC Formula:
[Relatable person] + [Casual setting] + [Natural lighting] +
[Authentic delivery] + [Genuine reaction] = Believable UGC
Prompt Template:
Friendly [demographic] sitting in [casual setting], natural window light,
holding/showing [product], genuine excited expression, talking directly to
camera like filming a selfie video, authentic UGC testimonial style, casual
comfortable body language, 5 seconds
UGC Authenticity Markers:
- Slightly imperfect framing
- Natural lighting (not studio)
- Casual wardrobe
- Real reactions, not posed
- Personal space as backdrop
- Eye contact with camera
4. Personal Brand Series
Best for: Thought leaders, course creators, coaches, consultants
Consistency Formula:
ESTABLISH ONCE, USE FOREVER:
- Same presenter appearance
- Same setting/background
- Same wardrobe style
- Same energy level
- Same lighting setup
Only change: Script and specific content
Series Prompt Template:
[Consistent presenter description - use same each time], [same setting],
[same lighting], [same wardrobe style], [same energy], discussing [new topic],
[consistent delivery style], 5 seconds
Script Mastery
Duration Calculation
| Word Count | Duration | Use Case |
|---|---|---|
| 15 words | ~5 seconds | Social hook |
| 30 words | ~10 seconds | Instagram Reel |
| 45 words | ~15 seconds | TikTok optimal |
| 60 words | ~20 seconds | Short testimonial |
| 90 words | ~30 seconds | Product explainer |
| 150 words | ~60 seconds | Full testimonial |
Rule: ~150 words per minute at natural conversational pace
Script Structures
HOOK-VALUE-CTA (15-30 seconds):
Hook (0-3 sec): [Attention-grabber - question, statement, or pattern interrupt]
Value (3-20 sec): [Main message, benefit, or story]
CTA (20-30 sec): [Clear next step]
PROBLEM-AGITATE-SOLVE (30-60 seconds):
Problem (0-10 sec): [Name the pain point]
Agitate (10-30 sec): [Make them feel it]
Solve (30-60 sec): [Present the solution + CTA]
BEFORE-AFTER (15-30 seconds):
Before (0-10 sec): [Life before product/solution]
After (10-25 sec): [Transformation/result]
CTA (25-30 sec): [How to get same result]
Tone Templates
Professional/Corporate:
"[Name] here with [Company]. Today I want to share how [product/insight]
can help you [achieve outcome]. Here's what you need to know..."
Casual/UGC:
"Okay so I've been using [product] for [time] and honestly? I'm obsessed.
Here's why [specific benefit]. If you [problem], you need this."
Expert/Educational:
"One thing I see people get wrong about [topic] is [misconception].
Here's what actually works: [insight]. Let me show you..."
Energetic/Sales:
"Stop what you're doing. [Product] just changed everything. I'm serious -
[result] in [timeframe]. You HAVE to try this."
Aspirational:
"[Casual opening]. I wanted to share something that's completely transformed
[area of life]. [Product] gave me [result]. Here's how it works..."
Platform-Specific Optimization
TikTok/Reels (9:16)
Specs:
- Aspect Ratio: 9:16 (vertical)
- Duration: 15-30 seconds optimal
- Safe Zone: Keep face/text center 60%
Style Adjustments:
→ Higher energy delivery
→ Faster pacing
→ Hook in first 1-2 seconds
→ Pattern interrupts
→ Jump cuts acceptable
→ Casual/authentic feel
Prompt Modifier:
...[base prompt], filmed vertically like TikTok/Reels content,
energetic creator style, direct eye contact with camera
YouTube (16:9)
Specs:
- Aspect Ratio: 16:9 (landscape)
- Duration: 30-120 seconds
- Safe Zone: Standard letterbox
Style Adjustments:
→ More measured pacing
→ Can be longer form
→ More professional setups accepted
→ Room for B-roll integration
→ Intro/outro structure
Prompt Modifier:
...[base prompt], widescreen YouTube style, professional yet engaging,
room for graphics/lower thirds
LinkedIn (1:1 or 16:9)
Specs:
- Aspect Ratio: 1:1 (square) or 16:9
- Duration: 30-60 seconds optimal
- Tone: Professional but personal
Style Adjustments:
→ Professional appearance
→ Business-appropriate setting
→ Thought leadership tone
→ Value-first messaging
→ Credibility signals
Prompt Modifier:
...[base prompt], professional LinkedIn style, credible expert appearance,
business casual in modern office environment
Instagram Stories (9:16)
Specs:
- Aspect Ratio: 9:16
- Duration: 15 seconds max per segment
- Ephemeral feel
Style Adjustments:
→ Casual, in-the-moment feel
→ Can be "rougher" quality
→ Direct audience address
→ Personal/behind-scenes vibe
→ Clear single message per story
Ads (Various)
Facebook/Instagram Ads:
- 1:1, 4:5, or 9:16
- 15-30 second optimal
- Hook in 0-3 seconds
- Clear CTA
YouTube Ads:
- 16:9
- 15-30 second (skippable) or 6 second (bumper)
- Brand visible throughout
Audio & Voice Considerations
When Using Veo 3.1 Native Audio
Strengths:
- Generates synchronized audio with video
- Natural ambient sounds
- Speech that matches lip movement
- Good for establishing scenes
Limitations:
- Less control over specific script
- Audio quality varies
- May need post-processing
When Adding Lip-Sync
Best Practices:
- Use high-quality audio recording
- Match energy level to video presenter
- Pace script to natural speaking rhythm
- Allow for breath pauses
- Keep sentences short (easier sync)
Voice-Over Tips
If recording your own VO for lip-sync:
□ Record in quiet environment
□ Use consistent distance from mic
□ Match energy to presenter style
□ Natural pauses between sentences
□ Clear enunciation
□ Export as MP3 or WAV
If using TTS (text input):
□ Use punctuation for natural pauses
□ Write phonetically for tricky words
□ Keep sentences conversational length
□ Test different phrasings
□ Consider adding "..." for pauses
Execution Workflow
Step 1: Clarify Requirements
Before generating:
□ What's the use case? (UGC, corporate, educational, etc.)
□ What platform? (TikTok, YouTube, LinkedIn, ads)
□ What aspect ratio? (9:16, 16:9, 1:1)
□ What duration? (and word count)
□ What presenter style? (see archetypes)
□ What's the script/message?
□ Need lip-sync to specific audio?
Step 2: Style Selection
If not predefined:
□ Generate style exploration with 4-5 different presenter styles
□ Present options to user
□ Extract principles from winner
□ Document for consistency
Step 3: Construct Prompt
Use this formula:
[PRESENTER DESCRIPTION] + [SETTING] + [LIGHTING] +
[EXPRESSION/ENERGY] + [ACTION] + [STYLE MODIFIER] + [DURATION]
Step 4: Multi-Model Generation
Run same prompt through:
1. Sora 2 (~80s)
2. Veo 3.1 (~130s)
3. Kling v2.5 (~155s)
Present all three to user for selection.
Step 5: Add Lip-Sync (If Needed)
If specific script delivery required:
1. User approves video from Step 4
2. Run through Kling Lip-Sync
3. Input: selected video + audio/text
4. Output: synced talking head
Step 6: Deliver & Iterate
## Talking Head Video Options
**Style:** [Archetype used]
**Platform:** [Target platform]
**Duration:** [X seconds]
### Option 1: Sora 2
[video URL]
Notes: [quality assessment]
### Option 2: Veo 3.1 (with audio)
[video URL]
Notes: [quality assessment]
### Option 3: Kling v2.5
[video URL]
Notes: [quality assessment]
**Select preferred video for lip-sync or final delivery.**
Quality Checklist
Technical Quality
- Face clearly visible throughout
- No uncanny valley artifacts
- Consistent appearance (no morphing)
- Smooth natural movement
- Appropriate resolution for platform
Presenter Quality
- Matches intended archetype
- Expression appropriate for message
- Energy level fits content type
- Wardrobe matches brand/context
- Setting supports message
Lip-Sync Quality (if applicable)
- Mouth movement matches audio
- Natural speech rhythm
- No obvious desync
- Head movement doesn't break sync
- Audio quality clear
Content Quality
- Script delivered clearly
- Pacing appropriate for platform
- Hook captures attention
- Message comes through
- CTA clear (if applicable)
Common Issues & Solutions
| Issue | Cause | Solution |
|---|---|---|
| Uncanny valley feel | Model limitations | Use Kling v2.5 for most realistic faces |
| Face morphing mid-video | Long duration | Keep videos shorter (5-10 sec), extend with cuts |
| Lip-sync drift | Audio/video mismatch | Use shorter scripts, clear enunciation |
| Wrong energy level | Prompt too vague | Be explicit about energy: "calm" vs "enthusiastic" |
| Generic stock presenter | No specific direction | Add detailed demographic and style descriptors |
| Setting doesn't match | Prompt conflict | Prioritize setting description, remove conflicts |
| Awkward hand movement | Unspecified gestures | Add gesture direction or specify "minimal movement" |
| Bad lighting | Missing lighting prompt | Always include lighting: "warm natural light" |
| Doesn't look like brand | No style consistency | Create and use presenter spec document |
| Audio quality poor | TTS limitations | Use recorded audio instead of text input |
Output Format
Style Exploration Output
## Presenter Style Exploration
**Brand/Project:** [Name]
**Use Case:** [What videos will be used for]
### Style 1: Corporate Authority
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]
### Style 2: Relatable Friend
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]
[...continue for all 5 styles...]
**Recommendation:** Style [X] best fits because [reasons]
**Feedback needed:** Which direction resonates?
Generated Video Output
## Talking Head Video Generated
**Style:** [Archetype]
**Platform:** [Target]
**Duration:** [X seconds]
### Model Outputs:
**Sora 2:** [URL]
**Veo 3.1:** [URL] (includes audio)
**Kling v2.5:** [URL]
**Prompt Used:**
> [full prompt for reference]
**Next Steps:**
- [ ] Select preferred video
- [ ] Add lip-sync to specific script (if needed)
- [ ] Request variation
- [ ] Approve for use
Lip-Sync Output
## Lip-Sync Video Delivered
**Source Video:** [URL]
**Script:** "[excerpt...]"
**Duration:** [X seconds]
**Final Video:** [URL]
**Quality Check:**
- ✓ Sync accuracy
- ✓ Natural rhythm
- ✓ Audio clarity
- ✓ Expression match
**Options:**
- [ ] Approve and use
- [ ] Adjust script and resync
- [ ] Try different source video
Pipeline Integration
TALKING HEAD PIPELINE
┌─────────────────────────────────────────┐
│ Request arrives (direct or routed) │
│ → Clarify: platform, duration, style │
│ → Determine: generation vs lip-sync │
└─────────────────────────────────────────┘
│
┌───────────┴───────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Style Undefined │ │ Style Defined │
│ → Run style │ │ → Skip to │
│ exploration │ │ generation │
└──────────────────┘ └──────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ ai-talking-head (THIS SKILL) │
│ → Multi-model generation │
│ → Present options │
│ → Add lip-sync if needed │
│ → Quality check │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Delivery │
│ → Platform-optimized output │
│ → Ready for ads/social/content │
└─────────────────────────────────────────┘
Handoff Protocols
Receiving from ai-creative-workflow
Receive:
use_case: "talking head" | "UGC" | "presenter" | "lip-sync"
platform: "[target platform]"
aspect_ratio: "[ratio]"
duration: "[seconds]"
style: "[archetype or custom]"
script: "[text]"
audio_url: "[if lip-sync with audio]"
video_url: "[if lip-sync to existing]"
Returning to Workflow
Return:
status: "complete" | "needs_selection" | "needs_iteration"
deliverables:
- video_url: "[URL]"
model: "[which model]"
has_audio: true | false
duration: "[seconds]"
feedback_needed: "[any questions]"
Receiving Video from ai-product-video
Receive for lip-sync:
video_url: "[product video URL]"
aspect_ratio: "[ratio]"
script: "[voiceover text]"
audio_url: "[optional, if pre-recorded]"
Tips from Experience
What Works
- Consistency beats variety — Same presenter across videos builds recognition
- Kling v2.5 for faces — Most realistic human generation
- Shorter is safer — 5-10 second clips avoid quality degradation
- Explicit energy levels — "calm and measured" vs "enthusiastic and dynamic"
- Multi-model approach — Always generate with 2-3 models, let user pick
- Lip-sync extends value — One good video can become many scripts
What Doesn't Work
- Vague presenter description — "A person talking" = generic results
- Long continuous takes — Quality degrades after 10-15 seconds
- Ignoring setting — Presenter without context looks artificial
- Skipping style exploration — First idea rarely best for brand
- Mismatched energy — Corporate script + UGC style = awkward
- Complex movements — Walking + talking + gesturing = artifacts
The 80/20
80% of talking head success comes from:
- Clear presenter archetype selection
- Matching energy to platform
- Short, punchy scripts
- Using Kling v2.5 for realism
Get these four right, and you'll get good results.
Quick Reference
| Task | Model | Process |
|---|---|---|
| Generate presenter video | All 3 models | Multi-model, user picks |
| Add speech to existing video | Kling Lip-Sync | Direct, ~1min |
| Presenter + specific script | Generate → Lip-Sync | Two-step |
| Video with built-in audio | Veo 3.1 | Single generation |
| Most realistic face | Kling v2.5 | Single or multi-model |
| Fastest generation | Sora 2 | Single generation |
| UGC style | Kling v2.5 | Handles casual movement best |