Claude Code Plugins

Community-maintained marketplace

Feedback

Specialized skill for AI talking head and lip-sync video generation. Use when you need presenter videos, UGC-style content, or lip-synced avatars. Triggers on: talking head, presenter video, lip sync, UGC video. Outputs professional talking head videos.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ai-talking-head
description Specialized skill for AI talking head and lip-sync video generation. Use when you need presenter videos, UGC-style content, or lip-synced avatars. Triggers on: talking head, presenter video, lip sync, UGC video. Outputs professional talking head videos.

AI Talking Head

Generate talking head videos, presenter content, and lip-synced videos.

Use this skill when: You need a person (real or AI) talking to camera. Route here from: ai-creative-workflow, ai-creative-strategist, or direct requests.


Why This Skill Exists

The problem: Talking head videos are the most persuasive content format but:

  1. Recording yourself is time-consuming and requires confidence
  2. Professional presenters are expensive ($500-5000+ per video)
  3. UGC creators charge $100-500 per post and may not match your brand
  4. Iterating on scripts means re-filming everything
  5. Scaling personalized video is nearly impossible manually

The solution: AI talking heads that:

  • Generate professional presenter videos in minutes
  • Let you iterate on scripts without re-recording
  • Create unlimited variants for A/B testing
  • Maintain consistent brand presenter identity
  • Scale personalized outreach cost-effectively

The game-changer: Combining avatar generation + lip-sync lets you:

  • Create a consistent "brand spokesperson"
  • Update any script without re-filming
  • Test multiple presenter styles quickly
  • Produce video content at 10x the speed

Presenter Style Exploration (Before Generation)

Critical insight from ai-creative-strategist: Don't generate with one style and hope it works. Explore genuinely DIFFERENT presenter styles first.

The Style Exploration Process

STEP 1: GENERATE 4-5 DIFFERENT PRESENTER STYLES

This is NOT: Same person with different clothes This IS: Fundamentally different presenter archetypes that each tell a different story

[YOUR BRAND] - Style Exploration

Generate presenter concepts for these 5 directions:

1. CORPORATE AUTHORITY
   - Demographic: 35-50, professional appearance
   - Setting: Modern office, corporate environment
   - Wardrobe: Business professional, suit/blazer
   - Energy: Confident, measured, authoritative
   - Vibe: "Trust the expert"

2. RELATABLE FRIEND
   - Demographic: 25-40, approachable look
   - Setting: Home office, kitchen, casual space
   - Wardrobe: Smart casual, comfortable
   - Energy: Warm, conversational, genuine
   - Vibe: "Let me share what worked for me"

3. ENERGETIC CREATOR
   - Demographic: 22-35, creator aesthetic
   - Setting: Ring light setup, content studio
   - Wardrobe: Trendy casual, branded
   - Energy: High, dynamic, enthusiastic
   - Vibe: "You HAVE to try this"

4. EXPERT EDUCATOR
   - Demographic: 30-55, credible appearance
   - Setting: Study, library, professional backdrop
   - Wardrobe: Smart casual, glasses optional
   - Energy: Calm, explanatory, helpful
   - Vibe: "Let me explain how this works"

5. LIFESTYLE ASPIRATIONAL
   - Demographic: 28-45, aspirational look
   - Setting: Beautiful home, travel location, luxury
   - Wardrobe: Elevated casual, tasteful
   - Energy: Relaxed confidence, success aura
   - Vibe: "This is what my life looks like"

STEP 2: IDENTIFY WINNER

After generating style exploration:

REVIEW each presenter style:

Which presenter:
- Best matches brand voice?
- Would audience trust most?
- Fits the content type?
- Has right energy level?
- Would work across multiple videos?

WINNER: [Selected style]
BECAUSE: [Why this style wins for this brand/use case]

STEP 3: EXTRACT PRESENTER PRINCIPLES

Once winner identified:

WINNING STYLE EXTRACTION

Demographics:
- Age range: [X-X]
- Gender: [if specific]
- Ethnicity: [if specific]
- Overall look: [descriptors]

Environment:
- Primary setting: [where they present from]
- Background elements: [what's visible]
- Lighting style: [natural/studio/mixed]

Wardrobe:
- Style: [formal/casual/etc.]
- Colors: [palette]
- Accessories: [if any]

Delivery:
- Energy level: [1-10]
- Speaking pace: [slow/medium/fast]
- Hand gestures: [minimal/moderate/expressive]
- Eye contact: [direct to camera always]

Audio:
- Voice tone: [warm/authoritative/energetic]
- Pacing: [conversational/punchy/measured]

STEP 4: APPLY ACROSS CONTENT

Use extracted principles for:

  • All future videos maintain consistency
  • Same presenter = brand recognition
  • Variations in script, not in presenter

Presenter Archetype Deep Dives

Corporate Authority

When to use: B2B, financial services, healthcare, enterprise SaaS, professional services

Visual Formula:

[Man/Woman] in [30s-50s], [silver/dark hair], wearing [tailored blazer/suit],
in [modern glass office/conference room with city view], [warm professional lighting],
[confident composed expression], [seated at desk OR standing with slight lean],
[direct eye contact with camera], [subtle hand gestures], corporate executive style

Setting Options:

  • Corner office with city view
  • Modern conference room
  • Executive desk with minimal decor
  • Standing at presentation screen
  • Seated in designer chair

Wardrobe Options:

  • Tailored navy blazer over white shirt
  • Grey suit, no tie (modern)
  • Classic suit with subtle tie
  • Blazer over turtleneck (thought leader)
  • Professional dress (solid colors)

Energy Markers:

  • Measured pace
  • Deliberate movements
  • Confident pauses
  • Minimal but purposeful gestures
  • Assured vocal tone

Relatable Friend (UGC Style)

When to use: DTC brands, consumer products, wellness, beauty, lifestyle

Visual Formula:

[Friendly man/woman] in [25-40s], wearing [casual but put-together outfit],
in [bright modern apartment/kitchen/home office], [natural window light],
[genuine warm smile], [relaxed comfortable posture], [talking to camera like
a friend], [natural hand movements], authentic UGC creator style

Setting Options:

  • Bright kitchen counter
  • Cozy living room couch
  • Home office with plants
  • Bedroom getting-ready setup
  • Outdoor patio/balcony

Wardrobe Options:

  • Cozy sweater/cardigan
  • Simple t-shirt
  • Casual button-down
  • Loungewear (if brand appropriate)
  • Athleisure

Energy Markers:

  • Conversational rhythm
  • Natural pauses ("honestly?", "okay so...")
  • Expressive facial reactions
  • Genuine enthusiasm without over-selling
  • Relatable body language

UGC Script Patterns:

DISCOVERY: "Okay so I found this [product] and I'm obsessed..."
REVIEW: "So I've been using [product] for [time] and here's my honest take..."
COMPARISON: "I used to use [old product] but then I tried [new product]..."
TRANSFORMATION: "Before [product] I was [problem]. Now? [result]."

Energetic Creator

When to use: Gen-Z products, entertainment, gaming, trendy DTC, social apps

Visual Formula:

[Young energetic creator] in [22-35], [colorful trendy outfit], in [content
studio with ring light/neon lights], [bright dynamic lighting], [animated
expressions], [lots of movement and gestures], [high energy delivery],
[fast-paced enthusiastic style], YouTube/TikTok creator aesthetic

Setting Options:

  • Ring light setup visible
  • LED/neon accent lighting
  • Streaming/gaming setup
  • Colorful backdrop
  • Outdoor action setting

Wardrobe Options:

  • Graphic tees
  • Bold colors
  • Branded merch
  • Trendy streetwear
  • Statement accessories

Energy Markers:

  • Fast-paced delivery
  • Big expressions
  • Lots of hand movement
  • Pattern interrupts
  • Enthusiasm at 10

Creator Script Patterns:

HOOK: "STOP scrolling. This is important."
REVEAL: "I literally just discovered [thing] and I'm freaking out."
CHALLENGE: "I bet you can't guess what [product] does."
REACTION: "[reaction to trying product]... WAIT what?!"

Expert Educator

When to use: Online courses, professional services, B2B explainers, tutorials

Visual Formula:

[Knowledgeable expert] in [30s-55], [smart casual or academic style],
in [home study/office with books/whiteboard], [balanced lighting],
[thoughtful composed expression], [explaining with purposeful gestures],
[patient instructive tone], educator/thought leader style

Setting Options:

  • Study with bookshelves
  • Office with credentials visible
  • Whiteboard/screen behind
  • Standing at presentation
  • Desk with relevant props

Wardrobe Options:

  • Button-down shirt
  • Blazer over casual shirt
  • Sweater over collared shirt
  • Glasses (authority signal)
  • Minimal accessories

Energy Markers:

  • Patient pace
  • Teaching rhythm
  • Logical structure
  • Illustrative gestures
  • "Here's what matters" moments

Lifestyle Aspirational

When to use: Luxury brands, high-ticket services, aspirational DTC, travel, real estate

Visual Formula:

[Elegant successful person] in [30s-50s], [elevated casual attire],
in [beautiful interior/scenic location], [golden hour OR designer lighting],
[relaxed confident demeanor], [speaking with quiet confidence], [minimal
but graceful movement], aspirational lifestyle aesthetic

Setting Options:

  • Designer living room
  • Travel location (balcony view)
  • Luxury car interior
  • High-end restaurant/hotel
  • Yacht/beach/resort

Wardrobe Options:

  • Designer casual
  • Linen/natural fabrics
  • Neutral luxury palette
  • Subtle jewelry/watch
  • Effortlessly elegant

Energy Markers:

  • Relaxed confidence
  • No rushing
  • "I have time" energy
  • Subtle smile
  • Quiet success vibes

Video Model Roster (Quality Winners)

Generate presenter videos with ALL THREE models, present outputs for selection:

Model Owner Speed Strengths
Sora 2 openai ~80s Excellent general quality, good faces
Veo 3.1 google ~130s Native audio generation, natural movement
Kling v2.5 Turbo Pro kwaivgi ~155s Best for people/motion, most realistic

Strategy: Run same prompt through all 3 models → User picks best output.

Model Selection Guide

FOR MAXIMUM REALISM (people quality):
    → Kling v2.5 Turbo Pro (best faces, most natural movement)

FOR SPEED + QUALITY BALANCE:
    → Sora 2 (fastest, still good quality)

FOR BUILT-IN AUDIO:
    → Veo 3.1 (generates audio with video)

FOR UGC AUTHENTICITY:
    → Kling v2.5 (handles casual movements well)

FOR CORPORATE/FORMAL:
    → Sora 2 or Kling v2.5 (cleaner, more controlled)

Lip-Sync Model

For adding speech to existing videos:

Model Use Cost Speed Quality
Kling Lip-Sync Add voiceover to any video ~$0.20 ~1min Excellent

When to use Lip-Sync:

  • You have a great presenter video but need different script
  • Client wants to change messaging after video generation
  • Creating personalized versions of same base video
  • Adding voiceover to product demo videos
  • Dubbing content for different languages

Use Cases Deep Dive

1. Lip-Sync Overlay

Best for: Adding voiceover to existing video, dubbing, personalization

Input Requirements:

  • Video with visible face (front-facing works best)
  • Audio file (MP3, WAV) OR text script

Workflow:

{
  "model_owner": "kwaivgi",
  "model_name": "kling-lip-sync",
  "Prefer": "wait",
  "input": {
    "video": "https://... (source video URL)",
    "audio": "https://... (audio file URL)"
  }
}

Or with text (uses built-in TTS):

{
  "input": {
    "video": "https://... (source video URL)",
    "text": "Script text to speak"
  }
}

Quality Tips:

  • Source video should have face visible 70%+ of time
  • Forward-facing shots work better than profiles
  • Avoid videos with heavy face movement/turning
  • Audio should be clear without background noise
  • Script pacing should match natural speech

2. AI Presenter Generation

Best for: Creating presenter content from scratch, brand spokesperson

Multi-Model Workflow:

// Sora 2
{
  "model_owner": "openai",
  "model_name": "sora-2",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "duration": 5
  }
}

// Veo 3.1 (with native audio)
{
  "model_owner": "google",
  "model_name": "veo-3.1",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "generate_audio": true
  }
}

// Kling v2.5
{
  "model_owner": "kwaivgi",
  "model_name": "kling-v2.5-turbo-pro",
  "input": {
    "prompt": "[presenter prompt]",
    "aspect_ratio": "16:9",
    "duration": 5
  }
}

Then add lip-sync if specific script needed:

{
  "model_owner": "kwaivgi",
  "model_name": "kling-lip-sync",
  "input": {
    "video": "[generated video URL]",
    "text": "[script text]"
  }
}

3. UGC-Style Content

Best for: Authentic testimonials, product reviews, social proof

The UGC Formula:

[Relatable person] + [Casual setting] + [Natural lighting] +
[Authentic delivery] + [Genuine reaction] = Believable UGC

Prompt Template:

Friendly [demographic] sitting in [casual setting], natural window light,
holding/showing [product], genuine excited expression, talking directly to
camera like filming a selfie video, authentic UGC testimonial style, casual
comfortable body language, 5 seconds

UGC Authenticity Markers:

  • Slightly imperfect framing
  • Natural lighting (not studio)
  • Casual wardrobe
  • Real reactions, not posed
  • Personal space as backdrop
  • Eye contact with camera

4. Personal Brand Series

Best for: Thought leaders, course creators, coaches, consultants

Consistency Formula:

ESTABLISH ONCE, USE FOREVER:
- Same presenter appearance
- Same setting/background
- Same wardrobe style
- Same energy level
- Same lighting setup

Only change: Script and specific content

Series Prompt Template:

[Consistent presenter description - use same each time], [same setting],
[same lighting], [same wardrobe style], [same energy], discussing [new topic],
[consistent delivery style], 5 seconds

Script Mastery

Duration Calculation

Word Count Duration Use Case
15 words ~5 seconds Social hook
30 words ~10 seconds Instagram Reel
45 words ~15 seconds TikTok optimal
60 words ~20 seconds Short testimonial
90 words ~30 seconds Product explainer
150 words ~60 seconds Full testimonial

Rule: ~150 words per minute at natural conversational pace

Script Structures

HOOK-VALUE-CTA (15-30 seconds):

Hook (0-3 sec): [Attention-grabber - question, statement, or pattern interrupt]
Value (3-20 sec): [Main message, benefit, or story]
CTA (20-30 sec): [Clear next step]

PROBLEM-AGITATE-SOLVE (30-60 seconds):

Problem (0-10 sec): [Name the pain point]
Agitate (10-30 sec): [Make them feel it]
Solve (30-60 sec): [Present the solution + CTA]

BEFORE-AFTER (15-30 seconds):

Before (0-10 sec): [Life before product/solution]
After (10-25 sec): [Transformation/result]
CTA (25-30 sec): [How to get same result]

Tone Templates

Professional/Corporate:

"[Name] here with [Company]. Today I want to share how [product/insight]
can help you [achieve outcome]. Here's what you need to know..."

Casual/UGC:

"Okay so I've been using [product] for [time] and honestly? I'm obsessed.
Here's why [specific benefit]. If you [problem], you need this."

Expert/Educational:

"One thing I see people get wrong about [topic] is [misconception].
Here's what actually works: [insight]. Let me show you..."

Energetic/Sales:

"Stop what you're doing. [Product] just changed everything. I'm serious -
[result] in [timeframe]. You HAVE to try this."

Aspirational:

"[Casual opening]. I wanted to share something that's completely transformed
[area of life]. [Product] gave me [result]. Here's how it works..."

Platform-Specific Optimization

TikTok/Reels (9:16)

Specs:

  • Aspect Ratio: 9:16 (vertical)
  • Duration: 15-30 seconds optimal
  • Safe Zone: Keep face/text center 60%

Style Adjustments:

→ Higher energy delivery
→ Faster pacing
→ Hook in first 1-2 seconds
→ Pattern interrupts
→ Jump cuts acceptable
→ Casual/authentic feel

Prompt Modifier:

...[base prompt], filmed vertically like TikTok/Reels content,
energetic creator style, direct eye contact with camera

YouTube (16:9)

Specs:

  • Aspect Ratio: 16:9 (landscape)
  • Duration: 30-120 seconds
  • Safe Zone: Standard letterbox

Style Adjustments:

→ More measured pacing
→ Can be longer form
→ More professional setups accepted
→ Room for B-roll integration
→ Intro/outro structure

Prompt Modifier:

...[base prompt], widescreen YouTube style, professional yet engaging,
room for graphics/lower thirds

LinkedIn (1:1 or 16:9)

Specs:

  • Aspect Ratio: 1:1 (square) or 16:9
  • Duration: 30-60 seconds optimal
  • Tone: Professional but personal

Style Adjustments:

→ Professional appearance
→ Business-appropriate setting
→ Thought leadership tone
→ Value-first messaging
→ Credibility signals

Prompt Modifier:

...[base prompt], professional LinkedIn style, credible expert appearance,
business casual in modern office environment

Instagram Stories (9:16)

Specs:

  • Aspect Ratio: 9:16
  • Duration: 15 seconds max per segment
  • Ephemeral feel

Style Adjustments:

→ Casual, in-the-moment feel
→ Can be "rougher" quality
→ Direct audience address
→ Personal/behind-scenes vibe
→ Clear single message per story

Ads (Various)

Facebook/Instagram Ads:

  • 1:1, 4:5, or 9:16
  • 15-30 second optimal
  • Hook in 0-3 seconds
  • Clear CTA

YouTube Ads:

  • 16:9
  • 15-30 second (skippable) or 6 second (bumper)
  • Brand visible throughout

Audio & Voice Considerations

When Using Veo 3.1 Native Audio

Strengths:

  • Generates synchronized audio with video
  • Natural ambient sounds
  • Speech that matches lip movement
  • Good for establishing scenes

Limitations:

  • Less control over specific script
  • Audio quality varies
  • May need post-processing

When Adding Lip-Sync

Best Practices:

  • Use high-quality audio recording
  • Match energy level to video presenter
  • Pace script to natural speaking rhythm
  • Allow for breath pauses
  • Keep sentences short (easier sync)

Voice-Over Tips

If recording your own VO for lip-sync:

□ Record in quiet environment
□ Use consistent distance from mic
□ Match energy to presenter style
□ Natural pauses between sentences
□ Clear enunciation
□ Export as MP3 or WAV

If using TTS (text input):

□ Use punctuation for natural pauses
□ Write phonetically for tricky words
□ Keep sentences conversational length
□ Test different phrasings
□ Consider adding "..." for pauses

Execution Workflow

Step 1: Clarify Requirements

Before generating:

□ What's the use case? (UGC, corporate, educational, etc.)
□ What platform? (TikTok, YouTube, LinkedIn, ads)
□ What aspect ratio? (9:16, 16:9, 1:1)
□ What duration? (and word count)
□ What presenter style? (see archetypes)
□ What's the script/message?
□ Need lip-sync to specific audio?

Step 2: Style Selection

If not predefined:

□ Generate style exploration with 4-5 different presenter styles
□ Present options to user
□ Extract principles from winner
□ Document for consistency

Step 3: Construct Prompt

Use this formula:

[PRESENTER DESCRIPTION] + [SETTING] + [LIGHTING] +
[EXPRESSION/ENERGY] + [ACTION] + [STYLE MODIFIER] + [DURATION]

Step 4: Multi-Model Generation

Run same prompt through:
1. Sora 2 (~80s)
2. Veo 3.1 (~130s)
3. Kling v2.5 (~155s)

Present all three to user for selection.

Step 5: Add Lip-Sync (If Needed)

If specific script delivery required:

1. User approves video from Step 4
2. Run through Kling Lip-Sync
3. Input: selected video + audio/text
4. Output: synced talking head

Step 6: Deliver & Iterate

## Talking Head Video Options

**Style:** [Archetype used]
**Platform:** [Target platform]
**Duration:** [X seconds]

### Option 1: Sora 2
[video URL]
Notes: [quality assessment]

### Option 2: Veo 3.1 (with audio)
[video URL]
Notes: [quality assessment]

### Option 3: Kling v2.5
[video URL]
Notes: [quality assessment]

**Select preferred video for lip-sync or final delivery.**

Quality Checklist

Technical Quality

  • Face clearly visible throughout
  • No uncanny valley artifacts
  • Consistent appearance (no morphing)
  • Smooth natural movement
  • Appropriate resolution for platform

Presenter Quality

  • Matches intended archetype
  • Expression appropriate for message
  • Energy level fits content type
  • Wardrobe matches brand/context
  • Setting supports message

Lip-Sync Quality (if applicable)

  • Mouth movement matches audio
  • Natural speech rhythm
  • No obvious desync
  • Head movement doesn't break sync
  • Audio quality clear

Content Quality

  • Script delivered clearly
  • Pacing appropriate for platform
  • Hook captures attention
  • Message comes through
  • CTA clear (if applicable)

Common Issues & Solutions

Issue Cause Solution
Uncanny valley feel Model limitations Use Kling v2.5 for most realistic faces
Face morphing mid-video Long duration Keep videos shorter (5-10 sec), extend with cuts
Lip-sync drift Audio/video mismatch Use shorter scripts, clear enunciation
Wrong energy level Prompt too vague Be explicit about energy: "calm" vs "enthusiastic"
Generic stock presenter No specific direction Add detailed demographic and style descriptors
Setting doesn't match Prompt conflict Prioritize setting description, remove conflicts
Awkward hand movement Unspecified gestures Add gesture direction or specify "minimal movement"
Bad lighting Missing lighting prompt Always include lighting: "warm natural light"
Doesn't look like brand No style consistency Create and use presenter spec document
Audio quality poor TTS limitations Use recorded audio instead of text input

Output Format

Style Exploration Output

## Presenter Style Exploration

**Brand/Project:** [Name]
**Use Case:** [What videos will be used for]

### Style 1: Corporate Authority
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]

### Style 2: Relatable Friend
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]

[...continue for all 5 styles...]

**Recommendation:** Style [X] best fits because [reasons]
**Feedback needed:** Which direction resonates?

Generated Video Output

## Talking Head Video Generated

**Style:** [Archetype]
**Platform:** [Target]
**Duration:** [X seconds]

### Model Outputs:

**Sora 2:** [URL]
**Veo 3.1:** [URL] (includes audio)
**Kling v2.5:** [URL]

**Prompt Used:**
> [full prompt for reference]

**Next Steps:**
- [ ] Select preferred video
- [ ] Add lip-sync to specific script (if needed)
- [ ] Request variation
- [ ] Approve for use

Lip-Sync Output

## Lip-Sync Video Delivered

**Source Video:** [URL]
**Script:** "[excerpt...]"
**Duration:** [X seconds]

**Final Video:** [URL]

**Quality Check:**
- ✓ Sync accuracy
- ✓ Natural rhythm
- ✓ Audio clarity
- ✓ Expression match

**Options:**
- [ ] Approve and use
- [ ] Adjust script and resync
- [ ] Try different source video

Pipeline Integration

TALKING HEAD PIPELINE

┌─────────────────────────────────────────┐
│  Request arrives (direct or routed)     │
│  → Clarify: platform, duration, style   │
│  → Determine: generation vs lip-sync    │
└─────────────────────────────────────────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
┌──────────────────┐   ┌──────────────────┐
│  Style Undefined │   │  Style Defined   │
│  → Run style     │   │  → Skip to       │
│    exploration   │   │    generation    │
└──────────────────┘   └──────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  ai-talking-head (THIS SKILL)           │
│  → Multi-model generation               │
│  → Present options                      │
│  → Add lip-sync if needed               │
│  → Quality check                        │
└─────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  Delivery                               │
│  → Platform-optimized output            │
│  → Ready for ads/social/content         │
└─────────────────────────────────────────┘

Handoff Protocols

Receiving from ai-creative-workflow

Receive:
  use_case: "talking head" | "UGC" | "presenter" | "lip-sync"
  platform: "[target platform]"
  aspect_ratio: "[ratio]"
  duration: "[seconds]"
  style: "[archetype or custom]"
  script: "[text]"
  audio_url: "[if lip-sync with audio]"
  video_url: "[if lip-sync to existing]"

Returning to Workflow

Return:
  status: "complete" | "needs_selection" | "needs_iteration"
  deliverables:
    - video_url: "[URL]"
      model: "[which model]"
      has_audio: true | false
      duration: "[seconds]"
  feedback_needed: "[any questions]"

Receiving Video from ai-product-video

Receive for lip-sync:
  video_url: "[product video URL]"
  aspect_ratio: "[ratio]"
  script: "[voiceover text]"
  audio_url: "[optional, if pre-recorded]"

Tips from Experience

What Works

  1. Consistency beats variety — Same presenter across videos builds recognition
  2. Kling v2.5 for faces — Most realistic human generation
  3. Shorter is safer — 5-10 second clips avoid quality degradation
  4. Explicit energy levels — "calm and measured" vs "enthusiastic and dynamic"
  5. Multi-model approach — Always generate with 2-3 models, let user pick
  6. Lip-sync extends value — One good video can become many scripts

What Doesn't Work

  1. Vague presenter description — "A person talking" = generic results
  2. Long continuous takes — Quality degrades after 10-15 seconds
  3. Ignoring setting — Presenter without context looks artificial
  4. Skipping style exploration — First idea rarely best for brand
  5. Mismatched energy — Corporate script + UGC style = awkward
  6. Complex movements — Walking + talking + gesturing = artifacts

The 80/20

80% of talking head success comes from:

  1. Clear presenter archetype selection
  2. Matching energy to platform
  3. Short, punchy scripts
  4. Using Kling v2.5 for realism

Get these four right, and you'll get good results.


Quick Reference

Task Model Process
Generate presenter video All 3 models Multi-model, user picks
Add speech to existing video Kling Lip-Sync Direct, ~1min
Presenter + specific script Generate → Lip-Sync Two-step
Video with built-in audio Veo 3.1 Single generation
Most realistic face Kling v2.5 Single or multi-model
Fastest generation Sora 2 Single generation
UGC style Kling v2.5 Handles casual movement best