| name | arch-v |
| description | Video production workflow orchestrator for Veo 3. Guides users through creating professional video prompts via two paths - direct text-to-video OR image-to-video pipeline (Imagen 3/4 → Veo 3). Validates prompt completeness, checks conflicts, ensures all mandatory components present. Integrates camera-movements, great-prompt-anatomy, short-prompt-guide, long-prompt-guide, and imagine skills. |
ARCH-V: Video Production Orchestrator
Professional video prompt creation for Veo 3 with two production paths.
Overview
ARCH-V guides you through creating production-ready video prompts by:
- Determining optimal workflow path
- Loading appropriate reference skills
- Validating mandatory components
- Checking for conflicts
- Delivering final validated prompts
Two Production Paths
Path 1: Text-to-Video (Direct Veo 3)
When to use: You have clear vision for entire video including motion, audio, and can describe it in text.
Output: Single Veo 3 prompt ready for text-to-video generation
Skills used:
- great-prompt-anatomy (8 mandatory components)
- camera-movements (standardized vocabulary)
- short-prompt-guide OR long-prompt-guide (based on complexity)
Path 2: Image-to-Video (Imagen → Veo 3)
When to use: You want precise control over starting visual, or have specific still image composition in mind before adding motion.
Output: Two prompts
- Imagen 3/4 prompt for still image generation
- Veo 3 image-to-video prompt for animation/motion
Skills used:
- Stage A (Imagen): imagine skill + great-prompt-anatomy (visual components)
- Stage B (Veo 3): camera-movements + great-prompt-anatomy (motion components)
Workflow Process
Stage 0: Path Determination
ARCH-V asks:
Which production path?
1. Text-to-Video (Direct Veo 3)
- Describe entire video in single prompt
- Faster workflow
- Less control over initial composition
2. Image-to-Video (Imagen → Veo 3 pipeline)
- Create perfect still image first
- Then add motion and animation
- Maximum control over visual composition
- Two-step process
User chooses path → ARCH-V loads appropriate skills and guides accordingly
Path 1 Workflow: Text-to-Video
Stage 1: Prompt Type Determination
ARCH-V asks:
What type of video prompt?
SHORT PROMPT (for):
- Filler shots, B-roll
- Atmospheric scenes
- Quick establishing shots
- <3 sentences to describe
LONG PROMPT (for):
- Dialogue scenes
- Character continuity
- Multi-beat sequences (>3 beats)
- Complex choreography
User chooses → ARCH-V loads short-prompt-guide OR long-prompt-guide
Stage 2: Mandatory Components Check
ARCH-V validates all 8 components from great-prompt-anatomy:
Checklist:
-
- Subject (who/what in shot)
-
- Setting (where/when)
-
- Action (what's happening)
-
- Style/Genre (aesthetic)
-
- Camera/Composition (shot size, angle, movement)
-
- Lighting/Mood (light sources, emotional tone)
-
- Audio (dialogue, ambience, music)
-
- Constraints (prohibitions, exact requirements)
If camera movement mentioned: Load camera-movements for vocabulary validation
Stage 3: Validation & Conflict Checking
ARCH-V checks for:
Time/Weather Conflicts:
- ❌ "Golden hour" with "midnight"
- ❌ "Harsh noon sun" with "soft evening light"
- ✅ Consistent time of day throughout
Camera Movement Conflicts:
- ❌ "Dolly in while arc left" (multiple movements per beat)
- ✅ One movement per beat from standardized vocabulary
Spatial Coherence (if long prompt):
- ✅ FG/MG/BG structure defined
- ✅ Color anchors consistent (3-5 colors repeated)
- ✅ Continuity rules explicit
Stage 4: Output
If all valid:
✅ PROMPT READY
[Final Veo 3 text-to-video prompt displayed]
Ready to use in Veo 3!
If validation fails:
⚠️ PROMPT-LOCKED
Missing/Conflicting:
- [specific issue 1]
- [specific issue 2]
Suggested fixes:
- [actionable fix 1]
- [actionable fix 2]
Would you like me to help resolve these?
Path 2 Workflow: Image-to-Video
Stage A: Imagen Prompt Creation
ARCH-V loads: imagine skill for Imagen 3/4 structure
Focus on VISUAL components from great-prompt-anatomy:
Stage A Checklist:
-
- Subject (detailed visual description)
-
- Setting (environment, placement)
-
- Style/Genre (photographic or artistic style)
-
- Camera/Composition (shot size, angle - STATIC, no movement)
-
- Lighting/Mood (light sources, color palette)
-
- Constraints (visual prohibitions)
NOT included in Stage A:
- ❌ Action (no motion in still image)
- ❌ Audio (images have no sound)
- ❌ Camera movements (static composition)
Imagine-specific guidance:
- Subject-Context-Style framework
- Technical photography specs (lens, lighting quality)
- Art style references (Science SARU by default)
- Natural language verbose descriptions
- Token limit: 480 tokens
Stage A Output:
✅ IMAGEN PROMPT READY
[Detailed Imagen 3/4 prompt for still image]
Generate this image in Imagen 3/4 first.
Once you have the image, proceed to Stage B.
Stage B: Veo 3 Motion Prompt Creation
ARCH-V asks:
You now have your still image. Let's add motion!
What type of motion complexity?
SHORT MOTION (for):
- Simple camera movement
- Atmospheric animation
- Single motion element
LONG MOTION (for):
- Complex choreography
- Multiple action beats
- Character animation with timing
User chooses → ARCH-V loads appropriate prompt guide
Stage B Checklist (MOTION components):
-
- Action (what motion/animation occurs)
-
- Camera/Composition (movement from static image)
-
- Audio (sound design for video)
Additional from Stage A (maintained):
- ✅ Subject (already defined in image)
- ✅ Setting (already defined in image)
- ✅ Style (must match image aesthetic)
- ✅ Lighting (must match image lighting)
Camera movements: Load camera-movements for vocabulary
Validation checks:
- Motion must be achievable from static image
- Camera movement must respect image composition
- Action must fit subject/setting from Stage A
- Audio must match visual style
Stage B Output:
✅ VEO 3 IMAGE-TO-VIDEO PROMPT READY
[Veo 3 prompt referencing your Imagen-generated image]
Use this prompt in Veo 3 with your generated image!
Integration Pattern
Skills Cross-Reference
great-prompt-anatomy (8 components):
- Used in ALL workflows
- Path 1: All 8 components
- Path 2A (Imagen): Components 1,2,4,5,6,8 (visual only)
- Path 2B (Veo 3 motion): Components 3,5,7 + reference to image
camera-movements:
- Path 1: For camera movement specification
- Path 2A: NOT used (static image)
- Path 2B: CRITICAL (motion from static)
short/long-prompt-guide:
- Path 1: Guide entire prompt structure
- Path 2B: Guide motion/animation structure
imagine:
- Path 1: NOT used
- Path 2A: PRIMARY guide for Imagen structure
Workflow Decision Tree
User: "I want to create a video"
↓
ARCH-V: "Text-to-Video or Image-to-Video?"
↓
Path 1: Text-to-Video
↓
"Short or Long prompt?"
↓
Load: great-prompt-anatomy + camera-movements + [short/long]-guide
↓
Validate 8 components + conflicts
↓
Output: Veo 3 prompt
Path 2: Image-to-Video
↓
STAGE A: "Create still image"
↓
Load: imagine + great-prompt-anatomy (visual components)
↓
Validate visual components
↓
Output: Imagen prompt
↓
User generates image
↓
STAGE B: "Add motion to image"
↓
"Short or Long motion?"
↓
Load: camera-movements + great-prompt-anatomy (motion components)
↓
Validate motion feasibility + conflicts
↓
Output: Veo 3 image-to-video prompt
Validation Rules
Universal Validations (All Paths)
Mandatory Component Presence:
- All required components for chosen path present
- No empty placeholders
- Specific details provided (not vague)
Camera Movement:
- ONE movement per beat/timestamp
- Uses standardized vocabulary from camera-movements skill
- Movement appropriate for shot type
Style Consistency:
- Style/aesthetic maintained throughout
- Color palette specified (3-5 color anchors)
- Lighting approach consistent
Path 1 Specific Validations
Time/Weather Continuity:
- Single time of day throughout (unless intentional transition)
- Weather consistent (no sudden rain → sun)
- Lighting quality matches time/weather
Audio Appropriateness:
- Dialogue formatted correctly (
Character: "Text") - Ambient sounds match environment
- Music style fits mood
Path 2 Specific Validations
Stage A (Imagen) Validations:
- No motion words (running, flying, moving) - image is static
- No audio descriptions - images silent
- Technical photography specs appropriate
- Token count under 480
Stage B (Veo 3 motion) Validations:
- Motion achievable from static starting point
- Camera movement respects image composition
- Action fits subject capabilities
- Audio matches visual aesthetic from Stage A
Error Messages & Fixes
Common Issues
Issue: Missing Components
⚠️ PROMPT-LOCKED
Missing mandatory components:
- Audio not specified
- Lighting/Mood undefined
Fix: Add audio description (dialogue/ambience/music)
Fix: Specify light sources and mood
Issue: Camera Movement Conflict
⚠️ PROMPT-LOCKED
Conflict detected:
- "0-4s: Dolly in while panning left"
Fix: Choose ONE movement per beat
- Option A: "0-4s: Dolly in"
- Option B: "0-4s: Pan left"
Issue: Time/Weather Conflict
⚠️ PROMPT-LOCKED
Conflict detected:
- Setting: "Golden hour sunset"
- Lighting: "Harsh midday sun"
Fix: Make lighting consistent with time
- "Golden hour: warm, low-angle sun, soft shadows"
Issue: Path 2A - Motion in Static Image
⚠️ IMAGEN PROMPT LOCKED
Error: Motion detected in static image prompt
- "person running across field"
Fix: Describe static composition
- "person mid-stride in running pose"
Then add motion in Stage B (Veo 3)
Example Workflows
Example 1: Path 1 Short Prompt
User: "I want a coffee shop morning scene"
ARCH-V: Path 1 (Text-to-Video) → Short Prompt
Guides through:
- Subject: Barista pouring latte art
- Setting: Morning café, 8:30 AM
- Action: Pours latte, heart shape forms
- Style: Cinematic B-roll, warm grade
- Camera: Overhead, slow tilt down
- Lighting: Natural window light, warm
- Audio: Espresso machine, milk steaming
- Constraints: (none specified)
Validates: ✅ All 8 components present
Output:
Format & style: Cinematic B-roll, warm color grade
Barista pours latte art, heart shape forming in foam
Morning café, 8:30 AM, soft window light
Overhead, slow tilt down
Natural window key, warm practical fills
Espresso machine hiss, milk steaming, gentle café chatter
Example 2: Path 1 Long Prompt
User: "Romantic rain scene with couple under umbrella"
ARCH-V: Path 1 (Text-to-Video) → Long Prompt (has dialogue)
Loads: long-prompt-guide Production Brief framework
Guides through all 11 blocks, validates:
- FG/MG/BG structure ✅
- Color anchors (5 colors) ✅
- Camera beats (3 beats, ONE movement each) ✅
- Dialogue format ✅
- Continuity rules ✅
Output: Complete Production Brief (see long-prompt-guide examples)
Example 3: Path 2 Image-to-Video
User: "I want precise control over Jakarta street scene, then animate it"
ARCH-V: Path 2 (Image-to-Video)
STAGE A - Imagen:
Subject: Young street vendor, simplified Science SARU style
Setting: Dense Jakarta street, golden hour
Style: Science SARU animation aesthetic
Camera: Wide establishing, low angle (STATIC)
Lighting: Golden hour warm, dramatic shadows
Constraints: No motion, frozen moment
[Validates: No motion words, visual only]
✅ IMAGEN PROMPT:
"Science SARU animation style aesthetic. Young Indonesian street
vendor character with simplified design, elastic limbs, large
expressive eyes, worn earth tone clothing. Dense layered Jakarta
street scene at golden hour, geometric shophouses, vibrant warm
palette transitioning to orange-pink sunset gradient, dramatic
long shadows, wet reflective surfaces. Wide establishing shot,
low angle, dramatic composition, flat color blocks with gradients,
painterly watercolor texture, melancholic atmosphere."
User generates image in Imagen 3/4
STAGE B - Veo 3 Motion:
[Image uploaded to Veo 3]
Action: Vendor turns head slowly toward camera, slight smile
Camera: Slow dolly in from wide to medium close-up
Audio: Street ambience, distant traffic, vendor humming melody
[Validates: Motion achievable, camera respects composition]
✅ VEO 3 IMAGE-TO-VIDEO PROMPT:
"From this still image: Vendor turns head slowly toward camera,
slight smile forming. Slow dolly in from wide establishing to
medium close-up over 8 seconds. Jakarta street ambience with
distant traffic, motorbike sounds, vendor humming traditional
Indonesian melody. Maintain golden hour lighting, warm atmosphere,
and Science SARU aesthetic from image."
Tips for Using ARCH-V
Choose the right path:
- Path 1 faster for straightforward videos
- Path 2 better for complex visual compositions or when you need perfect still first
Be specific early:
- Detailed subject descriptions help ARCH-V guide you better
- Vague inputs → more back-and-forth questions
Trust the validation:
- PROMPT-LOCKED means real conflicts exist
- Suggested fixes based on proven patterns
- Resolve conflicts before generating
Use skill references:
- ARCH-V will load appropriate skills automatically
- You can reference them directly for inspiration
- Cross-skill integration is intentional
Iterate progressively:
- Start with minimum viable (4 blocks for long prompts)
- Add detail as needed
- ARCH-V guides what's optional vs mandatory
Technical Notes
Token Efficiency:
- ARCH-V itself: ~2,000 tokens
- Loads reference skills on-demand only
- Total system: ~8,600 tokens vs ~15,000 monolithic
- 43% token savings with full capability
Skill Integration:
- All 5 reference skills work together seamlessly
- Cross-references validated automatically
- No duplication between skills
Progressive Disclosure:
- ARCH-V SKILL.md loaded when user asks for video prompts
- Reference skills loaded based on path/choices
- Optimal token usage throughout workflow
Quick Start
For beginners: Let ARCH-V guide you with questions
For experienced users: Specify path and prompt type upfront:
- "Text-to-video short prompt for product shot"
- "Image-to-video, create Imagen prompt first for portrait"
For complex projects: Use Path 2 with long motion prompts for maximum control
ARCH-V adapts to your workflow preference!