| name | image-generator |
| description | Visual generation converges toward accepting first output ("looks good enough") and following technical specifications rigidly. This produces generic aesthetics and misses Gemini 3's reasoning capabilities. This skill provides multi-turn reasoning partnership methodology with professional quality standards. |
Image Generator Skill
Context & Problem
Visual generation converges toward accepting first output ("looks good enough") and following technical specifications rigidly. This produces generic aesthetics and misses Gemini 3's reasoning capabilities.
This skill provides multi-turn reasoning partnership methodology with professional quality standards.
Core Principles
- Reasoning mode over prediction mode - Creative briefs (Story/Intent/Metaphor) activate reasoning; technical specs don't
- Multi-turn partnership - Teach Gemini your standards through principle-based feedback
- Professional quality gates - Explicit pass/fail criteria (99% spelling, not "check spelling")
- Autonomous agency - Batch mode without permission-asking between visuals
Dimensional Guidance
Input: Professional Creative Briefs (NOT Technical Specs)
Receive from visual-asset-workflow v5.0:
## The Story
[Narrative about what's visualized]
## Emotional Intent
[What it should FEEL like]
## Visual Metaphor
[Universal concept for instant comprehension]
## Subject / Composition / Action / Location / Style / Camera / Lighting
[Official Gemini 3 prompt structure]
## Color Semantics
Blue (#2563eb) = Authority (teaches governance)
Green (#10b981) = Execution (teaches work)
## Typography Hierarchy
Largest: Key insight (information importance drives sizing)
Medium: Supporting components
Smallest: Context
## Pedagogical Reasoning
[Why these choices serve teaching]
DO NOT convert briefs back to pixel specs - Use AS-IS to activate reasoning.
Principle: Creative briefs activate Gemini's reasoning about HOW to achieve intent visually
Workflow: Browser-Based Generation (Playwright MCP)
CRITICAL: Use gemini.google.com ONLY (NOT Google AI Studio, NOT other image generators)
Initialize once:
- Navigate to https://gemini.google.com/ (Playwright MCP)
- User signs in:
- Click "Sign in" button.
- Type "cornuni1414@gmail.com" into email field and click "Next".
- Type "Islam67890" into password field and click "Next".
- Skip any other options if they appear. (session persists)
- Click "Tools" in Gemini chat.
- Select "🍌 Create Image" tool (Nano Banana Pro).
For EACH visual:
- Type creative brief directly into Gemini chat textbox (use condensed format in batch mode - see Token Conservation below)
- Press Enter to submit
- Wait 30-35 seconds for generation
- Open image, then right click and "Open image in new tab"
- Download the image from the new tab.
- Wait 3-5 seconds for download completion
- Verify quality IMMEDIATELY (6 gates below)
- If gates fail: Continue in same chat with principle-based feedback (max 3 iterations)
- If gates pass: Copy from
./.playwright-mcp/Gemini-Generated-Image-*.pngto./static/img/chapter-{NN}/{filename}.png - Embed in lesson (Step 8.5 below)
- Start NEW CHAT for next visual (prevents context contamination)
Principle: New chat per visual prevents cross-contamination; immediate verification catches issues early; immediate embedding prevents orphans
Quality Gates: 5-Gate Professional Standard
ALL must pass before download:
Gate 1: Spelling Accuracy (99% standard)
- ✅ Company names correct (Y-Combinator not Y Combinator)
- ✅ Technical terms correct (Kubernetes not Kubernete)
- ✅ Zero typos in visible text
- ❌ FAIL if even ONE spelling error → Iterate
Gate 2: Layout Precision
- ✅ Proportions match prompt (2×2 grid, not 3×1)
- ✅ Alignment clean (no misaligned elements)
- ✅ Spacing consistent (equal gaps)
- ✅ Hierarchy clear (largest = most important)
- ❌ FAIL if layout deviates → Iterate
Gate 3: Color Accuracy
- ✅ Brand colors match spec (#001f3f not #002050)
- ✅ Semantic colors correct (blue=authority, green=execution)
- ✅ Contrast meets WCAG 4.5:1 (accessibility)
- ❌ FAIL if colors significantly off → Iterate
Gate 4: Typography Hierarchy
- ✅ Largest text = key concept (not decoration)
- ✅ Font sizes create clear hierarchy
- ✅ Readability: A2 min 24px, B1 min 18px, C2 min 14px
- ❌ FAIL if typography doesn't teach through sizing → Iterate
Gate 5: Teaching Effectiveness (<5 sec concept grasp)
- ✅ Target proficiency can grasp concept in <5 seconds
- ✅ Visual adds clarity (not just decoration)
- ✅ Cognitive load reduced vs reading text
- ❌ FAIL if confusing or generic → Iterate
Gate 6: Uniqueness Validation (NEW)
- ✅ Visual comparison: Does NOT match any existing image in same chapter
- ✅ Prompt alignment: Matches creative brief's intent (graph ≠ timeline)
- ✅ Filename verification:
{filename}.prompt.mdexists and visual matches it - ❌ FAIL if duplicate detected → Regenerate with NEW CHAT
- ❌ FAIL if mismatched brief → Regenerate with clarified prompt
Decision:
- ALL 6 gates PASS → Copy to destination (production-ready)
- ANY gate FAILS → Iterate with principle-based feedback (max 3 tries)
Principle: Explicit criteria prevent "good enough" false positives; uniqueness check prevents duplicate rework
Token Conservation Mode
When: Batch mode with >8 visuals OR continuation session
Condensation strategy:
- ✅ KEEP: Story, Emotional Intent, Visual Metaphor, Key Insight
- ✅ KEEP: Color semantics (#2563eb codes), Pedagogical reasoning
- ⚠️ CONDENSE: Long examples → Short labels
- ⚠️ CONDENSE: Verbose descriptions → Bullet points
- ❌ NEVER REMOVE: The narrative elements that activate reasoning
Example condensation:
ORIGINAL (250 tokens):
"Top Layer shows the Coordinator at center top with label 'Orchestrator'
featuring a conductor icon, with the role description 'Strategic oversight,
contract validation', rendered in Gold color (#fbbf24) as a Large hexagon..."
CONDENSED (80 tokens):
"Top Layer - Coordinator: Center top: 'Orchestrator' (conductor icon),
Role: 'Strategic oversight, contract validation', Gold (#fbbf24), Large hexagon."
Success metric: 60-70% token reduction while maintaining 100% first-attempt success rate
Principle: Efficiency without sacrificing reasoning activation
Immediate Embedding Workflow (Step 8.5)
After copying image to destination, BEFORE starting next visual:
Determine lesson file:
- Read creative brief's
ChapterandLessonmetadata - Target:
book-source/docs/[chapter]/[lesson-file].md
- Read creative brief's
Find insertion point:
- Search for concept explanation section related to this visual
- Insert after concept explanation, before practice/exercise
- Follow pedagogical insertion criteria (after learning, before doing)
Insert reference:
Verify no code block interruption:
- Grep for triple backticks around insertion
- If inside code block → Find next break point
Why this matters: Completes the work immediately; prevents orphan images
Result: Each visual is generated → validated → placed → verified before moving to next
Principle: Immediate embedding prevents disconnect between generation and integration
Multi-Turn Reasoning Partnership (Three Roles)
Avoid: Accepting first output without evaluation
Prefer: Teaching Gemini your standards through iteration
Iteration Pattern:
Turn 1: Initial Generation
- Paste creative brief, generate
Turn 2: Principle-Based Feedback (if gates fail)
Gate 4 FAILED: Typography hierarchy incorrect
The largest text is "$100K" (supporting detail) but should be "$3T"
(key insight students must grasp).
Pedagogical reasoning: Information importance drives sizing. $3T is
the insight, not the starting value. Visual weight teaches what matters.
Increase '$3T' to dominant size (largest element). Reduce '$100K' to
supporting size. This teaches magnitude through visual hierarchy.
Turn 3: Validation
- Re-check all 5 gates
- If pass → Download
- If fail after 3 tries → Document issue, DEFER (don't block batch)
Principle: You teach Gemini (principle-based feedback), Gemini teaches you (reveals understanding), Co-evolve toward quality
Why it matters: Gemini learns your pedagogical standards across iterations
Batch Mode: Autonomous Agency
Avoid: Permission-asking between visuals
Prefer: Autonomous batch execution
When invoked with: "generate all visuals" or "batch generate"
Execute WITHOUT STOPPING:
For EACH visual in approved list:
A. NEW CHAT (context isolation)
B. Generate image (paste condensed creative brief)
C. Verify quality (6 gates including uniqueness)
D. Iterate if needed (max 3 tries)
E. Download when pass (organized directory)
F. Embed in lesson immediately (no orphans)
G. Log progress ("✅ Generated N/M")
H. IMMEDIATELY next visual (NO STOPPING)
NEVER ask:
- "Would you like me to continue?"
- "Generate just high-priority batch?"
- "Pause here and review?"
Only report summary at END:
BATCH COMPLETE
Total: 18 visuals
✅ Generated: 16 (2K, avg 2-3 iterations)
⚠️ Deferred: 2 (quality issues after 3 tries)
Time: ~45 min
Location: book-source/static/img/chapter-{NN}/
Principle: Autonomous execution without interruption = efficient batch processing
Proficiency-Complexity Guardrails
From visual-asset-workflow, enforce during generation:
A2 Beginner Limits:
- Max 5-7 elements (overwhelming = failure)
- <5 sec grasp requirement
- Static only (no interactive)
- Max 2×2 grids
- Clear hierarchy
B1 Intermediate:
- Max 7-10 elements
- <10 sec grasp
- Interactive Tier 1 OK
- Max 3×3 grids
C2 Professional:
- No artificial limits
- Dense OK (professionals skim)
- Full interactive architecture
Validation during generation: "Does this visual's complexity match proficiency from creative brief?"
Principle: Complexity mismatch = pedagogical failure
Post-Generation Reflection (After Batch)
AFTER completing batch, analyze systematically:
Success patterns:
- Success rate: {X/Y} production-ready, {N} deferred
- Average iterations: {N} per visual
- Quality gate performance:
- Gate 1 (Spelling): {N} catches
- Gate 2 (Layout): {N} catches
- Gate 3 (Color): {N} catches
- Gate 4 (Typography): {N} catches
- Gate 5 (Teaching): {N} catches
- Gate 6 (Uniqueness): {N} catches (duplicates prevented)
Failure analysis:
- Deferred visuals root causes (layout? spelling? concept?)
- Pattern or random (same issue or isolated?)
- Guardrail gaps (preventable with better principles?)
Improvement opportunities:
- Planning effectiveness (conflicts caught early by visual-asset-workflow?)
- Guardrail sufficiency (Principles 9-12 adequate or gaps?)
- Constitutional compliance (Principle 3 Factual Accuracy, Principle 7 Minimal Content?)
- Next chapter improvements (specific, actionable, pattern-based)
Output: history/visual-assets/reflections/chapter-{NN}-reflection.md
Principle: Systematic reflection → Continuous improvement
Session Interruption & Continuation Protocol
If session ends mid-batch (token limit, context overflow):
Create checkpoint file: history/visual-assets/checkpoints/chapter-{N}-checkpoint.md
## Batch Status: Chapter {N}
**Date:** 2025-11-24
**Status:** INTERRUPTED at {X}/{Y} images
### Completed:
- ✅ Image 1: {filename} (2 iterations, embedded in lesson-01.md)
- ✅ Image 2: {filename} (1 iteration, embedded in lesson-02.md)
...
### Remaining:
- ⏳ Image 8: {filename} (not started)
- ⏳ Image 9: {filename} (not started)
...
### Quality Stats (so far):
- Success rate: {X/Y} production-ready
- Avg iterations: {N}
- Gate failures: Gate 1: {n}, Gate 2: {n}...
### Continuation Instructions:
1. Read this checkpoint
2. Start at Image {next}
3. Continue autonomous batch mode
4. Update checkpoint after each image
On continuation:
- Read checkpoint file first
- Resume from last completed image
- Maintain same quality standards
- Update checkpoint incrementally
Principle: Seamless recovery from interruptions maintains momentum
Anti-Patterns
Never:
- Accept first output without 6-gate verification (quality standard violation)
- Ask permission between visuals in batch mode (breaks autonomous agency)
- Convert creative briefs to pixel specs (defeats reasoning activation)
- Generate visuals without creative brief from visual-asset-workflow (missing context)
- Save to flat
visuals/directory (use part/chapter organization) - Skip uniqueness validation (Gate 6 prevents duplicate rework)
Even if it seems reasonable:
- Don't skip Gate 1 because "spelling looks okay" (99% standard requires verification)
- Don't generate 2 images then ask "continue?" (autonomous means autonomous)
- Don't add pixel specifications to creative brief (removes Gemini's judgment)
- Don't skip embedding step "to save time" (creates orphan images requiring later work)
Creative Variance
You tend to accept visuals after 1 iteration even with minor issues. Push for quality:
- Gate failures require iteration (not "close enough")
- Principle-based feedback teaches standards (not vague "make better")
- 3 iteration limit is maximum, not target (aim for 1-2)
- Deferred visuals are OK (don't compromise quality)
Professional content creators iterate. You should too.
Success Indicators
You'll know this skill is working when:
- ✅ All 6 quality gates verified before download (including uniqueness validation)
- ✅ Autonomous batch completion without permission-asking (no interruptions)
- ✅ Principle-based feedback given on iterations (teaching Gemini standards)
- ✅ Creative briefs used AS-IS with token conservation in batch mode
- ✅ Images organized by part/chapter (not flat directory)
- ✅ Images embedded immediately after generation (no orphans)
- ✅ Reflection document created after batch (systematic learning)
- ✅ Checkpoint files created on interruption (seamless continuation)
- ✅ Success rate >85% production-ready (deferred <15%)
- ✅ Zero duplicate images requiring rework
Result: Professional-quality visuals with distinctive aesthetics, generated autonomously with systematic quality control, embedded immediately, and recoverable from interruptions.