| name | summarize-transcripts |
| description | Generate AI summaries for downloaded YouTube transcripts. Use when you want to add summaries to existing transcript files, batch process transcripts for summarization, summarize a channel's videos, generate frontmatter summaries, or when the user mentions summarize, AI summaries, OpenRouter, or transcript metadata. |
Summarize Transcripts
Why? Transcript files without summaries are difficult to scan. This skill adds ~500-word AI-generated summaries to transcript frontmatter, enabling quick content discovery and organization.
Quick Start
# Summarize a specific folder
transcript-summarize OpenAI
# Summarize ALL folders
transcript-summarize --all
# Preview what would be summarized
transcript-summarize --all --dry-run
Workflow
1. Verification & Mode Selection
Check for OpenRouter API key securely (do not print it):
# Check if .env contains the key definition (silently)
if [ -f .env ] && grep -q "^[[:space:]]*OPENROUTER_API_KEY=" .env; then echo "✅ Key configured in .env"; else echo "❌ Key missing"; fi
Decision:
- If key exists: Proceed to Step 2A (Automated Batch Mode). This is preferred for speed and volume.
- If key is MISSING: Proceed to Step 2B (Agentic Fallback Mode). You will summarize the files manually.
2A. Automated Batch Mode (With API Key)
Use the CLI tool to process folders efficiently.
1. Load Environment & Run:
export $(grep -v '^#' .env | xargs) && transcript-summarize <FOLDER_NAME>
2. Verify & Interpret Output:
Check the summary statistics at the end of the command output:
Summarization Complete!
Success: 0
Skipped: 15 <-- This means files were already summarized (Idempotent)
Errors: 0
Total: 15
Decision Logic:
- Success > 0: Work was done. Task success.
- Skipped == Total: All files are already up to date. Task success. Do NOT retry or look for "missing" files.
- Errors > 0: Check the error logs (401/403/429).
2B. Agentic Fallback Mode (No API Key)
If the user has no API key, YOU are the summarizer.
Constraints:
- Process small batches (1-5 files) to manage your context window.
- DO NOT use the
transcript-summarizecommand (it will fail). - You must read, summarize, and update the files using your tools.
Workflow:
List Files:
ls data/<FOLDER>/transcripts/*.mdNotify User (Polite Fallback):
- Inform the user: "I see the API key is missing. For future reference, you can set this up following the instructions in
README.mdto enable faster automated summarization. For now, I will proceed with manual summarization of this batch."
- Inform the user: "I see the API key is missing. For future reference, you can set this up following the instructions in
Process Loop (Iterate through files):
Read the transcript file:
view_file <path>Generate Summary (Internal Monologue):
- Target ~500 words.
- Format: Single continuous paragraph (no line breaks, no bullets).
- Style: Neutral, informative, dense. No "This video is about..." intro.
Update File:
- Insert the summary into the frontmatter
summary:field. - Use
replace_file_contentto add the summary block.
Example Code Edit:
summary: | Here is the generated summary text... ---- Insert the summary into the frontmatter
3. Ask to Continue: After processing a batch, ask the user if they want you to continue with the next batch.
2. Identify Target Folders
Determine scope based on user request:
| User Request | Command |
|---|---|
| Specific channel | transcript-summarize <FOLDER_NAME> |
| All channels | transcript-summarize --all |
| Preview only | transcript-summarize --all --dry-run |
[!TIP] Always run
--dry-runfirst when processing many folders. This shows exactly how many transcripts need summaries.
3. Execute Summarization
Run the command with environment variables loaded:
# Load .env and run summarization
export $(grep -v '^#' .env | xargs) && transcript-summarize <FOLDER_NAME>
# Example: Summarize the a16z folder
export $(grep -v '^#' .env | xargs) && transcript-summarize a16z
Expected output:
Processing folder: a16z
Found 42 transcripts, 38 need summaries
Summarizing: how_to_build_a_startup.txt (1/38)
[========================================] 38/38 complete
Summary: Processed 38 files, 0 errors
4. Verify Results
Check that summaries were added:
# View a summarized transcript's frontmatter
head -30 data/a16z/how_to_build_a_startup.txt
Success criteria:
summary:field exists in frontmatter- Summary is ~500 words (configurable via
--max-words) - No API errors in output
Command Reference
transcript-summarize [FOLDER] [OPTIONS]
| Option | Description | Default |
|---|---|---|
FOLDER |
Specific folder in data/ to process |
(Required unless --all) |
--all |
Process ALL folders in data/ |
False |
--dry-run |
Show what would happen without changes | False |
--force |
Re-summarize files that already have summaries | False |
--delay |
Seconds between API requests (min: 4s) | 4.0 |
--model |
OpenRouter model to use | xiaomi/mimo-v2-flash:free |
--max-words |
Target summary length | 500 |
[!TIP] The default model
xiaomi/mimo-v2-flash:freeis free and high-quality. No paid account needed, just an OpenRouter API key.
Examples
Summarize a single channel:
export $(grep -v '^#' .env | xargs) && transcript-summarize OpenAI
# Processes only data/OpenAI/transcripts/*.md files
Dry run to preview work:
export $(grep -v '^#' .env | xargs) && transcript-summarize --all --dry-run
# Output: "Would process 156 files across 12 folders"
Force re-summarize with custom model:
export $(grep -v '^#' .env | xargs) && transcript-summarize a16z --force --model moonshotai/kimi-k2:free
# Overwrites existing summaries with fresh ones
Slower rate for unstable connections:
export $(grep -v '^#' .env | xargs) && transcript-summarize LexFridman --delay 10
# 10 second delay between API calls
Common Mistakes
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
Running without --dry-run first |
May process hundreds of files unexpectedly | Always preview with --dry-run when using --all |
Using --force without reason |
Wastes API calls on already-summarized files | Only use --force when changing models or max-words |
Setting --delay below 4s |
Will trigger rate limits | Keep delay at 4s minimum |
| Running from wrong directory | Command won't find data/ folder |
Always run from project root (where pyproject.toml is) |
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
OPENROUTER_API_KEY not set |
.env file exists but variables not exported to shell |
Use `export $(grep -v '^#' .env |
Rate limited (429) |
Too many requests too fast | Script auto-retries with backoff. If persistent, increase --delay to 8-10s |
No folders found in data/ |
Running from wrong directory or empty data folder | cd to project root; verify data/ contains channel folders |
Transcript too short (skipped) |
Transcript under 100 words | Expected behavior; very short transcripts skip summarization |
Model not available |
Model ID typo or model deprecated | Check OpenRouter docs for current model IDs |
| Summary in wrong language | Model defaulted to source language | Most free models default to English; use a multilingual model if needed |
Quality Checklist
Before considering summarization complete:
- Ran
--dry-runfirst to preview scope - Verified API key is configured
- Command completed without errors
- Spot-checked 2-3 summaries for quality
- Summary length is appropriate (~500 words)
[!WARNING] If summaries appear truncated or low-quality, try a different model. Quality varies by model and transcript content.