| name | consolidate-transcripts |
| description | Consolidate transcripts from a channel into a single file, sorted by date (newest first), up to 800K tokens. Use when preparing transcripts for LLM context or bulk analysis. |
Consolidate Transcripts
Why? LLMs have context limits. This skill merges multiple transcripts into a single file with accurate token counting, so you can feed an entire channel's content to Claude or GPT without exceeding limits.
Quick Start
python scripts/consolidate_transcripts.py <channel_name>
Output: data/<channel_name>/<channel_name>-consolidated.md
Workflow
1. Identify the Channel
List available channels:
ls data/
2. Choose Token Limit
| Use Case | Recommended Limit | Flag |
|---|---|---|
| Claude (200K context) | 150000 | --limit 150000 |
| GPT-4 Turbo (128K) | 100000 | --limit 100000 |
| Full archive (Claude Pro) | 800000 | (default) |
| Quick sample | 50000 | --limit 50000 |
[!TIP] The default 800K limit leaves ~200K tokens for prompts and responses when using Claude's 1M context.
3. Run Consolidation
python scripts/consolidate_transcripts.py <channel_name> [--limit TOKENS] [--verbose]
Examples:
# Default (800K tokens)
python scripts/consolidate_transcripts.py library-of-minds
# Custom limit for GPT-4
python scripts/consolidate_transcripts.py aws-reinvent-2025 --limit 100000
# Verbose output showing all included files
python scripts/consolidate_transcripts.py dwarkesh-patel --verbose
4. Verify Output
Check the consolidated file was created:
ls -la data/<channel_name>/*-consolidated.md
Parameters
| Option | Description | Default |
|---|---|---|
channel_name |
Folder name in data/ |
Required |
--limit, -l |
Maximum tokens to include | 800000 |
--verbose, -v |
Show detailed file list | False |
Output Format
The consolidated file includes:
- Header — Generation metadata, total transcripts, token/word counts
- Table of Contents — Dates, titles, tokens, words per transcript
- Transcripts — Full text with title, date, author, source URL
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
ModuleNotFoundError: tiktoken |
tiktoken not installed | pip install tiktoken |
No transcripts found |
Empty transcripts folder | Run transcript-download first |
FileNotFoundError |
Channel doesn't exist | Check ls data/ for valid names |
| Output file is small | Few transcripts available | Use --verbose to see what was included |
| Token count seems wrong | Old tiktoken version | pip install --upgrade tiktoken |
Common Mistakes
- Wrong channel name — Use the folder name exactly as shown in
ls data/, not the YouTube channel name. - Forgetting to download transcripts first — Consolidation requires transcripts to exist. Run
/download-transcriptsfirst. - Using too high a limit — If you exceed your LLM's context, you'll get truncation errors. Use the limit guide above.
- Expecting real-time updates — Re-run consolidation after downloading new transcripts.
Reference
- Transcripts sorted newest first (descending by date)
- Files without dates in filename are placed last
- Token counting uses
cl100k_baseencoding (GPT-4/Claude compatible) - Consolidated files are gitignored (not committed)
- Re-running overwrites the previous consolidated file