name	youtube-cache
description	YouTube video cache operations with Qdrant. Auto-checks cache when YouTube URLs detected. Provides semantic search, verification, ingestion, and archive access. Shows metadata by default, transcript on request.

YouTube Cache Operations

Activation triggers:

YouTube URLs in conversation (youtube.com/watch?v=..., youtu.be/...)
Mentions: "youtube", "video", "cache", "qdrant", "transcript"
Working with compose/cli/ or compose/services/ directories

Token efficiency:

Upfront: ~50 tokens (just YAML description)
On activation: ~2-3k tokens (this full document)
Compare to MCP: 5-10k tokens loaded immediately

Core Principles

Auto-check YouTube URLs - When user pastes URL, immediately verify cache status
Metadata-first display - Show title, summary, tags by default (NOT full transcript)
Archive-aware - Check both Qdrant cache and archive for historical data
Leverage existing scripts - All operations use tested compose/cli/* commands

Quick Commands

Auto-Check (Automatic on URL Detection)

When user pastes a YouTube URL, automatically run:

uv run python compose/cli/verify_video.py VIDEO_ID

Display format (metadata only):

Video ID: {video_id}
URL: {url}
Title: {title}
Summary: {summary[:100]}...
Subject: {subject_matter[:3]}
Style: {content_style}
Transcript: {length:,} characters

If not cached:

[INFO] Video not found in cache.

To ingest:
uv run python compose/cli/ingest_video.py "URL"

Search Videos (Semantic)

For natural language queries:

uv run python compose/cli/search_videos.py "query text" --limit 10

Examples:

# Find videos about MCP
uv run python compose/cli/search_videos.py "MCP protocol" --limit 5

# Python tutorials
uv run python compose/cli/search_videos.py "python async programming"

# Custom collection
uv run python compose/cli/search_videos.py "agents" --collection test_basic

Output: Relevance scores + metadata (no transcript unless requested)

List All Cached Videos

# Default collection (cached_content)
uv run python compose/cli/list_videos.py

# Custom collection
uv run python compose/cli/list_videos.py my_collection

# Limit results
uv run python compose/cli/list_videos.py cached_content 20

Ingest New Video

Single video (recommended):

uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=VIDEO_ID"

Batch/REPL mode:

make ingest
# Or: uv run python compose/cli/ingest_youtube.py

What happens:

Fetch transcript (via Webshare proxy if configured)
Archive transcript immediately → projects/data/archive/youtube/YYYY-MM/
Generate metadata with Claude Haiku (title, summary, tags)
Archive LLM output
Cache in Qdrant with embeddings

Features:

Dry run: --dry-run flag
Custom collection: Add collection name as second arg
Archive-first: All expensive data saved before processing

Delete from Cache

# Delete with confirmation prompt (safe)
uv run python compose/cli/delete_video.py VIDEO_ID

# Custom collection
uv run python compose/cli/delete_video.py VIDEO_ID --collection my_collection

# Skip confirmation (use with caution!)
uv run python compose/cli/delete_video.py VIDEO_ID --yes

Note: Only deletes from Qdrant cache, NOT from archive. Always prompts for confirmation unless --yes flag is used.

Advanced Operations

Search by Reference/Entity

Filter videos mentioning specific people, companies, or concepts:

uv run python compose/cli/search_by_reference.py "entity_name"

# Examples
uv run python compose/cli/search_by_reference.py "Anthropic"
uv run python compose/cli/search_by_reference.py "MCP"

Archive Access

Location: projects/data/archive/youtube/YYYY-MM/VIDEO_ID.json

Structure:

{
  "video_id": "...",
  "url": "...",
  "fetched_at": "...",
  "raw_transcript": "...",
  "llm_outputs": [...],
  "processing_history": [...]
}

Reading archive:

from compose.services.archive import create_local_archive_reader

archive = create_local_archive_reader()
video_data = archive.get("VIDEO_ID")

Reingest from archive:

uv run python compose/cli/reingest_from_archive.py VIDEO_ID

Batch Processing

Fetch channel videos to CSV:

uv run python compose/cli/fetch_channel_videos.py "@channel_handle"
uv run python compose/cli/fetch_channel_videos.py "@channel_handle" output.csv

Requires: YOUTUBE_API_KEY in .env

Then ingest CSV:

make ingest
# Select CSV file when prompted

Collection Management

Default collection: cached_content

Custom collections: Pass as argument to most scripts

Check what's in a collection:

uv run python compose/cli/list_videos.py my_collection

No auto-discovery: Must specify collection name explicitly.

Usage Examples

Example 1: User Pastes YouTube URL

User: "Check this out: https://www.youtube.com/watch?v=1_z3h2r93OY"

Auto-action:

uv run python compose/cli/verify_video.py 1_z3h2r93OY

Response:

Video ID: 1_z3h2r93OY
URL: https://www.youtube.com/watch?v=1_z3h2r93OY
Title: MCP token waste and the solution: skills and code execution
Summary: A breakdown of MCP's token inefficiency and context-rot, and how Anthropic's Claude Skills...
Subject: MCP protocol and token inefficiency, context window and context rot, real-time discovery...
Style: demonstration
Transcript: 11,460 characters

[FOUND] Video is cached in Qdrant.

Example 2: Search for Related Content

User: "Do we have other videos about MCP servers?"

Action:

uv run python compose/cli/search_videos.py "MCP servers" --limit 5

Response:

Found 5 results:

1. OIKTsVjTVJE (relevance: 0.575)
   Title: MCP server implementation guide
   Summary: Building MCP servers with practical examples...

2. D92aDGVFcRE (relevance: 0.488)
   Title: Seven MCP architecture failure modes
   ...

Example 3: Ingest New Video

User: "Can you ingest https://youtube.com/watch?v=ABC123 into the cache?"

Action:

uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=ABC123"

Response:

[1/5] Checking cache...
[2/5] Fetching transcript...
[3/5] Archiving transcript...
[4/5] Generating metadata...
[5/5] Caching in Qdrant...

SUCCESS! Video cached.

Example 4: Video Not Cached

User: "https://youtube.com/watch?v=XYZ789"

Auto-check result:

[INFO] Video XYZ789 not found in cache.

To ingest this video:
uv run python compose/cli/ingest_video.py "https://youtube.com/watch?v=XYZ789"

Would you like me to ingest it now?

Metadata Structure

Full metadata format (from LLM tagging):

{
    "video_id": str,
    "url": str,
    "transcript": str,           # Full transcript text
    "transcript_length": int,    # Character count
    "metadata": {
        "title": str,
        "summary": str,          # 1-2 sentence overview
        "subject_matter": list[str],  # Main topics (3-6 items)
        "content_style": str,    # tutorial | demonstration | lecture | discussion
        "difficulty": str,       # beginner | intermediate | advanced
        "entities": {
            "named_things": list[str],
            "people": list[str],
            "companies": list[str]
        },
        "techniques_or_concepts": list[str],
        "tools_or_materials": list[str],
        "key_points": list[str],
        "references": list[{
            "type": str,
            "name": str,
            "url": str | None,
            "description": str
        }]
    }
}

Important Patterns

1. Archive-First Philosophy

ALWAYS archive before processing:

Fetch transcript → Archive immediately
Generate LLM output → Archive immediately
Process/embed/cache → Derived data (can be rebuilt)

Why: Protects against data loss, enables reprocessing, tracks costs.

2. Transcript Display Rules

Default: Show metadata only (title, summary, subject, style, length)

Never show full transcript unless explicitly requested:

Transcripts are 10k-50k+ characters
Floods context window
User can request: "show me the transcript for VIDEO_ID"

If user requests transcript:

uv run python compose/cli/verify_video.py VIDEO_ID
# Then show full transcript field from output

3. URL Format Support

All formats supported:

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://youtube.com/watch?v=VIDEO_ID
Just VIDEO_ID for verify/delete operations

4. Collection Defaults

Default collection: cached_content
Override: Pass --collection NAME or as positional arg
No validation: Script will create collection if doesn't exist

5. Error Handling

Video not found:

[INFO] Video not found in cache.

→ Suggest ingestion command

Transcript unavailable:

[ERROR] Could not fetch transcript (disabled, private, etc.)

→ Cannot ingest, inform user

Already cached:

[INFO] Video already cached, skipping.

→ Use verify to show metadata

Environment Requirements

Required in .env (root directory):

ANTHROPIC_API_KEY=sk-ant-...     # For metadata generation
OPENAI_API_KEY=sk-proj-...       # For embeddings (sentence-transformers)

Optional:

WEBSHARE_PROXY_USERNAME=...      # For YouTube rate limit bypass
WEBSHARE_PROXY_PASSWORD=...
YOUTUBE_API_KEY=...              # For fetch_channel_videos.py

Data paths (auto-created):

Qdrant: projects/data/qdrant/
Archive: projects/data/archive/youtube/YYYY-MM/

Service Layer Reference

If direct Python imports needed:

# Cache operations
from compose.services.cache import create_qdrant_cache

cache = create_qdrant_cache(collection_name="cached_content")
results = cache.search("query", limit=10)
exists = cache.exists("youtube:video:VIDEO_ID")
cache.close()

# Archive operations
from compose.services.archive import create_local_archive_reader

archive = create_local_archive_reader()
video = archive.get("VIDEO_ID")

# YouTube operations
from compose.services.youtube import extract_video_id, get_transcript

video_id = extract_video_id("https://youtube.com/watch?v=...")
transcript = await get_transcript(video_id)

Script Design Philosophy

From compose/cli/README.md:

Scripts should:

Build on services - Use compose.services.* for business logic
CLI-focused - User-friendly command-line interfaces
Self-contained - Run directly or via make targets
Well-documented - Clear help text and examples
Error handling - Graceful exits and helpful messages

This skill wraps these scripts to provide Claude Code with the same capabilities in a token-efficient, context-aware manner.

Future Enhancements (Phase 2)

Not implemented yet, but potential additions:

Direct service imports (skip bash wrapper for speed)
Batch operation helpers (process multiple URLs)
Collection discovery and auto-switching
Cost tracking from archive metadata
Duplicate detection before ingestion
Smart excerpt showing (relevant transcript sections based on query)

Design philosophy: Start simple, add features only when real pain points emerge.

Token efficiency achieved: This skill provides full YouTube cache access with ~50 tokens upfront, loading full documentation only when needed. Compare to MCP server approach which would load 5-10k tokens immediately for similar functionality.

youtube-cache

Install Skill

SKILL.md

YouTube Cache Operations

Core Principles

Quick Commands

Auto-Check (Automatic on URL Detection)

Search Videos (Semantic)

List All Cached Videos

Ingest New Video

Delete from Cache

Advanced Operations

Search by Reference/Entity

Archive Access

Batch Processing

Collection Management

Usage Examples

Example 1: User Pastes YouTube URL

Example 2: Search for Related Content

Example 3: Ingest New Video

Example 4: Video Not Cached

Metadata Structure

Important Patterns

1. Archive-First Philosophy

2. Transcript Display Rules

3. URL Format Support

4. Collection Defaults

5. Error Handling

Environment Requirements

Service Layer Reference

Script Design Philosophy

Future Enhancements (Phase 2)