| name | sqldown |
| description | Bidirectional markdown ↔ SQLite conversion with column limit protection and smart section extraction. Import with Python, query with sqlite3. |
SQLDown Skill
Core Concept
Load with sqldown. Query with sqlite3. Dump when needed.
This skill handles bidirectional markdown ↔ SQLite conversion with intelligent column limit protection. For queries, use sqlite3 directly - it's already perfect.
When to Use
- Need to query across many markdown files efficiently
- Want SQL-powered filtering, aggregation, sorting
- Working with structured markdown (YAML frontmatter + H2 sections)
- Need context-efficient progressive disclosure
Commands
Import: Create or Update Tables
Use sqldown load - loads markdown files into SQLite database.
Command:
sqldown load PATH [OPTIONS]
What it does:
- Scans markdown files recursively
- Parses YAML frontmatter → database columns
- Extracts H2 sections →
section_*columns - Creates schema dynamically based on discovered fields
- Upserts into SQLite (idempotent - safe to run multiple times)
- Respects
.gitignorepatterns automatically
Options:
-d, --db PATH- Database file (default:sqldown.db)-t, --table NAME- Table name (default:docs)-p, --pattern GLOB- File pattern (default:**/*.md)--max-columns N- Maximum allowed columns (default: 1800, SQLite limit: 2000)--top-sections N- Extract only top N most common sections (default: 20, 0=all)-w, --watch- Watch for file changes and auto-update-v, --verbose- Show detailed progress
Examples:
# Load markdown files into SQLite
sqldown load ~/tasks
# Specify database and table
sqldown load ~/tasks -d cache.db -t tasks
# Load notes
sqldown load ~/notes -d cache.db -t notes
# Load with specific pattern
sqldown load ~/.claude/skills -d cache.db -t skills -p "*/SKILL.md"
# Watch mode: auto-update on file changes
sqldown load ~/tasks -w
# Column limit protection - extract only top 10 sections
sqldown load ~/tasks --top-sections 10
# Extract all sections (may hit 2000 column limit with diverse docs)
sqldown load ~/tasks --top-sections 0
# Check column breakdown with verbose output
sqldown load ~/tasks -v
# Output shows: Base columns: 7, Frontmatter: 89, Sections: 20, Total: 116
Query: Use sqlite3 Directly
For ALL queries, use sqlite3 command directly:
# List available tables
sqlite3 cache.db ".tables"
# Show table schema
sqlite3 cache.db ".schema tasks"
# Query
sqlite3 cache.db "SELECT title, status FROM tasks WHERE status='active'"
# Aggregate
sqlite3 cache.db "SELECT status, COUNT(*) FROM tasks GROUP BY status"
# Complex queries
sqlite3 cache.db "
SELECT project, COUNT(*) as count,
SUM(CASE WHEN status='active' THEN 1 ELSE 0 END) as active
FROM tasks
GROUP BY project
ORDER BY count DESC
"
Dynamic Schema
Core fields (always present):
_id TEXT PRIMARY KEY -- SHA1 of file path
_path TEXT -- Relative path
_sections TEXT -- JSON array of H2 names
title TEXT -- H1 heading
body TEXT -- Full content
lead TEXT -- First paragraph
file_modified FLOAT -- Timestamp
Dynamic fields (auto-generated):
- YAML frontmatter:
status,project,priority,tags, etc. - H2 sections:
section_objective,section_implementation_plan, etc.
Example: 87 tasks with varied structure → 181 columns generated automatically.
Column Limit Protection
SQLite has a hard limit of 2000 columns per table. The --top-sections flag prevents hitting this limit:
How it works:
- Analyzes all documents to count section frequency
- Extracts only the N most common sections as columns
- All other sections remain in the
bodyfield
Real-world example:
- 5,225 tasks with diverse sections = 6,694 unique columns (exceeds limit!)
- With
--top-sections 20(default) = 116 columns ✅
Top extracted sections (from Mike's tasks):
overview, usage, objective, notes, next_steps, troubleshooting, installation, configuration, requirements, testing, etc.
When to adjust:
--top-sections 10- Fewer columns for very diverse collections--top-sections 50- More columns if you need more queryable sections--top-sections 0- Extract all (only for homogeneous collections)
What about rare sections?
- Still in
bodyfield - nothing is lost - Use FTS5 or
LIKE '%text%'to search across all content - Only the top N become directly queryable columns
Common Query Patterns
# Find active tasks
sqlite3 cache.db "SELECT title FROM tasks WHERE status='active'"
# Recent updates
sqlite3 cache.db "SELECT title, updated FROM tasks ORDER BY updated DESC LIMIT 10"
# Count by status
sqlite3 cache.db "SELECT status, COUNT(*) FROM tasks GROUP BY status"
# Search content
sqlite3 cache.db "SELECT title, _path FROM tasks WHERE body LIKE '%SQLite%'"
# High priority items
sqlite3 cache.db "SELECT title FROM tasks WHERE priority='high' AND status!='completed'"
# Project summary
sqlite3 cache.db "
SELECT project,
COUNT(*) as total,
SUM(CASE WHEN status='completed' THEN 1 ELSE 0 END) as done
FROM tasks
GROUP BY project
"
# Find related documents
sqlite3 cache.db "
SELECT title FROM tasks
WHERE section_related_tasks LIKE '%AG-22%'
"
Progressive Disclosure Pattern
Query metadata first (fast, context-efficient):
sqlite3 cache.db "SELECT title, _path, status FROM tasks WHERE priority='high'"Read full markdown only when needed (slower, more context):
# After finding relevant tasks, read the actual files cat ~/tasks/AG-22_feat_add-configuration/README.md
This keeps context usage low while still finding what you need.
Multiple Tables Strategy
Keep different document types in separate tables:
# Load each type
sqldown load ~/tasks -d cache.db -t tasks
sqldown load ~/notes -d cache.db -t notes
sqldown load ~/.claude/skills -d cache.db -t skills
# Query across them
sqlite3 ~/cache.db "
SELECT 'task' as type, title FROM tasks WHERE body LIKE '%cache%'
UNION ALL
SELECT 'note' as type, title FROM notes WHERE body LIKE '%cache%'
"
Refresh Strategy
One-time load (manual refresh):
Load is idempotent - run after file changes:
sqldown load ~/tasks -d cache.db -t tasks -v
Watch mode (automatic refresh):
Use --watch to automatically update when files change:
# Starts watching - runs until Ctrl-C
sqldown load ~/tasks -d cache.db -t tasks -w
# Output shows real-time updates:
# [2025-01-15 10:23:45] Updated: AG-22_feat_add-configuration/README.md
# [2025-01-15 10:24:12] Added: AG-31_feat_new-feature/README.md
# [2025-01-15 10:25:03] Deleted: AG-19_feat_old-feature/README.md
Watch mode is ideal for development workflows where you want the cache to stay in sync as you edit files.
Why sqlite3 Instead of Python Wrappers?
sqlite3 gives you:
- Full SQL power (no wrapper limitations)
- Standard tool (no custom command syntax to learn)
- Better output formats (
.mode csv,.mode json,.mode column) - Interactive shell with history and tab completion
- Better performance (no Python startup overhead)
sqldown load gives you:
- Markdown + YAML frontmatter parsing
- Dynamic schema generation
- H2 section extraction
- Automatic .gitignore filtering
- Watch mode for auto-updates
This division of responsibility keeps tools simple and powerful.
Workflow Guidelines
Start with schema inspection:
sqlite3 cache.db ".schema tasks"Shows what columns are available (critical for dynamic schemas).
Use simple queries first:
sqlite3 cache.db "SELECT title, status FROM tasks LIMIT 5"Build up complexity:
sqlite3 cache.db "SELECT status, COUNT(*) FROM tasks GROUP BY status"Read full files last: Only after identifying relevant documents via SQL.
Trust .gitignore filtering: By default, sqldown load respects .gitignore automatically.
Requirements
Prerequisites:
- Python 3.10+ (includes sqlite3 module - standard library)
- sqlite3 CLI (built-in on macOS 10.4+ and most Linux distributions)
Installation:
# Install from PyPI
pip install sqldown
# Or use uv for faster installation
uv pip install sqldown
Limitations
- Best for <100K documents
- SQLite column limit: 2000 columns max (sqldown detects and reports)
- No built-in full-text search (though SQLite FTS5 could be added)
Additional Commands
Dump: Export Back to Markdown
sqldown dump -d DATABASE -o OUTPUT_DIR [OPTIONS]
What it does:
- Exports database rows back to markdown files
- Reconstructs original markdown structure with frontmatter
- Preserves file paths from original import
- Skips unchanged files (smart change detection)
- Supports SQL filtering to export subsets
Options:
-d, --db PATH- Database file (required)-t, --table NAME- Table name (default:docs)-o, --output PATH- Output directory (required)-f, --filter WHERE- SQL WHERE clause to filter rows--force- Always write files, even if unchanged--dry-run- Preview what would be exported without writing-v, --verbose- Show detailed progress
Examples:
# Export all documents
sqldown dump -d cache.db -o ~/restored
# Export only active tasks
sqldown dump -d cache.db -t tasks -o ~/active --filter "status='active'"
# Preview export without writing files
sqldown dump -d cache.db -o ~/export --dry-run
Info: Database Statistics
sqldown info [OPTIONS]
What it does:
- Shows database statistics and table information
- Lists all tables with document counts
- Displays column breakdown (frontmatter vs sections)
- Provides schema details for specific tables
Options:
-d, --db PATH- Database file (default:sqldown.dbif exists)-t, --table NAME- Show detailed info for specific table
Examples:
# Show database overview
sqldown info
# Show info for specific database
sqldown info -d cache.db
# Show detailed table information
sqldown info -d cache.db -t tasks
Technical Details
Automatic .gitignore Support:
- Reads .gitignore from root directory by default
- Uses pathspec library for gitignore pattern matching
- Filters files before import to avoid unwanted content
Column Limit Protection:
- Uses
--top-sectionsto extract only the N most common H2 sections - Prevents hitting SQLite's 2000 column limit
- Rare sections remain in
bodyfield - nothing is lost
Related Files
bin/sqldown- Main CLI toolREADME.md- Human-facing documentationSPECIFICATION.md- Technical specification