name	code-explore
description	Systematic codebase exploration using tree-sitter AST analysis and repomix semantic search. Use when: (1) User asks to explore/analyze/find/search codebase, (2) Looking for functions/classes/symbols/patterns, (3) Understanding project structure, (4) Tracing dependencies/usage, (5) ANY code analysis task. CRITICAL: NEVER use basic filesystem tools (find, grep, Glob, Read multiple files) when tree-sitter MCP or repomix tools are available.
version	1.1.0
metadata	[object Object]

Code Exploration with Tree-Sitter & Repomix

Overview

Problem: Using filesystem tools (find, grep, Read) for code exploration is inefficient:

Wastes context tokens reading full files
Requires multiple iterations to find patterns
No semantic understanding of code structure
Slow and error-prone

Solution: Purpose-built code analysis tools:

Tree-Sitter MCP: AST-based analysis for precise code understanding
Repomix: Semantic codebase packaging for AI consumption
Strategic routing: Right tool for each exploration type

Core Principle: ALWAYS use specialized tools over filesystem commands for code analysis.

When to Use This Skill

Trigger Phrases

Activate this skill when user says:

"Find all [functions/classes/imports]"
"Where is [symbol] used?"
"Show me the structure of [project/file]"
"What does [file] import?"
"Search for [pattern/idiom]"
"Explore the codebase"
"Analyze [component]"
"Trace [dependency]"

Red Flags - STOP and Use This Skill

If you catch yourself thinking:

"Let me search for files with glob..."
"I'll read each [language] file to find..."
"Let me grep for this pattern..."
"I'll use find to locate..."
"I need to read multiple files to understand..."

ALL of these mean: STOP. Use code-explore skill instead.

Anti-Patterns This Skill Prevents

❌ NEVER do this:

1. Glob("**/*.py")
2. For each file: Read(file)
3. Search manually for pattern

✅ ALWAYS do this instead:

repomix grep "pattern" --context 2

❌ NEVER do this:

1. Read("module.py")
2. Manually parse imports from content

✅ ALWAYS do this instead:

tree-sitter get_dependencies(project, "module.py")

Core Workflow

Phase 1: One-Time Setup

Check if project is registered:

tree-sitter list_projects_tool()
→ Project exists? Skip to Phase 2
→ Not registered? Continue below

Register project with tree-sitter:

tree-sitter register_project_tool(
  path="/absolute/path/to/project",
  name="project-name",
  description="Brief description"
)

Pack codebase with repomix:

repomix pack_codebase(
  directory="/absolute/path/to/project",
  style="xml",
  topFilesLength=15
)

Store the outputId returned by repomix for future searches.

Phase 2: Route to Optimal Tool

Use this decision tree to select the right tool:

What type of exploration?
│
├─ Find ALL symbols (functions/classes)?
│  └─ repomix grep "^class |^def |^function "
│     Why: Fast, language-agnostic pattern matching
│
├─ Understand project structure?
│  ├─ tree-sitter analyze_project(project, scan_depth=3)
│  │  Why: Language counts, directory structure, entry points
│  └─ repomix pack (for semantic overview)
│     Why: See file tree + metrics
│
├─ Find symbols IN SPECIFIC FILE?
│  └─ tree-sitter get_symbols(project, file_path, ["functions", "classes", "imports"])
│     Why: Precise AST extraction
│
├─ Trace dependencies/imports?
│  └─ tree-sitter get_dependencies(project, file_path)
│     Why: Direct import analysis
│
├─ Find where symbol is used?
│  └─ tree-sitter find_usage(project, symbol, [file_path], [language])
│     Why: Tracks references across codebase
│
├─ Search for code patterns/idioms?
│  └─ repomix grep_repomix_output(outputId, pattern, contextLines=2)
│     Why: Regex search with context
│
├─ Analyze code complexity?
│  └─ tree-sitter analyze_complexity(project, file_path)
│     Why: Metrics (cyclomatic complexity, nesting depth)
│
└─ Find similar code?
   └─ tree-sitter find_similar_code(project, snippet, threshold=0.8)
      Why: Structural similarity detection

Phase 3: Validate, Synthesize & Return Results

Step 3.1: Security Validation (for packed outputs)

BEFORE delivering repomix packed output, check for sensitive data:

Repomix has built-in security scanning that detects:

API keys, tokens, auth credentials
Connection strings with passwords
Private keys, certificates
Environment variable secrets

If sensitive data detected:

Warn user immediately: "⚠️ Packed output contains potential secrets"
Suggest .repomixignore patterns: Exclude config files, .env, credentials
Recommend alternatives: Use includePatterns to focus on source code only
Never share sensitive output directly

Example warning:

⚠️ Security Check: Found potential secrets in packed output:
- DATABASE_URL with password in config/database.yml
- API_KEY in .env.production

Recommendation: Add to .repomixignore:
  config/database.yml
  .env*
  **/*credentials*

Or use: includePatterns="src/**/*.py,tests/**/*.py"

Step 3.2: Token Awareness

Check packed output size:

Repomix returns token count in metrics
If > 100K tokens, suggest optimization:
- compress: true - Tree-sitter compression (70% reduction)
- includePatterns - Focus on relevant files
- Split analysis across multiple sessions

Example token warning:

📊 Token Count: 156,842 tokens (exceeds typical context window)

Recommendations:
1. Enable compression: compress=true (reduces to ~47K tokens)
2. Focus packing: includePatterns="src/**/*.ts,tests/**/*.test.ts"
3. Or analyze in phases: pack src/ first, then tests/ separately

Step 3.3: Synthesize & Format

DON'T:

Dump raw tool output to user
Return unformatted grep results
Show 500-line JSON responses

DO:

Synthesize findings into clear summary
Extract relevant code snippets with file:line references
Provide actionable insights
Format results as tables/lists for readability

Example Good Output:

Found 8 logging functions across the codebase:

1. log_info() - lib/bash-logger.sh:2952
2. log_warn() - lib/bash-logger.sh:2953
3. log_error() - lib/bash-logger.sh:2954
...

Usage pattern: 207 calls across 6 scripts:
- ntt-backup-chll.sh: 34 calls
- ntt-backup-usb.sh: 28 calls
- floppy-orchestrator: 45 calls
...

Tool Reference

Tree-Sitter MCP Tools

Tool	Purpose	When to Use
`register_project_tool`	One-time project setup	First time analyzing codebase
`list_projects_tool`	List registered projects	Check if project exists
`analyze_project`	Project-wide structure	Understand overall architecture
`get_symbols`	Extract functions/classes/imports	Find symbols in specific file
`find_usage`	Trace symbol references	"Where is X used?"
`get_dependencies`	Parse imports/dependencies	"What does X import?"
`find_similar_code`	Find structural patterns	Detect code duplication
`analyze_complexity`	Code metrics	Complexity analysis
`run_query`	Custom tree-sitter queries	Advanced AST analysis

Repomix Tools

Tool	Purpose	When to Use
`pack_codebase`	Package into AI-optimized format	Initial codebase overview
`grep_repomix_output`	Regex search packed output	Find patterns/idioms
`read_repomix_output`	Read specific line ranges	View specific sections
`attach_packed_output`	Load existing pack	Resume from previous pack

See references/tool-comparison.md for detailed capabilities.

Repomix Configuration Best Practices

When to Use .repomixignore

Create .repomixignore in project root to systematically exclude:

# Dependencies
node_modules/
venv/
.venv/
vendor/

# Build artifacts
dist/
build/
*.pyc
__pycache__/

# Configuration with secrets
.env*
config/credentials*
**/*secret*
**/*password*

# Large binary files
*.zip
*.tar.gz
*.pdf
*.png
*.jpg

# IDE files
.vscode/
.idea/

includePatterns vs excludePatterns

Use includePatterns when: You want to focus on specific source files

includePatterns="src/**/*.ts,tests/**/*.test.ts"
# Only packs TypeScript source and tests

Use ignorePatterns when: You want everything except specific files

ignorePatterns="test/**,*.spec.js"
# Packs everything except tests

Comment Removal Strategy

Remove comments (--remove-comments) when:

Token budget is tight
Comments add noise (auto-generated docs, linter rules)
You need pure code structure

Keep comments when:

Comments explain complex logic
Documentation is needed for understanding
First-time codebase exploration

Output Format Selection

Format	Best For	Token Efficiency
XML	AI parsing, structured analysis	Medium
Markdown	Human review, documentation	Low (most readable)
JSON	Programmatic processing, grep	High (compact)
Plain	Simple text search	Highest (minimal formatting)

Recommendation: Use XML (default) for AI analysis, unless token budget requires JSON/Plain.

Compression Flag Usage

Use compress: true when:

Codebase > 50K lines
Token count > 100K
Need overview without implementation details

Compression extracts:

Function signatures (not bodies)
Class definitions (not methods)
Type definitions
Import statements
~70% token reduction

Skip compression when:

Need full implementation details
Debugging specific logic
Small codebase (< 10K lines)

Concrete Examples

Example 1: Find All Logging Functions

User asks: "Find all the logging functions in ntt"

Wrong approach:

Glob("**/*.sh")
→ Read each file
→ Search for "log" manually

Correct approach:

repomix grep_repomix_output(
  outputId="...",
  pattern="log_info|log_warn|log_error",
  contextLines=0
)

Result: 207 matches in <1 second

Example 2: Understand Project Structure

User asks: "What's the structure of this project?"

Wrong approach:

Bash("find . -type f")
→ Parse output manually

Correct approach:

tree-sitter analyze_project("ntt", scan_depth=3)

Result: Language counts, directory structure, file types

Example 3: Find Where CopyWorker is Used

User asks: "Where is CopyWorker class used?"

Wrong approach:

Grep("CopyWorker", "**/*.py")
→ Read each match manually

Correct approach:

tree-sitter find_usage(
  project="ntt",
  symbol="CopyWorker",
  language="python"
)

Result: All usage locations with file:line references

Example 4: Trace Import Dependencies

User asks: "What does ntt-copier.py import?"

Wrong approach:

Read("bin/ntt-copier.py")
→ Parse import statements manually

Correct approach:

tree-sitter get_dependencies("ntt", "bin/ntt-copier.py")

Result: Complete dependency tree

Advanced Patterns

Combining Tools for Complex Analysis

Pattern 1: Find + Analyze

1. repomix grep "^class " → Find all classes
2. tree-sitter get_symbols → Extract methods for each
3. Synthesize into class hierarchy

Pattern 2: Structure + Dependencies

1. tree-sitter analyze_project → Get file list
2. For key files: get_dependencies → Build dependency graph
3. Visualize as mermaid diagram

Pattern 3: Pattern + Usage

1. repomix grep "pattern" → Find pattern usage
2. tree-sitter find_usage → Trace each occurrence
3. Summarize usage context

Performance Characteristics

Tree-Sitter

Strengths:

Precise AST-based analysis
Language-aware parsing
Fast symbol extraction
Handles syntax errors gracefully

Limitations:

Requires language parser support
One file at a time for some operations
Setup required (registration)

Repomix

Strengths:

Entire codebase in single output
Fast regex search
No language limitations
Includes metrics + file tree

Limitations:

Large codebases = large output (use compress flag)
Regex only (no AST understanding)
Need to pack first (one-time cost)

See references/tool-comparison.md for detailed benchmarks.

Integration with Other Skills

Works With

skills/repomix - Detailed repomix usage patterns
skills/docs-seeker - Finding external documentation
skills/systematic-debugging - Root cause analysis
skills/memory-augmented-dev - Store exploration findings

Common Workflow

1. code-explore → Understand codebase structure
2. memory-augmented-dev → Store key insights
3. systematic-debugging → Apply findings to debug
4. docs-seeker → Find relevant documentation

The Iron Law

NEVER USE FILESYSTEM TOOLS WHEN CODE ANALYSIS TOOLS ARE AVAILABLE

If tree-sitter MCP tools are present, you MUST use them instead of:

find / Glob / ls for finding code
grep / Grep for searching patterns
cat / Read for parsing symbols
Manual file iteration

Violating this rule wastes tokens and time.

Checklist: Before Using Filesystem Tools

Before using Glob, Grep, or Read for code analysis, ask:

Is this a code exploration task?
Are tree-sitter MCP tools available?
Is repomix packed for this project?
Would tree-sitter or repomix be more efficient?

If any answer is YES, use this skill instead.

Quick Reference Card

TASK                          TOOL                              COMMAND
────────────────────────────────────────────────────────────────────────
Find all functions            repomix grep                      "^def |^function "
Find classes                  repomix grep                      "^class "
Project structure             tree-sitter analyze_project       scan_depth=3
Symbols in file               tree-sitter get_symbols           ["functions", "classes"]
Where is X used?              tree-sitter find_usage            symbol="X"
What does X import?           tree-sitter get_dependencies      file_path="X"
Find pattern                  repomix grep_repomix_output       pattern="..."
Code complexity               tree-sitter analyze_complexity    file_path="X"
Similar code                  tree-sitter find_similar_code     snippet="..."

Install Skill

SKILL.md

Code Exploration with Tree-Sitter & Repomix

Overview

When to Use This Skill

Trigger Phrases

Red Flags - STOP and Use This Skill

Anti-Patterns This Skill Prevents

Core Workflow

Phase 1: One-Time Setup

Phase 2: Route to Optimal Tool

Phase 3: Validate, Synthesize & Return Results

Step 3.1: Security Validation (for packed outputs)

Step 3.2: Token Awareness

Step 3.3: Synthesize & Format

Tool Reference

Tree-Sitter MCP Tools

Repomix Tools

Repomix Configuration Best Practices

When to Use .repomixignore

includePatterns vs excludePatterns

Comment Removal Strategy

Output Format Selection

Compression Flag Usage

Concrete Examples

Example 1: Find All Logging Functions

Example 2: Understand Project Structure

Example 3: Find Where CopyWorker is Used

Example 4: Trace Import Dependencies

Advanced Patterns

Combining Tools for Complex Analysis

Performance Characteristics

Tree-Sitter

Repomix

Integration with Other Skills

Works With

Common Workflow

The Iron Law

Checklist: Before Using Filesystem Tools

Quick Reference Card

Further Reading