Claude Code Plugins

Community-maintained marketplace

Feedback

Analyzes dbt model dependencies, traces upstream sources and downstream consumers, identifies circular dependencies, and visualizes data lineage. Use when user asks about model dependencies, wants to understand data flow, needs impact analysis, or wants to trace where data comes from or goes to.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name lineage-analyzer
description Analyzes dbt model dependencies, traces upstream sources and downstream consumers, identifies circular dependencies, and visualizes data lineage. Use when user asks about model dependencies, wants to understand data flow, needs impact analysis, or wants to trace where data comes from or goes to.
allowed-tools Read, Grep, Glob, Bash

dbt Lineage Analyzer

Expert skill for analyzing and tracing dbt model dependencies and data lineage.

When to Use This Skill

Activate when the user mentions:

  • "Show me the lineage for this model"
  • "What models depend on X"
  • "Where does this data come from"
  • "Trace dependencies"
  • "Impact analysis"
  • "Upstream/downstream models"
  • "DAG structure"
  • "Circular dependencies"

Core Capabilities

1. Dependency Discovery

Extract dependencies from dbt models by finding:

  • {{ ref('model_name') }} - model references
  • {{ source('source_name', 'table_name') }} - source references
  • Dependencies in Jinja loops and conditionals

2. Lineage Tracing Methods

Method 1: File Scanning (Primary)

# Find all model files
find models/ -name "*.sql" -type f

# Extract refs from a model
grep -o "ref('[^']*')" model_file.sql | sed "s/ref('\(.*\)')/\1/"

# Extract sources
grep -o "source('[^']*',\s*'[^']*')" model_file.sql

Method 2: Parse manifest.json (If Available)

# Check for dbt manifest
if [ -f "target/manifest.json" ]; then
    # Parse using jq or Python
    cat target/manifest.json | jq '.nodes[] | select(.resource_type == "model")'
fi

3. Build Dependency Graph

For each model, extract:

  1. Model name and path
  2. Direct parent refs
  3. Direct parent sources
  4. Config (materialization, schema)

Create bidirectional mapping:

  • parents[model] = list of dependencies
  • children[model] = list of consumers

4. Recursive Tracing

Upstream Traversal (Sources)

def trace_upstream(model_name, graph, visited=None):
    if visited is None:
        visited = set()
    
    if model_name in visited:
        return []  # Circular dependency
    
    visited.add(model_name)
    upstream = []
    
    for parent in graph['parents'].get(model_name, []):
        upstream.append(parent)
        upstream.extend(trace_upstream(parent, graph, visited))
    
    return upstream

Downstream Traversal (Consumers)

def trace_downstream(model_name, graph, visited=None):
    if visited is None:
        visited = set()
    
    if model_name in visited:
        return []
    
    visited.add(model_name)
    downstream = []
    
    for child in graph['children'].get(model_name, []):
        downstream.append(child)
        downstream.extend(trace_downstream(child, graph, visited))
    
    return downstream

5. Visualization Formats

Tree Format (Compact)

fct_orders
├── int_order_details
│   ├── stg_orders
│   │   └── source.raw.orders
│   └── stg_order_items
│       └── source.raw.order_items
└── dim_customers
    └── stg_customers
        └── source.raw.customers

Detailed Format

Model: fct_orders
Type: model
Path: models/marts/core/fct_orders.sql
Materialization: incremental
Schema: analytics

DIRECT DEPENDENCIES:
  → int_order_details (ephemeral)
  → dim_customers (table)

UPSTREAM LINEAGE (3 levels deep):
  Level 1:
    - int_order_details
    - dim_customers
  Level 2:
    - stg_orders
    - stg_order_items
    - stg_customers
  Level 3:
    - source.raw.orders
    - source.raw.order_items
    - source.raw.customers

DOWNSTREAM CONSUMERS:
  ← rpt_daily_revenue (view)
  ← dashboard_sales_metrics (table)

STATISTICS:
  - Dependency Depth: 3
  - Total Upstream Models: 5
  - Total Sources: 3
  - Direct Children: 2
  - Total Downstream: 2

Mermaid Diagram Format

graph TD
    source_orders[(source: orders)]
    source_customers[(source: customers)]
    stg_orders[stg_orders]
    stg_customers[stg_customers]
    int_order_details[int_order_details]
    fct_orders[fct_orders]
    
    source_orders --> stg_orders
    source_customers --> stg_customers
    stg_orders --> int_order_details
    stg_customers --> int_order_details
    int_order_details --> fct_orders

6. Impact Analysis

When tracing lineage, also identify:

Breaking Change Impact

  • If this model's schema changes, what breaks?
  • Models that depend on specific columns
  • Tests that would fail

Performance Impact

  • Models on the critical path
  • Long-running models blocking others
  • Warehouse resource consumption

Freshness Impact

  • Source freshness checks
  • Incremental load dependencies
  • SLA-critical models

7. Circular Dependency Detection

def find_cycles(graph):
    """Detect circular dependencies using DFS"""
    def visit(node, path, visited, cycles):
        if node in path:
            cycle_start = path.index(node)
            cycles.append(path[cycle_start:] + [node])
            return
        
        if node in visited:
            return
        
        visited.add(node)
        path.append(node)
        
        for child in graph.get(node, []):
            visit(child, path.copy(), visited, cycles)
    
    cycles = []
    visited = set()
    
    for node in graph:
        if node not in visited:
            visit(node, [], visited, cycles)
    
    return cycles

Practical Workflows

Workflow 1: Single Model Lineage

User: "Show lineage for fct_orders"

Steps:
1. Find the model file: models/marts/core/fct_orders.sql
2. Extract refs and sources from the file
3. Build immediate parent list
4. Recursively trace upstream to sources
5. Trace downstream to find consumers
6. Generate visualization
7. Calculate statistics

Workflow 2: Column-Level Lineage

User: "Where does the 'customer_ltv' column come from?"

Steps:
1. Find model containing customer_ltv
2. Parse SELECT statement to see source column
3. If derived, trace back through transformations
4. Identify originating source table and column
5. Map full transformation path

Workflow 3: Impact Analysis

User: "What breaks if I change stg_customers?"

Steps:
1. Find all downstream models referencing stg_customers
2. For each downstream model:
   - Check which columns it uses from stg_customers
   - Identify tests on those columns
   - Mark as impacted
3. Trace to final downstream consumers
4. Generate impact report

Workflow 4: Critical Path Analysis

User: "What's the critical path in my dbt project?"

Steps:
1. Build full dependency graph
2. Identify leaf nodes (final marts/reports)
3. Trace back to sources
4. Calculate depth for each path
5. Identify longest paths
6. Highlight bottleneck models

Implementation Approach

Step 1: Scan Project Structure

# Find all dbt models
find models/ -type f -name "*.sql" > model_list.txt

# Find sources
find models/ -type f -name "*.yml" | xargs grep -l "sources:"

Step 2: Extract Dependencies

For each model file:

import re

def extract_refs(sql_content):
    # Pattern: {{ ref('model_name') }}
    pattern = r"{{\s*ref\(['\"]([^'\"]+)['\"]\)\s*}}"
    return re.findall(pattern, sql_content)

def extract_sources(sql_content):
    # Pattern: {{ source('source', 'table') }}
    pattern = r"{{\s*source\(['\"]([^'\"]+)['\"]\s*,\s*['\"]([^'\"]+)['\"]\)\s*}}"
    return re.findall(pattern, sql_content)

Step 3: Build Graph Structure

graph = {
    'nodes': {},  # model_name: {path, type, config}
    'edges': [],  # (parent, child) tuples
    'sources': {} # source_name.table_name: metadata
}

for model_file in model_files:
    model_name = extract_model_name(model_file)
    refs = extract_refs(read_file(model_file))
    sources = extract_sources(read_file(model_file))
    
    graph['nodes'][model_name] = {
        'path': model_file,
        'type': 'model',
        'parents': refs + sources
    }
    
    for ref in refs:
        graph['edges'].append((ref, model_name))

Step 4: Query and Visualize

def get_lineage(model_name, graph):
    upstream = trace_upstream(model_name, graph)
    downstream = trace_downstream(model_name, graph)
    
    return {
        'model': model_name,
        'upstream': upstream,
        'downstream': downstream,
        'depth': calculate_depth(model_name, graph),
        'stats': calculate_stats(model_name, graph)
    }

Output Examples

Example 1: Simple Lineage

Model: stg_orders
Path: models/staging/ecommerce/stg_orders.sql
Type: View

UPSTREAM:
└── source.raw.orders

DOWNSTREAM:
├── int_order_details (ephemeral)
└── fct_daily_orders (table)

Stats: 1 level deep, 1 source, 2 consumers

Example 2: Complex Lineage with Circular Dependency

⚠️  CIRCULAR DEPENDENCY DETECTED:

model_a → model_b → model_c → model_a

Path: models/intermediate/model_a.sql
      models/intermediate/model_b.sql  
      models/intermediate/model_c.sql

This creates an infinite loop. Please refactor to break the cycle.

Example 3: Critical Path

CRITICAL PATH ANALYSIS

Longest dependency chain (6 levels):
source.raw.events 
  → stg_events (view)
    → int_event_sessions (ephemeral)
      → int_user_sessions (ephemeral)
        → fct_user_activity (table, 45min build)
          → rpt_user_dashboard (view)

Bottleneck: fct_user_activity (45min)
Recommendation: Consider incremental materialization

Troubleshooting

Issue: Can't find model file

  • Check project structure and model naming
  • Look in subdirectories under models/
  • Verify model isn't disabled in dbt_project.yml

Issue: Missing dependencies in output

  • Check for dynamic refs: {{ ref(var('model_name')) }}
  • Look for refs in Jinja loops
  • Check macros that might generate refs

Issue: Manifest not available

  • Fall back to file scanning method
  • Run dbt compile first to generate manifest
  • Parse .sql files directly

Quality Checklist

  • All refs and sources extracted correctly
  • Circular dependencies identified
  • Depth calculations accurate
  • Downstream consumers found
  • Visualization clear and readable
  • Statistics include all relevant metrics
  • Special cases handled (dynamic refs, macros)