name	knowledge-graph-builder
description	Build large-scale knowledge graphs with D3.js visualization, CI/CD pipelines, ETL processes, and query optimization. Supports Neo4j and graph databases for millions of nodes with incremental updates and interactive exploration.

Knowledge Graph Builder Skill

Build and visualize large-scale knowledge graphs with millions of nodes, ETL pipelines, and CI/CD automation.

What This Skill Provides

Core Tools

build_knowledge_graph.py - Construct graphs from data sources (JSON, CSV, databases)
visualize_graph.py - D3.js interactive visualizations
optimize_graph_queries.py - Query optimization for large graphs
setup_graph_pipeline.py - CI/CD pipeline for graph updates

References

large_scale_graphs.md - Handling 10M+ nodes, partitioning strategies
d3_graph_viz.md - Interactive D3.js visualizations
graph_etl_patterns.md - ETL pipelines for graph construction
best_practices.md - Graph database best practices
troubleshooting.md - Common graph issues

Templates

Graph construction pipeline templates
D3.js visualization components
GitHub Actions workflows for graph updates

When to Use

Perfect For

Building knowledge bases with millions of entities
Network analysis and relationship mapping
D3.js interactive graph visualization
CI/CD for automated graph updates
Graph ETL pipelines
Query optimization for large graphs

Not For

Simple graph schema design (use neo4j-integration)
Small graphs (< 10,000 nodes) - over-engineered
Relational database modeling

Quick Start

# Build graph from data
python scripts/build_knowledge_graph.py \
  --input data.json \
  --output graph.db

# Visualize
python scripts/visualize_graph.py \
  --graph graph.db \
  --output visualization.html

# Setup CI/CD
python scripts/setup_graph_pipeline.py \
  --repo-path . \
  --schedule "0 0 * * *"

Decision Trees

Which visualization approach?

Small (< 1000 nodes): Full D3.js force-directed
Medium (1K-100K): Clustered view with drill-down
Large (> 100K): Heatmaps, aggregated views

Quality Checklist

Graph schema documented
ETL pipeline automated
Incremental updates implemented
Indexes on frequently queried properties
CI/CD pipeline configured
Visualization responsive and interactive

Common Pitfalls

Pitfall: Full Graph Rendering

Solution: Use clustering, viewport-based rendering, or aggregation for large graphs.

Pitfall: Slow Queries

Solution: Create indexes, use query profiling, optimize Cypher/Gremlin queries.

Pro Tips

Tip 1: Incremental Updates

def update_graph_incrementally(new_data):
    # Only update changed nodes/edges
    for entity in new_data:
        if entity.hash != stored_hash:
            update_node(entity)

Related Skills

neo4j-integration - Schema design and basic queries
data-viz-studio - General data visualization
frontend-component-system - D3.js component patterns

knowledge-graph-builder

Install Skill

SKILL.md