| name | knowledge-graph-builder |
| description | Build large-scale knowledge graphs with D3.js visualization, CI/CD pipelines, ETL processes, and query optimization. Supports Neo4j and graph databases for millions of nodes with incremental updates and interactive exploration. |
Knowledge Graph Builder Skill
Build and visualize large-scale knowledge graphs with millions of nodes, ETL pipelines, and CI/CD automation.
What This Skill Provides
Core Tools
- build_knowledge_graph.py - Construct graphs from data sources (JSON, CSV, databases)
- visualize_graph.py - D3.js interactive visualizations
- optimize_graph_queries.py - Query optimization for large graphs
- setup_graph_pipeline.py - CI/CD pipeline for graph updates
References
- large_scale_graphs.md - Handling 10M+ nodes, partitioning strategies
- d3_graph_viz.md - Interactive D3.js visualizations
- graph_etl_patterns.md - ETL pipelines for graph construction
- best_practices.md - Graph database best practices
- troubleshooting.md - Common graph issues
Templates
- Graph construction pipeline templates
- D3.js visualization components
- GitHub Actions workflows for graph updates
When to Use
Perfect For
- Building knowledge bases with millions of entities
- Network analysis and relationship mapping
- D3.js interactive graph visualization
- CI/CD for automated graph updates
- Graph ETL pipelines
- Query optimization for large graphs
Not For
- Simple graph schema design (use neo4j-integration)
- Small graphs (< 10,000 nodes) - over-engineered
- Relational database modeling
Quick Start
# Build graph from data
python scripts/build_knowledge_graph.py \
--input data.json \
--output graph.db
# Visualize
python scripts/visualize_graph.py \
--graph graph.db \
--output visualization.html
# Setup CI/CD
python scripts/setup_graph_pipeline.py \
--repo-path . \
--schedule "0 0 * * *"
Decision Trees
Which visualization approach?
- Small (< 1000 nodes): Full D3.js force-directed
- Medium (1K-100K): Clustered view with drill-down
- Large (> 100K): Heatmaps, aggregated views
Quality Checklist
- Graph schema documented
- ETL pipeline automated
- Incremental updates implemented
- Indexes on frequently queried properties
- CI/CD pipeline configured
- Visualization responsive and interactive
Common Pitfalls
Pitfall: Full Graph Rendering
Solution: Use clustering, viewport-based rendering, or aggregation for large graphs.
Pitfall: Slow Queries
Solution: Create indexes, use query profiling, optimize Cypher/Gremlin queries.
Pro Tips
Tip 1: Incremental Updates
def update_graph_incrementally(new_data):
# Only update changed nodes/edges
for entity in new_data:
if entity.hash != stored_hash:
update_node(entity)
Related Skills
- neo4j-integration - Schema design and basic queries
- data-viz-studio - General data visualization
- frontend-component-system - D3.js component patterns