Claude Code Plugins

Community-maintained marketplace

Feedback
24
14

Universal data lake and lakehouse patterns covering ingestion (dlt, Airbyte), transformation (SQLMesh, dbt), storage formats (Iceberg, Delta, Hudi, Parquet), query engines (ClickHouse, DuckDB, Doris, StarRocks), streaming (Kafka, Flink), orchestration (Dagster, Airflow, Prefect), and visualization (Metabase, Superset, Grafana). Self-hosted and cloud options.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name data-lake-platform
description Data lake and lakehouse patterns: ingestion, CDC, Iceberg/Delta/Hudi, Trino/DuckDB, orchestration, and governance. Self-hosted and cloud.

Data Lake Platform

Build and operate production data lakes and lakehouses: ingest, transform, store in open formats, and serve analytics reliably.

When to Use

  • Design data lake/lakehouse architecture
  • Set up ingestion pipelines (batch, incremental, CDC)
  • Build SQL transformation layers (SQLMesh, dbt)
  • Choose table formats and catalogs (Iceberg, Delta, Hudi)
  • Deploy query/serving engines (Trino, ClickHouse, DuckDB)
  • Implement streaming pipelines (Kafka, Flink)
  • Set up orchestration (Dagster, Airflow, Prefect)
  • Add governance, lineage, data quality, and cost controls

Triage Questions

  1. Batch, streaming, or hybrid? What is the freshness SLO?
  2. Append-only vs upserts/deletes (CDC)? Is time travel required?
  3. Primary query pattern: BI dashboards (high concurrency), ad-hoc joins, embedded analytics?
  4. PII/compliance: row/column-level access, retention, audit logging?
  5. Platform constraints: self-hosted vs cloud, preferred engines, team strengths?

Default Baseline (Good Starting Point)

  • Storage: object storage + open table format (usually Iceberg)
  • Catalog: REST/Hive/Glue/Nessie/Unity (match your platform)
  • Transforms: SQLMesh or dbt (pick one and standardize)
  • Lake query: Trino (or Spark for heavy compute/ML workloads)
  • Serving (optional): ClickHouse/StarRocks/Doris for low-latency BI
  • Governance: DataHub/OpenMetadata + OpenLineage
  • Orchestration: Dagster/Airflow/Prefect

Workflow

  1. Pick table format + catalog: references/storage-formats.md (use assets/cross-platform/template-schema-evolution.md and assets/cross-platform/template-partitioning-strategy.md)
  2. Design ingestion (batch/incremental/CDC): references/ingestion-patterns.md (use assets/cross-platform/template-ingestion-governance-checklist.md and assets/cross-platform/template-incremental-loading.md)
  3. Design transformations (bronze/silver/gold or data products): references/transformation-patterns.md (use assets/cross-platform/template-data-pipeline.md)
  4. Choose lake query vs serving engines: references/query-engine-patterns.md
  5. Add governance, lineage, and quality gates: references/governance-catalog.md (use assets/cross-platform/template-data-quality-governance.md and assets/cross-platform/template-data-quality.md)
  6. Plan operations + cost controls: references/operational-playbook.md and references/cost-optimization.md (use assets/cross-platform/template-data-quality-backfill-runbook.md and assets/cross-platform/template-cost-optimization.md)

Architecture Patterns

  • Medallion (bronze/silver/gold): references/architecture-patterns.md
  • Data mesh (domain-owned data products): references/architecture-patterns.md
  • Streaming-first (Kappa): references/streaming-patterns.md

Quick Start

dlt + ClickHouse

pip install "dlt[clickhouse]"
dlt init rest_api clickhouse
python pipeline.py

SQLMesh + DuckDB

pip install sqlmesh
sqlmesh init duckdb
sqlmesh plan && sqlmesh run

Reliability and Safety

Do

  • Define data contracts and owners up front
  • Add quality gates (freshness, volume, schema, distribution) per tier
  • Make every pipeline idempotent and re-runnable (backfills are normal)
  • Treat access control and audit logging as first-class requirements

Avoid

  • Skipping validation to "move fast"
  • Storing PII without access controls
  • Pipelines that can't be re-run safely
  • Manual schema changes without version control

Resources

Resource Purpose
references/architecture-patterns.md Medallion, data mesh
references/ingestion-patterns.md dlt vs Airbyte, CDC
references/transformation-patterns.md SQLMesh vs dbt
references/storage-formats.md Iceberg vs Delta
references/query-engine-patterns.md ClickHouse, DuckDB
references/streaming-patterns.md Kafka, Flink
references/orchestration-patterns.md Dagster, Airflow
references/bi-visualization-patterns.md Metabase, Superset
references/cost-optimization.md Cost levers and maintenance
references/operational-playbook.md Monitoring and incident response
references/governance-catalog.md Catalog, lineage, access control
references/data-mesh-patterns.md Domain ownership, data products, federated governance
references/data-quality-patterns.md Quality gates, validation frameworks, SLOs, anomaly detection
references/security-access-patterns.md Row/column security, encryption, audit logging, compliance

Templates

Template Purpose
assets/cross-platform/template-medallion-architecture.md Baseline bronze/silver/gold plan
assets/cross-platform/template-data-pipeline.md End-to-end pipeline skeleton
assets/cross-platform/template-ingestion-governance-checklist.md Source onboarding checklist
assets/cross-platform/template-incremental-loading.md Incremental + backfill plan
assets/cross-platform/template-schema-evolution.md Schema change rules
assets/cross-platform/template-cost-optimization.md Cost control checklist
assets/cross-platform/template-data-quality-governance.md Quality contracts + SLOs
assets/cross-platform/template-data-quality-backfill-runbook.md Backfill incident/runbook

Related Skills

Skill Purpose
ai-mlops ML deployment
ai-ml-data-science Feature engineering
data-sql-optimization OLTP optimization

Fact-Checking

  • Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
  • Prefer primary sources; report source links and dates for volatile information.
  • If web access is unavailable, state the limitation and mark guidance as unverified.