Claude Code Plugins

Community-maintained marketplace

Feedback

Data warehouse (大数据/数仓) ops skill for designing and operating analytical data platforms. Use for tasks like defining source-of-truth, building ETL/ELT pipelines, dimensional modeling (star schema), data quality checks, partitioning strategies, cost/performance tuning, governance, lineage, and SLA monitoring.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name data-warehouse
description Data warehouse (大数据/数仓) ops skill for designing and operating analytical data platforms. Use for tasks like defining source-of-truth, building ETL/ELT pipelines, dimensional modeling (star schema), data quality checks, partitioning strategies, cost/performance tuning, governance, lineage, and SLA monitoring.

data-warehouse

Use this skill for 大数据/数仓(DW)建设与运维:从数据接入到指标交付与治理。

Defaults / assumptions to confirm

  • Warehouse tech: BigQuery/Snowflake/Redshift/Hive/ClickHouse/etc.
  • Orchestration: Airflow/Dagster/Argo/dbt
  • Ingestion: CDC vs batch, streaming (Kafka) vs file
  • Data consumers: BI dashboards, product analytics, ML features

Core outputs

  • Warehouse architecture (sources → staging → warehouse → marts)
  • Data model (facts/dimensions) + metric definitions
  • Pipeline plan (DAGs, schedules, dependencies, SLAs)
  • Data quality plan (checks, thresholds, alerts)
  • Cost/performance plan (partitioning, clustering, materialization)
  • Governance plan (access control, PII handling, retention)

Workflow

  1. Understand the business questions
  • What decisions will this warehouse support?
  • Define critical metrics and their definitions (single source of truth).
  1. Source & ingestion design
  • Identify systems of record and ownership.
  • Choose ingestion: CDC for mutable OLTP, append-only logs for events.
  • Define late-arriving data strategy and backfills.
  1. Modeling (practical)
  • Prefer star schema for BI: Fact tables + Dimension tables.
  • Keep grain explicit (one row represents what?).
  • Separate raw/staging from curated models; avoid mixing.
  1. ETL/ELT pipelines
  • Define DAGs with clear inputs/outputs and idempotency.
  • Handle retries, partial failures, and reprocessing windows.
  • Provide backfill procedures and runbook.
  1. Data quality & observability
  • Validate freshness, volume, schema drift, null ratios, referential consistency (logical).
  • Add anomaly detection for key metrics.
  • Track pipeline success rate, duration, and SLA misses.
  1. Performance & cost
  • Partition by date/time for large facts; cluster by common filters/joins.
  • Materialize expensive queries (summary tables, incremental models).
  • Control scan cost with column pruning and predicate pushdown.
  1. Governance & security
  • PII classification, masking, and retention.
  • RBAC/ABAC for datasets; audit logging.
  • Document lineage and ownership for each table/model.

Templates

Table spec

  • Name:
  • Grain:
  • Partition/cluster keys:
  • Key columns:
  • Sources:
  • Refresh cadence:
  • Consumers:
  • Quality checks:
  • Owner:

Metric spec

  • Name:
  • Definition:
  • Numerator/denominator:
  • Filters:
  • Time window:
  • Known caveats: