Claude Code Plugins

Community-maintained marketplace

Feedback

REQUIRED Phase 2 of /ds workflow. Profiles data and creates analysis task breakdown.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name ds-plan
version 1
description REQUIRED Phase 2 of /ds workflow. Profiles data and creates analysis task breakdown.

Announce: "Using ds-plan (Phase 2) to profile data and create task breakdown."

Contents

Planning (Data Profiling + Task Breakdown)

Profile the data and create an analysis plan based on the spec. Requires .claude/SPEC.md from /ds-brainstorm first.

## The Iron Law of DS Planning

SPEC MUST EXIST BEFORE PLANNING. This is not negotiable.

Before exploring data or creating tasks, you MUST have:

  1. .claude/SPEC.md with objectives and constraints
  2. Clear success criteria
  3. User-approved spec

If .claude/SPEC.md doesn't exist, run /ds-brainstorm first.

Rationalization Table - STOP If You Think:

Excuse Reality Do Instead
"Data looks clean, profiling unnecessary" Your data is never clean PROFILE to discover issues
"I can profile as I go" You'll miss systemic issues PROFILE comprehensively NOW
"Quick .head() is enough" Your head hides tail problems RUN full profiling checklist
"Missing values won't affect my analysis" They always do DOCUMENT and plan handling
"I'll handle data issues during analysis" Your issues will derail your analysis FIX data issues FIRST
"User didn't mention data quality" They assume YOU'LL check QUALITY check is YOUR job
"Profiling takes too long" Your skipping it costs days later INVEST time now

Honesty Framing

Creating an analysis plan without profiling the data is LYING about understanding the data.

You cannot plan analysis steps without knowing:

  • Your data's shape and types
  • Your missing value patterns
  • Your data quality issues
  • Your cleaning requirements

Profiling costs you minutes. Your wrong plan costs hours of rework and incorrect results.

No Pause After Completion

After writing .claude/PLAN.md, IMMEDIATELY invoke:

Skill(skill="workflows:ds-implement")

DO NOT:

  • Ask "should I proceed with implementation?"
  • Summarize the plan
  • Wait for user confirmation (they approved SPEC already)
  • Write status updates

The workflow phases are SEQUENTIAL. Complete plan → immediately start implement.

What Plan Does

DO DON'T
Read .claude/SPEC.md Skip brainstorm phase
Profile data (shape, types, stats) Skip to analysis
Identify data quality issues Ignore missing/duplicate data
Create ordered task list Write final analysis code
Write .claude/PLAN.md Make completion claims

Brainstorm answers: WHAT and WHY Plan answers: HOW and DATA QUALITY

Process

1. Verify Spec Exists

cat .claude/SPEC.md  # verify-spec: read SPEC file to confirm it exists

If missing, stop and run /ds-brainstorm first.

2. Data Profiling

For multiple data sources: Profile in parallel using background Task agents.

Single Data Source (Direct Profiling)

MANDATORY profiling steps:

import pandas as pd

# Basic structure
df.shape                    # (rows, columns)
df.dtypes                   # Column types
df.head(10)                 # Sample data
df.tail(5)                  # End of data

# Summary statistics
df.describe()               # Numeric summaries
df.describe(include='object')  # Categorical summaries
df.info()                   # Memory, non-null counts

# Data quality checks
df.isnull().sum()           # Missing values per column
df.duplicated().sum()       # Duplicate rows
df[col].value_counts()      # Distribution of categories

# For time series
df[date_col].min(), df[date_col].max()  # Date range
df.groupby(date_col).size()              # Records per period

Multiple Data Sources (Parallel Profiling)

**Pattern from oh-my-opencode: Launch ALL profiling agents in a SINGLE message.**

Use run_in_background: true for parallel execution.

When profiling 2+ data sources, launch agents in parallel:

# PARALLEL + BACKGROUND: All Task calls in ONE message

Task(
    subagent_type="general-purpose",
    description="Profile dataset 1",
    run_in_background=true,
    prompt="""
Profile this dataset and return a data quality report.

Dataset: /path/to/dataset1.csv

Required checks:
1. Shape: rows x columns
2. Data types: df.dtypes
3. Missing values: df.isnull().sum()
4. Duplicates: df.duplicated().sum()
5. Summary statistics: df.describe()
6. Unique value counts for categorical columns
7. Date range if time series
8. Memory usage: df.info()

Output format:
- Markdown table with column summary
- List of data quality issues found
- Recommendations for cleaning

Tools denied: Write, Edit, NotebookEdit (read-only profiling)
""")

Task(
    subagent_type="general-purpose",
    description="Profile dataset 2",
    run_in_background=true,
    prompt="""
[Same template for dataset 2]
""")

Task(
    subagent_type="general-purpose",
    description="Profile dataset 3",
    run_in_background=true,
    prompt="""
[Same template for dataset 3]
""")

After launching agents:

  • Continue to other work (don't wait)
  • Check status with /tasks command
  • Collect results with TaskOutput when ready
# Collect profiling results
TaskOutput(task_id="task-abc123", block=true, timeout=30000)
TaskOutput(task_id="task-def456", block=true, timeout=30000)
TaskOutput(task_id="task-ghi789", block=true, timeout=30000)

Benefits:

  • 3x faster profiling for 3 datasets
  • Each agent focused on single source
  • Results consolidated in main chat

3. Identify Data Quality Issues

CRITICAL: Document ALL issues before proceeding:

Check What to Look For
Missing values Null counts, patterns of missingness
Duplicates Exact duplicates, key-based duplicates
Outliers Extreme values, impossible values
Type issues Strings in numeric columns, date parsing
Cardinality Unexpected unique values
Distribution Skewness, unexpected patterns

4. Create Task Breakdown

Break analysis into ordered tasks:

  • Each task should produce visible output
  • Order by data dependencies
  • Include data cleaning tasks FIRST

5. Write Plan Doc

Write to .claude/PLAN.md:

# Analysis Plan: [Analysis Name]

> **For Claude:** REQUIRED SUB-SKILL: Use `Skill(skill="workflows:ds-implement")` to implement this plan with output-first verification.
>
> **Delegation:** Main chat orchestrates, Task agents implement. Use `Skill(skill="workflows:ds-delegate")` for subagent templates.

## Spec Reference
See: .claude/SPEC.md

## Data Profile

### Source 1: [name]
- Location: [path/connection]
- Shape: [rows] x [columns]
- Date range: [start] to [end]
- Key columns: [list]

#### Column Summary
| Column | Type | Non-null | Unique | Notes |
|--------|------|----------|--------|-------|
| col1 | int64 | 100% | 50 | Primary key |
| col2 | object | 95% | 10 | Category |

#### Data Quality Issues
- [ ] Missing: col2 has 5% nulls - [strategy: drop/impute/flag]
- [ ] Duplicates: 100 duplicate rows on [key] - [strategy]
- [ ] Outliers: col3 has values > 1000 - [strategy]

### Source 2: [name]
[Same structure]

## Task Breakdown

### Task 1: Data Cleaning (required first)
- Handle missing values in col2
- Remove duplicates
- Fix data types
- Output: Clean DataFrame, log of rows removed

### Task 2: [Analysis Step]
- Input: Clean DataFrame
- Process: [description]
- Output: [specific output to verify]
- Dependencies: Task 1

### Task 3: [Next Step]
[Same structure]

## Output Verification Plan
For each task, define what output proves completion:
- Task 1: "X rows cleaned, Y rows dropped"
- Task 2: "Visualization showing [pattern]"
- Task 3: "Model accuracy >= 0.8"

## Reproducibility Requirements
- Random seed: [value if needed]
- Package versions: [key packages]
- Data snapshot: [date/version]

Red Flags - STOP If You're About To:

Action Why It's Wrong Do Instead
Skip data profiling Your data issues will break your analysis Always profile first
Ignore missing values You'll corrupt your results Document and plan handling
Start analysis immediately You haven't characterized your data Complete profiling
Assume your data is clean Never assume, you must verify Run quality checks

Output

Complete the plan when:

  • Read and understand .claude/SPEC.md
  • Profile all data sources (shape, types, stats)
  • Document data quality issues
  • Define cleaning strategy for each issue
  • Order tasks by dependency
  • Define output verification criteria
  • Write .claude/PLAN.md
  • Confirm ready for implementation

Phase Complete

REQUIRED SUB-SKILL: After completing plan, IMMEDIATELY invoke:

Skill(skill="workflows:ds-implement")