Claude Code Plugins

Community-maintained marketplace

Feedback

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name healthsim-duckdb
description DuckDB-based unified data architecture for HealthSim. Provides persistent storage for canonical entities across all products, enabling cross-product queries and cohort management. Use for loading cohorts, saving cohorts, querying the canonical data model, and token-efficient cohort operations.
status Active
version 1.2

HealthSim DuckDB Skill

Overview

The HealthSim DuckDB skill provides persistent storage and query capabilities for all canonical entities generated across the HealthSim product suite. It serves as the operational data store for cohorts and enables cross-product analytics.

Status: Active
Schema Version: 1.2
Database: healthsim.duckdb (in workspace root)

Trigger Phrases

Cohort Management

  • "Load cohort into DuckDB"
  • "Save cohort to database"
  • "Persist these entities"
  • "Get summary of cohort"
  • "List my cohorts"

Querying

  • "Query the canonical model"
  • "Show patients with claims"
  • "Cross-product query for..."
  • "Query cohort for..."
  • "Show me [entities] where..."

Export/Import

  • "Export cohort to JSON"
  • "Import cohort from JSON"

Database Schema

Schema Version 1.2 Updates

As of version 1.2, all canonical tables include a cohort_id column that links entities directly to their source cohort. This enables efficient cohort-scoped queries without joining through cohort_entities.

Provenance Columns (All Canonical Tables)

Every canonical table includes these columns for provenance tracking:

cohort_id VARCHAR,        -- Links to cohorts.cohort_id (v1.2+)
source_type VARCHAR,        -- 'loaded', 'generated', 'derived'
source_system VARCHAR,      -- 'patientsim', 'membersim', etc.
skill_used VARCHAR,         -- Skill that guided generation
created_at TIMESTAMP,       -- When entity was created

Table Summary (41 Tables)

Layer Tables Description
Core persons, providers, facilities Shared entities
PatientSim patients, encounters, diagnoses, procedures, lab_results, medications, allergies, vitals Clinical/EMR
MemberSim members, accumulators, claims, claim_lines, authorizations Payer/Claims
RxMemberSim rx_members, prescriptions, pharmacy_claims, dur_alerts, pharmacies Pharmacy/PBM
TrialSim studies, sites, treatment_arms, subjects, adverse_events, visit_schedule, actual_visits, disposition_events Clinical Trials
PopulationSim geographic_entities, population_profiles, health_indicators, sdoh_indices, cohort_specifications Demographics/SDOH
NetworkSim networks, network_providers, network_facilities, provider_specialties Provider Networks
State Mgmt cohorts, cohort_entities, cohort_tags Cohort Tracking

Cross-Product Relationships

Person (Core) ─────────────────────────────────────────────────
    │
    ├── Patient (PatientSim) via person_id
    │       └── Encounter → Diagnosis, Procedure, Lab, Medication
    │
    ├── Member (MemberSim) via person_id
    │       └── Claim → ClaimLine
    │
    ├── RxMember (RxMemberSim) via person_id
    │       └── Prescription → PharmacyClaim
    │
    └── Subject (TrialSim) via person_id
            └── AdverseEvent, ActualVisit

Auto-Persist API

The auto-persist pattern provides token-efficient cohort management. Instead of returning all data after persist operations, it returns a compact summary (~500 tokens).

persist()

Persist entities to canonical tables with automatic naming.

from healthsim.state import persist

result = persist(
    entities={'patients': [...], 'encounters': [...]},
    context='diabetes cohort San Diego',  # Used for auto-naming
    tags=['diabetes', 'training']
)

# Returns PersistResult:
# - cohort_id: UUID
# - cohort_name: 'diabetes-cohort-20241227'
# - summary: CohortSummary (~500 tokens)
# - entity_ids: {'patients': [...], 'encounters': [...]}

get_summary()

Load cohort summary without full data (~500 tokens vs potentially 50K+).

from healthsim.state import get_summary

summary = get_summary('diabetes-cohort-20241227')

# Returns CohortSummary:
# - cohort_id, cohort_name
# - entity_counts: {'patients': 100, 'encounters': 250}
# - statistics: {'age_range': [35, 78], 'gender_distribution': {...}}
# - samples: {'patients': [3 sample patients], ...}

query_cohort()

Run SQL queries against cohort data with pagination.

from healthsim.state import query_cohort

results = query_cohort(
    'diabetes-cohort-20241227',
    "SELECT given_name, family_name, birth_date FROM patients WHERE gender = 'F'",
    limit=20,
    offset=0
)

# Returns QueryResult:
# - rows: List of result rows
# - columns: Column names
# - total_count: Total matching rows
# - page_size: Current page size
# - has_more: Whether more pages exist

Security: Only SELECT queries are allowed. Any attempt to use INSERT, UPDATE, DELETE, DROP, etc. will raise a ValueError.

Example Queries

1. Cohort-Scoped Query (v1.2+)

Query data from a specific cohort:

-- Find all female patients in a specific cohort
SELECT given_name, family_name, birth_date, gender
FROM patients
WHERE cohort_id = 'abc123-...'
  AND gender = 'female'
ORDER BY family_name;

2. Cross-Product Patient Journey

SELECT 
    per.given_name || ' ' || per.family_name as patient_name,
    m.plan_name,
    e.encounter_type,
    e.start_datetime,
    c.total_billed,
    c.total_patient_resp
FROM persons per
JOIN members m ON per.person_id = m.person_id
JOIN patients p ON per.person_id = p.person_id
JOIN encounters e ON p.patient_id = e.patient_id
JOIN claims c ON e.encounter_id = c.encounter_id
WHERE per.family_name = 'Thompson'
  AND per.cohort_id = 'abc123-...';

3. Trial Subject with Clinical History

SELECT 
    sub.usubjid,
    sub.arm_code,
    st.protocol_number,
    p.mrn,
    COUNT(DISTINCT e.encounter_id) as prior_encounters
FROM subjects sub
JOIN studies st ON sub.study_id = st.study_id
JOIN patients p ON sub.patient_id = p.patient_id
LEFT JOIN encounters e ON p.patient_id = e.patient_id
WHERE sub.cohort_id = 'abc123-...'
GROUP BY sub.usubjid, sub.arm_code, st.protocol_number, p.mrn;

4. Population Health by Geography

SELECT 
    g.county_name,
    g.state_name,
    h.pct_diabetes,
    h.pct_obesity,
    s.svi_overall
FROM geographic_entities g
JOIN health_indicators h ON g.geo_id = h.geo_id
JOIN sdoh_indices s ON g.geo_id = s.geo_id
WHERE h.pct_diabetes > 12
ORDER BY s.svi_overall DESC
LIMIT 10;

5. Pharmacy Claims with DUR Alerts

SELECT 
    rm.cardholder_id,
    pc.drug_name,
    pc.service_date,
    pc.patient_pay,
    da.dur_type,
    da.message
FROM rx_members rm
JOIN pharmacy_claims pc ON rm.rx_member_id = pc.rx_member_id
LEFT JOIN dur_alerts da ON pc.pharmacy_claim_id = da.pharmacy_claim_id
WHERE da.severity = '1'  -- Severe alerts
  AND rm.cohort_id = 'abc123-...';

6. Entity Statistics for Summary

-- Get entity counts for a cohort
SELECT 
    'patients' as entity_type,
    COUNT(*) as count
FROM patients
WHERE cohort_id = 'abc123-...'
UNION ALL
SELECT 'encounters', COUNT(*) FROM encounters WHERE cohort_id = 'abc123-...'
UNION ALL
SELECT 'claims', COUNT(*) FROM claims WHERE cohort_id = 'abc123-...';

Tools

Cohort Loader (tools/cohort_loader.py)

Load JSON cohort files into DuckDB.

python tools/cohort_loader.py <cohort_path> [--db healthsim.duckdb]

Cohort Saver (tools/cohort_saver.py)

Export cohorts from DuckDB to JSON files.

python tools/cohort_saver.py <cohort_name> [--output ./exports]

MCP Integration

Access via the healthsim-duckdb MCP server:

# Query the database
result = await mcp.query("SELECT * FROM persons LIMIT 10")

Entity Types (38 Total)

Product Entity Types
Core person, provider, facility
PatientSim patient, encounter, diagnosis, procedure, lab_result, medication, allergy, vital
MemberSim member, accumulator, claim, claim_line, authorization
RxMemberSim rx_member, prescription, pharmacy_claim, dur_alert, pharmacy
TrialSim study, site, treatment_arm, subject, adverse_event, visit_schedule, actual_visit, disposition_event
PopulationSim geographic_entity, population_profile, health_indicator, sdoh_index, cohort_specification
NetworkSim network, network_provider, network_facility, provider_specialty

Related Documentation

Validation Rules

Rule Description
person_id required All product entities must link to a person
cohort_id required All canonical entities must have cohort_id (v1.2+)
SSN format XXX-XX-XXXX for universal correlation
NPI format 10-digit with Luhn check digit
FIPS codes 2-digit state, 5-digit county, 11-digit tract
Date formats ISO 8601 (YYYY-MM-DD)
SELECT only Query API only allows SELECT statements