name	dagster-orchestration
description	ALWAYS USE when working with Dagster assets, resources, IO managers, schedules, or sensors. MUST be loaded before creating @asset decorators, ConfigurableResource classes, or orchestration code. Provides SDK research steps and integration patterns.

Dagster Orchestration Development (Research-Driven)

Philosophy

This skill does NOT prescribe specific implementation patterns. Instead, it guides you to:

Research the current Dagster version and API capabilities
Discover existing asset definitions and orchestration patterns in the codebase
Validate your implementations against Dagster SDK documentation
Verify integration with dbt, Iceberg, and other components

Pre-Implementation Research Protocol

Step 1: Verify Runtime Environment

ALWAYS run this first:

python -c "import dagster; print(f'Dagster {dagster.__version__}')"

Critical Questions to Answer:

What version is installed?
Does it support Python 3.9-3.13? (check compatibility)
Are there breaking changes from previous versions?

Step 2: Research SDK State (if unfamiliar)

When to research: If you encounter unfamiliar Dagster features or need to validate patterns

Research queries (use WebSearch):

"Dagster software-defined assets documentation 2025"
"Dagster [feature] Python API 2025" (e.g., "Dagster IO managers Python API 2025")
"Dagster dbt integration best practices 2025"

Official documentation: https://docs.dagster.io

Key API documentation:

Assets: https://docs.dagster.io/api/dagster/assets
Resources: https://docs.dagster.io/api/dagster/resources
IO Managers: https://docs.dagster.io/api/python-api/io-managers
Definitions: https://docs.dagster.io/api/python-api/definitions

Step 3: Discover Existing Patterns

BEFORE creating new assets or resources, search for existing implementations:

# Find existing asset definitions
rg "@asset|@multi_asset|@graph_asset" --type py

# Find resource definitions
rg "ConfigurableResource|ResourceDefinition" --type py

# Find IO manager definitions
rg "IOManager|io_manager_key" --type py

# Find schedules and sensors
rg "@schedule|@sensor|ScheduleDefinition|SensorDefinition" --type py

Key questions:

What asset patterns are already in use?
How are resources configured?
What IO managers exist?
Are there existing dbt integrations?

Step 4: Validate Against Architecture

Check architecture docs for integration requirements:

Read /docs/ for CompiledArtifacts contract requirements
Understand how Dagster consumes CompiledArtifacts
Verify dbt integration patterns (dagster-dbt library)
Check observability requirements (OpenTelemetry, OpenLineage)

Implementation Guidance (Not Prescriptive)

Software-Defined Assets

Core concept: Assets represent logical data units (tables, datasets, ML models)

Research questions:

What data products need to be materialized?
What are the asset dependencies (data lineage)?
How should assets be partitioned (time-based, dimension-based)?
What metadata should be attached to assets?

SDK features to research:

@asset decorator: Basic asset definition
@multi_asset: Assets with shared computation
@graph_asset: Assets composed of ops
Asset dependencies: deps parameter, AssetIn
Asset metadata: metadata parameter, MetadataValue
Partitions: PartitionsDefinition, DailyPartitionsDefinition

Resources and Configuration

Core concept: Resources provide external services to assets (databases, APIs, file systems)

Research questions:

What external services do assets need? (databases, S3, APIs)
How should resources be configured? (environment variables, secrets)
Are resources shared across multiple assets?
What resource lifecycle management is needed?

SDK features to research:

ConfigurableResource: Structured config base class
ResourceDefinition: Raw resource definitions
with_resources(): Binding resources to assets
Resource context: ResourceContext, InitResourceContext

IO Managers

Core concept: IO managers control how asset data is stored and retrieved

Research questions:

Where should asset outputs be stored? (filesystem, database, object storage)
What format should be used? (Parquet, CSV, Iceberg tables)
How should data be partitioned on storage?
Are there performance considerations for large datasets?

SDK features to research:

IOManager interface: handle_output(), load_input()
io_manager_key: Attaching IO managers to assets
IOManagerDefinition: Configuring IO managers
Built-in IO managers: FilesystemIOManager, InMemoryIOManager
Third-party IO managers: dagster-aws, dagster-gcp, dagster-iceberg

Schedules and Sensors

Core concept: Automate asset materialization based on time or events

Research questions:

What assets should run on a schedule? (daily ETL, hourly refresh)
What events should trigger asset materialization? (file uploads, API events)
How should failures be handled?
What notifications are needed?

SDK features to research:

@schedule: Time-based automation
@sensor: Event-based automation
@asset_sensor: Monitor asset materializations
RunRequest: Triggering asset runs
RunConfig: Parameterizing runs

dbt Integration

Core concept: Dagster assets can be created from dbt models

Research questions:

How should dbt models be represented as Dagster assets?
What dbt commands should be run? (compile, run, test, build)
How should dbt artifacts (manifest.json) be loaded?
What metadata from dbt should be propagated?

SDK features to research:

dagster-dbt library: dbt integration utilities
load_assets_from_dbt_project(): Generate assets from dbt
load_assets_from_dbt_manifest(): Generate assets from manifest.json
DbtCliResource: Execute dbt CLI commands
dbt asset metadata: materialize_to_messages(), lineage extraction

Validation Workflow

Before Implementation

✅ Verified Dagster version
✅ Searched for existing asset/resource patterns in codebase
✅ Read architecture docs to understand CompiledArtifacts contract
✅ Identified integration requirements (dbt, Iceberg, observability)
✅ Researched unfamiliar Dagster features

During Implementation

✅ Using @asset decorator for asset definitions
✅ Type hints on ALL functions and parameters
✅ Proper resource binding with with_resources()
✅ IO managers configured for storage requirements
✅ Asset dependencies correctly specified
✅ Metadata attached to assets for observability

After Implementation

✅ Run Dagster development server: dagster dev
✅ Verify assets appear in Dagster UI
✅ Test asset materialization manually
✅ Verify data lineage in UI
✅ Test schedules/sensors trigger correctly
✅ Check OpenTelemetry/OpenLineage integration

Context Injection (For Future Claude Instances)

When this skill is invoked, you should:

Verify runtime state (don't assume):

python -c "import dagster; print(dagster.__version__)"
dagster --version

Discover existing patterns (don't invent):

rg "@asset" --type py
rg "ConfigurableResource" --type py

Research when uncertain (don't guess):
- Use WebSearch for "Dagster [feature] documentation 2025"
- Check official docs: https://docs.dagster.io
Validate against architecture (don't assume requirements):
- Read relevant architecture docs in /docs/
- Understand CompiledArtifacts contract (input to Dagster)
- Check for existing Dagster Definitions
Check dbt integration (if working with dbt assets):
- Verify dagster-dbt is installed
- Check for existing dbt project paths
- Understand dbt manifest.json location

Quick Reference: Common Research Queries

Use these WebSearch queries when encountering specific needs:

Assets: "Dagster software-defined assets examples 2025"
Resources: "Dagster ConfigurableResource best practices 2025"
IO Managers: "Dagster IOManager custom implementation 2025"
dbt integration: "Dagster dbt integration load_assets_from_dbt_project 2025"
Partitions: "Dagster partitions DailyPartitionsDefinition examples 2025"
Schedules: "Dagster schedule decorator cron syntax 2025"
Sensors: "Dagster sensor asset_sensor examples 2025"
Metadata: "Dagster asset metadata MetadataValue 2025"
Observability: "Dagster OpenTelemetry OpenLineage integration 2025"

Integration Points to Research

CompiledArtifacts → Dagster Assets

Key question: How does Dagster consume CompiledArtifacts?

Research areas:

Where are CompiledArtifacts loaded from? (file path, environment variable)
How are dbt models converted to Dagster assets?
How are compute targets (from CompiledArtifacts) mapped to Dagster resources?
What metadata from CompiledArtifacts propagates to asset metadata?

dbt → Dagster Integration

Key question: How are dbt models represented as Dagster assets?

Research areas:

load_assets_from_dbt_manifest() vs load_assets_from_dbt_project()
dbt profile configuration (connection to compute targets)
dbt command execution (DbtCliResource)
dbt test results as Dagster asset checks
dbt metadata extraction (classifications, lineage)

Iceberg → Dagster IO Managers

Key question: How should Iceberg tables be read/written by Dagster?

Research areas:

Custom IOManager for Iceberg tables
PyIceberg integration with Dagster
Polaris catalog integration
Partition mapping (Dagster partitions → Iceberg partitions)

Observability Integration

Key question: How should Dagster emit telemetry and lineage?

Research areas:

OpenTelemetry exporter configuration
OpenLineage integration (dagster-openlineage)
Custom metadata extraction
Trace context propagation

Dagster Development Workflow

Local Development

# Install Dagster
pip install dagster dagster-webserver dagster-dbt

# Verify installation
dagster --version

# Run development server
dagster dev

# Access UI at http://localhost:3000

Testing Assets

# Materialize specific asset
dagster asset materialize --select asset_name

# Materialize all assets
dagster asset materialize

# Run tests
pytest tests/

References

Dagster Documentation: Official documentation
Software-Defined Assets: Core concept guide
dagster-dbt Integration: dbt integration library
API Reference - Assets: Asset decorators and classes
API Reference - Resources: Resource definitions
API Reference - IO Managers: IO manager interfaces

Remember: This skill provides research guidance, NOT prescriptive patterns. Always:

Verify the runtime environment and Dagster version
Discover existing codebase patterns for assets and resources
Research SDK capabilities when needed (use WebSearch liberally)
Validate against actual CompiledArtifacts contract requirements
Test in Dagster UI before considering implementation complete

dagster-orchestration

Install Skill

SKILL.md

Dagster Orchestration Development (Research-Driven)

Philosophy

Pre-Implementation Research Protocol

Step 1: Verify Runtime Environment

Step 2: Research SDK State (if unfamiliar)

Step 3: Discover Existing Patterns

Step 4: Validate Against Architecture

Implementation Guidance (Not Prescriptive)

Software-Defined Assets

Resources and Configuration

IO Managers

Schedules and Sensors

dbt Integration

Validation Workflow

Before Implementation

During Implementation

After Implementation

Context Injection (For Future Claude Instances)

Quick Reference: Common Research Queries

Integration Points to Research

CompiledArtifacts → Dagster Assets

dbt → Dagster Integration

Iceberg → Dagster IO Managers

Observability Integration

Dagster Development Workflow

Local Development

Testing Assets

References