name	dagu-workflows
description	Guide for authoring Dagu workflows including YAML syntax, steps, executors, scheduling, dependencies, and workflow composition

Dagu Workflow Authoring

This skill activates when creating or modifying Dagu workflow definitions, configuring workflow steps, scheduling, or composing complex workflows.

When to Use This Skill

Activate when:

Writing Dagu workflow YAML files
Configuring workflow steps and executors
Setting up workflow scheduling with cron
Defining step dependencies and data flow
Implementing error handling and retries
Composing hierarchical workflows
Using environment variables and parameters

Basic Workflow Structure

Minimal Workflow

# hello.yaml
steps:
  - name: hello
    command: echo "Hello from Dagu!"

Complete Workflow Structure

name: my_workflow
description: Description of what this workflow does

# Schedule (optional)
schedule: "0 2 * * *"  # Cron format: daily at 2 AM

# Environment variables
env:
  - KEY: value
  - DB_HOST: localhost

# Parameters
params: ENVIRONMENT=production

# Email notifications (optional)
mailOn:
  failure: true
  success: false

smtp:
  host: smtp.example.com
  port: 587

errorMail:
  from: dagu@example.com
  to: alerts@example.com

# Workflow steps
steps:
  - name: step1
    command: echo "First step"

  - name: step2
    command: echo "Second step"
    depends:
      - step1

Steps

Basic Step

steps:
  - name: greet
    command: echo "Hello, World!"

Step with Script

steps:
  - name: process
    command: |
      echo "Starting processing..."
      ./scripts/process.sh
      echo "Done!"

Step with Working Directory

steps:
  - name: build
    dir: /path/to/project
    command: make build

Step with Environment Variables

steps:
  - name: deploy
    env:
      - ENVIRONMENT: production
      - API_KEY: $API_KEY  # From global env
    command: ./deploy.sh

Executors

Command Executor (Default)

steps:
  - name: shell_command
    command: ./script.sh

Docker Executor

steps:
  - name: run_in_container
    executor:
      type: docker
      config:
        image: alpine:latest
    command: echo "Running in Docker"

  - name: with_volumes
    executor:
      type: docker
      config:
        image: node:18
        volumes:
          - /host/path:/container/path
        env:
          - NODE_ENV=production
    command: npm run build

SSH Executor

steps:
  - name: remote_execution
    executor:
      type: ssh
      config:
        user: deploy
        host: server.example.com
        key: /path/to/ssh/key
    command: ./remote_script.sh

HTTP Executor

steps:
  - name: api_call
    executor:
      type: http
      config:
        method: POST
        url: https://api.example.com/webhook
        headers:
          Content-Type: application/json
          Authorization: Bearer $API_TOKEN
        body: |
          {
            "event": "workflow_complete",
            "timestamp": "{{.timestamp}}"
          }

Mail Executor

steps:
  - name: send_notification
    executor:
      type: mail
      config:
        to: user@example.com
        from: dagu@example.com
        subject: Workflow Complete
        message: |
          The workflow has completed successfully.
          Time: {{.timestamp}}

JQ Executor

steps:
  - name: transform_json
    executor:
      type: jq
      config:
        query: '.users[] | select(.active == true) | .email'
    command: cat users.json

Step Dependencies

Simple Dependencies

steps:
  - name: download
    command: wget https://example.com/data.zip

  - name: extract
    depends:
      - download
    command: unzip data.zip

  - name: process
    depends:
      - extract
    command: ./process.sh

Multiple Dependencies

steps:
  - name: fetch_data
    command: ./fetch.sh

  - name: fetch_config
    command: ./fetch_config.sh

  - name: process
    depends:
      - fetch_data
      - fetch_config
    command: ./process.sh

Parallel Execution

# These run in parallel (no dependencies)
steps:
  - name: task1
    command: ./task1.sh

  - name: task2
    command: ./task2.sh

  - name: task3
    command: ./task3.sh

  # This waits for all above to complete
  - name: finalize
    depends:
      - task1
      - task2
      - task3
    command: ./finalize.sh

Conditional Execution

Preconditions

steps:
  - name: deploy_production
    preconditions:
      - condition: "`echo $ENVIRONMENT`"
        expected: "production"
    command: ./deploy.sh

Continue On Failure

steps:
  - name: optional_step
    continueOn:
      failure: true
    command: ./might_fail.sh

  - name: cleanup
    depends:
      - optional_step
    command: ./cleanup.sh  # Runs even if optional_step fails

Error Handling and Retries

Retry Configuration

steps:
  - name: flaky_api_call
    command: curl https://api.example.com/data
    retryPolicy:
      limit: 3
      intervalSec: 10

Exponential Backoff

steps:
  - name: with_backoff
    command: ./external_api.sh
    retryPolicy:
      limit: 5
      intervalSec: 5
      exponentialBackoff: true  # 5s, 10s, 20s, 40s, 80s

Signal on Stop

steps:
  - name: graceful_shutdown
    command: ./long_running_process.sh
    signalOnStop: SIGTERM  # Send SIGTERM instead of SIGKILL

Data Flow

Output Variables

steps:
  - name: generate_id
    command: echo "ID_$(date +%s)"
    output: PROCESS_ID

  - name: use_id
    depends:
      - generate_id
    command: echo "Processing with ID: $PROCESS_ID"

Script Output

steps:
  - name: get_config
    script: |
      #!/bin/bash
      export DB_HOST="localhost"
      export DB_PORT="5432"
    output: DB_CONFIG

  - name: connect
    depends:
      - get_config
    command: ./connect.sh $DB_HOST $DB_PORT

Scheduling

Cron Schedule

# Daily at 2 AM
schedule: "0 2 * * *"

# Every Monday at 9 AM
schedule: "0 9 * * 1"

# Every 15 minutes
schedule: "*/15 * * * *"

# First day of month at midnight
schedule: "0 0 1 * *"

Start/Stop Times

# Only run during business hours
schedule:
  start: "2024-01-01"
  end: "2024-12-31"
  cron: "0 9-17 * * 1-5"  # Mon-Fri, 9 AM to 5 PM

Environment Variables

Global Environment

env:
  - ENVIRONMENT: production
  - LOG_LEVEL: info
  - API_URL: https://api.example.com

steps:
  - name: use_env
    command: echo "Environment: $ENVIRONMENT"

Step-Level Environment

steps:
  - name: with_custom_env
    env:
      - CUSTOM_VAR: value
      - OVERRIDE: step_value
    command: ./script.sh

Environment from File

env:
  - .env  # Load from .env file

steps:
  - name: use_env_file
    command: echo "DB_HOST: $DB_HOST"

Parameters

Defining Parameters

params: ENVIRONMENT=development VERSION=1.0.0

steps:
  - name: deploy
    command: ./deploy.sh $ENVIRONMENT $VERSION

Using Parameters

# Run with default parameters
dagu start workflow.yaml

# Override parameters
dagu start workflow.yaml ENVIRONMENT=production VERSION=2.0.0

Sub-Workflows

Calling Sub-Workflows

# main.yaml
steps:
  - name: run_sub_workflow
    run: sub_workflow.yaml
    params: PARAM=value

  - name: another_sub
    run: workflows/another.yaml

Hierarchical Workflows

# orchestrator.yaml
steps:
  - name: data_ingestion
    run: workflows/ingest.yaml

  - name: data_processing
    depends:
      - data_ingestion
    run: workflows/process.yaml

  - name: data_export
    depends:
      - data_processing
    run: workflows/export.yaml

Handlers

Cleanup Handler

handlerOn:
  exit:
    - name: cleanup
      command: ./cleanup.sh

steps:
  - name: main_task
    command: ./task.sh

Error Handler

handlerOn:
  failure:
    - name: send_alert
      executor:
        type: mail
        config:
          to: alerts@example.com
          subject: "Workflow Failed"
          message: "Workflow {{.Name}} failed at {{.timestamp}}"

steps:
  - name: risky_operation
    command: ./operation.sh

Success Handler

handlerOn:
  success:
    - name: notify_success
      command: ./notify.sh "Workflow completed successfully"

steps:
  - name: task
    command: ./task.sh

Templates and Variables

Built-in Variables

steps:
  - name: use_variables
    command: |
      echo "Workflow: {{.Name}}"
      echo "Step: {{.Step.Name}}"
      echo "Timestamp: {{.timestamp}}"
      echo "Request ID: {{.requestId}}"

Custom Templates

params: USER=alice

steps:
  - name: templated
    command: echo "Hello, {{.Params.USER}}!"

Common Patterns

ETL Pipeline

name: etl_pipeline
description: Extract, Transform, Load data pipeline

schedule: "0 2 * * *"  # Daily at 2 AM

env:
  - DATA_SOURCE: s3://bucket/data
  - TARGET_DB: postgresql://localhost/warehouse

steps:
  - name: extract
    command: ./extract.sh $DATA_SOURCE
    output: EXTRACTED_FILE

  - name: transform
    depends:
      - extract
    command: ./transform.sh $EXTRACTED_FILE
    output: TRANSFORMED_FILE

  - name: load
    depends:
      - transform
    command: ./load.sh $TRANSFORMED_FILE $TARGET_DB

  - name: cleanup
    depends:
      - load
    command: rm -f $EXTRACTED_FILE $TRANSFORMED_FILE

handlerOn:
  failure:
    - name: alert
      executor:
        type: mail
        config:
          to: data-team@example.com
          subject: "ETL Pipeline Failed"

Multi-Environment Deployment

name: deploy
description: Deploy application to multiple environments

params: ENVIRONMENT=staging VERSION=latest

steps:
  - name: build
    command: docker build -t app:$VERSION .

  - name: test
    depends:
      - build
    command: docker run app:$VERSION npm test

  - name: deploy_staging
    depends:
      - test
    preconditions:
      - condition: "`echo $ENVIRONMENT`"
        expected: "staging"
    executor:
      type: ssh
      config:
        user: deploy
        host: staging.example.com
    command: ./deploy.sh $VERSION

  - name: deploy_production
    depends:
      - test
    preconditions:
      - condition: "`echo $ENVIRONMENT`"
        expected: "production"
    executor:
      type: ssh
      config:
        user: deploy
        host: prod.example.com
    command: ./deploy.sh $VERSION

Data Backup Workflow

name: database_backup
description: Automated database backup workflow

schedule: "0 3 * * *"  # Daily at 3 AM

env:
  - DB_HOST: localhost
  - DB_NAME: myapp
  - BACKUP_DIR: /backups
  - S3_BUCKET: s3://backups/db

steps:
  - name: create_backup
    command: |
      TIMESTAMP=$(date +%Y%m%d_%H%M%S)
      pg_dump -h $DB_HOST $DB_NAME > $BACKUP_DIR/backup_$TIMESTAMP.sql
      echo "backup_$TIMESTAMP.sql"
    output: BACKUP_FILE

  - name: compress
    depends:
      - create_backup
    command: gzip $BACKUP_DIR/$BACKUP_FILE
    output: COMPRESSED_FILE

  - name: upload_to_s3
    depends:
      - compress
    command: aws s3 cp $BACKUP_DIR/$COMPRESSED_FILE.gz $S3_BUCKET/

  - name: cleanup_old_backups
    depends:
      - upload_to_s3
    command: |
      find $BACKUP_DIR -name "*.sql.gz" -mtime +30 -delete
      aws s3 ls $S3_BUCKET/ | awk '{print $4}' | head -n -30 | xargs -I {} aws s3 rm $S3_BUCKET/{}

handlerOn:
  failure:
    - name: alert_failure
      executor:
        type: mail
        config:
          to: dba@example.com
          subject: "Backup Failed"
  success:
    - name: log_success
      command: echo "Backup completed at $(date)" >> /var/log/backups.log

Monitoring and Alerts

name: health_check
description: Monitor services and send alerts

schedule: "*/5 * * * *"  # Every 5 minutes

steps:
  - name: check_web_service
    command: curl -f https://app.example.com/health
    retryPolicy:
      limit: 3
      intervalSec: 10
    continueOn:
      failure: true

  - name: check_api_service
    command: curl -f https://api.example.com/health
    retryPolicy:
      limit: 3
      intervalSec: 10
    continueOn:
      failure: true

  - name: check_database
    command: pg_isready -h db.example.com
    continueOn:
      failure: true

handlerOn:
  failure:
    - name: alert_on_failure
      executor:
        type: http
        config:
          method: POST
          url: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
          headers:
            Content-Type: application/json
          body: |
            {
              "text": "⚠️ Service health check failed",
              "attachments": [{
                "color": "danger",
                "fields": [
                  {"title": "Workflow", "value": "{{.Name}}", "short": true},
                  {"title": "Time", "value": "{{.timestamp}}", "short": true}
                ]
              }]
            }

Best Practices

Workflow Organization

# Good: Clear, descriptive names
name: user_data_sync
description: Synchronize user data from CRM to database

# Good: Logical step names
steps:
  - name: fetch_from_crm
  - name: validate_data
  - name: update_database

# Avoid: Generic names
name: workflow1
steps:
  - name: step1
  - name: step2

Error Handling

# Always define error handlers for critical workflows
handlerOn:
  failure:
    - name: cleanup
      command: ./cleanup.sh
    - name: notify
      executor:
        type: mail
        config:
          to: team@example.com

# Use retries for flaky operations
steps:
  - name: api_call
    command: curl https://api.example.com
    retryPolicy:
      limit: 3
      intervalSec: 5
      exponentialBackoff: true

Environment Management

# Use parameters for environment-specific values
params: ENVIRONMENT=development

# Load environment from files
env:
  - config/$ENVIRONMENT.env

# Override in production
# dagu start workflow.yaml ENVIRONMENT=production

Modular Workflows

# Break complex workflows into sub-workflows
steps:
  - name: data_ingestion
    run: workflows/ingestion.yaml

  - name: data_transformation
    run: workflows/transformation.yaml
    depends:
      - data_ingestion

Key Principles

Keep workflows focused: One workflow per logical task
Use dependencies wisely: Parallelize when possible
Handle errors explicitly: Define failure handlers
Use retries for flaky operations: Network calls, external APIs
Parameterize configurations: Make workflows reusable
Document workflows: Add clear names and descriptions
Test workflows: Start with small, focused workflows
Monitor and alert: Use handlers to track workflow health

Install Skill

SKILL.md

Dagu Workflow Authoring

When to Use This Skill

Basic Workflow Structure

Minimal Workflow

Complete Workflow Structure

Steps

Basic Step

Step with Script

Step with Working Directory

Step with Environment Variables

Executors

Command Executor (Default)

Docker Executor

SSH Executor

HTTP Executor

Mail Executor

JQ Executor

Step Dependencies

Simple Dependencies

Multiple Dependencies

Parallel Execution

Conditional Execution

Preconditions

Continue On Failure

Error Handling and Retries

Retry Configuration

Exponential Backoff

Signal on Stop

Data Flow

Output Variables

Script Output

Scheduling

Cron Schedule

Start/Stop Times

Environment Variables

Global Environment

Step-Level Environment

Environment from File

Parameters

Defining Parameters

Using Parameters

Sub-Workflows

Calling Sub-Workflows

Hierarchical Workflows

Handlers

Cleanup Handler

Error Handler

Success Handler

Templates and Variables

Built-in Variables

Custom Templates

Common Patterns

ETL Pipeline

Multi-Environment Deployment

Data Backup Workflow

Monitoring and Alerts

Best Practices

Workflow Organization

Error Handling

Environment Management

Modular Workflows

Key Principles