Claude Code Plugins

Community-maintained marketplace

Feedback

subquery-patterns-and-union

@rustomax/observe-community-mcp
1
0

Use OPAL subquery syntax (@labels) and union operations to combine multiple datasets or time periods. Essential for period-over-period comparisons, multi-dataset analysis, and complex data transformations. Covers @label <- @ syntax, timeshift for temporal shifts, union for combining results, and any_not_null() for collapsing grouped data.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name subquery-patterns-and-union
description Use OPAL subquery syntax (@labels) and union operations to combine multiple datasets or time periods. Essential for period-over-period comparisons, multi-dataset analysis, and complex data transformations. Covers @label <- @ syntax, timeshift for temporal shifts, union for combining results, and any_not_null() for collapsing grouped data.

Subquery Patterns and Union Operations

Overview

OPAL subqueries using @label syntax enable powerful multi-dataset and multi-period analysis. This skill covers:

  • Subquery syntax with @label <- @dataset
  • Union operations to combine multiple result sets
  • Timeshift for period-over-period comparisons
  • Best practices for complex data transformations

When to Use This Skill

Use subqueries and union when you need to:

  • Period-over-period comparison: Compare current vs previous hour/day/week (metrics or events)
  • Time-series comparison: Chart trends over time with period-over-period data
  • Complex transformations: Build intermediate results for multi-stage calculations (e.g., SLO tracking, error budgets)

Core Concepts

Subquery Syntax

# Basic pattern
@label <- @dataset_reference {
    # OPAL pipeline
}

# Reference primary input
@current <- @ {
    # Process primary dataset
}

# Reference named dataset (requires dataset_aliases parameter)
@other <- @dataset_name {
    # Process other dataset
}

# Use subquery results
<- @label {
    # Continue processing
}

Key Points:

  • @ alone references the primary input dataset
  • @dataset_name references a named dataset via aliases
  • @label creates a reusable intermediate result
  • <- @label continues the pipeline from that subquery

Union Operation

Union combines multiple result sets with matching columns:

@set_a <- @ {
    # First result set
}

@set_b <- @ {
    # Second result set with same column structure
}

<- @set_a {
    union @set_b
    # Combined results
}

Important:

  • Union requires matching column names
  • Rows from both sources appear in output
  • Use any_not_null() to collapse sparse union results
  • Union happens AFTER aggregation, not before

Timeshift Verb

Timeshift moves row timestamps forward (positive) or backward (negative):

timeshift 1h          # Move 1 hour forward
timeshift -1d         # Move 1 day backward
timeshift 30m         # Move 30 minutes forward

Critical Rule: Apply timeshift BEFORE align when working with metrics!

# CORRECT
timeshift 1h
align rate:sum(m("metric"))

# WRONG
align rate:sum(m("metric"))
timeshift 1h  # Too late! Align already processed time buckets

Pattern 1: Period-Over-Period Comparison (Metrics)

Use Case: Compare current metrics to previous period (hour, day, week)

Strategy:

  1. Create @current subquery with current period aggregation
  2. Create @previous subquery with timeshift + same aggregation
  3. Union both, then collapse with any_not_null()
  4. Calculate change and percentage change

Example: Compare Current Hour vs Previous Hour

# Current period (last 1h)
@current <- @ {
    align rate:sum(m("span_call_count_5m"))
    aggregate current_sum:sum(rate), group_by(service_name)
}

# Previous period (1h before that)
@previous <- @ {
    timeshift 1h                        # Shift BEFORE align!
    align rate:sum(m("span_call_count_5m"))
    aggregate prev_sum:sum(rate), group_by(service_name)
}

# Combine both periods
@combined <- @current {
    union @previous
    aggregate current:any_not_null(current_sum),
              previous:any_not_null(prev_sum),
              group_by(service_name)
}

# Calculate changes
<- @combined {
    make_col change:current - previous
    make_col pct_change:if(previous > 0, (change / previous) * 100, 0)
    make_col abs_pct_change:if(pct_change < 0, -pct_change, pct_change)
    filter abs_pct_change > 50          # Show only significant changes
    sort desc(abs_pct_change)
}

Why This Works:

  • timeshift 1h moves the "previous" data timestamps forward by 1 hour
  • When combined with current data, both align to same time buckets
  • any_not_null() picks the non-null value from each period
  • Result: side-by-side comparison in same row

Sample Output:

service_name        current  previous  change    pct_change
frontend-service    45000    15000     30000     200.0
checkout-api        8000     15000     -7000     -46.7
payment-service     500      10000     -9500     -95.0

Day-Over-Day Comparison

@today <- @ {
    align rate:sum(m("span_call_count_5m"))
    aggregate today_sum:sum(rate), group_by(service_name)
}

@yesterday <- @ {
    timeshift 1d                        # 24 hours
    align rate:sum(m("span_call_count_5m"))
    aggregate yesterday_sum:sum(rate), group_by(service_name)
}

@combined <- @today {
    union @yesterday
    aggregate today:any_not_null(today_sum),
              yesterday:any_not_null(yesterday_sum),
              group_by(service_name)
}

<- @combined {
    make_col change:today - yesterday
    make_col pct_change:if(yesterday > 0, (change / yesterday) * 100, 0)
    sort desc(pct_change)
}

Week-Over-Week Comparison

@this_week <- @ {
    align 1h, rate:sum(m("span_call_count_5m"))
    aggregate week_sum:sum(rate), group_by(service_name)
}

@last_week <- @ {
    timeshift 7d                        # One week
    align 1h, rate:sum(m("span_call_count_5m"))
    aggregate last_week_sum:sum(rate), group_by(service_name)
}

@combined <- @this_week {
    union @last_week
    aggregate this_week:any_not_null(week_sum),
              last_week:any_not_null(last_week_sum),
              group_by(service_name)
}

<- @combined {
    make_col growth:this_week - last_week
    make_col growth_pct:if(last_week > 0, (growth / last_week) * 100, 0)
    sort desc(growth_pct)
}

Pattern 2: Period-Over-Period Comparison (Events/Intervals)

Use Case: Compare raw event/span counts across time periods

Strategy: Same union pattern, but use statsby instead of align + aggregate

Example: Error Count This Hour vs Last Hour

@current <- @ {
    filter error = true
    statsby current_errors:count(), group_by(service_name)
}

@previous <- @ {
    timeshift 1h
    filter error = true
    statsby prev_errors:count(), group_by(service_name)
}

@combined <- @current {
    union @previous
    aggregate current:any_not_null(current_errors),
              previous:any_not_null(prev_errors),
              group_by(service_name)
}

<- @combined {
    make_col error_change:current - previous
    make_col pct_change:if(previous > 0, (error_change / previous) * 100, 0)
    filter current > 10                 # Only services with significant errors
    sort desc(current)
}

Key Difference: Use statsby for event datasets, align + aggregate for metrics.

Pattern 3: Time-Series Period Comparison

Use Case: Chart current vs previous period trends over time

Example: Current vs Previous Week (Hourly Buckets)

@current <- @ {
    align 1h, rate:sum(m("span_call_count_5m"))
    aggregate current_rate:sum(rate), group_by(service_name)
}

@previous <- @ {
    timeshift 7d
    align 1h, rate:sum(m("span_call_count_5m"))
    aggregate prev_rate:sum(rate), group_by(service_name)
}

@combined <- @current {
    union @previous
    aggregate current:any_not_null(current_rate),
              previous:any_not_null(prev_rate),
              group_by(service_name, _c_bucket)
}

<- @combined {
    make_col change:current - previous
    make_col pct_change:if(previous > 0, (change / previous) * 100, 0)
}

Output: Time-series with both periods aligned by bucket, suitable for line charts showing trends over time.

Note: This pattern returns multiple rows per service (one per time bucket). For summary comparisons, use Pattern 1 instead.

Pattern 4: Building Intermediate Results

Use Case: Complex calculations requiring multiple steps

Example: Calculate Error Budget Consumption

# Step 1: Get total requests and errors
@base <- @ {
    align options(bins: 1), rate:sum(m("span_call_count_5m")),
                             errors:sum(m("span_error_count_5m"))
    aggregate total_requests:sum(rate),
              total_errors:sum(errors),
              group_by(service_name)
}

# Step 2: Calculate SLO metrics
@slo <- @base {
    make_col error_rate:if(total_requests > 0, total_errors / total_requests, 0)
    make_col success_rate:1 - error_rate
    make_col slo_target:0.999           # 99.9% SLO
    make_col error_budget:1 - slo_target
}

# Step 3: Calculate budget consumption
<- @slo {
    make_col budget_consumed:if(error_budget > 0, error_rate / error_budget, 0)
    make_col budget_remaining:1 - budget_consumed
    make_col status:if(budget_consumed > 1, "VIOLATED",
                    if(budget_consumed > 0.8, "WARNING", "HEALTHY"))
    filter total_requests > 1000        # Only services with traffic
    sort desc(budget_consumed)
}

Sample Output:

service_name     total_requests  error_rate  budget_consumed  status
adservice        870             0.0276      27.6             VIOLATED
cartservice      2303            0.0091      9.1              VIOLATED
frontend         15108           0.0016      1.6              VIOLATED
productcatalog   8838            0.0000      0.0              HEALTHY

Why This Works:

  • @base subquery aggregates raw metrics (requests + errors)
  • @slo subquery builds on @base, adding calculated SLO fields
  • Final stage uses @slo results to compute budget status
  • Each stage can reference all columns from previous stages

Understanding any_not_null()

The any_not_null() function is crucial for union patterns:

# After union, you typically have sparse data:
# Row 1: current_sum=100, prev_sum=null
# Row 2: current_sum=null, prev_sum=80

aggregate current:any_not_null(current_sum),
          previous:any_not_null(prev_sum),
          group_by(service_name)

# Result:
# Row 1: current=100, previous=80

How it works:

  • Groups by service_name (or other dimensions)
  • For each group, finds any non-null value across all union rows
  • Collapses sparse union into single row per group

Alternative functions:

  • any() - Picks arbitrary value (may be null)
  • min() / max() - Numeric min/max (only for numbers)
  • any_not_null() - Best for union collapse (picks any non-null)

Common Patterns Summary

Use Case Subqueries Needed Key Verbs
Period-over-period (metrics) 2+ timeshift, align, union, any_not_null()
Period-over-period (events) 2+ timeshift, statsby, union, any_not_null()
Time-series comparison 2+ timeshift, align, union, any_not_null(), group_by(_c_bucket)
Complex calculations 1-3 make_col, pipeline stages

Note: For A/B comparisons across different filter conditions, use conditional columns with if() statements instead of subqueries. For multi-dataset joins, use lookup or join verbs (see working-with-resources skill).

Troubleshooting

Issue: "Columns don't match in union"

Cause: Union requires exact column name matches

Solution: Ensure both subqueries produce same column names

# WRONG - column names don't match
@a <- @ { aggregate count_a:count() }
@b <- @ { aggregate count_b:count() }
<- @a { union @b }  # Error!

# CORRECT - same column names
@a <- @ { aggregate cnt:count() }
@b <- @ { aggregate cnt:count() }
<- @a { union @b }  # Works!

Issue: "All nulls after any_not_null()"

Cause: group_by dimensions don't align across union sources

Solution: Verify both subqueries group by same dimensions

# WRONG - different group_by
@a <- @ { aggregate cnt:count(), group_by(service_name) }
@b <- @ { aggregate cnt:count(), group_by(namespace) }
<- @a { union @b; aggregate total:any_not_null(cnt), group_by(service_name) }
# Result: Nulls (no matching groups)

# CORRECT - same group_by
@a <- @ { aggregate cnt:count(), group_by(service_name) }
@b <- @ { aggregate cnt:count(), group_by(service_name) }
<- @a { union @b; aggregate total:any_not_null(cnt), group_by(service_name) }

Issue: "Timeshift has no effect"

Cause: Timeshift applied AFTER align (too late!)

Solution: Always timeshift BEFORE align

# WRONG - timeshift after align
@previous <- @ {
    align rate:sum(m("metric"))
    timeshift 1h                # Too late!
}

# CORRECT - timeshift before align
@previous <- @ {
    timeshift 1h                # First!
    align rate:sum(m("metric"))
}

Issue: "Can't reference @label"

Cause: Trying to use label before it's defined

Solution: Define subquery first, then reference it

# WRONG - @combined used before definition
<- @combined { ... }
@combined <- @ { ... }

# CORRECT - define first
@combined <- @ { ... }
<- @combined { ... }

Performance Considerations

When to Use Subqueries vs Single Query

Use subqueries when:

  • Period-over-period comparison (timeshift required)
  • Building complex intermediate results
  • Readability improves significantly

Avoid subqueries when:

  • Simple A/B comparison (use conditional columns)
  • Single metric aggregation
  • Performance is critical (subqueries add overhead)

Optimization Tips

  1. Filter early: Apply filters in subqueries, not after union

    # GOOD
    @current <- @ {
        filter service_name = "frontend"  # Filter early
        align rate:sum(m("metric"))
    }
    
    # BAD
    @current <- @ {
        align rate:sum(m("metric"))
    }
    <- @current {
        filter service_name = "frontend"  # Filter late (processes all services)
    }
    
  2. Use options(bins: 1) for summaries: Reduces data volume in union

    @current <- @ {
        align options(bins: 1), rate:sum(m("metric"))  # Single row per service
        aggregate total:sum(rate), group_by(service_name)
    }
    
  3. Limit union sources: Each union source adds processing cost

Comparison: Subquery Union vs Window Functions

Both can solve period-over-period comparison, but have different tradeoffs:

Aspect Subquery + Union Window(lag)
Syntax More verbose More concise
Flexibility Can compare any time periods Limited to adjacent rows
Performance Processes data twice Single pass
Time buckets Aligns arbitrary periods Only sequential buckets
Use case Day-over-day, week-over-week Row-to-row change detection

Example: Hour-over-hour rate of change

# Window approach (simpler for sequential buckets)
align 1h, rate:sum(m("span_call_count_5m"))
| make_col prev_rate:window(lag(rate, 1), group_by(service_name))
| make_col change:rate - prev_rate
| make_col pct:if(prev_rate > 0, (change / prev_rate) * 100, 0)

# Union approach (more flexible, can compare any offset)
@current <- @ {
    align 1h, rate:sum(m("span_call_count_5m"))
    aggregate current:sum(rate), group_by(service_name)
}
@previous <- @ {
    timeshift 1h
    align 1h, rate:sum(m("span_call_count_5m"))
    aggregate prev:sum(rate), group_by(service_name)
}
@combined <- @current {
    union @previous
    aggregate current:any_not_null(current), prev:any_not_null(prev), group_by(service_name)
}
<- @combined {
    make_col change:current - prev
    make_col pct:if(prev > 0, (change / prev) * 100, 0)
}

Recommendation:

  • Use window(lag) for simple sequential comparisons (previous hour, previous bucket)
  • Use union + timeshift for arbitrary period comparisons (same hour yesterday, last week)

Related Skills

  • window-functions-deep-dive - Covers window(lag/lead/avg) patterns
  • detecting-anomalies - Uses both union and window patterns
  • aggregating-gauge-metrics - Foundation for metric aggregation
  • time-series-analysis - Time bucketing with timechart

Key Takeaways

  1. Subquery syntax: @label <- @dataset { pipeline }
  2. Union combines: Same schema, different filters/time periods
  3. Timeshift before align: Critical for metric comparisons
  4. any_not_null(): Collapses sparse union results
  5. Not always best: Consider conditional columns for simple A/B tests
  6. Performance matters: Union processes data multiple times

When in doubt about subquery syntax or complex union patterns, use learn_observe_skill("OPAL subquery") for official documentation.