| name | window-functions-deep-dive |
| description | Master OPAL window functions for row-relative calculations, rankings, and moving aggregates. Covers lag(), lead(), row_number(), rank(), dense_rank(), moving averages, first(), and last(). Use when comparing rows to neighbors, ranking within partitions, calculating rate of change, or computing time-based moving windows. CRITICAL - OPAL uses window() function wrapper, NOT SQL OVER clause. |
Window Functions Deep Dive
Overview
Window functions perform calculations across sets of rows related to the current row, without collapsing rows like aggregate does. They enable:
- Row comparisons: Compare current row to previous/next rows (lag/lead)
- Rankings: Assign ranks within partitions (row_number/rank/dense_rank)
- Moving calculations: Rolling averages, sums over time windows
- Boundary values: First/last values in windows or partitions
CRITICAL: OPAL vs SQL Syntax
❌ WRONG - SQL OVER Syntax (Does NOT Work):
SUM(val) OVER (PARTITION BY subject ORDER BY time ROWS UNBOUNDED PRECEDING)
# Error: Unknown function 'over()'
✅ CORRECT - OPAL window() Function:
window(sum(val), group_by(subject), order_by(time))
Key Difference: OPAL uses window() function wrapper, NOT SQL OVER clause. The Observe documentation sometimes shows SQL syntax for reference, but you MUST use the window() syntax in actual OPAL queries.
Window Function Categories
1. Offset Functions (lag/lead)
Access values from previous or next rows relative to current row.
Functions:
lag(column, offset)- Get value from N rows backlead(column, offset)- Get value from N rows ahead
Syntax:
window(lag(column, N), group_by(partition_columns))
window(lead(column, N), group_by(partition_columns))
Key Points:
- Offset defaults to 1 if not specified
- Returns
nullwhen offset goes beyond partition boundary - MUST specify
group_by()for partitioning (even if single partition) - Optional
order_by()for explicit ordering (default: unspecified)
2. Ranking Functions
Assign ranks or row numbers within partitions.
Functions:
row_number()- Sequential unique numbers (1, 2, 3, 4, ...)rank()- Ranks with gaps after ties (1, 2, 2, 4, ...)dense_rank()- Ranks without gaps (1, 2, 2, 3, ...)
Syntax:
window(row_number(), group_by(partition), order_by(sort_col))
window(rank(), group_by(partition), order_by(sort_col))
window(dense_rank(), group_by(partition), order_by(sort_col))
Key Differences:
| Value | row_number() | rank() | dense_rank() |
|---|---|---|---|
| 59 | 1 | 1 | 1 |
| 53 | 2 | 2 | 2 |
| 43 | 3 | 3 | 3 |
| 3 | 4 | 4 | 4 |
| 3 | 5 | 4 | 4 |
| 2 | 6 | 6 | 5 |
When to Use:
row_number(): Need unique IDs, pagination, samplingrank(): Olympic-style ranking (ties share rank, gaps for fairness)dense_rank(): Category ranking (no gaps, continuous numbering)
3. Aggregate Functions with Time Windows
Calculate rolling aggregates over time-based sliding windows.
Functions:
avg(column)- Moving averagesum(column)- Moving summin(column)- Moving minimummax(column)- Moving maximumcount(column)- Moving count
Syntax:
window(aggregate_func(column), group_by(partition), order_by(time_col), frame(back:duration))
Frame Durations:
- Minutes:
5m,15m,30m - Hours:
1h,6h,12h,24h - Days:
1d,7d,30d
IMPORTANT: OPAL only supports time-based frames (back:duration), NOT row-based frames like SQL's ROWS N PRECEDING.
4. Value Functions (first/last)
Retrieve first or last value within partition or window.
Functions:
first(column)- First value (by order_by)last(column)- Last value (by order_by)
Syntax:
# Entire partition
window(first(column), group_by(partition), order_by(sort_col))
window(last(column), group_by(partition), order_by(sort_col))
# Sliding window
window(first(column), group_by(partition), order_by(sort_col), frame(back:duration))
window(last(column), group_by(partition), order_by(sort_col), frame(back:duration))
Pattern 1: Rate of Change Detection (lag/lead)
Use Case: Detect sudden spikes or drops in metrics.
align 5m, rate:sum(m("span_call_count_5m"))
| make_col previous_rate:window(lag(rate, 1), group_by(service_name))
| make_col next_rate:window(lead(rate, 1), group_by(service_name))
| make_col rate_change:rate - previous_rate
| make_col pct_change:if(previous_rate > 0, (rate_change / previous_rate) * 100, 0)
| make_col is_spike:if(pct_change > 100 or pct_change < -50, true, false)
| filter is_spike = true
| topk 20, max(pct_change)
Result:
service_name previous_rate rate pct_change is_spike
frontend 2 50 2400% true # 24x increase!
frontend 50 2 -96% true # 96% drop
cartservice 10 80 700% true # 8x increase
How It Works:
lag(rate, 1)gets previous time bucket's ratelead(rate, 1)gets next time bucket's rate (for forward detection)- Calculate percentage change:
(current - previous) / previous * 100 - Flag spikes: >100% increase or >50% decrease
topksorts by largest changes
When to Use:
- Detect traffic spikes or drops
- Identify service anomalies
- Alert on sudden behavior changes
- Compare bucket-to-bucket changes
Pattern 2: Top-N Per Time Bucket (Ranking)
Use Case: Find top 5 busiest services in each 5-minute window.
align 5m, rate:sum(m("span_call_count_5m"))
| aggregate total:sum(rate), group_by(service_name, _c_bucket)
| make_col rank:window(rank(), group_by(_c_bucket), order_by(desc(total)))
| filter rank <= 5
| sort asc(_c_bucket), asc(rank)
Result:
_c_bucket service_name total rank
5877542 frontend-proxy 59 1
5877542 frontend 53 2
5877542 featureflagservice 43 3
5877542 productcatalogservice 31 4
5877542 cartservice 8 5
5877543 frontend 61 1
5877543 frontend-proxy 55 2
...
How It Works:
- Aggregate to get total per service per bucket
rank()within each_c_bucketpartitionorder_by(desc(total))ranks highest first- Filter to top 5
- Sort for readable output
When to Use:
- Top-N queries within time windows
- Identify busiest/slowest services per period
- Compare rankings across time
- Dashboard "top services" widgets
Variation - Use dense_rank() for No Gaps:
make_col dense_rank:window(dense_rank(), group_by(_c_bucket), order_by(desc(total)))
Use dense_rank() when ties should count as one rank (no gaps).
Pattern 3: Moving Average Baseline
Use Case: Detect anomalies by comparing current value to 30-minute moving average.
align 5m, rate:sum(m("span_call_count_5m"))
| make_col baseline:window(avg(rate), group_by(service_name), order_by(asc(valid_from)), frame(back:30m))
| make_col deviation:rate - baseline
| make_col pct_deviation:if(baseline > 0, (deviation / baseline) * 100, 0)
| make_col is_anomaly:if(pct_deviation > 50 or pct_deviation < -50, true, false)
| filter is_anomaly = true
| topk 20, max(pct_deviation)
Result:
service_name rate baseline deviation pct_deviation is_anomaly
featureflagservice 150 30 120 400% true
frontend 2 26 -24 -92% true
cartservice 80 20 60 300% true
How It Works:
frame(back:30m)creates 30-minute sliding windowavg(rate)calculates average within that window- Compare current
ratetobaseline - Flag deviations >50% above or below baseline
When to Use:
- Anomaly detection with dynamic baseline
- Smooth out noise in metrics
- Detect sustained changes vs temporary blips
- Performance monitoring (latency vs moving average)
Frame Duration Guidance:
back:15m- Very responsive, more false positivesback:30m- Balanced (recommended for most use cases)back:1h- Stable baseline, less sensitiveback:24h- Day-over-day trending
Pattern 4: Detect Return to Baseline
Use Case: Alert when error rate exceeds initial rate by 2x for more than 1 hour.
align 5m, error_rate:sum(m("span_error_count_5m"))
| make_col first_rate:window(first(error_rate), group_by(service_name), order_by(asc(valid_from)), frame(back:1h))
| make_col current_elevated:if(error_rate > first_rate * 2, true, false)
| filter current_elevated = true
| statsby incidents:count(), avg_elevation:avg(error_rate / first_rate), group_by(service_name)
| sort desc(incidents)
Result:
service_name incidents avg_elevation
checkoutservice 15 3.2x # Error rate 3.2x higher than 1h ago
paymentservice 8 2.8x
fraudservice 3 2.1x
How It Works:
first(error_rate)withframe(back:1h)gets rate from 1 hour ago- Compare current to first:
error_rate > first_rate * 2 - Aggregate to count incidents and average elevation
When to Use:
- Detect if metric returned to baseline
- Track sustained elevations
- SLO violations (errors above initial threshold)
- Performance degradation alerts
Pattern 5: Smooth Noisy Metrics
Use Case: Smooth CPU usage with 10-minute moving average for charting.
align 1m, cpu:avg(m("cpu_usage_percent"))
| make_col smooth_cpu:window(avg(cpu), group_by(host), order_by(asc(valid_from)), frame(back:10m))
| pick_col valid_from, valid_to, host, cpu, smooth_cpu
Result:
valid_from host cpu smooth_cpu
1763259000... web-01 85.2 78.5 # Raw vs smoothed
1763259060... web-01 92.1 79.2 # Spike smoothed out
1763259120... web-01 76.4 79.8 # Averages over 10m
How It Works:
- Align to 1-minute buckets (high resolution)
- Calculate 10-minute moving average
- Use
smooth_cpufor charting,cpufor alerts
When to Use:
- Smooth spiky metrics for visualization
- Reduce noise in dashboards
- Calculate trends without losing detail
- Separate signal from noise
Pattern 6: Consecutive Threshold Violations
Use Case: Alert only if latency exceeds SLO for 3 consecutive buckets.
align 5m, p95:percentile(duration_ms, 0.95)
| make_col prev1:window(lag(p95, 1), group_by(service_name))
| make_col prev2:window(lag(p95, 2), group_by(service_name))
| make_col all_high:if(p95 > 500 and prev1 > 500 and prev2 > 500, true, false)
| filter all_high = true
| statsby first_violation:min(valid_from), duration_mins:count() * 5, group_by(service_name)
Result:
service_name first_violation duration_mins
checkoutservice 2025-11-15 10:30:00 45 # Violated for 45 min
paymentservice 2025-11-15 11:15:00 20 # Violated for 20 min
How It Works:
lag(p95, 1)andlag(p95, 2)get two previous buckets- Check if current AND both previous exceed 500ms
- Aggregate to find first violation and total duration
When to Use:
- Reduce alert fatigue from transient spikes
- Require sustained violations before alerting
- SLO monitoring with grace period
- Stability checks (N consecutive failures)
Common Window Function Patterns
Quick Reference
| Pattern | Window Function | Use Case |
|---|---|---|
| Bucket-to-bucket change | lag(col, 1) |
Spike detection, rate of change |
| Forward prediction | lead(col, 1) |
Pre-emptive alerts |
| Top-N per period | rank(), dense_rank() |
Rankings, leaderboards |
| Unique row IDs | row_number() |
Sampling, pagination |
| Moving average | avg() + frame(back:30m) |
Smoothing, baseline |
| Rolling sum | sum() + frame(back:1h) |
Hourly totals |
| Start of period | first() + frame(back:24h) |
Day-over-day comparison |
| End of period | last() + frame(back:24h) |
Final value in window |
window() Syntax Reference
Complete Syntax
window(
expression, # Function to apply
group_by(partition_columns), # Partition into groups
order_by(sort_expression), # Order within partitions
frame(back:duration) # Optional: time-based window
)
Parameters
expression (required):
- Offset:
lag(col, N),lead(col, N) - Ranking:
row_number(),rank(),dense_rank() - Aggregate:
avg(col),sum(col),min(col),max(col),count(col) - Value:
first(col),last(col)
group_by() (required for most functions):
- Partitions data into groups
- Window function operates independently per partition
- Example:
group_by(service_name)- separate windows per service
order_by() (required for ranking, optional for others):
- Defines row order within partitions
- Example:
order_by(asc(valid_from))- chronological order - Example:
order_by(desc(total))- highest first
frame() (optional, only for aggregates and value functions):
- Time-based sliding window:
frame(back:duration) - Examples:
frame(back:5m),frame(back:1h),frame(back:24h) - NOT supported: Row-based frames (
ROWS N PRECEDINGin SQL)
Limitations and Workarounds
❌ Limitation 1: No Row-Based Frames
Problem: Cannot specify "last 3 rows" or "next 5 rows".
SQL Syntax (Does NOT Work):
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
ROWS 5 FOLLOWING
OPAL Reality:
- Only time-based frames:
frame(back:duration) - No row-count-based frames
Workaround:
- Use fixed time window:
frame(back:15m)includes ~3 rows at 5m align - Use multiple
lag()calls:lag(col, 1),lag(col, 2),lag(col, 3)
❌ Limitation 2: No Cumulative Sums
Problem: Cannot create running totals using window functions.
SQL Syntax (Does NOT Work):
SUM(val) OVER (ORDER BY time ROWS UNBOUNDED PRECEDING)
OPAL Behavior:
window(sum(val), order_by(time)) # Returns total, NOT cumulative
Why: Without frame, sums entire partition. With frame(back:duration), sums time window.
Workaround:
- Use subqueries for cumulative calculations
- Process outside OPAL (in application layer)
- Request feature from Observe team
Example - Partial Workaround:
# NOT cumulative, but moving 1-hour sum
window(sum(val), group_by(service), order_by(time), frame(back:1h))
❌ Limitation 3: No OVER Clause
Problem: SQL OVER clause syntax fails in OPAL.
SQL Syntax (Does NOT Work):
SUM(val) OVER (PARTITION BY service ORDER BY time)
LAG(val, 1) OVER (PARTITION BY service)
Error: Unknown function 'over()'
Fix: Always use window() function wrapper.
Correct OPAL:
window(sum(val), group_by(service), order_by(time))
window(lag(val, 1), group_by(service))
Decision Tree: Which Window Function?
┌─────────────────────────────────────┐
│ What do you need to calculate? │
└─────────────────────────────────────┘
│
├─ Compare to previous/next row?
│ └─> Use: lag() or lead()
│ Examples:
│ - Spike detection (current vs previous)
│ - Rate of change (delta between buckets)
│ - Consecutive violations (check last N via multiple lags)
│
├─ Assign rankings or row numbers?
│ └─> Use: row_number(), rank(), or dense_rank()
│ Choose based on tie handling:
│ - Unique IDs → row_number()
│ - Olympic ranking (gaps after ties) → rank()
│ - Continuous numbering (no gaps) → dense_rank()
│
├─ Calculate over time window?
│ └─> Use: avg/sum/min/max + frame(back:duration)
│ Examples:
│ - Moving average baseline
│ - Rolling sum (hourly totals)
│ - Smoothing spiky metrics
│
└─ Get first/last value in window?
└─> Use: first() or last()
- Entire partition: omit frame()
- Sliding window: use frame(back:duration)
Examples:
- Compare to start of day (first in 24h window)
- Detect if returned to baseline
Performance Considerations
Window Function Performance
Fast ✅:
lag()andlead()with single offsetrow_number(),rank(),dense_rank()within small partitions- Time-based frames with reasonable durations (<1h)
Slower ⚠️:
- Large partitions (millions of rows per service)
- Very long time windows (
back:30dover large datasets) - Multiple window functions in single query
Optimization Tips:
- Filter Before Windowing:
# GOOD: Filter first, then window
filter service_name = "frontend"
| make_col prev:window(lag(rate, 1), group_by(service_name))
# SLOW: Window on all services, then filter
make_col prev:window(lag(rate, 1), group_by(service_name))
| filter service_name = "frontend"
- Use Appropriate Frame Duration:
# GOOD: 30m for 5m buckets = ~6 rows
frame(back:30m)
# SLOW: 24h for 5m buckets = ~288 rows
frame(back:24h)
- Limit Output Early:
# GOOD: Filter anomalies first
filter pct_change > 100
| limit 100
# SLOW: Calculate all, limit at end
make_col pct_change:...
| limit 100
Common Mistakes and Fixes
Mistake 1: Using SQL OVER Syntax
❌ WRONG:
make_col prev:lag(rate, 1) over (partition by service_name)
Error: Unknown function 'over()'
✅ FIX:
make_col prev:window(lag(rate, 1), group_by(service_name))
Mistake 2: Forgetting group_by()
❌ WRONG:
make_col prev:window(lag(rate, 1))
Error: Undefined behavior or incorrect partitioning
✅ FIX:
# Partition by service
make_col prev:window(lag(rate, 1), group_by(service_name))
# Single partition (all rows together)
make_col prev:window(lag(rate, 1), group_by())
Mistake 3: Expecting Cumulative Sum
❌ WRONG EXPECTATION:
window(sum(value), order_by(time)) # Expecting running total
Reality: Returns total sum of entire partition for ALL rows.
✅ WORKAROUND:
# Use moving sum instead (not cumulative, but windowed)
window(sum(value), group_by(service), order_by(time), frame(back:1h))
Mistake 4: Using Row-Based Frame Syntax
❌ WRONG:
frame(rows: 3 preceding) # No rows parameter
frame(between: 3 preceding and current) # No SQL syntax
✅ FIX:
frame(back:15m) # Time-based only
Cross-References
Related Skills:
- detecting-anomalies - Uses lag() for rate-of-change detection
- aggregating-gauge-metrics - Metrics aggregation before windowing
- time-series-analysis - Temporal analysis with timechart
- subquery-patterns-and-union - Advanced multi-stage calculations
Common Workflows:
- Spike Detection: aggregating-gauge-metrics → window-functions (lag) → detecting-anomalies
- Top-N Dashboards: aggregating-gauge-metrics → window-functions (rank) → filtering
- Baseline Comparison: aggregating-gauge-metrics → window-functions (frame) → anomaly filtering
- Trend Analysis: time-series-analysis → window-functions (moving avg) → visualization
Examples Summary
Example 1: Rate of Change Detection
Use Case: Find services with >100% request rate increase
align 5m, rate:sum(m("span_call_count_5m"))
| make_col prev:window(lag(rate, 1), group_by(service_name))
| make_col pct_change:if(prev > 0, (rate - prev) / prev * 100, 0)
| filter pct_change > 100
Example 2: Top 10 Services Per Time Bucket
Use Case: Busiest services in each 5-minute window
align 5m, rate:sum(m("span_call_count_5m"))
| make_col rank:window(rank(), group_by(_c_bucket), order_by(desc(rate)))
| filter rank <= 10
Example 3: Anomaly Detection with Moving Baseline
Use Case: Alert when current exceeds 30m average by 50%
align 5m, rate:sum(m("span_call_count_5m"))
| make_col baseline:window(avg(rate), group_by(service_name), order_by(asc(valid_from)), frame(back:30m))
| make_col anomaly:if(rate > baseline * 1.5, true, false)
| filter anomaly = true
Example 4: Smoothed Metric for Dashboards
Use Case: 15-minute moving average for charts
align 1m, latency:percentile(duration_ms, 0.95)
| make_col smooth_p95:window(avg(latency), group_by(service_name), order_by(asc(valid_from)), frame(back:15m))
| pick_col valid_from, valid_to, service_name, smooth_p95
Example 5: Detect Sustained Violations
Use Case: Alert if SLO violated for 3+ consecutive buckets
align 5m, p95:percentile(duration_ms, 0.95)
| make_col prev1:window(lag(p95, 1), group_by(service_name))
| make_col prev2:window(lag(p95, 2), group_by(service_name))
| make_col sustained:if(p95 > 500 and prev1 > 500 and prev2 > 500, true, false)
| filter sustained = true
Quick Syntax Reference Card
| Function | Basic Syntax | With Frame |
|---|---|---|
| lag | window(lag(col, 1), group_by(dim)) |
N/A |
| lead | window(lead(col, 1), group_by(dim)) |
N/A |
| row_number | window(row_number(), group_by(dim), order_by(col)) |
N/A |
| rank | window(rank(), group_by(dim), order_by(col)) |
N/A |
| dense_rank | window(dense_rank(), group_by(dim), order_by(col)) |
N/A |
| avg | window(avg(col), group_by(dim)) |
window(avg(col), group_by(dim), order_by(time), frame(back:30m)) |
| sum | window(sum(col), group_by(dim)) |
window(sum(col), group_by(dim), order_by(time), frame(back:1h)) |
| first | window(first(col), group_by(dim), order_by(time)) |
window(first(col), group_by(dim), order_by(time), frame(back:24h)) |
| last | window(last(col), group_by(dim), order_by(time)) |
window(last(col), group_by(dim), order_by(time), frame(back:24h)) |
Remember: ALWAYS use window() wrapper. NEVER use SQL OVER clause.