| name | dst-join-analysis |
| description | Perform SQL joins and multi-table analysis on DST data in DuckDB. Use when research requires combining multiple tables on common dimensions (time, region). Provides patterns for common DST dimension joins and multi-table comparisons. |
DST Multi-Table Join Analysis
Combine and analyze multiple Danmarks Statistik tables stored in DuckDB.
Common DST Dimensions
DST tables often share these dimensions:
1. Time Dimensions
tid- Time period (year, quarter, month)- Common formats: '2023', '2023K1', '2023M01'
- All tables with time series can join on this
2. Regional Dimensions
område- Geographic area (hele landet, regions, municipalities)region- Region codeskommune- Municipality codes
3. Demographic Dimensions
køn- Gender (M/K/Total)alder- Age groupsherkomst- Origin/ethnicity
Join Patterns
Pattern 1: Time-Series Join
When both tables have time dimension:
-- Example: Join population (FOLK1A) with births (FOD)
SELECT
f.tid AS year,
f.indhold AS population,
b.indhold AS births,
ROUND(b.indhold::FLOAT / f.indhold * 1000, 2) AS birth_rate_per_1000
FROM dst_folk1a f
INNER JOIN dst_fod b ON f.tid = b.tid
WHERE f.område = 'Hele landet'
AND b.område = 'Hele landet'
ORDER BY f.tid;
Use when: Comparing two time-series indicators
Pattern 2: Regional Comparison
When both tables have regional breakdown:
-- Example: Compare population (FOLK1A) across regions
SELECT
t1.område AS region,
t1.tid AS year,
t1.indhold AS population_2020,
t2.indhold AS population_2023,
ROUND((t2.indhold::FLOAT - t1.indhold) / t1.indhold * 100, 2) AS growth_pct
FROM dst_folk1a t1
INNER JOIN dst_folk1a t2
ON t1.område = t2.område
WHERE t1.tid = '2020'
AND t2.tid = '2023'
AND t1.område != 'Hele landet'
ORDER BY growth_pct DESC;
Use when: Comparing regions across same indicator
Pattern 3: Multi-Indicator Analysis
When joining different indicators by time and region:
-- Example: Correlate unemployment with business bankruptcies
SELECT
u.tid AS year,
u.indhold AS unemployment_rate,
b.indhold AS bankruptcies,
ROUND(b.indhold::FLOAT / u.indhold, 2) AS bankruptcies_per_unemployed
FROM dst_unemployment u
INNER JOIN dst_bankruptcies b
ON u.tid = b.tid
AND u.område = b.område
WHERE u.område = 'Hele landet'
ORDER BY u.tid;
Use when: Exploring relationships between different indicators
Pattern 4: Aggregate Join
When one table is at higher granularity:
-- Example: Join total population with regional breakdown
SELECT
total.tid AS year,
total.indhold AS total_population,
region.område AS region,
region.indhold AS region_population,
ROUND(region.indhold::FLOAT / total.indhold * 100, 2) AS pct_of_total
FROM dst_folk1a total
INNER JOIN dst_folk1a region
ON total.tid = region.tid
WHERE total.område = 'Hele landet'
AND region.område != 'Hele landet'
AND region.område LIKE 'Region%'
ORDER BY total.tid, pct_of_total DESC;
Use when: Comparing parts to whole
Data Type Handling
DST data stored as TEXT, convert for calculations:
-- Convert to numeric for math
SELECT
tid,
CAST(indhold AS INTEGER) AS value_int,
CAST(indhold AS FLOAT) AS value_float,
CAST(indhold AS DECIMAL(15,2)) AS value_decimal
FROM dst_table;
Common Calculations
Growth Rate
ROUND((new_value::FLOAT - old_value) / old_value * 100, 2) AS growth_pct
Year-over-Year Change
SELECT
t1.tid AS year,
t1.indhold AS current_value,
LAG(t1.indhold) OVER (ORDER BY t1.tid) AS previous_value,
t1.indhold::FLOAT - LAG(t1.indhold) OVER (ORDER BY t1.tid) AS yoy_change
FROM dst_table t1;
Moving Average
SELECT
tid,
indhold,
AVG(indhold::FLOAT) OVER (
ORDER BY tid
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3y
FROM dst_table;
Correlation (Pearson)
-- DuckDB has built-in correlation
SELECT
CORR(t1.indhold::FLOAT, t2.indhold::FLOAT) AS correlation_coefficient
FROM dst_table1 t1
INNER JOIN dst_table2 t2 ON t1.tid = t2.tid;
Handling Missing Data
DST tables may have:
NULLvalues".."for unavailable data"-"for zero/not applicable
-- Filter out missing data
WHERE indhold IS NOT NULL
AND indhold NOT IN ('..', '-', '')
AND TRY_CAST(indhold AS INTEGER) IS NOT NULL
Query Optimization Tips
- Filter early: Apply WHERE before JOIN when possible
- Use indexes: DuckDB auto-indexes, but column order matters
- Limit rows: Add LIMIT for exploration, remove for final analysis
- Aggregate wisely: Use GROUP BY only when necessary
Example Multi-Table Research Query
-- Research question: "How does population growth correlate with housing starts?"
WITH pop_growth AS (
SELECT
t1.tid AS year,
(t2.indhold::FLOAT - t1.indhold) / t1.indhold * 100 AS pop_growth_pct
FROM dst_folk1a t1
INNER JOIN dst_folk1a t2 ON t1.tid = CAST(CAST(t2.tid AS INTEGER) - 1 AS VARCHAR)
WHERE t1.område = 'Hele landet' AND t2.område = 'Hele landet'
),
housing AS (
SELECT
tid AS year,
indhold::INTEGER AS housing_starts
FROM dst_housing_table
WHERE område = 'Hele landet'
)
SELECT
p.year,
p.pop_growth_pct,
h.housing_starts,
CORR(p.pop_growth_pct, h.housing_starts) OVER () AS correlation
FROM pop_growth p
INNER JOIN housing h ON p.year = h.year
ORDER BY p.year;
Usage Guidelines
When dst-research-analyst invokes you:
- Identify join keys: What dimensions do tables share?
- Choose pattern: Which pattern fits the analysis?
- Handle data types: Convert TEXT to numeric
- Filter missing data: Remove NULL/".."/"-"
- Add calculations: Growth rates, percentages, correlations
- Order results: By time or magnitude
- Return SQL: Provide query for dst-query skill to execute
Remember: You provide the SQL patterns, dst-query executes them.