| name | trino |
| description | Expert assistance for writing, analyzing, and optimizing Trino SQL queries for Treasure Data. Use this skill when users need help with Trino queries, performance optimization, or TD-specific SQL patterns. |
Trino SQL Expert
Expert assistance for writing and optimizing Trino SQL queries for Treasure Data environments.
When to Use This Skill
Use this skill when:
- Writing new Trino SQL queries for TD
- Optimizing existing Trino queries for performance
- Debugging Trino query errors or issues
- Converting queries from other SQL dialects to Trino
- Implementing TD best practices for data processing
Core Principles
1. TD Table Naming Conventions
Always use the TD table format:
SELECT * FROM database_name.table_name
2. Partitioning and Time-based Queries
TD tables are typically partitioned by time. Always include time filters for performance:
SELECT *
FROM database_name.table_name
WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')
Or use relative time ranges:
WHERE TD_TIME_RANGE(time, TD_TIME_ADD(TD_SCHEDULED_TIME(), '-7d'), TD_SCHEDULED_TIME())
3. Performance Optimization
Use APPROX functions for large datasets:
SELECT
APPROX_DISTINCT(user_id) as unique_users,
APPROX_PERCENTILE(response_time, 0.95) as p95_response
FROM database_name.events
WHERE TD_TIME_RANGE(time, '2024-01-01')
Partition pruning:
-- Good: Filters on partition column
WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-02')
-- Avoid: Non-partition column filters without time filter
WHERE event_type = 'click' -- Missing time filter!
Limit data scanned:
-- Use LIMIT for exploratory queries
SELECT * FROM table_name
WHERE TD_TIME_RANGE(time, '2024-01-01')
LIMIT 1000
4. Common TD Functions
TD_INTERVAL - Simplified relative time filtering (Recommended):
-- Current day
WHERE TD_INTERVAL(time, '1d', 'JST')
-- Yesterday
WHERE TD_INTERVAL(time, '-1d', 'JST')
-- Previous week
WHERE TD_INTERVAL(time, '-1w', 'JST')
-- Previous month
WHERE TD_INTERVAL(time, '-1M', 'JST')
-- 2 days ago (offset syntax)
WHERE TD_INTERVAL(time, '-1d/-1d', 'JST')
-- 3 months ago (combined offset)
WHERE TD_INTERVAL(time, '-1M/-2M', 'JST')
Note: TD_INTERVAL simplifies relative time queries and is preferred over combining TD_TIME_RANGE with TD_DATE_TRUNC. Cannot accept TD_SCHEDULED_TIME as first argument, but including TD_SCHEDULED_TIME elsewhere in the query establishes the reference date.
TD_TIME_RANGE - Filter by time partitions (explicit dates):
TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')
TD_TIME_RANGE(time, '2024-01-01') -- Single day
TD_SCHEDULED_TIME() - Get scheduled execution time:
TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d') -- Yesterday
TD_TIME_STRING - Format timestamps (Recommended):
-- Uses simple format codes instead of full format strings
TD_TIME_STRING(time, 'd!', 'JST') -- Returns: 2018-09-13
TD_TIME_STRING(time, 's!', 'UTC') -- Returns: 2018-09-13 16:45:34
TD_TIME_STRING(time, 'M!', 'JST') -- Returns: 2018-09 (year-month)
TD_TIME_STRING(time, 'h!', 'UTC') -- Returns: 2018-09-13 16 (year-month-day hour)
-- With timezone in output (without ! suffix)
TD_TIME_STRING(time, 'd', 'JST') -- Returns: 2018-09-13 00:00:00+0900
TD_TIME_STRING(time, 's', 'UTC') -- Returns: 2018-09-13 16:45:34+0000
Format codes:
y!= yyyy (year only)q!= yyyy-MM (quarter start)M!= yyyy-MM (month)w!= yyyy-MM-dd (week start)d!= yyyy-MM-dd (day)h!= yyyy-MM-dd HH (hour)m!= yyyy-MM-dd HH:mm (minute)s!= yyyy-MM-dd HH:mm:ss (second)- Without
!includes timezone offset
TD_TIME_FORMAT - Format timestamps (Legacy, use TD_TIME_STRING instead):
TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss', 'UTC')
TD_SESSIONIZE - Sessionize events:
SELECT TD_SESSIONIZE(time, 1800, user_id) as session_id
FROM events
5. JOIN Optimization
Put smaller table on the right side:
-- Good
SELECT *
FROM large_table l
JOIN small_table s ON l.id = s.id
-- Consider table size when joining
Use appropriate JOIN types:
-- INNER JOIN for matching records only
-- LEFT JOIN when you need all records from left table
-- Avoid FULL OUTER JOIN when possible (expensive)
6. Data Types and Casting
Be explicit with data types:
CAST(column_name AS BIGINT)
CAST(column_name AS VARCHAR)
CAST(column_name AS DOUBLE)
TRY_CAST(column_name AS BIGINT) -- Returns NULL on failure
7. Window Functions
SELECT
user_id,
event_time,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_time) as event_seq,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) as prev_event
FROM events
WHERE TD_TIME_RANGE(time, '2024-01-01')
Common Patterns
User Event Analysis
-- Using TD_INTERVAL for last month
SELECT
TD_TIME_STRING(time, 'd!', 'JST') as date,
event_type,
COUNT(*) as event_count,
APPROX_DISTINCT(user_id) as unique_users
FROM database_name.events
WHERE TD_INTERVAL(time, '-1M', 'JST')
AND event_type IN ('page_view', 'click', 'purchase')
GROUP BY 1, 2
ORDER BY 1, 2
Alternative with explicit date range:
SELECT
TD_TIME_STRING(time, 'd!', 'JST') as date,
event_type,
COUNT(*) as event_count,
APPROX_DISTINCT(user_id) as unique_users
FROM database_name.events
WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')
AND event_type IN ('page_view', 'click', 'purchase')
GROUP BY 1, 2
ORDER BY 1, 2
Conversion Funnel
WITH events_filtered AS (
SELECT
user_id,
event_type,
time
FROM database_name.events
WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')
)
SELECT
COUNT(DISTINCT CASE WHEN event_type = 'page_view' THEN user_id END) as step1_users,
COUNT(DISTINCT CASE WHEN event_type = 'add_to_cart' THEN user_id END) as step2_users,
COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END) as step3_users
FROM events_filtered
Daily Aggregation
-- Using TD_INTERVAL for yesterday's data
SELECT
TD_TIME_STRING(time, 'd!', 'JST') as date,
COUNT(*) as total_events,
APPROX_DISTINCT(user_id) as daily_active_users,
AVG(session_duration) as avg_session_duration
FROM database_name.events
WHERE TD_INTERVAL(time, '-1d', 'JST')
AND TD_SCHEDULED_TIME() IS NOT NULL -- Establishes reference date for TD_INTERVAL
GROUP BY 1
ORDER BY 1
For rolling 30-day window:
SELECT
TD_TIME_STRING(time, 'd!', 'JST') as date,
COUNT(*) as total_events,
APPROX_DISTINCT(user_id) as daily_active_users,
AVG(session_duration) as avg_session_duration
FROM database_name.events
WHERE TD_TIME_RANGE(time, TD_TIME_ADD(TD_SCHEDULED_TIME(), '-30d'), TD_SCHEDULED_TIME())
GROUP BY 1
ORDER BY 1
Error Handling
Common Errors
"Line X:Y: Column 'time' cannot be resolved"
- Ensure table name is correct
- Check that column exists in table schema
"Query exceeded memory limit"
- Add time filters with TD_TIME_RANGE
- Use APPROX_ functions instead of exact aggregations
- Reduce JOIN complexity or data volume
"Partition not found"
- Verify time range covers existing partitions
- Check TD_TIME_RANGE syntax
Best Practices
- Always include time filters using TD_INTERVAL or TD_TIME_RANGE for partition pruning
- Use TD_INTERVAL for relative dates:
WHERE TD_INTERVAL(time, '-1d', 'JST') - Use TD_TIME_RANGE for explicit dates:
WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-31') - Never filter by formatted dates: ❌
WHERE TD_TIME_STRING(time, 'd!', 'JST') = '2024-01-01'
- Use TD_INTERVAL for relative dates:
- Use TD_TIME_STRING for display only, not for filtering
- ✅
SELECT TD_TIME_STRING(time, 'd!', 'JST') as date - ❌
WHERE TD_TIME_STRING(time, 'd!', 'JST') = '2024-01-01'
- ✅
- Use APPROX functions for large-scale aggregations (APPROX_DISTINCT, APPROX_PERCENTILE)
- Limit exploratory queries to reduce costs and scan time
- Test queries on small time ranges before running on full dataset
- Use CTEs (WITH clauses) for complex queries to improve readability
- Add comments explaining business logic
- Consider materialized results for frequently-run queries
Example Workflow
When helping users write Trino queries:
- Understand the requirement - What data do they need?
- Identify tables - Which TD tables contain the data?
- Add time filters - What time range is needed?
- Write base query - Start with simple SELECT
- Add aggregations - Use appropriate functions
- Optimize - Apply performance best practices
- Test - Validate results on small dataset first
Resources
- Trino SQL documentation: https://trino.io/docs/current/
- TD-specific functions: Check internal TD documentation
- Query performance: Use EXPLAIN for query plans