Claude Code Plugins

Community-maintained marketplace

Feedback

Expert assistance for writing, analyzing, and optimizing Trino SQL queries for Treasure Data. Use this skill when users need help with Trino queries, performance optimization, or TD-specific SQL patterns.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name trino
description Expert assistance for writing, analyzing, and optimizing Trino SQL queries for Treasure Data. Use this skill when users need help with Trino queries, performance optimization, or TD-specific SQL patterns.

Trino SQL Expert

Expert assistance for writing and optimizing Trino SQL queries for Treasure Data environments.

When to Use This Skill

Use this skill when:

  • Writing new Trino SQL queries for TD
  • Optimizing existing Trino queries for performance
  • Debugging Trino query errors or issues
  • Converting queries from other SQL dialects to Trino
  • Implementing TD best practices for data processing

Core Principles

1. TD Table Naming Conventions

Always use the TD table format:

SELECT * FROM database_name.table_name

2. Partitioning and Time-based Queries

TD tables are typically partitioned by time. Always include time filters for performance:

SELECT *
FROM database_name.table_name
WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')

Or use relative time ranges:

WHERE TD_TIME_RANGE(time, TD_TIME_ADD(TD_SCHEDULED_TIME(), '-7d'), TD_SCHEDULED_TIME())

3. Performance Optimization

Use APPROX functions for large datasets:

SELECT
  APPROX_DISTINCT(user_id) as unique_users,
  APPROX_PERCENTILE(response_time, 0.95) as p95_response
FROM database_name.events
WHERE TD_TIME_RANGE(time, '2024-01-01')

Partition pruning:

-- Good: Filters on partition column
WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-02')

-- Avoid: Non-partition column filters without time filter
WHERE event_type = 'click'  -- Missing time filter!

Limit data scanned:

-- Use LIMIT for exploratory queries
SELECT * FROM table_name
WHERE TD_TIME_RANGE(time, '2024-01-01')
LIMIT 1000

4. Common TD Functions

TD_INTERVAL - Simplified relative time filtering (Recommended):

-- Current day
WHERE TD_INTERVAL(time, '1d', 'JST')

-- Yesterday
WHERE TD_INTERVAL(time, '-1d', 'JST')

-- Previous week
WHERE TD_INTERVAL(time, '-1w', 'JST')

-- Previous month
WHERE TD_INTERVAL(time, '-1M', 'JST')

-- 2 days ago (offset syntax)
WHERE TD_INTERVAL(time, '-1d/-1d', 'JST')

-- 3 months ago (combined offset)
WHERE TD_INTERVAL(time, '-1M/-2M', 'JST')

Note: TD_INTERVAL simplifies relative time queries and is preferred over combining TD_TIME_RANGE with TD_DATE_TRUNC. Cannot accept TD_SCHEDULED_TIME as first argument, but including TD_SCHEDULED_TIME elsewhere in the query establishes the reference date.

TD_TIME_RANGE - Filter by time partitions (explicit dates):

TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')
TD_TIME_RANGE(time, '2024-01-01')  -- Single day

TD_SCHEDULED_TIME() - Get scheduled execution time:

TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d')  -- Yesterday

TD_TIME_STRING - Format timestamps (Recommended):

-- Uses simple format codes instead of full format strings
TD_TIME_STRING(time, 'd!', 'JST')     -- Returns: 2018-09-13
TD_TIME_STRING(time, 's!', 'UTC')     -- Returns: 2018-09-13 16:45:34
TD_TIME_STRING(time, 'M!', 'JST')     -- Returns: 2018-09 (year-month)
TD_TIME_STRING(time, 'h!', 'UTC')     -- Returns: 2018-09-13 16 (year-month-day hour)

-- With timezone in output (without ! suffix)
TD_TIME_STRING(time, 'd', 'JST')      -- Returns: 2018-09-13 00:00:00+0900
TD_TIME_STRING(time, 's', 'UTC')      -- Returns: 2018-09-13 16:45:34+0000

Format codes:

  • y! = yyyy (year only)
  • q! = yyyy-MM (quarter start)
  • M! = yyyy-MM (month)
  • w! = yyyy-MM-dd (week start)
  • d! = yyyy-MM-dd (day)
  • h! = yyyy-MM-dd HH (hour)
  • m! = yyyy-MM-dd HH:mm (minute)
  • s! = yyyy-MM-dd HH:mm:ss (second)
  • Without ! includes timezone offset

TD_TIME_FORMAT - Format timestamps (Legacy, use TD_TIME_STRING instead):

TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss', 'UTC')

TD_SESSIONIZE - Sessionize events:

SELECT TD_SESSIONIZE(time, 1800, user_id) as session_id
FROM events

5. JOIN Optimization

Put smaller table on the right side:

-- Good
SELECT *
FROM large_table l
JOIN small_table s ON l.id = s.id

-- Consider table size when joining

Use appropriate JOIN types:

-- INNER JOIN for matching records only
-- LEFT JOIN when you need all records from left table
-- Avoid FULL OUTER JOIN when possible (expensive)

6. Data Types and Casting

Be explicit with data types:

CAST(column_name AS BIGINT)
CAST(column_name AS VARCHAR)
CAST(column_name AS DOUBLE)
TRY_CAST(column_name AS BIGINT)  -- Returns NULL on failure

7. Window Functions

SELECT
  user_id,
  event_time,
  ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_time) as event_seq,
  LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) as prev_event
FROM events
WHERE TD_TIME_RANGE(time, '2024-01-01')

Common Patterns

User Event Analysis

-- Using TD_INTERVAL for last month
SELECT
  TD_TIME_STRING(time, 'd!', 'JST') as date,
  event_type,
  COUNT(*) as event_count,
  APPROX_DISTINCT(user_id) as unique_users
FROM database_name.events
WHERE TD_INTERVAL(time, '-1M', 'JST')
  AND event_type IN ('page_view', 'click', 'purchase')
GROUP BY 1, 2
ORDER BY 1, 2

Alternative with explicit date range:

SELECT
  TD_TIME_STRING(time, 'd!', 'JST') as date,
  event_type,
  COUNT(*) as event_count,
  APPROX_DISTINCT(user_id) as unique_users
FROM database_name.events
WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')
  AND event_type IN ('page_view', 'click', 'purchase')
GROUP BY 1, 2
ORDER BY 1, 2

Conversion Funnel

WITH events_filtered AS (
  SELECT
    user_id,
    event_type,
    time
  FROM database_name.events
  WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')
)
SELECT
  COUNT(DISTINCT CASE WHEN event_type = 'page_view' THEN user_id END) as step1_users,
  COUNT(DISTINCT CASE WHEN event_type = 'add_to_cart' THEN user_id END) as step2_users,
  COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END) as step3_users
FROM events_filtered

Daily Aggregation

-- Using TD_INTERVAL for yesterday's data
SELECT
  TD_TIME_STRING(time, 'd!', 'JST') as date,
  COUNT(*) as total_events,
  APPROX_DISTINCT(user_id) as daily_active_users,
  AVG(session_duration) as avg_session_duration
FROM database_name.events
WHERE TD_INTERVAL(time, '-1d', 'JST')
  AND TD_SCHEDULED_TIME() IS NOT NULL  -- Establishes reference date for TD_INTERVAL
GROUP BY 1
ORDER BY 1

For rolling 30-day window:

SELECT
  TD_TIME_STRING(time, 'd!', 'JST') as date,
  COUNT(*) as total_events,
  APPROX_DISTINCT(user_id) as daily_active_users,
  AVG(session_duration) as avg_session_duration
FROM database_name.events
WHERE TD_TIME_RANGE(time, TD_TIME_ADD(TD_SCHEDULED_TIME(), '-30d'), TD_SCHEDULED_TIME())
GROUP BY 1
ORDER BY 1

Error Handling

Common Errors

"Line X:Y: Column 'time' cannot be resolved"

  • Ensure table name is correct
  • Check that column exists in table schema

"Query exceeded memory limit"

  • Add time filters with TD_TIME_RANGE
  • Use APPROX_ functions instead of exact aggregations
  • Reduce JOIN complexity or data volume

"Partition not found"

  • Verify time range covers existing partitions
  • Check TD_TIME_RANGE syntax

Best Practices

  1. Always include time filters using TD_INTERVAL or TD_TIME_RANGE for partition pruning
    • Use TD_INTERVAL for relative dates: WHERE TD_INTERVAL(time, '-1d', 'JST')
    • Use TD_TIME_RANGE for explicit dates: WHERE TD_TIME_RANGE(time, '2024-01-01', '2024-01-31')
    • Never filter by formatted dates: ❌ WHERE TD_TIME_STRING(time, 'd!', 'JST') = '2024-01-01'
  2. Use TD_TIME_STRING for display only, not for filtering
    • SELECT TD_TIME_STRING(time, 'd!', 'JST') as date
    • WHERE TD_TIME_STRING(time, 'd!', 'JST') = '2024-01-01'
  3. Use APPROX functions for large-scale aggregations (APPROX_DISTINCT, APPROX_PERCENTILE)
  4. Limit exploratory queries to reduce costs and scan time
  5. Test queries on small time ranges before running on full dataset
  6. Use CTEs (WITH clauses) for complex queries to improve readability
  7. Add comments explaining business logic
  8. Consider materialized results for frequently-run queries

Example Workflow

When helping users write Trino queries:

  1. Understand the requirement - What data do they need?
  2. Identify tables - Which TD tables contain the data?
  3. Add time filters - What time range is needed?
  4. Write base query - Start with simple SELECT
  5. Add aggregations - Use appropriate functions
  6. Optimize - Apply performance best practices
  7. Test - Validate results on small dataset first

Resources

  • Trino SQL documentation: https://trino.io/docs/current/
  • TD-specific functions: Check internal TD documentation
  • Query performance: Use EXPLAIN for query plans