name	earthquake_data_tsunami-query
description	Query and analyze earthquake_data_tsunami.xlsx data using conversational AI. Automatically generated skill for Excel file with 783 rows and 13 columns across 1 sheet(s).

earthquake_data_tsunami Query Skill

Auto-generated skill for querying earthquake_data_tsunami.xlsx

Dataset Overview

Original File: earthquake_data_tsunami.xlsx
File Size: 0.05 MB
Sheets: 1
Total Rows: 783
Total Columns: 13
Formulas: 0
Data Format: Parquet (optimized for fast querying)

Available Sheets

earthquake_data_tsunami

Rows: 783
Columns: 13
Key columns: magnitude, cdi, mmi, sig, nst

Query Capabilities

This skill enables natural language querying of the Excel data. You can:

Filtering and Selection

Filter rows based on conditions
Select specific columns
Combine multiple conditions

Aggregations

Group by categories
Calculate sums, averages, counts
Find min/max values

Analysis

Compare across groups
Identify trends and patterns
Generate insights from the data

Example Queries

"Show me total sales by age group"
"What's the average revenue for customers over 25?"
"Filter rows where status is 'active' and created after 2024-01-01"
"Group by category and calculate sum of revenue"
"Find the top 10 products by sales volume"
"Compare performance across different regions"

Formula Information

This dataset contains 0 formulas. They have been analyzed and documented.

Key Formulas

No formulas found

All formulas are documented in formula_map.json with their cell locations and dependencies.

Technical Details

Storage Format: Parquet (columnar, compressed)
Query Engine: Polars with streaming support
Memory Efficiency: Lazy loading, data loaded on-demand
Performance: ~30x faster than direct Excel queries
Data Location: earthquake_data_tsunami_parquet/

Instructions for Claude

When a user requests to query this data:

Step 1: Load Required Resources

import polars as pl
from pathlib import Path
import json

# Load data dictionary to understand schema
with open('data_dictionary.json', 'r') as f:
    schema = json.load(f)

# Load formula map if needed
with open('formula_map.json', 'r') as f:
    formulas = json.load(f)

Step 2: Use Query Helper

from query_helper import QueryHelper

# Initialize helper
helper = QueryHelper('earthquake_data_tsunami_parquet')

# Load a sheet
df = helper.load_sheet('Sheet1', lazy=True)

# Execute query with streaming
result = df.filter(
    pl.col("age") > 25
).group_by("category").agg([
    pl.count().alias("count"),
    pl.mean("revenue").alias("avg_revenue")
]).collect(streaming=True)

# Display results (paginated)
print(result.head(100))

Step 3: Handle Large Results

Always use .head(100) or .limit(100) for initial results
Offer to show more if user requests
Use streaming mode for queries: .collect(streaming=True)
Paginate large outputs

Step 4: Reference Documentation

Check data_dictionary.json for:
- Column names and data types
- Sample values
- Formula indicators
Check formula_map.json for:
- Excel formula definitions
- Cell locations
- Dependencies
Check sample_data.json for:
- Representative data examples
- Data patterns and formats

Column Reference

Sheet	Column	Type	Has Formulas
earthquake_data_tsunami	magnitude	numeric	No
earthquake_data_tsunami	cdi	numeric	No
earthquake_data_tsunami	mmi	numeric	No
earthquake_data_tsunami	sig	numeric	No
earthquake_data_tsunami	nst	numeric	No
earthquake_data_tsunami	dmin	numeric	No
earthquake_data_tsunami	gap	numeric	No
earthquake_data_tsunami	depth	numeric	No
earthquake_data_tsunami	latitude	numeric	No
earthquake_data_tsunami	longitude	numeric	No
earthquake_data_tsunami	Year	numeric	No
earthquake_data_tsunami	Month	numeric	No
earthquake_data_tsunami	tsunami	numeric	No

For complete column information, see data_dictionary.json.

Data Dictionary Location

All detailed schema information is in data_dictionary.json:

Column names, types, and sample values
Formula locations and definitions
Sheet relationships
Data statistics

Best Practices

Always use lazy loading with pl.scan_parquet() for large datasets
Stream results with .collect(streaming=True) to avoid memory issues
Limit initial results to 100 rows, offer pagination
Check data dictionary before constructing queries
Handle nulls gracefully in user-facing outputs
Validate column names from schema before querying
Use appropriate aggregations based on data types

Example Complete Workflow

import polars as pl
import json
from query_helper import QueryHelper

# 1. Initialize
helper = QueryHelper('earthquake_data_tsunami_parquet')

# 2. Load schema
with open('data_dictionary.json', 'r') as f:
    schema = json.load(f)

# 3. Check available columns
sheet_info = schema.get('Sheet1', {})
columns = [col['name'] for col in sheet_info.get('column_details', [])]
print(f"Available columns: {', '.join(columns[:10])}")

# 4. Execute query
df = helper.load_sheet('Sheet1')
result = df.filter(
    pl.col('age') > 25
).select(['name', 'age', 'revenue']).head(100).collect(streaming=True)

# 5. Display formatted results
print(result)
print(f"\nShowing {len(result)} of {df.count()} total rows")

Troubleshooting

Q: Column not found error? A: Check data_dictionary.json for exact column names (case-sensitive)

Q: Memory issues with large queries? A: Use .head() to limit results and .collect(streaming=True)

Q: Formula not working? A: Check formula_map.json - formulas are pre-computed in Parquet data

Q: Sheet name not found? A: List available sheets from schema or use helper.list_sheets()

Note: This is an auto-generated skill. The quality of query results depends on the data quality in the source Excel file.

earthquake_data_tsunami-query

Install Skill

SKILL.md