| name | bigquery |
| description | Comprehensive guide for using BigQuery CLI (bq) to query and inspect tables in Monzo's BigQuery projects, with emphasis on data sensitivity and INFORMATION_SCHEMA queries. |
BigQuery CLI Skill
This skill provides comprehensive guidance on using the BigQuery CLI (bq) for querying and inspecting data in Monzo's BigQuery projects.
Core Principles
- Always specify the project explicitly using
--project_id=PROJECT_NAME - Always use Standard SQL with
--use_legacy_sql=false - Respect data sensitivity - avoid querying actual content from sensitive tables
- Use INFORMATION_SCHEMA for metadata queries (schemas, columns, tables)
Common Query Patterns
1. Check Table Schema (INFORMATION_SCHEMA)
Use this to inspect column names, types, and structure without accessing sensitive data:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT column_name, data_type, is_nullable
FROM \`monzo-analytics.DATASET_NAME.INFORMATION_SCHEMA.COLUMNS\`
WHERE table_name = 'TABLE_NAME'
ORDER BY ordinal_position"
Examples:
# Check dims dataset table schema
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT column_name, data_type FROM \`monzo-analytics.dims.INFORMATION_SCHEMA.COLUMNS\`
WHERE table_name = 'vulnerable_customer_logs_dim' ORDER BY ordinal_position"
# Check prod dataset table schema
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT column_name, data_type FROM \`monzo-analytics.prod.INFORMATION_SCHEMA.COLUMNS\`
WHERE table_name = 'transactions' ORDER BY ordinal_position"
2. Count Rows (Safe for Sensitive Tables)
Use COUNT(*) to check table size without exposing data:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT COUNT(*) as row_count FROM \`monzo-analytics.DATASET.TABLE_NAME\`"
Example:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT COUNT(*) as row_count FROM \`monzo-analytics.dims.vulnerable_customer_logs_dim\`"
3. List All Tables in a Dataset
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT table_name, table_type
FROM \`monzo-analytics.DATASET_NAME.INFORMATION_SCHEMA.TABLES\`
ORDER BY table_name"
Example:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT table_name FROM \`monzo-analytics.dims.INFORMATION_SCHEMA.TABLES\`
ORDER BY table_name"
4. Export Schema to File
Useful for programmatic processing of table schemas:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
--format=csv --quiet \
"SELECT column_name FROM \`monzo-analytics.DATASET.INFORMATION_SCHEMA.COLUMNS\`
WHERE table_name = 'TABLE_NAME' ORDER BY ordinal_position" \
| tail -n +2 > /tmp/columns.txt
Example:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
--format=csv --quiet \
"SELECT column_name FROM \`monzo-analytics.dims.INFORMATION_SCHEMA.COLUMNS\`
WHERE table_name = 'vulnerable_customer_logs_dim' ORDER BY ordinal_position" \
| tail -n +2 > /tmp/columns.txt
5. Check Table Metadata
Get table creation time, size, and other metadata:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT
table_name,
creation_time,
ROUND(size_bytes/1024/1024/1024, 2) as size_gb,
row_count
FROM \`monzo-analytics.DATASET_NAME.INFORMATION_SCHEMA.TABLES\`
WHERE table_name = 'TABLE_NAME'"
6. Find Tables by Pattern
Search for tables matching a naming pattern:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT table_name
FROM \`monzo-analytics.DATASET_NAME.INFORMATION_SCHEMA.TABLES\`
WHERE table_name LIKE '%PATTERN%'
ORDER BY table_name"
Example:
# Find all customer-related tables
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT table_name FROM \`monzo-analytics.dims.INFORMATION_SCHEMA.TABLES\`
WHERE table_name LIKE '%customer%' ORDER BY table_name"
7. Get Detailed Column Information
Get comprehensive column metadata including descriptions:
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT
column_name,
data_type,
is_nullable,
is_partitioning_column
FROM \`monzo-analytics.DATASET.INFORMATION_SCHEMA.COLUMNS\`
WHERE table_name = 'TABLE_NAME'
ORDER BY ordinal_position"
8. Sample Data (Non-Sensitive Tables Only)
⚠️ WARNING: Only use this on non-sensitive tables. Never query actual content from people/staff/PII tables.
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT * FROM \`monzo-analytics.DATASET.TABLE_NAME\` LIMIT 10"
Output Formatting Options
Control how results are displayed:
# CSV format
--format=csv
# JSON format
--format=json
# Pretty table format (default)
--format=prettyjson
# Quiet mode (no status messages)
--quiet
# Maximum rows to return
--max_rows=100
Common Projects and Datasets
Main Analytics Projects
monzo-analytics- Main analytics warehousemonzo-analytics-v2- New OOM architecture modelsmonzo-analytics-pii- PII-containing data (use with caution)sanitized-events-prod- Sanitised event dataraw-analytics-events-prod- Raw event data
Common Datasets
dims- Dimension tablesprod- Production tableslending- Lending-specific tablesslurpee- Slurpee data
Data Sensitivity Guidelines
✅ SAFE Operations (Always Allowed)
- INFORMATION_SCHEMA queries - These only return metadata, not actual data
- COUNT(*) queries - These only return row counts
- Schema inspection - Column names, types, table structure
⚠️ RESTRICTED Operations (Use with Caution)
Querying actual content from:
- People/staff data tables
- PII-containing tables
- Customer financial data
- Authentication/security tables
When in doubt:
- Stick to INFORMATION_SCHEMA queries
- Use COUNT(*) to verify table exists
- Ask the user before querying actual content
🚫 NEVER Do This
- Query actual rows from
people,staff,hibobtables - Export PII data to local files
- Query authentication credentials or tokens
- Access customer financial details without explicit permission
Error Handling
Common Errors and Solutions
Error: "Not found: Table"
# Solution: Check the table exists first
bq query --project_id=monzo-analytics --use_legacy_sql=false \
"SELECT table_name FROM \`monzo-analytics.DATASET.INFORMATION_SCHEMA.TABLES\`
WHERE table_name LIKE '%SEARCH_TERM%'"
Error: "Access Denied"
# Solution: You may not have permissions for that project/dataset
# Try a different project or ask the user about access
Error: "Syntax error"
# Solution: Ensure you're using Standard SQL (--use_legacy_sql=false)
# Check backtick usage around project.dataset.table identifiers
Best Practices
Always use fully-qualified table names with backticks:
`project-id.dataset.table`Use LIMIT for exploratory queries to avoid large result sets:
SELECT * FROM `project.dataset.table` LIMIT 10Check row counts before running expensive queries:
# First check size bq query --project_id=monzo-analytics --use_legacy_sql=false \ "SELECT COUNT(*) FROM \`project.dataset.table\`" # Then run full query if reasonableUse dry-run for cost estimation (for expensive queries):
bq query --dry_run --use_legacy_sql=false "YOUR_QUERY_HERE"Export large results to file:
bq query --project_id=monzo-analytics --use_legacy_sql=false \ --format=csv "YOUR_QUERY" > output.csv
Quick Reference Commands
# Schema check
bq query --project_id=PROJECT --use_legacy_sql=false \
"SELECT column_name, data_type FROM \`PROJECT.DATASET.INFORMATION_SCHEMA.COLUMNS\`
WHERE table_name = 'TABLE' ORDER BY ordinal_position"
# Row count
bq query --project_id=PROJECT --use_legacy_sql=false \
"SELECT COUNT(*) FROM \`PROJECT.DATASET.TABLE\`"
# List tables
bq query --project_id=PROJECT --use_legacy_sql=false \
"SELECT table_name FROM \`PROJECT.DATASET.INFORMATION_SCHEMA.TABLES\`
ORDER BY table_name"
# Table metadata
bq query --project_id=PROJECT --use_legacy_sql=false \
"SELECT table_name, row_count, size_bytes
FROM \`PROJECT.DATASET.INFORMATION_SCHEMA.TABLES\`
WHERE table_name = 'TABLE'"
When to Use This Skill
Invoke this skill when you need to:
- Query BigQuery tables or datasets
- Inspect table schemas or column types
- Count rows or check table existence
- Export table metadata
- Verify data before running dbt models
- Investigate data issues or table structures
- Find tables by naming patterns
Integration with dbt Workflow
When working on dbt models in the analytics repository:
- Before creating import models - Use BigQuery CLI to inspect source schemas
- Before running dbt - Verify source tables exist and have expected structure
- Debugging dbt failures - Query actual tables to understand data issues
- Validating generators - Check that column types match between source and generator
Remember: Always respect data sensitivity guidelines and use INFORMATION_SCHEMA when possible.