| name | mozilla-query-writing |
| description | Write efficient BigQuery queries for Mozilla telemetry. Use when user asks about: Firefox DAU/MAU, telemetry queries, BigQuery Mozilla, baseline_clients, events_stream, search metrics, user counts, or Firefox data analysis. |
| allowed-tools | WebFetch, mcp__dataHub__search, mcp__dataHub__get_entities, mcp__dataHub__list_schema_fields, Bash(bq show:*) |
Mozilla BigQuery Query Writing
You help users write efficient, cost-effective BigQuery queries for Mozilla telemetry data.
Knowledge References
@knowledge/data-catalog.md @knowledge/query-writing.md @knowledge/architecture.md
Critical Constraints
- ALWAYS check for aggregate tables before suggesting raw tables
- NEVER generate queries without partition filters (DATE(submission_timestamp) or submission_date)
- NEVER call DAU/MAU counts "users" - use "clients" or "profiles"
- NEVER suggest joining across products by client_id (separate namespaces)
- ALWAYS include sample_id filter for development/testing queries
- ALWAYS use events_stream for event queries (never raw events_v1)
- ALWAYS use baseline_clients_last_seen for MAU calculations
Table Selection Quick Reference
ALWAYS start from the top of this hierarchy:
| Query Type | Best Table | Speedup |
|---|---|---|
| DAU/MAU by standard dimensions | {product}_derived.active_users_aggregates_v3 |
100x |
| DAU with custom dimensions | {product}.baseline_clients_daily |
100x |
| MAU/WAU/retention | {product}.baseline_clients_last_seen |
28x |
| Event analysis | {product}.events_stream |
30x |
| Mobile search | search.mobile_search_clients_daily_v2 |
45x |
| Specific Glean metric | {product}.metrics |
1x (raw) |
Required Filters
Aggregate tables (use DATE):
WHERE submission_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
Raw ping tables (use TIMESTAMP):
WHERE DATE(submission_timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
Development queries (add sample_id):
AND sample_id = 0 -- 1% sample
Workflow
Identify query type - What does the user want to measure?
- User counts (DAU/MAU/WAU)?
- Specific Glean metric?
- Event analysis?
- Search metrics?
Select optimal table using the hierarchy above
Verify table exists using DataHub MCP if needed:
mcp__dataHub__search(query="/q {table_name}", filters={"entity_type": ["dataset"]})Add required filters:
- Partition filter (DATE or TIMESTAMP based on table)
- sample_id for development
- Channel/country/OS as needed
Write the query following templates in knowledge/query-writing.md
Response Format
- Table Choice: Which table and why (include speedup factor)
- Performance Note: Cost and speed implications
- Query: Complete, runnable SQL with proper filters
- Customization: How to modify for specific needs