| name | ga4-bigquery |
| description | Complete guide to GA4 BigQuery export including setup, schema documentation, SQL query patterns, and data analysis. Use when exporting GA4 data to BigQuery, writing SQL queries for GA4 data, analyzing event-level data, working with nested/repeated fields (UNNEST), or building custom reports from raw data. Covers BigQuery linking, events_* tables, SQL patterns, and performance optimization. |
GA4 BigQuery Export and Analysis
Overview
GA4 BigQuery export provides raw, event-level data access for advanced analysis, custom reporting, machine learning, and long-term data warehousing.
When to Use This Skill
Invoke this skill when:
- Exporting GA4 raw data to BigQuery
- Writing SQL queries for GA4 event data
- Analyzing unsampled event-level data
- Working with nested/repeated fields using UNNEST
- Building custom reports beyond GA4 UI limits
- Creating attribution models with raw data
- Performing user journey analysis across all events
- Integrating GA4 data with other data sources
- Building machine learning models on GA4 data
- Analyzing historical data beyond GA4 retention limits
- Optimizing BigQuery query performance
- Working with events_* table schema
- Extracting event parameters from nested structures
Core Capabilities
BigQuery Export Setup
Requirements:
- GA4 property (standard or 360)
- Google Cloud project
- BigQuery API enabled
- Appropriate permissions
Setup Steps:
Create Google Cloud Project:
- Go to console.cloud.google.com
- Create new project or select existing
- Enable BigQuery API
Link GA4 to BigQuery:
- GA4 Admin → Product Links → BigQuery Links
- Click "Link"
- Choose Google Cloud project
- Select dataset location (US, EU, etc.)
- Configure export:
- Daily: Complete export once per day (~9AM property timezone)
- Streaming: Real-time export (360 only)
- Click "Next"
- Confirm setup
Export Options:
- Daily Export: Free for standard GA4, once per day
- Streaming Export: GA4 360 only, near real-time
- Include Advertising IDs: Optional, for Ads integration
Data Availability:
- Daily tables: ~24 hours after day ends
- Intraday tables: ~3 updates per day
- Streaming: Minutes after event collection (360)
BigQuery Table Structure
Table Naming:
project.dataset.events_YYYYMMDD- Daily exportproject.dataset.events_intraday_YYYYMMDD- Intraday (partial day)project.dataset.events_*- Wildcard for all dates
Key Schema Fields:
Event Fields:
event_date: YYYYMMDD format (STRING)event_timestamp: Microseconds since epoch (INTEGER)event_name: Event name (STRING)event_params: Event parameters (RECORD, REPEATED)event_value_in_usd: Event value in USD (FLOAT)
User Fields:
user_id: User ID if set (STRING)user_pseudo_id: Anonymous user ID (STRING)user_properties: User properties (RECORD, REPEATED)user_first_touch_timestamp: First visit timestamp (INTEGER)
Device Fields:
device.category: desktop, mobile, tabletdevice.operating_system: Windows, iOS, Androiddevice.browser: Chrome, Safari, etc.
Geo Fields:
geo.country: Country namegeo.region: State/regiongeo.city: City name
Traffic Source Fields:
traffic_source.source: Source (google, direct)traffic_source.medium: Medium (organic, cpc)traffic_source.name: Campaign name
E-commerce Fields:
ecommerce.transaction_id: Transaction ID (STRING)ecommerce.purchase_revenue_in_usd: Purchase revenue (FLOAT)items: Items array (RECORD, REPEATED)
Basic SQL Query Patterns
Query 1: Event Count by Name
SELECT
event_name,
COUNT(*) as event_count
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
event_name
ORDER BY
event_count DESC
Query 2: Extract Event Parameters
SELECT
event_date,
event_name,
user_pseudo_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') as page_title
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'page_view'
LIMIT 1000
Query 3: Purchase Analysis
SELECT
event_date,
COUNT(DISTINCT user_pseudo_id) as purchasers,
COUNT(DISTINCT ecommerce.transaction_id) as transactions,
SUM(ecommerce.purchase_revenue_in_usd) as total_revenue,
AVG(ecommerce.purchase_revenue_in_usd) as avg_order_value
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
AND ecommerce.transaction_id IS NOT NULL
GROUP BY
event_date
ORDER BY
event_date
Query 4: UNNEST Items Array
SELECT
event_date,
item.item_name,
item.item_category,
SUM(item.quantity) as total_quantity,
SUM(item.item_revenue_in_usd) as total_revenue
FROM
`project.dataset.events_*`,
UNNEST(items) as item
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
GROUP BY
event_date,
item.item_name,
item.item_category
ORDER BY
total_revenue DESC
Advanced Query Patterns
User Journey Analysis
WITH user_events AS (
SELECT
user_pseudo_id,
event_timestamp,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX = '20250115'
)
SELECT
user_pseudo_id,
ARRAY_AGG(
STRUCT(event_name, page_location, event_timestamp)
ORDER BY event_timestamp
) as event_sequence
FROM
user_events
GROUP BY
user_pseudo_id
LIMIT 100
Session Attribution
SELECT
event_date,
traffic_source.source,
traffic_source.medium,
traffic_source.name as campaign,
COUNT(DISTINCT user_pseudo_id) as users,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) as sessions
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
event_date,
traffic_source.source,
traffic_source.medium,
traffic_source.name
ORDER BY
sessions DESC
Helper Functions
-- Create reusable functions for parameter extraction
CREATE TEMP FUNCTION GetParamString(params ANY TYPE, target_key STRING)
RETURNS STRING
AS (
(SELECT value.string_value FROM UNNEST(params) WHERE key = target_key)
);
CREATE TEMP FUNCTION GetParamInt(params ANY TYPE, target_key STRING)
RETURNS INT64
AS (
(SELECT value.int_value FROM UNNEST(params) WHERE key = target_key)
);
-- Use in query
SELECT
event_date,
GetParamString(event_params, 'page_location') as page_location,
GetParamInt(event_params, 'engagement_time_msec') as engagement_time
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
Query Optimization
Best Practices:
- Use _TABLE_SUFFIX Filtering:
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
NOT:
WHERE event_date BETWEEN '20250101' AND '20250131'
- Filter on Clustered Columns:
GA4 tables clustered by
event_nameandevent_timestamp:
WHERE event_name IN ('page_view', 'purchase')
- Select Specific Columns:
SELECT event_name, user_pseudo_id, event_timestamp
NOT:
SELECT *
- Limit UNNEST Operations:
-- Good: inline UNNEST
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location')
-- Avoid: full UNNEST in FROM
FROM table, UNNEST(event_params) as param
WHERE param.key = 'page_location'
- Use LIMIT During Development:
LIMIT 1000 -- Test query first
Cost Management
BigQuery Pricing:
- Storage: ~$0.02/GB/month
- Queries: ~$5/TB scanned
- Streaming inserts: ~$0.05/GB (360 only)
Reducing Costs:
- Partition by date using _TABLE_SUFFIX
- Select only needed columns
- Use LIMIT for testing
- Create materialized views for frequent queries
- Set up cost alerts in Google Cloud
Free Tier:
- 10 GB storage free/month
- 1 TB queries free/month
Data Retention
GA4 Export Retention:
- Standard GA4: 2 months or 14 months (Admin setting)
- BigQuery: Unlimited (until manually deleted)
- Set table expiration if needed (optional)
Setting Expiration:
ALTER TABLE `project.dataset.events_20250101`
SET OPTIONS (
expiration_timestamp=TIMESTAMP "2026-01-01 00:00:00 UTC"
)
Common Use Cases
1. Unsampled Reporting:
- GA4 UI may sample large datasets
- BigQuery = full, unsampled data
- Use for accurate reporting
2. Custom Attribution:
- Access full user journey
- Build custom attribution models
- Credit touchpoints as needed
3. Data Integration:
- Join GA4 with CRM data
- Combine with product catalog
- Enrich with external sources
4. Machine Learning:
- Export to ML tools
- Predict churn, LTV, conversions
- Train custom models
5. Long-term Analysis:
- Historical analysis beyond GA4 limits
- Year-over-year comparisons
- Trend analysis
Integration with Other Skills
- ga4-setup - Initial property setup before BigQuery export
- ga4-recommended-events - Event structure in BigQuery tables
- ga4-custom-events - Custom event parameters in BigQuery
- ga4-custom-dimensions - Custom dimensions in event_params
- ga4-reporting - Comparing BigQuery vs GA4 UI reports
- ga4-measurement-protocol - Server-side events in BigQuery
References
- references/bigquery-setup-complete.md - Step-by-step BigQuery linking
- references/schema-reference.md - Complete table schema documentation
- references/sql-patterns.md - Common SQL query patterns and examples
- references/optimization-guide.md - Performance and cost optimization
Quick Reference
Table Names:
- Daily:
events_YYYYMMDD - Intraday:
events_intraday_YYYYMMDD - Wildcard:
events_*
Filter by Date:
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
Extract Parameter:
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'param_name')
UNNEST Items:
FROM table, UNNEST(items) as item
Costs:
- Storage: $0.02/GB/month
- Queries: $5/TB scanned