| name | gcp-bq-data-export |
| description | Use when exporting BigQuery data to Cloud Storage, extracting tables to CSV, JSON, Avro, or Parquet formats, or using EXPORT DATA statements. Covers bq extract command, format options, compression, and wildcard exports. |
BigQuery Data Export
Use this skill when exporting data from BigQuery to Cloud Storage or local files.
Basic Extract Command
bq extract \
--location=LOCATION \
--destination_format=FORMAT \
--compression=COMPRESSION \
PROJECT:DATASET.TABLE \
gs://bucket/file.ext
Extract to CSV
Basic CSV Export
bq extract \
--destination_format=CSV \
--print_header=true \
dataset.table \
gs://bucket/export.csv
CSV with Options
bq extract \
--destination_format=CSV \
--compression=GZIP \
--field_delimiter=',' \
--print_header=true \
dataset.table \
gs://bucket/export.csv.gz
CSV flags:
--field_delimiter=','- Column separator--print_header=true/false- Include header row
Tab-Delimited Export
bq extract \
--destination_format=CSV \
--field_delimiter=$'\t' \
dataset.table \
gs://bucket/export.tsv
Extract to JSON
Newline-Delimited JSON
bq extract \
--destination_format=NEWLINE_DELIMITED_JSON \
--compression=GZIP \
dataset.table \
gs://bucket/export.json.gz
Output format:
{"id": 1, "name": "Alice", "amount": 100.50}
{"id": 2, "name": "Bob", "amount": 250.75}
Extract to Avro
bq extract \
--destination_format=AVRO \
--compression=SNAPPY \
dataset.table \
gs://bucket/export.avro
Benefits:
- Preserves schema
- Efficient binary format
- Fast re-import to BigQuery
Extract to Parquet
bq extract \
--destination_format=PARQUET \
dataset.table \
gs://bucket/export.parquet
Benefits:
- Columnar format
- Good compression
- Compatible with many analytics tools
Compression Options
Available compression:
GZIP- Good compression, slower (CSV, JSON, Avro)SNAPPY- Fast, moderate compression (Avro, Parquet)DEFLATE- Similar to GZIP (Avro)NONE- No compression (fastest)
Example:
bq extract \
--destination_format=CSV \
--compression=GZIP \
dataset.table \
gs://bucket/export.csv.gz
Large Table Exports
Using Wildcards (>1 GB)
BigQuery limitation: 1 GB per file
Solution: Use wildcard in destination
bq extract \
--destination_format=CSV \
dataset.large_table \
'gs://bucket/export-*.csv'
Output:
gs://bucket/export-000000000000.csv
gs://bucket/export-000000000001.csv
gs://bucket/export-000000000002.csv
...
Shard Pattern
# Create sharded exports
bq extract \
--destination_format=AVRO \
dataset.large_table \
'gs://bucket/shard/data-*.avro'
Note: Number of files depends on data size, not configurable.
Export Specific Partitions
Single Partition
# Export 2024-01-15 partition only
bq extract \
--destination_format=CSV \
dataset.partitioned_table\$20240115 \
gs://bucket/export_20240115.csv
Date Range (use WHERE in EXPORT DATA)
See EXPORT DATA section below.
EXPORT DATA SQL Statement
Basic EXPORT DATA
EXPORT DATA OPTIONS(
uri='gs://bucket/export-*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=','
) AS
SELECT * FROM `project.dataset.table`
WHERE date >= '2024-01-01';
Export Query Results
EXPORT DATA OPTIONS(
uri='gs://bucket/aggregated-*.parquet',
format='PARQUET',
overwrite=true
) AS
SELECT
customer_id,
DATE(order_timestamp) as order_date,
SUM(amount) as total_amount,
COUNT(*) as order_count
FROM `project.dataset.orders`
WHERE DATE(order_timestamp) >= '2024-01-01'
GROUP BY customer_id, order_date;
Format Options
CSV:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.csv',
format='CSV',
header=true,
field_delimiter=',',
compression='GZIP'
) AS SELECT ...;
JSON:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.json',
format='JSON',
compression='GZIP'
) AS SELECT ...;
Avro:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.avro',
format='AVRO',
compression='SNAPPY'
) AS SELECT ...;
Parquet:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.parquet',
format='PARQUET'
) AS SELECT ...;
Export to Local Files (Not Recommended)
Small Results via Query
# For small datasets only
bq query \
--format=csv \
--max_rows=10000 \
--use_legacy_sql=false \
'SELECT * FROM `project.dataset.table` LIMIT 10000' \
> local_export.csv
Limitation: Not suitable for large datasets. Use GCS for production.
Export Scheduled (Automation)
Using Cloud Scheduler + EXPORT DATA
# Create scheduled query
bq mk --transfer_config \
--target_dataset=dataset \
--display_name='Daily Export' \
--schedule='every 24 hours' \
--params='{"query":"EXPORT DATA OPTIONS(uri='\''gs://bucket/daily-*.csv'\'', format='\''CSV'\'') AS SELECT * FROM dataset.table WHERE date = CURRENT_DATE()"}' \
--data_source=scheduled_query
Using Cloud Composer (Airflow)
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
export_task = BigQueryInsertJobOperator(
task_id='export_to_gcs',
configuration={
'extract': {
'sourceTable': {
'projectId': 'project',
'datasetId': 'dataset',
'tableId': 'table'
},
'destinationUris': ['gs://bucket/export-*.csv'],
'destinationFormat': 'CSV'
}
}
)
Monitoring Exports
Check Extract Jobs
bq ls --jobs --max_results=10
Job Details
bq show -j JOB_ID
Failed Exports
SELECT
job_id,
user_email,
error_result.message as error_message,
creation_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
job_type = 'EXTRACT'
AND state = 'DONE'
AND error_result IS NOT NULL
AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
ORDER BY creation_time DESC;
Export Best Practices
Format Selection
CSV:
- ✅ Human-readable
- ✅ Universal compatibility
- ❌ Larger file size
- ❌ No schema preservation
JSON:
- ✅ Human-readable
- ✅ Preserves nested structures
- ❌ Larger file size
Avro:
- ✅ Preserves schema
- ✅ Efficient binary format
- ✅ Fast BigQuery re-import
- ❌ Not human-readable
Parquet:
- ✅ Columnar format
- ✅ Good compression
- ✅ Analytics tool compatible
- ❌ Not human-readable
Compression Recommendations
For long-term storage: GZIP (best compression) For processing pipelines: SNAPPY (fast) For network transfer: GZIP (smaller size) For speed: NONE (no compression overhead)
Wildcards for Large Exports
Always use wildcards for:
- Tables >500 MB
- Unknown data size
- Distributed processing
Example:
bq extract dataset.large_table 'gs://bucket/export-*.avro'
Cost Considerations
Export Costs
- BigQuery extract: FREE
- GCS storage: Standard GCS pricing
- Network egress: Free within same region
Optimization
Reduce costs:
- Export only needed columns (use EXPORT DATA with SELECT)
- Filter rows before export (WHERE clause)
- Use compression (smaller files)
- Export to GCS in same region as BigQuery
Example - filtered export:
EXPORT DATA OPTIONS(
uri='gs://bucket/*.parquet',
format='PARQUET'
) AS
SELECT customer_id, order_date, amount -- Only needed columns
FROM `project.dataset.orders`
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Last 30 days only
AND amount > 0; -- Filter out zero amounts
Common Patterns
Daily Export
#!/bin/bash
DATE=$(date +%Y%m%d)
bq extract \
--destination_format=CSV \
--compression=GZIP \
dataset.table\$$DATE \
gs://bucket/exports/daily_export_$DATE.csv.gz
Incremental Export
-- Create temp table with new data
CREATE TEMP TABLE new_data AS
SELECT * FROM `project.dataset.table`
WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
-- Export only new data
EXPORT DATA OPTIONS(
uri='gs://bucket/incremental/data-*.parquet',
format='PARQUET'
) AS
SELECT * FROM new_data;
Export with Transformation
EXPORT DATA OPTIONS(
uri='gs://bucket/transformed-*.csv',
format='CSV'
) AS
SELECT
customer_id,
UPPER(customer_name) as customer_name,
ROUND(amount, 2) as amount,
FORMAT_DATE('%Y-%m-%d', order_date) as order_date
FROM `project.dataset.orders`
WHERE order_date >= '2024-01-01';
Troubleshooting
"Permission denied"
Problem: No write access to GCS bucket Solution: Grant BigQuery service account Storage Object Creator role
"Table too large"
Problem: Export exceeds 1GB without wildcard
Solution: Use wildcard pattern gs://bucket/export-*.csv
"Invalid URI"
Problem: Incorrect GCS path format
Solution: Use gs://bucket/path/file format, not https://
"Quota exceeded"
Problem: Too many extract jobs Solution: Batch exports or increase quota
Quick Reference
Format recommendations:
- Re-import to BigQuery → Avro
- Analytics tools → Parquet
- Data exchange → CSV
- API consumption → JSON
Compression guide:
- Best ratio → GZIP
- Fastest → SNAPPY or NONE
- Balance → SNAPPY
Size limits:
- 1 GB per file (use wildcards)
- 10 TB per extract job
- 50,000 URIs per export
Syntax patterns:
# Single file
gs://bucket/file.csv
# Wildcard (recommended)
'gs://bucket/prefix-*.csv'
# Sharded with path
'gs://bucket/path/to/shard-*.parquet'