| name | carbon.data.qa |
| description | Answer analytical questions about carbon accounting data using internal datasets, APIs, and emission factor calculations. |
carbon.data.qa
Purpose
This skill enables Claude to answer factual, analytical questions about carbon accounting data by querying Carbon ACX's internal datasets (CSV files in data/ directory), derived artifacts, and the local API when running. It encodes domain knowledge about:
- Carbon accounting terminology and units (tCO2e, kWh, pkm, etc.)
- Emission factor structures and relationships
- Activity-to-emissions calculations
- Temporal data queries (Q1 2024, monthly totals, etc.)
- Layer, sector, and profile hierarchies
When to Use
Trigger Patterns:
- User asks about emissions data: "What were total CO2 emissions for Q1 2024?"
- Queries about specific activities: "What's the emission factor for streaming video?"
- Comparative questions: "Compare emissions from cloud storage vs local storage"
- Data exploration: "Show me all activities in the professional services layer"
- Unit conversions: "Convert 500 kWh to tCO2e"
- Source/provenance queries: "Where does the video streaming data come from?"
Do NOT Use When:
- User wants to generate reports (use
carbon.report.geninstead) - User wants to write code (use
acx.code.assistantinstead) - Questions about repo structure or development setup
- Non-carbon-accounting questions
Allowed Tools
read_file- Read CSV data files, JSON artifacts, schemaspython- Process data, perform calculations, query APIsgrep- Search for specific activities or emission factorsbash- Run simple data queries via command line (read-only)
Access Level: 1 (Local Execution - read-only, no file writes, no external network)
Tool Rationale:
read_file: Required to access canonical CSV data indata/directorypython: Needed for parsing CSVs, JSON artifacts, performing unit conversions and emission calculationsgrep: Efficient searching through data files for specific patternsbash: Helpful for quick file inspection and data exploration
Explicitly Denied:
write_file,edit_file- This is a read-only analytical skillweb_fetchwith external URLs - Only internal localhost API endpoints allowed
Expected I/O
Input:
- Type: Natural language question (string)
- Format: Free-form query about carbon data
- Constraints: Must relate to carbon accounting, emissions, or activities in the dataset
- Examples:
- "What is the emission factor for coffee?"
- "Total emissions from video streaming in 2024"
- "List all military operations activities"
- "What units are used for grid intensity?"
Output:
- Type: Structured answer with data, units, and citations
- Format: Markdown with tables, bullet lists, and inline values
- Requirements:
- MUST include units (tCO2e, kWh, etc.) with all numeric answers
- MUST cite data sources - reference
source_idfromdata/sources.csv - MUST include timestamp - data vintage or "as of" date
- Handle ambiguity by asking clarifying questions
- Example:
**Emission Factor for HD Video Streaming:** - Activity: `MEDIA.STREAM.HD.HOUR` (HD video streaming per hour) - Emission Factor: 0.055 kgCO2e/hour - Unit: kgCO2e per hour of streaming - Source: [SOURCE_ID_123] - "Streaming Energy Report 2023" - Vintage: 2023 - Notes: Includes device playback + network delivery
Validation:
- Every numeric value has explicit units
- Sources are referenced by
source_id - "Unknown" or "Data not available" for missing data (never guess)
- Calculations show methodology
Dependencies
Required:
- Access to Carbon ACX data directory (
data/) - Python 3.11+ with pandas, PyYAML
- Understanding of data schema (see
reference/data_schema.md) - Carbon accounting units glossary (see
reference/units_glossary.md)
Data Files:
data/activities.csv- Activity catalogdata/emission_factors.csv- Emission factorsdata/layers.csv- Layer definitionsdata/sectors.csv- Sector taxonomydata/units.csv- Unit definitions and conversionsdata/sources.csv- Data provenancedata/profiles.csv- Activity profilescalc/outputs/- Derived artifacts (if available)
Optional:
- Local API at
http://localhost:8787/api(when Worker is running) - Derived JSON manifests in
dist/artifacts/
Examples
Example 1: Basic Emission Factor Query
User: "What's the emission factor for a cup of coffee?"
Claude Process:
- Recognize this is a carbon data query → invoke
carbon.data.qaskill - Search
data/activities.csvfor coffee-related activities - Find matching emission factor in
data/emission_factors.csv - Look up source details in
data/sources.csv - Format answer with units and citation
Output:
**Coffee Emission Factor:**
- Activity: `FOOD.COFFEE.CUP.HOT` (12 oz hot coffee cup)
- Emission Factor: 0.021 kgCO2e/cup
- Source: [SRC_FOOD_2023] Coffee supply chain LCA
- Vintage: 2023
- Scope: Includes bean production, roasting, brewing, waste
Example 2: Aggregation Query
User: "Show me total emissions from all social media activities"
Claude Process:
- Query
data/activities.csvfor activities withcategory=social - Load corresponding emission factors
- Aggregate by activity type
- Present as table with sources
Output:
**Social Media Activity Emissions (per hour):**
| Activity | Emission Factor | Source |
|----------|----------------|--------|
| YouTube streaming | 0.046 kgCO2e/hr | SRC_MEDIA_2024 |
| Facebook browsing | 0.038 kgCO2e/hr | SRC_SOCIAL_2024 |
| Instagram scrolling | 0.041 kgCO2e/hr | SRC_SOCIAL_2024 |
| TikTok watching | 0.043 kgCO2e/hr | SRC_SOCIAL_2024 |
*All values include device use + network transfer*
*Data vintage: 2024*
Example 3: Ambiguous Query Handling
User: "What are the emissions for streaming?"
Claude Process:
- Recognize ambiguity - multiple streaming types exist
- Ask clarifying question before searching
Output:
I found multiple streaming activities in the dataset:
1. **Video streaming:**
- HD video (1080p)
- UHD/4K video
- SD video (mobile)
2. **Audio streaming:**
- Standard quality music
3. **Platform-specific:**
- YouTube
- Social media video (TikTok, Instagram, etc.)
Which type of streaming would you like emission data for?
Example 4: Unit Conversion Query
User: "How much CO2 does 500 kWh of Ontario grid electricity produce?"
Claude Process:
- Look up Ontario grid emission factor in
data/emission_factors.csv - Find activity
ENERGY.CA-ON.GRID.KWH - Perform calculation: 500 kWh × emission factor
- Show calculation with units
Output:
**Ontario Grid Electricity Emissions:**
- Activity: `ENERGY.CA-ON.GRID.KWH`
- Consumption: 500 kWh
- Emission Factor: 0.025 kgCO2e/kWh (Ontario IESO grid average)
- **Total Emissions: 12.5 kgCO2e** (0.0125 tCO2e)
*Calculation: 500 kWh × 0.025 kgCO2e/kWh = 12.5 kgCO2e*
*Source: [SRC_GRID_ON_2024]*
*Vintage: 2024 annual average*
Example 5: Edge Case - Missing Data
User: "What's the emission factor for blockchain mining?"
Claude Process:
- Search
data/activities.csvfor blockchain/crypto activities - No matches found
- Return graceful "not found" response
Output:
**Data Not Available**
I could not find emission factors for blockchain or cryptocurrency mining in the current Carbon ACX dataset (v1.2).
The dataset currently focuses on:
- Professional services and food
- Digital infrastructure (streaming, cloud, AI)
- Industrial operations
- Transportation and logistics
- Military and defense operations
- Earth system feedbacks
You might be interested in related activities:
- Cloud server operations (`ONLINE.DC.CLOUD.SERVER.HOUR`)
- Data center rack usage (`ONLINE.DC.COLOCATION.RACK.MONTH`)
Would you like information on any of these instead?
Limitations
Known Edge Cases:
- Cannot answer questions requiring data not in the CSV files
- Temporal queries limited to vintage years present in dataset
- Cannot perform predictive modeling or forecasting
- Regional data limited to what's explicitly coded (e.g., Ontario grid)
- Some activities have emission factors marked as "to be added"
Performance Constraints:
- Large aggregations across all activities may take 5-10 seconds
- Complex cross-layer queries require multiple file reads
- Derived artifacts may not always be up-to-date with source CSVs
Security Boundaries:
- Read-only access to data files
- No external API calls (except localhost Worker API)
- Cannot modify source data
- Cannot access files outside
data/orcalc/outputs/directories
Scope Limitations:
- Answers based solely on Carbon ACX dataset - no external knowledge
- Does not perform lifecycle assessments beyond what's in emission factors
- Does not provide regulatory compliance advice
- Does not make emission reduction recommendations (analytical only)
Validation Criteria
Success Metrics:
- ✅ All numeric answers include explicit units (kgCO2e, tCO2e, etc.)
- ✅ Every emission factor cites
source_idor notes if source missing - ✅ Data vintage/timestamp included in responses
- ✅ Ambiguous queries prompt for clarification before answering
- ✅ Missing data returns graceful "not found" rather than guessing
- ✅ Calculations show methodology (formula with units)
- ✅ Responses match data files exactly (no hallucination)
Failure Modes:
- ❌ Returns emission values without units → REJECT
- ❌ Makes up data not in CSV files → REJECT
- ❌ Provides answers without source attribution → WARN
- ❌ Performs calculations with wrong units → REJECT
- ❌ Answers ambiguous questions without clarification → WARN
Recovery:
- If uncertain about data interpretation: Ask user for clarification
- If data missing: Explicitly state "Data not available" and suggest alternatives
- If calculation complex: Show step-by-step methodology
- If source missing: Note "Source not specified in dataset"
Related Skills
Dependencies:
- None - this is a foundational skill
Composes With:
carbon.report.gen- Use this skill to gather data, then generate reportsacx.code.assistant- This skill informs what data structures exist for code generation
Alternative Skills:
- For report generation:
carbon.report.gen - For code generation:
acx.code.assistant - For schema validation:
schema.linter
Maintenance
Owner: ACX Team Review Cycle: Monthly (align with dataset releases) Last Updated: 2025-10-18 Version: 1.0.0
Maintenance Notes:
- Update when new CSV files added to
data/ - Review when emission factor schema changes
- Validate examples against current dataset version
- Keep
reference/data_schema.mdsynchronized with actual schema