| name | cur-data |
| description | Knowledge about AWS Cost and Usage Report data structure, column formats, and analysis patterns |
AWS CUR Data Skill
CUR File Formats
The project supports three CUR file formats:
- CSV: Plain text, largest file size
- CSV.GZ: Gzip compressed CSV, smaller
- Parquet: Columnar format, fastest and smallest (recommended)
Column Name Variants
AWS CUR has two naming conventions. The data processor handles both:
| Canonical Name | Old Format | New Format |
|---|---|---|
| cost | lineItem/UnblendedCost |
line_item_unblended_cost |
| account_id | lineItem/UsageAccountId |
line_item_usage_account_id |
| service | product/ProductName |
product_product_name |
| date | lineItem/UsageStartDate |
line_item_usage_start_date |
| region | product/Region |
product_region |
| line_item_type | lineItem/LineItemType |
line_item_line_item_type |
Key Cost Columns
# Unblended cost - actual cost before discounts
line_item_unblended_cost
# Blended cost - averaged across organization
line_item_blended_cost
# Net cost - after discounts applied
line_item_net_unblended_cost
# Usage amount
line_item_usage_amount
Line Item Types
LINE_ITEM_TYPES = {
'Usage': 'Normal usage charges',
'Tax': 'Tax charges',
'Fee': 'AWS fees',
'Refund': 'Refunds/credits',
'Credit': 'Applied credits',
'RIFee': 'Reserved Instance fees',
'DiscountedUsage': 'RI/SP discounted usage',
'SavingsPlanCoveredUsage': 'Savings Plan usage',
'SavingsPlanNegation': 'SP cost adjustment',
'SavingsPlanUpfrontFee': 'SP upfront payment',
'SavingsPlanRecurringFee': 'SP monthly fee',
'BundledDiscount': 'Free tier/bundled',
'EdpDiscount': 'Enterprise discount',
}
Discount Analysis
To identify discounts and credits:
discount_types = ['Credit', 'Refund', 'EdpDiscount', 'BundledDiscount']
discounts = df[df['line_item_type'].isin(discount_types)]
Savings Plan Analysis
Key columns for savings plans:
savings_plan_columns = [
'savings_plan_savings_plan_arn',
'savings_plan_savings_plan_rate',
'savings_plan_used_commitment',
'savings_plan_total_commitment_to_date',
]
Common Aggregations
# Cost by service
df.groupby('service').agg({'cost': 'sum'}).sort_values('cost', ascending=False)
# Cost by account and service
df.groupby(['account_id', 'service']).agg({'cost': 'sum'})
# Daily trends
df.groupby(df['date'].dt.date).agg({'cost': 'sum'})
# Monthly summary
df.groupby(df['date'].dt.to_period('M')).agg({'cost': 'sum'})
Anomaly Detection
The project uses z-score based detection:
mean = daily_costs.mean()
std = daily_costs.std()
z_scores = (daily_costs - mean) / std
anomalies = daily_costs[abs(z_scores) > 2] # 2 std deviations
Mock Data Reference
Test fixtures provide 6 months of data:
- Production (111111111111): 87% of costs, steady growth
- Development (210987654321): 13% of costs, spiky (load testing)
- Services: EC2, RDS, S3, CloudFront, DynamoDB, Lambda
- Regions: us-east-1, us-west-2, eu-west-1, ap-northeast-1, etc.
- Total: ~$6.2M over 182 days