| name | mimir-prometheus-troubleshoot |
| description | Help craft efficient Mimir/Prometheus queries, troubleshoot metric issues, avoid high-cardinality problems, and recommend best practices for aggregation, recording rules, and performance. |
Mimir + Prometheus Troubleshooting & Query-Builder Skill
What this Skill does
Use this skill whenever a user needs help with:
- PromQL queries
- Metric debugging
- Missing data / gaps
- Cardinality optimization
- Aggregation strategy
- Recording rules
Best Practices
Low-cardinality label selection
Use labels such as:
job,instance,service,cluster,namespace,env
Avoid:
user_id,session_id,request_id, raw UUIDs
Always narrow time ranges
Prefer "5m", "15m", "1h".
Use correct aggregations
rate()for counterssum by (...)for groupinghistogram_quantile()for latency
Suggest recording rules if query is heavy
Example Queries
| User Request | PromQL |
|---|---|
| "Error rate for payments in prod" | sum by (job) (rate(http_requests_total{job="payments", env="prod", status=~"5.."}[5m])) |
| "Latency p95 for frontend" | histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket{app="frontend"}[5m]))) |
When to Suggest Loki or Tempo
For:
- request IDs
- root-cause event-level debugging
- full request paths
→ Recommend Tempo + Loki correlations.
Limitations
- Skill does not run PromQL