| name | azure-troubleshoot |
| description | Troubleshoot Azure using tool-first access, falling back to Azure CLI when necessary. Focus on Virtual Machines, AKS, Azure Container Registry, Storage Accounts, and Log Analytics. |
Azure Troubleshooting Skill
General Guidance
Always use tool-based queries first to fetch logs, metrics, and diagnostic data.
Only fall back to Azure CLI for deeper or unsupported inspection.
Investigations should:
- Use Log Analytics Kusto queries with proper scoping
- Use Activity Logs to identify failures
- Use metrics when diagnosing performance issues
- Provide minimal, targeted remediation advice
Core Services Covered
Virtual Machines
Common issues:
- Boot failures
- OS/disk failures
- NIC/IP misconfiguration
Investigations:
- Inspect boot diagnostics logs
- Query
Heartbeattable for VM status - Check Activity Logs for failed start/stop operations
AKSS
Common issues:
- Pod scheduling failures
- Node pressure
- Image pull errors (ACR auth)
- Container crashes
Investigations:
- Query
KubeEvents - Query
KubePodInventory - Inspect
ContainerLog
Azure Container Registry (ACR))
Common issues:
- Permission denied (RBAC)
- Token expiration
Investigations:
- Query Activity Logs for
push/writeorpull/readfailures - Check repository event logs
Storage Accountss
Common issues:
- Firewall-restricted access
- SAS token expiration
- Object not found
Investigations:
- Query
StorageBlobLogs - Validate configuration + permissions
Log Analytics
Best practices:
- Always filter by
_ResourceId - Narrow time range
- Query only the tables relevant to the service
Workflow
- Identify target service
- Query Log Analytics with scoped KQL
- Query Activity Logs
- Review metrics
- Interpret patterns
- Recommend targeted fixes