Claude Code Plugins

Community-maintained marketplace

Feedback

Query and analyze logs using Grafana Loki for the Kagenti platform, search for errors, and investigate issues

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name check-logs
description Query and analyze logs using Grafana Loki for the Kagenti platform, search for errors, and investigate issues

Check Logs Skill

This skill helps you query and analyze logs from the Kagenti platform using Loki via Grafana.

When to Use

  • User asks "show me logs for X"
  • Investigating errors or failures
  • After deployments to check for issues
  • Debugging pod crashes or restarts
  • Analyzing application behavior

What This Skill Does

  1. Query Logs: Search logs by namespace, pod, container, or log level
  2. Error Detection: Find errors and warnings in logs
  3. Log Aggregation: View logs across multiple pods
  4. Time-based Queries: Query logs for specific time ranges
  5. Log Patterns: Detect common issues from log patterns

Examples

Query Logs in Grafana UI

Access Grafana: https://grafana.localtest.me:9443 Navigate: Explore → Select Loki datasource

Log Dashboard: https://grafana.localtest.me:9443/d/loki-logs/loki-logs

Query Examples in Grafana Explore:

# All logs from observability namespace
{kubernetes_namespace_name="observability"}

# Logs from specific pod
{kubernetes_pod_name=~"prometheus.*"}

# Logs with errors
{kubernetes_namespace_name="observability"} |= "error"

# Logs from last 5 minutes with level=error
{kubernetes_namespace_name="observability"} | json | level="error"

# Count errors per namespace
sum by (kubernetes_namespace_name) (count_over_time({kubernetes_namespace_name=~".+"} |= "error" [5m]))

Query Logs via CLI (Promtail/Loki)

# Query Loki for recent errors in observability namespace
kubectl exec -n observability deployment/grafana -- \
  curl -s -G 'http://loki.observability.svc:3100/loki/api/v1/query_range' \
  --data-urlencode 'query={kubernetes_namespace_name="observability"} |= "error"' \
  --data-urlencode 'limit=100' \
  --data-urlencode 'start='$(date -u -v-5M +%s)000000000 \
  --data-urlencode 'end='$(date -u +%s)000000000 | python3 -m json.tool

Check Logs for Specific Pod

# Get logs for a specific pod using kubectl
kubectl logs -n observability deployment/prometheus --tail=100

# Get logs from previous container (if crashed)
kubectl logs -n observability pod/prometheus-xxx --previous

# Follow logs in real-time
kubectl logs -n observability deployment/grafana -f --tail=20

# Get logs from specific container in pod
kubectl logs -n observability pod/alertmanager-xxx -c alertmanager --tail=50

Search for Errors Across Platform

# Get recent error logs from all namespaces
for ns in observability keycloak oauth2-proxy istio-system kiali-system; do
  echo "=== Errors in $ns ==="
  kubectl logs -n $ns --all-containers=true --tail=50 2>&1 | grep -i "error\|fatal\|exception" | head -5
  echo
done

Check Logs for Failed Pods

# Find pods with issues and check their logs
kubectl get pods -A | grep -E "Error|CrashLoop|ImagePull" | while read ns pod rest; do
  echo "=== Logs for $pod in $ns ==="
  kubectl logs -n $ns $pod --tail=30 --previous 2>/dev/null || kubectl logs -n $ns $pod --tail=30
  echo
done

Query Log Volume by Namespace

# In Grafana Explore (Loki datasource)
sum by (kubernetes_namespace_name) (
  rate({kubernetes_namespace_name=~".+"}[5m])
)

Search for Specific Error Pattern

# Find connection errors
{kubernetes_namespace_name="observability"} |~ "connection (refused|timeout|reset)"

# Find authentication failures
{kubernetes_namespace_name=~"keycloak|oauth2-proxy"} |~ "auth.*fail|unauthorized|forbidden"

# Find OOM kills
{kubernetes_namespace_name=~".+"} |~ "OOM|out of memory|oom.*kill"

Log Levels and Filtering

Standard Log Levels

  • error: Critical errors requiring attention
  • warn/warning: Warnings that may indicate issues
  • info: Informational messages
  • debug: Detailed debugging information
  • trace: Very detailed trace information

Filter by Log Level

# Only errors
{kubernetes_namespace_name="observability"} | json | level="error"

# Errors and warnings
{kubernetes_namespace_name="observability"} | json | level=~"error|warn"

# Everything except debug
{kubernetes_namespace_name="observability"} | json | level!="debug"

Common Log Queries for Platform Components

Prometheus Logs

kubectl logs -n observability deployment/prometheus --tail=100

# Check for scrape errors
kubectl logs -n observability deployment/prometheus | grep -i "scrape\|error"

Grafana Logs

kubectl logs -n observability deployment/grafana --tail=100

# Check for datasource errors
kubectl logs -n observability deployment/grafana | grep -i "datasource\|error"

Keycloak Logs

kubectl logs -n keycloak statefulset/keycloak --tail=100

# Check for authentication errors
kubectl logs -n keycloak statefulset/keycloak | grep -i "auth\|login\|error"

Istio Proxy (Sidecar) Logs

# Check sidecar logs for a specific pod
POD=$(kubectl get pod -n observability -l app=alertmanager -o jsonpath='{.items[0].metadata.name}')
kubectl logs -n observability $POD -c istio-proxy --tail=50

AlertManager Logs

kubectl logs -n observability deployment/alertmanager -c alertmanager --tail=100

# Check for notification errors
kubectl logs -n observability deployment/alertmanager -c alertmanager | grep -i "notif\|error\|fail"

Log Analysis Patterns

Detect Crash Loops

# Find pods restarting frequently
kubectl get pods -A | awk '{if ($4 > 5) print $0}'

# Check logs before crash
kubectl logs -n <namespace> <pod-name> --previous | tail -50

Find HTTP Errors

{kubernetes_namespace_name=~".+"} |~ "HTTP.*[45]\\d{2}"

Find Timeout Errors

{kubernetes_namespace_name=~".+"} |~ "timeout|timed out|deadline exceeded"

Find Database Connection Issues

{kubernetes_namespace_name=~".+"} |~ "database.*error|connection.*refused|SQL.*error"

Troubleshooting with Logs

Issue: Service Not Starting

  1. Check pod events: kubectl describe pod <pod-name> -n <namespace>
  2. Check container logs: kubectl logs <pod-name> -n <namespace>
  3. Check init container logs: kubectl logs <pod-name> -n <namespace> -c <init-container>

Issue: High Error Rate

  1. Query error logs: {kubernetes_namespace_name="X"} |= "error" [5m]
  2. Group by component: sum by (kubernetes_pod_name) (count_over_time({...} |= "error" [5m]))
  3. Identify pattern in error messages

Issue: Performance Degradation

  1. Check for warnings: {kubernetes_namespace_name="X"} |= "warn"
  2. Look for timeout messages
  3. Check for resource exhaustion messages

Grafana Loki Dashboard Features

Loki Logs Dashboard: https://grafana.localtest.me:9443/d/loki-logs/loki-logs

Features:

  • Namespace filter: Select specific namespace
  • Pod filter: Filter by pod name
  • Log level: Filter by error/warn/info/debug
  • Time range: Select time window
  • Log volume graphs: See log rate over time
  • Log table: Browse actual log lines

Panels:

  1. Log Volume by Level: See errors vs warnings over time
  2. Log Volume by Namespace: Compare activity across namespaces
  3. Logs per Second: Current log ingestion rate
  4. Log Lines: Actual log content with search

Related Documentation

Pro Tips

  1. Use time ranges: Always specify time range to limit data
  2. Filter early: Add namespace/pod filters before log level filters (more efficient)
  3. Use regex carefully: Complex regex can be slow on large log volumes
  4. Check both current and previous: For crashed pods, use --previous
  5. Tail first: Use --tail=N to limit output, then increase if needed

🤖 Generated with Claude Code