Claude Code Plugins

Community-maintained marketplace

Feedback

Prow Job Analyze Resource

@openshift-eng/ai-helpers
8
0

Analyze Kubernetes resource lifecycle in Prow CI job artifacts by parsing audit logs and pod logs from GCS, generating interactive HTML reports with timelines

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name Prow Job Analyze Resource
description Analyze Kubernetes resource lifecycle in Prow CI job artifacts by parsing audit logs and pod logs from GCS, generating interactive HTML reports with timelines

Prow Job Analyze Resource

This skill analyzes the lifecycle of Kubernetes resources during Prow CI job execution by downloading and parsing artifacts from Google Cloud Storage.

When to Use This Skill

Use this skill when the user wants to:

  • Debug Prow CI test failures by tracking resource state changes
  • Understand when and how a Kubernetes resource was created, modified, or deleted during a test
  • Analyze resource lifecycle across audit logs and pod logs from ephemeral test clusters
  • Generate interactive HTML reports showing resource events over time
  • Search for specific resources (pods, deployments, configmaps, etc.) in Prow job artifacts

Prerequisites

Before starting, verify these prerequisites:

  1. gcloud CLI Installation

  2. gcloud Authentication (Optional)

    • The test-platform-results bucket is publicly accessible
    • No authentication is required for read access
    • Skip authentication checks

Input Format

The user will provide:

  1. Prow job URL - gcsweb URL containing test-platform-results/

    • Example: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/
    • URL may or may not have trailing slash
  2. Resource specifications - Comma-delimited list in format [namespace:][kind/]name

    • Supports regex patterns for matching multiple resources
    • Examples:
      • pod/etcd-0 - pod named etcd-0 in any namespace
      • openshift-etcd:pod/etcd-0 - pod in specific namespace
      • etcd-0 - any resource named etcd-0 (no kind filter)
      • pod/etcd-0,configmap/cluster-config - multiple resources
      • resource-name-1|resource-name-2 - multiple resources using regex OR
      • e2e-test-project-api-.* - all resources matching the pattern

Implementation Steps

Step 1: Parse and Validate URL

  1. Extract bucket path

    • Find test-platform-results/ in URL
    • Extract everything after it as the GCS bucket relative path
    • If not found, error: "URL must contain 'test-platform-results/'"
  2. Extract build_id

    • Search for pattern /(\d{10,})/ in the bucket path
    • build_id must be at least 10 consecutive decimal digits
    • Handle URLs with or without trailing slash
    • If not found, error: "Could not find build ID (10+ digits) in URL"
  3. Extract prowjob name

    • Find the path segment immediately preceding build_id
    • Example: In .../pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/
    • Prowjob name: pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn
  4. Construct GCS paths

    • Bucket: test-platform-results
    • Base GCS path: gs://test-platform-results/{bucket-path}/
    • Ensure path ends with /

Step 2: Parse Resource Specifications

For each comma-delimited resource spec:

  1. Parse format [namespace:][kind/]name

    • Split on : to get namespace (optional)
    • Split remaining on / to get kind (optional) and name (required)
    • Store as structured data: {namespace, kind, name}
  2. Validate

    • name is required
    • namespace and kind are optional
    • Examples:
      • pod/etcd-0{kind: "pod", name: "etcd-0"}
      • openshift-etcd:pod/etcd-0{namespace: "openshift-etcd", kind: "pod", name: "etcd-0"}
      • etcd-0{name: "etcd-0"}

Step 3: Create Working Directory

  1. Check for existing artifacts first

    • Check if .work/prow-job-analyze-resource/{build_id}/logs/ directory exists and has content
    • If it exists with content:
      • Use AskUserQuestion tool to ask:
        • Question: "Artifacts already exist for build {build_id}. Would you like to use the existing download or re-download?"
        • Options:
          • "Use existing" - Skip to artifact parsing step (Step 6)
          • "Re-download" - Continue to clean and re-download
      • If user chooses "Re-download":
        • Remove all existing content: rm -rf .work/prow-job-analyze-resource/{build_id}/logs/
        • Also remove tmp directory: rm -rf .work/prow-job-analyze-resource/{build_id}/tmp/
        • This ensures clean state before downloading new content
      • If user chooses "Use existing":
        • Skip directly to Step 6 (Parse Audit Logs)
        • Still need to download prowjob.json if it doesn't exist
  2. Create directory structure

    mkdir -p .work/prow-job-analyze-resource/{build_id}/logs
    mkdir -p .work/prow-job-analyze-resource/{build_id}/tmp
    
    • Use .work/prow-job-analyze-resource/ as the base directory (already in .gitignore)
    • Use build_id as subdirectory name
    • Create logs/ subdirectory for all downloads
    • Create tmp/ subdirectory for temporary files (intermediate JSON, etc.)
    • Working directory: .work/prow-job-analyze-resource/{build_id}/

Step 4: Download and Validate prowjob.json

  1. Download prowjob.json

    gcloud storage cp gs://test-platform-results/{bucket-path}/prowjob.json .work/prow-job-analyze-resource/{build_id}/logs/prowjob.json --no-user-output-enabled
    
  2. Parse and validate

    • Read .work/prow-job-analyze-resource/{build_id}/logs/prowjob.json
    • Search for pattern: --target=([a-zA-Z0-9-]+)
    • If not found:
      • Display: "This is not a ci-operator job. The prowjob cannot be analyzed by this skill."
      • Explain: ci-operator jobs have a --target argument specifying the test target
      • Exit skill
  3. Extract target name

    • Capture the target value (e.g., e2e-aws-ovn)
    • Store for constructing gather-extra path

Step 5: Download Audit Logs and Pod Logs

  1. Construct gather-extra paths

    • GCS path: gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/
    • Local path: .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/
  2. Download audit logs

    mkdir -p .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs
    gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/artifacts/audit_logs/ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs/ --no-user-output-enabled
    
    • Create directory first to avoid gcloud errors
    • Use --no-user-output-enabled to suppress progress output
    • If directory not found, warn: "No audit logs found. Job may not have completed or audit logging may be disabled."
  3. Download pod logs

    mkdir -p .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods
    gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/artifacts/pods/ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods/ --no-user-output-enabled
    
    • Create directory first to avoid gcloud errors
    • Use --no-user-output-enabled to suppress progress output
    • If directory not found, warn: "No pod logs found."

Step 6: Parse Audit Logs and Pod Logs

IMPORTANT: Use the provided Python script parse_all_logs.py from the skill directory to parse both audit logs and pod logs efficiently.

Usage:

python3 plugins/prow-job/skills/prow-job-analyze-resource/parse_all_logs.py <resource_pattern> \
  .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs \
  .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods \
  > .work/prow-job-analyze-resource/{build_id}/tmp/all_entries.json

Resource Pattern Parameter:

  • The <resource_pattern> parameter supports regex patterns
  • Use | (pipe) to search for multiple resources: resource1|resource2|resource3
  • Use .* for wildcards: e2e-test-project-.*
  • Simple substring matching still works: my-namespace
  • Examples:
    • Single resource: e2e-test-project-api-pkjxf
    • Multiple resources: e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx
    • Pattern matching: e2e-test-project-api-.*

Note: The script outputs status messages to stderr which will display as progress. The JSON output to stdout is clean and ready to use.

What the script does:

  1. Find all log files

    • Audit logs: .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs/**/*.log
    • Pod logs: .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods/**/*.log
  2. Parse audit log files (JSONL format)

    • Read file line by line
    • Each line is a JSON object (JSONL format)
    • Parse JSON into object e
  3. Extract fields from each audit log entry

    • e.verb - action (get, list, create, update, patch, delete, watch)
    • e.user.username - user making request
    • e.responseStatus.code - HTTP response code (integer)
    • e.objectRef.namespace - namespace (if namespaced)
    • e.objectRef.resource - lowercase plural kind (e.g., "pods", "configmaps")
    • e.objectRef.name - resource name
    • e.requestReceivedTimestamp - ISO 8601 timestamp
  4. Filter matches for each resource spec

    • Uses regex matching on e.objectRef.namespace and e.objectRef.name
    • Pattern matches if found in either namespace or name field
    • Supports all regex features:
      • Pipe operator: resource1|resource2 matches either resource
      • Wildcards: e2e-test-.* matches all resources starting with e2e-test-
      • Character classes: [abc] matches a, b, or c
    • Simple substring matching still works for patterns without regex special chars
    • Performance optimization: plain strings use fast substring search
  5. For each audit log match, capture

    • Source: "audit"
    • Filename: Full path to .log file
    • Line number: Line number in file (1-indexed)
    • Level: Based on e.responseStatus.code
      • 200-299: "info"
      • 400-499: "warn"
      • 500-599: "error"
    • Timestamp: Parse e.requestReceivedTimestamp to datetime
    • Content: Full JSON line (for expandable details)
    • Summary: Generate formatted summary
      • Format: {verb} {resource}/{name} in {namespace} by {username} → HTTP {code}
      • Example: create pod/etcd-0 in openshift-etcd by system:serviceaccount:kube-system:deployment-controller → HTTP 201
  6. Parse pod log files (plain text format)

    • Read file line by line
    • Each line is plain text (not JSON)
    • Search for resource pattern in line content
  7. For each pod log match, capture

    • Source: "pod"
    • Filename: Full path to .log file
    • Line number: Line number in file (1-indexed)
    • Level: Detect from glog format or default to "info"
      • Glog format: E0910 11:43:41.153414 ... (E=error, W=warn, I=info, F=fatal→error)
      • Non-glog format: default to "info"
    • Timestamp: Extract from start of line if present (format: YYYY-MM-DDTHH:MM:SS.mmmmmmZ)
    • Content: Full log line
    • Summary: First 200 characters of line (after timestamp if present)
  8. Combine and sort all entries

    • Merge audit log entries and pod log entries
    • Sort all entries chronologically by timestamp
    • Entries without timestamps are placed at the end

Step 7: Generate HTML Report

IMPORTANT: Use the provided Python script generate_html_report.py from the skill directory.

Usage:

python3 plugins/prow-job/skills/prow-job-analyze-resource/generate_html_report.py \
  .work/prow-job-analyze-resource/{build_id}/tmp/all_entries.json \
  "{prowjob_name}" \
  "{build_id}" \
  "{target}" \
  "{resource_pattern}" \
  "{gcsweb_url}"

Resource Pattern Parameter:

  • The {resource_pattern} should be the same pattern used in the parse script
  • For single resources: e2e-test-project-api-pkjxf
  • For multiple resources: e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx
  • The script will parse the pattern to display the searched resources in the HTML header

Output: The script generates .work/prow-job-analyze-resource/{build_id}/{first_resource_name}.html

What the script does:

  1. Determine report filename

    • Format: .work/prow-job-analyze-resource/{build_id}/{resource_name}.html
    • Uses the primary resource name for the filename
  2. Sort all entries by timestamp

    • Loads audit log entries from JSON
    • Sort chronologically (ascending)
    • Entries without timestamps go at the end
  3. Calculate timeline bounds

    • min_time: Earliest timestamp found
    • max_time: Latest timestamp found
    • Time range: max_time - min_time
  4. Generate HTML structure

    Header Section:

    <div class="header">
      <h1>Prow Job Resource Lifecycle Analysis</h1>
      <div class="metadata">
        <p><strong>Prow Job:</strong> {prowjob-name}</p>
        <p><strong>Build ID:</strong> {build_id}</p>
        <p><strong>gcsweb URL:</strong> <a href="{original-url}">{original-url}</a></p>
        <p><strong>Target:</strong> {target}</p>
        <p><strong>Resources:</strong> {resource-list}</p>
        <p><strong>Total Entries:</strong> {count}</p>
        <p><strong>Time Range:</strong> {min_time} to {max_time}</p>
      </div>
    </div>
    

    Interactive Timeline:

    <div class="timeline-container">
      <svg id="timeline" width="100%" height="100">
        <!-- For each entry, render colored vertical line -->
        <line x1="{position}%" y1="0" x2="{position}%" y2="100"
              stroke="{color}" stroke-width="2"
              class="timeline-event" data-entry-id="{entry-id}"
              title="{summary}">
        </line>
      </svg>
    </div>
    
    • Position: Calculate percentage based on timestamp between min_time and max_time
    • Color: white/lightgray (info), yellow (warn), red (error)
    • Clickable: Jump to corresponding entry
    • Tooltip on hover: Show summary

    Log Entries Section:

    <div class="entries">
      <div class="filters">
        <!-- Filter controls: by level, by resource, by time range -->
      </div>
    
      <div class="entry" id="entry-{index}">
        <div class="entry-header">
          <span class="timestamp">{formatted-timestamp}</span>
          <span class="level badge-{level}">{level}</span>
          <span class="source">{filename}:{line-number}</span>
        </div>
        <div class="entry-summary">{summary}</div>
        <details class="entry-details">
          <summary>Show full content</summary>
          <pre><code>{content}</code></pre>
        </details>
      </div>
    </div>
    

    CSS Styling:

    • Modern, clean design with good contrast
    • Responsive layout
    • Badge colors: info=gray, warn=yellow, error=red
    • Monospace font for log content
    • Syntax highlighting for JSON (in audit logs)

    JavaScript Interactivity:

    // Timeline click handler
    document.querySelectorAll('.timeline-event').forEach(el => {
      el.addEventListener('click', () => {
        const entryId = el.dataset.entryId;
        document.getElementById(entryId).scrollIntoView({behavior: 'smooth'});
      });
    });
    
    // Filter controls
    // Expand/collapse details
    // Search within entries
    
  5. Write HTML to file

    • Script automatically writes to .work/prow-job-analyze-resource/{build_id}/{resource_name}.html
    • Includes proper HTML5 structure
    • All CSS and JavaScript are inline for portability

Step 8: Present Results to User

  1. Display summary

    Resource Lifecycle Analysis Complete
    
    Prow Job: {prowjob-name}
    Build ID: {build_id}
    Target: {target}
    
    Resources Analyzed:
    - {resource-spec-1}
    - {resource-spec-2}
    ...
    
    Artifacts downloaded to: .work/prow-job-analyze-resource/{build_id}/logs/
    
    Results:
    - Audit log entries: {audit-count}
    - Pod log entries: {pod-count}
    - Total entries: {total-count}
    - Time range: {min_time} to {max_time}
    
    Report generated: .work/prow-job-analyze-resource/{build_id}/{resource_name}.html
    
    Open in browser to view interactive timeline and detailed entries.
    
  2. Open report in browser

    • Detect platform and automatically open the HTML report in the default browser
    • Linux: xdg-open .work/prow-job-analyze-resource/{build_id}/{resource_name}.html
    • macOS: open .work/prow-job-analyze-resource/{build_id}/{resource_name}.html
    • Windows: start .work/prow-job-analyze-resource/{build_id}/{resource_name}.html
    • On Linux (most common for this environment), use xdg-open
  3. Offer next steps

    • Ask if user wants to search for additional resources in the same job
    • Ask if user wants to analyze a different Prow job
    • Explain that artifacts are cached in .work/prow-job-analyze-resource/{build_id}/ for faster subsequent searches

Error Handling

Handle these error scenarios gracefully:

  1. Invalid URL format

    • Error: "URL must contain 'test-platform-results/' substring"
    • Provide example of valid URL
  2. Build ID not found

    • Error: "Could not find build ID (10+ decimal digits) in URL path"
    • Explain requirement and show URL parsing
  3. gcloud not installed

  4. gcloud not authenticated

    • Detect with: gcloud auth list
    • Instruct: "Please run: gcloud auth login"
  5. No access to bucket

    • Error from gcloud storage commands
    • Explain: "You need read access to the test-platform-results GCS bucket"
    • Suggest checking project access
  6. prowjob.json not found

    • Suggest verifying URL and checking if job completed
    • Provide gcsweb URL for manual verification
  7. Not a ci-operator job

    • Error: "This is not a ci-operator job. No --target found in prowjob.json."
    • Explain: Only ci-operator jobs can be analyzed by this skill
  8. gather-extra not found

    • Warn: "gather-extra directory not found for target {target}"
    • Suggest: Job may not have completed or target name is incorrect
  9. No matches found

    • Display: "No log entries found matching the specified resources"
    • Suggest:
      • Check resource names for typos
      • Try searching without kind or namespace filters
      • Verify resources existed during this job execution
  10. Timestamp parsing failures

    • Warn about unparseable timestamps
    • Fall back to line order for sorting
    • Still include entries in report

Performance Considerations

  1. Avoid re-downloading

    • Check if .work/prow-job-analyze-resource/{build_id}/logs/ already has content
    • Ask user before re-downloading
  2. Efficient downloads

    • Use gcloud storage cp -r for recursive downloads
    • Use --no-user-output-enabled to suppress verbose output
    • Create target directories with mkdir -p before downloading to avoid gcloud errors
  3. Memory efficiency

    • The parse_all_logs.py script processes log files incrementally (line by line)
    • Don't load entire files into memory
    • Script outputs to JSON for efficient HTML generation
  4. Content length limits

    • The HTML generator trims JSON content to ~2000 chars in display
    • Full content is available in expandable details sections
  5. Progress indicators

    • Show "Downloading audit logs..." before gcloud commands
    • Show "Parsing audit logs..." before running parse script
    • Show "Generating HTML report..." before running report generator

Examples

Example 1: Search for a namespace/project

User: "Analyze e2e-test-project-api-p28m in this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-techpreview/1964725888612306944"

Output:
- Downloads artifacts to: .work/prow-job-analyze-resource/1964725888612306944/logs/
- Finds actual resource name: e2e-test-project-api-p28mx (namespace)
- Parses 382 audit log entries
- Finds 86 pod log mentions
- Creates: .work/prow-job-analyze-resource/1964725888612306944/e2e-test-project-api-p28mx.html
- Shows timeline from creation (18:11:02) to deletion (18:17:32)

Example 2: Search for a pod

User: "Analyze pod/etcd-0 in this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/"

Output:
- Creates: .work/prow-job-analyze-resource/1978913325970362368/etcd-0.html
- Shows timeline of all pod/etcd-0 events across namespaces

Example 3: Search by name only

User: "Find all resources named cluster-version-operator in job {url}"

Output:
- Searches without kind filter
- Finds deployments, pods, services, etc. all named cluster-version-operator
- Creates: .work/prow-job-analyze-resource/{build_id}/cluster-version-operator.html

Example 4: Search for multiple resources using regex

User: "Analyze e2e-test-project-api-pkjxf and e2e-test-project-api-7zdxx in job {url}"

Output:
- Uses regex pattern: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
- Finds all events for both namespaces in a single pass
- Parses 1,047 total entries (501 for first namespace, 546 for second)
- Passes the same pattern to generate_html_report.py
- HTML displays: "Resources: e2e-test-project-api-7zdxx, e2e-test-project-api-pkjxf"
- Creates: .work/prow-job-analyze-resource/{build_id}/e2e-test-project-api-pkjxf.html
- Timeline shows interleaved events from both namespaces chronologically

Tips

  • Always verify gcloud prerequisites before starting (gcloud CLI must be installed)
  • Authentication is NOT required - the bucket is publicly accessible
  • Use .work/prow-job-analyze-resource/{build_id}/ directory structure for organization
  • All work files are in .work/ which is already in .gitignore
  • The Python scripts handle all parsing and HTML generation - use them!
  • Cache artifacts in .work/prow-job-analyze-resource/{build_id}/ to speed up subsequent searches
  • The parse script supports regex patterns for flexible matching:
    • Use resource1|resource2 to search for multiple resources in a single pass
    • Use .* wildcards to match resource name patterns
    • Simple substring matching still works for basic searches
  • The resource name provided by the user may not exactly match the actual resource name in logs
    • Example: User asks for e2e-test-project-api-p28m but actual resource is e2e-test-project-api-p28mx
    • Use regex patterns like e2e-test-project-api-p28m.* to find partial matches
  • For namespaces/projects, search for the resource name - it will match both namespace and project resources
  • Provide helpful error messages with actionable solutions

Important Notes

  1. Resource Name Matching:

    • The parse script uses regex pattern matching for maximum flexibility
    • Supports pipe operator (|) to search for multiple resources: resource1|resource2
    • Supports wildcards (.*) for pattern matching: e2e-test-.*
    • Simple substrings still work for basic searches
    • May match multiple related resources (e.g., namespace, project, rolebindings in that namespace)
    • Report all matches - this provides complete lifecycle context
  2. Namespace vs Project:

    • In OpenShift, a project is essentially a namespace with additional metadata
    • Searching for a namespace will find both namespace and project resources
    • The audit logs contain events for both resource types
  3. Target Extraction:

    • Must extract the --target argument from prowjob.json
    • This is critical for finding the correct gather-extra path
    • Non-ci-operator jobs cannot be analyzed (they don't have --target)
  4. Working with Scripts:

    • All scripts are in plugins/prow-job/skills/prow-job-analyze-resource/
    • parse_all_logs.py - Parses audit logs and pod logs, outputs JSON
      • Detects glog severity levels (E=error, W=warn, I=info, F=fatal)
      • Supports regex patterns for resource matching
    • generate_html_report.py - Generates interactive HTML report from JSON
    • Scripts output status messages to stderr for progress display. JSON output to stdout is clean.
  5. Pod Log Glog Format Support:

    • The parser automatically detects and parses glog format logs
    • Glog format: E0910 11:43:41.153414 ...
      • E = severity (E/F → error, W → warn, I → info)
      • 0910 = month/day (MMDD)
      • 11:43:41.153414 = time with microseconds
    • Timestamp parsing: Extracts timestamp and infers year (2025)
    • Severity mapping allows filtering by level in HTML report
    • Non-glog logs default to info level