Claude Code Plugins

Community-maintained marketplace

Feedback

Coordinate changes across project-beta repositories when updating runner configurations. Ensures workflow labels match runner scale set names. Use when changing runnerScaleSetName or deploying new runner pools.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name cross-repo-coordination
description Coordinate changes across project-beta repositories when updating runner configurations. Ensures workflow labels match runner scale set names. Use when changing runnerScaleSetName or deploying new runner pools.
allowed-tools Bash, Read, Grep, Glob

Cross-Repository Workflow Coordination Skill

Overview

GitHub Actions workflows in the project-beta ecosystem use self-hosted runners. When runner configurations change, ALL repositories using those runners need coordinated updates.

Architecture

matchpoint-github-runners-helm
├── Defines runnerScaleSetName: "arc-beta-runners"
└── ArgoCD deploys runners with this label

project-beta-frontend
project-beta-api           } Must use: runs-on: arc-beta-runners
project-beta

Critical Rule: Workflow runs-on: MUST EXACTLY match Helm runnerScaleSetName

The Coordination Problem

Issue #121 Example

Change: Update runnerScaleSetName from arc-runners to arc-beta-runners

Impact:

matchpoint-github-runners-helm
  ✅ runnerScaleSetName: "arc-beta-runners"

project-beta-frontend (15 workflows)
  ❌ runs-on: arc-runners  # OLD label - jobs stuck!

project-beta-api (13 workflows)
  ❌ runs-on: arc-runners  # OLD label - jobs stuck!

project-beta (3 workflows)
  ❌ runs-on: arc-runners  # OLD label - jobs stuck!

Result: All CI jobs stuck in "queued" state until workflows updated.

Affected Repositories

Repository Workflows Runner Labels Priority
project-beta-frontend 15 files arc-beta-runners P0 - Blocks deploys
project-beta-api 13 files arc-beta-runners P0 - Blocks deploys
project-beta 3 files arc-beta-runners P0 - Blocks infra

Coordination Workflow

Phase 1: Planning

Before changing runnerScaleSetName, audit all repositories:

# Search for current runner label usage
for repo in project-beta-frontend project-beta-api project-beta; do
  echo "=== $repo ==="
  cd /path/to/$repo
  grep -r "runs-on:" .github/workflows/ | grep -v "ubuntu-latest" | sort -u
done

Output example:

=== project-beta-frontend ===
.github/workflows/ci.yaml:    runs-on: arc-runners
.github/workflows/deploy.yaml:    runs-on: arc-runners
...

=== project-beta-api ===
.github/workflows/test.yaml:    runs-on: arc-runners
...

Document the changes needed:

  • Count of files per repository
  • Specific workflow files affected
  • Any workflows using different labels

Phase 2: Create Migration Plan

Option A: Dual Runner Pools (Zero Downtime)

Deploy BOTH old and new runner pools during transition:

# matchpoint-github-runners-helm/argocd/applicationset-runners.yaml
generators:
- list:
    elements:
    - name: arc-runners           # OLD - for existing workflows
      valuesFile: examples/runners-values-old.yaml
    - name: arc-beta-runners      # NEW - for updated workflows
      valuesFile: examples/runners-values-new.yaml

Timeline:

  1. Deploy both runner pools
  2. Update workflows in all repos (can be done gradually)
  3. Remove old runner pool after all workflows migrated

Pros:

  • Zero downtime
  • Safe rollback (revert workflow changes)
  • Can update repos independently

Cons:

  • 2x runner costs during migration
  • Need to track which repos migrated

Option B: Coordinated Single Cutover

Update runner AND all workflows simultaneously:

  1. Prepare PRs in ALL repositories (don't merge)
  2. Merge runner config change
  3. Wait for ArgoCD sync (~3 min)
  4. Merge ALL workflow PRs quickly
  5. Monitor for stuck jobs

Pros:

  • No extra runner costs
  • Clean cutover

Cons:

  • ~3-5 minute CI outage
  • Requires coordination across repos
  • Risky if issues arise

Recommended: Option A for production, Option B for dev/test

Phase 3: Update Workflows

For each repository, create a PR that updates ALL workflow files:

# Script: update-runner-labels.sh
#!/bin/bash

OLD_LABEL="arc-runners"
NEW_LABEL="arc-beta-runners"
REPO=$1

cd /path/to/$REPO

# Find all workflow files
WORKFLOWS=$(find .github/workflows -name "*.ya*ml")

# Update each file
for workflow in $WORKFLOWS; do
  if grep -q "runs-on: $OLD_LABEL" "$workflow"; then
    echo "Updating: $workflow"
    sed -i "s/runs-on: $OLD_LABEL/runs-on: $NEW_LABEL/g" "$workflow"
  fi
done

# Create PR
git checkout -b fix/update-runner-label-to-$NEW_LABEL
git add .github/workflows/
git commit -m "ci: Update runner label from $OLD_LABEL to $NEW_LABEL

Aligns with runner configuration change in matchpoint-github-runners-helm.

Refs: matchpoint-ai/matchpoint-github-runners-helm#121"

git push -u origin fix/update-runner-label-to-$NEW_LABEL

gh pr create \
  --title "ci: Update runner label from $OLD_LABEL to $NEW_LABEL" \
  --body "Updates all workflows to use the new runner label \`$NEW_LABEL\`.

## Context
matchpoint-github-runners-helm changed \`runnerScaleSetName\` to \`$NEW_LABEL\`.

## Changes
- Updates all \`.github/workflows/*.yaml\` files
- Changes \`runs-on: $OLD_LABEL\` → \`runs-on: $NEW_LABEL\`

## Testing
- [ ] Verify workflows use correct runner label
- [ ] Confirm CI jobs execute (not stuck in queue)

Related: matchpoint-ai/matchpoint-github-runners-helm#121"

Usage:

./update-runner-labels.sh project-beta-frontend
./update-runner-labels.sh project-beta-api
./update-runner-labels.sh project-beta

Phase 4: Verification

After merging workflow updates:

# Check that runners are picking up jobs
gh run list --repo Matchpoint-AI/project-beta-frontend --limit 5

# Verify no jobs stuck in queue
gh run list --repo Matchpoint-AI/project-beta-frontend --status queued

# Check runner status
gh api /orgs/Matchpoint-AI/actions/runners --jq '.runners[] | {name, status, busy, labels: [.labels[].name]}'

Success criteria:

  • ✅ No jobs stuck in "queued" for > 2 minutes
  • ✅ Jobs transition to "in_progress" quickly
  • ✅ Runners show "busy: true" when jobs running

Common Scenarios

Scenario 1: Adding New Runner Pool

Example: Add dedicated runners for frontend with GPU support

Steps:

  1. Add runner pool in matchpoint-github-runners-helm:

    # argocd/applicationset-runners.yaml
    - name: arc-frontend-gpu
      valuesFile: examples/frontend-gpu-values.yaml
    
  2. Update ONLY affected workflows in project-beta-frontend:

    # .github/workflows/e2e-visual-tests.yaml
    jobs:
      visual-tests:
        runs-on: arc-frontend-gpu  # NEW pool
    
  3. Keep other workflows on existing pool:

    # .github/workflows/ci.yaml
    jobs:
      test:
        runs-on: arc-beta-runners  # Existing pool
    

Impact: Only workflows explicitly updated use new pool

Scenario 2: Removing Runner Pool

Example: Deprecate arc-runners in favor of arc-beta-runners

Steps:

  1. Ensure NO workflows reference old label:

    for repo in project-beta-frontend project-beta-api project-beta; do
      cd /path/to/$repo
      grep -r "runs-on: arc-runners" .github/workflows/ && echo "❌ Found old label in $repo"
    done
    
  2. Remove runner pool from matchpoint-github-runners-helm:

    # argocd/applicationset-runners.yaml
    # Remove the arc-runners entry
    
  3. Verify no queued jobs after removal:

    gh run list --status queued --limit 20
    

Scenario 3: Emergency Runner Failover

Example: Primary runner pool down, need to switch to backup

Steps:

  1. Deploy backup runner pool (if not already deployed):

    # Quick deploy via ArgoCD
    kubectl apply -f argocd/applications/arc-backup-runners.yaml
    
  2. Bulk update workflows in critical repo:

    # Emergency script
    find .github/workflows -name "*.yaml" -exec sed -i 's/runs-on: arc-beta-runners/runs-on: arc-backup-runners/g' {} \;
    git add .github/workflows/
    git commit -m "EMERGENCY: Switch to backup runners"
    git push
    
  3. Monitor job execution:

    watch -n 5 'gh run list --limit 10'
    

Validation Scripts

Pre-Merge Validation

Run before merging runner configuration changes:

#!/bin/bash
# scripts/validate-runner-labels.sh

set -euo pipefail

RUNNER_LABEL=$1
REPOS=("project-beta-frontend" "project-beta-api" "project-beta")

echo "🔍 Checking if workflows use runner label: $RUNNER_LABEL"

for repo in "${REPOS[@]}"; do
  echo ""
  echo "=== $repo ==="

  if [ ! -d "../$repo" ]; then
    echo "⚠️  Repository not found: ../$repo"
    continue
  fi

  cd "../$repo"

  MATCHES=$(grep -r "runs-on: $RUNNER_LABEL" .github/workflows/ 2>/dev/null | wc -l)

  if [ "$MATCHES" -gt 0 ]; then
    echo "✅ Found $MATCHES workflow jobs using $RUNNER_LABEL"
    grep -r "runs-on: $RUNNER_LABEL" .github/workflows/ | head -5
  else
    echo "❌ No workflows use $RUNNER_LABEL"
  fi

  cd - > /dev/null
done

Usage:

cd matchpoint-github-runners-helm
./scripts/validate-runner-labels.sh arc-beta-runners

Post-Merge Validation

Run after merging workflow updates:

#!/bin/bash
# scripts/verify-ci-not-stuck.sh

set -euo pipefail

REPOS=("Matchpoint-AI/project-beta-frontend" "Matchpoint-AI/project-beta-api" "Matchpoint-AI/project-beta")

echo "🔍 Checking for stuck CI jobs..."

for repo in "${REPOS[@]}"; do
  echo ""
  echo "=== $repo ==="

  QUEUED=$(gh run list --repo "$repo" --status queued --limit 50 --json databaseId,createdAt,status | jq -r '.[] | select(.status == "queued") | "\(.databaseId) - queued since \(.createdAt)"')

  if [ -z "$QUEUED" ]; then
    echo "✅ No queued jobs"
  else
    echo "⚠️  Found queued jobs:"
    echo "$QUEUED"

    # Check if any queued > 5 minutes
    STUCK=$(echo "$QUEUED" | jq -r 'select(now - (.createdAt | fromdateiso8601) > 300)')
    if [ -n "$STUCK" ]; then
      echo "❌ Jobs stuck for > 5 minutes!"
    fi
  fi
done

Usage:

./scripts/verify-ci-not-stuck.sh

Troubleshooting

Error: Jobs Stuck After Runner Change

Symptom: CI jobs stuck in "queued" after runner label change

Diagnosis:

# Check what label runners have
kubectl get autoscalingrunnerset -A -o jsonpath='{.items[*].spec.runnerScaleSetName}'

# Check what label workflows use
for repo in project-beta-frontend project-beta-api project-beta; do
  cd ../$repo
  grep -h "runs-on:" .github/workflows/* | sort -u
done

Fix:

# If mismatch found, update workflows
cd ../project-beta-frontend
find .github/workflows -name "*.yaml" -exec sed -i 's/runs-on: OLD_LABEL/runs-on: NEW_LABEL/g' {} \;
git commit -am "fix: Update runner label to match deployed runners"
git push

Error: Some Repos Updated, Others Not

Symptom: CI works in some repos but not others

Diagnosis:

# Check each repo's workflows
for repo in project-beta-frontend project-beta-api project-beta; do
  echo "=== $repo ==="
  cd ../$repo
  grep -h "runs-on:" .github/workflows/* | sort -u
  cd -
done

Fix: Update remaining repos using update script

Error: Runners Deployed But Not Registering

Symptom: Runners deployed but GitHub doesn't show them

Diagnosis:

# Check GitHub runners
gh api /orgs/Matchpoint-AI/actions/runners --jq '.runners[] | {name, labels: [.labels[].name]}'

# Check Kubernetes runners
kubectl get pods -n arc-beta-runners -l app.kubernetes.io/component=runner

Fix: See arc-runner-troubleshooting

Best Practices

  1. Plan multi-repo changes in advance - Don't surprise developers with stuck CI
  2. Use dual runner pools during migration - Eliminates downtime
  3. Communicate changes - Post in team chat before merging
  4. Verify in dev first - Test runner changes in development repo
  5. Monitor after deployment - Watch for queued jobs for 30 minutes post-change
  6. Document runner labels - Keep README updated with current label names
  7. Automate validation - Run validation scripts in CI for runner config changes

Coordination Checklist

Before changing runnerScaleSetName:

  • Audit all repos for workflow label usage
  • Document count of files per repo needing updates
  • Choose migration strategy (dual pool vs cutover)
  • Prepare PRs for all affected repos
  • Communicate change timeline to team
  • Deploy runner config change
  • Wait for ArgoCD sync (verify runners online)
  • Merge workflow PRs
  • Verify CI jobs execute successfully
  • Monitor for stuck jobs (30 minutes)
  • Clean up old runner pool (if dual pool strategy)

Related Skills

Related Issues

  • #121 - releaseName/runnerScaleSetName mismatch causing empty labels
  • #123 - Cross-repo label update coordination
  • #112 - CI jobs stuck investigation
  • project-beta-api#798 - Workflow label update
  • project-beta-frontend#886 - CI blocked by label mismatch

References