| name | cicd-github-actions-ops |
| description | Systematic review and debugging of GitHub Actions workflows. Use when reviewing PRs, debugging failed actions, analyzing workflow efficiency, or making decisions about which actions to use. |
GitHub Actions Operations
Guide for systematic review, debugging, and optimization of GitHub Actions across repositories.
When to Use This Skill
- Reviewing open PRs that involve workflow changes
- Debugging failed GitHub Actions runs
- Auditing workflow efficiency and reasonableness
- Making decisions about action selection (reliable vs fancy, self-hosted vs third-party)
- Standardizing workflows across repositories
Review Priorities
When reviewing workflows and actions, follow these priorities in order:
Priority 1: Working (Not Just Passing)
Ensure all GitHub Actions are actually working, not just passing by luck or skipping.
Check for:
- Jobs that pass because they have no assertions
- Conditional steps that always skip (
if: falseeffectively) - Error handling that swallows failures
continue-on-error: truehiding real issues- Empty test suites that "pass"
# Check if a workflow has meaningful steps
gh run view <run-id> --log | grep -E "(Run|Error|Warning|PASS|FAIL)"
Priority 2: Reasonable Workflows
Ensure workflows trigger appropriately and don't waste resources.
Anti-patterns to fix:
| Anti-pattern | Problem | Solution |
|---|---|---|
| Fuzzing on every push | Expensive, slow | Schedule or manual trigger |
| Full rebuild for doc changes | Wasteful | Use path filters |
| No concurrency control | Redundant runs | Add concurrency: |
| Matrix without need | Slow CI | Use matrix only when testing compatibility |
Path filtering template:
on:
push:
paths:
- 'src/**'
- 'Cargo.toml'
- '.github/workflows/ci.yml'
paths-ignore:
- '**.md'
- 'docs/**'
- '.gitignore'
Concurrency template:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
Priority 3: Passing
All GitHub Actions should pass. Debug failures systematically.
See: debugging.md
Priority 4: Reliable > Fancy
Prefer proven, reliable actions over feature-rich alternatives.
When choosing reliable over fancy:
- Use the reliable action
- Create tracking issue in
arustydev/ghafor review
gh issue create --repo arustydev/gha \
--title "[REVIEW] Evaluate <fancy-action> vs <reliable-action>" \
--body "## Context
Chose \`<reliable-action>\` over \`<fancy-action>\` for <reason>.
## Fancy Action
- **Name:** \`<owner>/<fancy-action>\`
- **Features:** <list features>
- **Concerns:** <why not chosen>
## Reliable Action
- **Name:** \`<owner>/<reliable-action>\`
- **Why chosen:** <stability, maintenance, simplicity>
## Used In
- \`<repo-name>\` - \`<workflow-file>\`
## Review Request
Evaluate if fancy action is worth adopting once:
- [ ] It has more stability/adoption
- [ ] We need its features
- [ ] It's been maintained for 6+ months"
Priority 5: Reliable > Self-Hosted (New Development)
For NEW action development, prefer third-party reliable actions over building in arustydev/gha.
When using third-party over building self-hosted:
- Use the third-party action
- Create tracking issue in
arustydev/ghafor future consideration
gh issue create --repo arustydev/gha \
--title "[CONSIDER] Build alternative to <action>" \
--body "## Context
Using third-party \`<owner>/<action>\` instead of building custom.
## Third-Party Action
- **Name:** \`<owner>/<action>@<version>\`
- **Purpose:** <what it does>
- **Why chosen:** <reliability, features, maintenance>
## Evaluated Alternatives
| Action | Pros | Cons |
|--------|------|------|
| <action1> | ... | ... |
| <action2> | ... | ... |
## Used In
- \`<repo-name>\` - \`<workflow-file>\`
## Future Consideration
Build custom version if:
- [ ] Third-party becomes unmaintained
- [ ] We need custom features not supported
- [ ] Security/audit requirements demand it"
Priority 6: Standardization
Use consistent patterns across all repositories.
Standard workflow patterns:
| Workflow | Trigger | Purpose |
|---|---|---|
ci.yml |
push, pull_request | Build, test, lint |
release.yml |
release published | Publish artifacts |
dependabot.yml |
schedule | Dependency updates |
auto-assign.yml |
issues, PRs opened | Assign to owner |
Systematic Review Workflow
Phase 0: Fork Detection
Before reviewing, check if the repository is a fork:
# Check if repo is a fork
gh repo view --json isFork,parent -q '{fork: .isFork, parent: .parent.nameWithOwner}'
If forked, identify upstream-specific patterns:
| Pattern | Detection | Common Issues |
|---|---|---|
| External deploy target | external_repository: in workflow |
Deploys to upstream's gh-pages |
| Deploy keys | secrets.DEPLOY_KEY |
Secret doesn't exist in fork |
| Hardcoded org | google/timesketch in workflow |
Wrong target org |
| Upstream branches | branches: [main] when fork uses master |
Branch mismatch |
| Upstream composite actions | uses: <upstream>/.github/actions/ |
Action path doesn't exist in fork |
| Hardcoded Docker namespace | docker.*<upstream-org>/ |
Pushes to wrong Docker Hub namespace |
| External registries | hub.infinyon.cloud or similar |
Upstream-specific package registry |
| Upstream secrets | secrets.ORG_* or secrets.DOCKER_* |
Organization secrets not available |
# Comprehensive fork detection
grep -rE "external_repository:|DEPLOY_KEY|\.github/actions/" .github/workflows/
grep -rE "secrets\.(ORG_|DOCKER_|SLACK_|AWS_)" .github/workflows/
grep -rE "https?://[a-z-]+\.[a-z]+\.(cloud|io)/" .github/workflows/ | grep -v github
Fork handling options:
- Disable - Rename to
.yml.disabled(recommended for deploy workflows) - Adapt - Modify to work with your fork
- Remove - Delete if not needed
- Keep - Leave as-is if it will work (rare)
# Disable a workflow
mv .github/workflows/deploy.yml .github/workflows/deploy.yml.disabled
# Find upstream-specific patterns
grep -r "external_repository\|DEPLOY_KEY\|google/" .github/workflows/
Phase 0.5: Complexity Assessment
Before diving into fixes, assess the scope of work:
# Count workflows and total lines
echo "=== Workflow Complexity ==="
ls -1 .github/workflows/*.yml 2>/dev/null | wc -l | xargs echo "Workflow count:"
wc -l .github/workflows/*.yml 2>/dev/null | tail -1 | awk '{print "Total lines:", $1}'
# Count action dependencies
echo "=== Action Dependencies ==="
grep -h "uses:" .github/workflows/*.yml 2>/dev/null | wc -l | xargs echo "Action references:"
grep -h "uses:" .github/workflows/*.yml 2>/dev/null | grep -oE '[^/]+/[^@]+' | sort -u | wc -l | xargs echo "Unique actions:"
# Count job dependencies (complexity indicator)
echo "=== Job Dependencies ==="
grep -c "needs:" .github/workflows/*.yml 2>/dev/null | awk -F: '{sum+=$2} END {print "Total needs: clauses:", sum}'
# Matrix sprawl check
echo "=== Matrix Size ==="
grep -A20 "matrix:" .github/workflows/*.yml 2>/dev/null | grep -E "^\s+-\s" | wc -l | xargs echo "Matrix entries:"
Complexity tiers:
| Tier | Workflows | Lines | Approach |
|---|---|---|---|
| Simple | 1-5 | <500 | Fix all in one PR |
| Medium | 6-10 | 500-1500 | Fix by priority, 1-2 PRs |
| Complex | 11+ | 1500+ | Incremental fixes, multiple PRs |
| Massive | 15+ | 3000+ | Consider disable-first strategy |
If complexity is High/Massive:
- Start with disabling non-essential workflows
- Focus on Priority 2 fixes (concurrency, path filters) first
- Address failures incrementally
- Document known limitations that won't be fixed
Phase 1: Gather Information
# List all open PRs across your repos
gh search prs --author aRustyDev --state open --limit 100
# List failed workflow runs
gh run list --repo <owner>/<repo> --status failure --limit 20
# Get workflow files for a repo
gh api repos/<owner>/<repo>/contents/.github/workflows | jq -r '.[].name'
Phase 2: Categorize Issues
For each PR/failure, categorize:
- Workflow broken - Action itself has bugs
- Workflow inefficient - Runs unnecessarily
- Test failure - Code issue, not workflow
- Permission issue - Token/access problems
- Environment issue - Runner/dependency problems
- Flaky test - Intermittent failures
Phase 3: Fix by Category
| Category | Action |
|---|---|
| Workflow broken | Fix workflow, update action versions |
| Workflow inefficient | Add path filters, concurrency |
| Test failure | Fix code, not workflow |
| Permission issue | Adjust permissions block |
| Environment issue | Pin versions, add setup steps |
| Flaky test | Add retry or fix root cause |
Phase 4: Track Decisions
For every non-trivial decision, create appropriate tracking:
- Chose reliable over fancy → Issue in
arustydev/gha - Chose third-party over self-hosted → Issue in
arustydev/gha - Found bug in action → Issue in action's repo
- Need new action → Issue in
arustydev/gha
Phase 5: Validate Before Committing
Before committing workflow changes, validate them:
# 1. Check YAML syntax and common issues
actionlint .github/workflows/*.yml
# 2. Verify action versions exist
for action in $(grep -h "uses:" .github/workflows/*.yml | grep -oE '[^/]+/[^@]+@v[0-9]+' | sort -u); do
repo=$(echo "$action" | cut -d@ -f1)
version=$(echo "$action" | cut -d@ -f2)
echo -n "$action: "
gh api "repos/$repo/git/refs/tags/$version" --silent && echo "OK" || echo "NOT FOUND"
done
# 3. Check for deprecated actions
grep -r "actions-rs/\|set-output\|save-state" .github/workflows/ && echo "WARNING: Deprecated patterns found"
Common validation failures:
| Error | Cause | Fix |
|---|---|---|
action version not found |
Invalid version (v6 doesn't exist) | Check action-selection.md for valid versions |
set-output is deprecated |
Old output syntax | Use echo "name=value" >> $GITHUB_OUTPUT |
save-state is deprecated |
Old state syntax | Use echo "name=value" >> $GITHUB_STATE |
Phase 6: Partial Fixes and Known Limitations
Not every issue can or should be fully fixed. Know when to stop.
When to accept a partial fix:
| Situation | Action |
|---|---|
| Fixing requires rewriting >50% of workflow | Disable or document limitation |
| Need to create custom actions for fork | Document as future work |
| External service dependencies can't be removed | Disable affected jobs/workflows |
| Upstream architecture tightly coupled | Accept reduced CI coverage |
Documenting known limitations:
When creating a PR with partial fixes, include a "Known Limitations" section:
### Known Limitations
The following issues remain after this fix:
| Issue | Reason | Impact |
|-------|--------|--------|
| `cli_smoke` job fails | Uses upstream's Infinyon Hub | Integration tests don't run |
| Docker builds use wrong namespace | Would require forking build scripts | Images not pushed |
These would require significant refactoring to address.
When to ask the user:
If any of these apply, use AskUserQuestion before proceeding:
- Complete fix requires >2 hours of refactoring
- Fix would change core project behavior
- Multiple equally valid approaches exist
- Fork has diverged significantly from upstream
Incremental progress strategy:
For complex repositories, prefer multiple small PRs:
PR 1: Disable non-essential workflows (quick win)
↓
PR 2: Add concurrency blocks to remaining workflows
↓
PR 3: Fix path filters and triggers
↓
PR 4: Address specific test failures
↓
(Optional) PR 5: Deep refactoring if needed
Each PR should be independently mergeable and improve the situation.
Quick Commands
View failed runs
gh run list --status failure --limit 10
Get logs for failed run
gh run view <run-id> --log-failed
Re-run failed jobs
gh run rerun <run-id> --failed
List PRs needing review
gh pr list --search "is:open draft:false review:required"
Check workflow syntax
actionlint .github/workflows/*.yml
List all workflows in org
for repo in $(gh repo list aRustyDev --limit 100 --json name -q '.[].name'); do
echo "=== $repo ==="
gh api "repos/aRustyDev/$repo/contents/.github/workflows" 2>/dev/null | jq -r '.[].name' || echo "No workflows"
done
See Also
- Reference: debugging.md - Detailed debugging guide
- Reference: action-selection.md - Action selection criteria
- Reference: issue-templates.md - Issue templates for tracking
- Reference: multi-repo.md - Multi-repository batch review