| name | devops-engineering |
| description | Domain specialist for infrastructure, CI/CD, containers, observability, and DevOps operations. Expertise includes CI/CD pipelines, containerization (Docker, Kubernetes), infrastructure as code (Terraform, Ansible), monitoring and observability, container security, release strategies (blue-green, canary), and infrastructure reliability patterns. Use when: infrastructure questions, CI/CD setup, Docker configuration, Kubernetes orchestration, monitoring/alerting, deployment strategies, infrastructure automation. Triggers: "CI/CD", "Docker", "Kubernetes", "K8s", "deployment", "pipeline", "monitoring", "observability", "Terraform", "Ansible", "infrastructure", "container", "containerization", "release", "blue-green", "canary", "rolling", "logging", "metrics", "tracing", "alerting". |
DEVOPS_ENGINEERING
DOMAIN EXPERTISE
- Common Attacks: Container escapes, supply chain attacks, credential exposure, privilege escalation, insecure configurations, poisoned images
- Common Issues: Poor monitoring, lack of alerts, inconsistent environments, manual deployments, no rollback capability, resource exhaustion
- Common Mistakes: Running containers as root, hard-coded secrets, no resource limits, missing health checks, manual production changes, fragile pipelines, untested rollbacks
- Related Patterns: Infrastructure as Code, Immutable Infrastructure, Blue-Green Deployment, Canary Deployment, Cattle vs Pets, 12-Factor App
- Problematic Patterns: SSH into production, manual configuration management, snowflake servers, monolithic deployments, lack of idempotency
- Container Security: Container escapes, image scanning, least privilege, secure base images, secrets management, network policies
- Observability: Metrics (counters, gauges, histograms), logging (structured logs), tracing (distributed tracing), alerting (SLO/SLI based)
- Infrastructure Reliability: SLO (Service Level Objectives), SLI (Service Level Indicators), error budgets, blameless post-mortems, gradual rollouts
MODE DETECTION
- WRITE Mode: Keywords: ["create", "generate", "write", "build", "implement", "add", "new", "deploy", "setup", "configure"]
- REVIEW Mode: Keywords: ["review", "analyze", "audit", "check", "find issues", "security audit", "infrastructure review", "deployment issues", "monitoring problems"]
LOADING STRATEGY
Write Mode (Progressive)
Load patterns based on infrastructure requirements:
- CI/CD questions -> Load
@infrastructure/CI-CD.md - Container questions -> Load
@infrastructure/CONTAINERIZATION.md - Infrastructure as code -> Load
@infrastructure/IAC.md - Monitoring/observability -> Load
@infrastructure/OBSERVABILITY.md - Container security -> Load
@security/CONTAINER-SECURITY.md - Deployment questions -> Load
@release/RELEASE-STRATEGIES.md
Review Mode (Exhaustive)
Load comprehensive checklists:
- IF infrastructure review requested -> Load all infrastructure patterns
- IF security review requested -> Load
@security/CONTAINER-SECURITY.md+ security patterns - IF deployment issues -> Load
@release/RELEASE-STRATEGIES.md+ infrastructure patterns
Progressive Loading (Write Mode)
- IF request mentions "CI", "CD", "pipeline", "build", "deploy" -> READ FILE:
@infrastructure/CI-CD.md - IF request mentions "Docker", "container", "image" -> READ FILE:
@infrastructure/CONTAINERIZATION.md - IF request mentions "Terraform", "Ansible", "IaC", "infrastructure code" -> READ FILE:
@infrastructure/IAC.md - IF request mentions "monitoring", "logging", "metrics", "tracing", "alerting", "observability" -> READ FILE:
@infrastructure/OBSERVABILITY.md - IF request mentions "container security", "image scanning", "container escapes" -> READ FILE:
@security/CONTAINER-SECURITY.md - IF request mentions "deployment", "release", "blue-green", "canary" -> READ FILE:
@release/RELEASE-STRATEGIES.md
Comprehensive Loading (Review Mode)
- IF request mentions "review", "audit", "security" -> READ FILES:
@security/CONTAINER-SECURITY.md,@infrastructure/CONTAINERIZATION.md,@infrastructure/CI-CD.md - IF request mentions "infrastructure issues" -> READ FILES:
@infrastructure/CI-CD.md,@infrastructure/CONTAINERIZATION.md,@infrastructure/IAC.md,@infrastructure/OBSERVABILITY.md - IF request mentions "deployment issues" -> READ FILES:
@release/RELEASE-STRATEGIES.md,@infrastructure/CI-CD.md
CONTEXT DETECTION
Platform Detection
Container Platforms
- Docker: Dockerfile, docker-compose.yml, docker-compose.yaml, .dockerignore
- Docker Compose: docker-compose.yml, docker-compose.yaml, compose.yml, compose.yaml, docker-compose.override.yml
- Kubernetes (K8s): deployment.yaml, service.yaml, ingress.yaml, configmap.yaml, secret.yaml, statefulset.yaml, daemonset.yaml, namespace.yaml, persistentvolume.yaml, persistentvolumeclaim.yaml, storageclass.yaml, .k8s/ directory
- Kustomize: kustomization.yaml, kustomization.yml, overlays/ directory
- Helm: Chart.yaml, values.yaml, templates/ directory, .helmignore, requirements.yaml, helmfile.yaml
- Docker Swarm: docker-stack.yml, docker-stack.yaml, docker-compose.yml with deploy keys
Cloud Platforms
- AWS CloudFormation: template.yaml, template.json, AWS::CloudFormation::Stack
- AWS SAM: template.yaml with AWSTemplateFormatVersion: '2010-09-09' and Transform: AWS::Serverless-2016-10-31, sam.yaml
- AWS CDK: lib/ with stack definitions, cdk.json, package.json with aws-cdk
- Terraform: *.tf, main.tf, variables.tf, outputs.tf, terraform.tfvars, .terraform.lock.hcl, .terraform/ directory, modules/ directory
- Terraform Cloud/Enterprise: terraform-cloud.tf, workspace files
- Pulumi: Pulumi.yaml, Pulumi.
.yaml, index.ts, main.py, main.go - Google Cloud Deployment Manager: *.yaml, *.jinja, config.yaml
- Azure Resource Manager (ARM): template.json, azuredeploy.json, azuredeploy.parameters.json
- Azure Bicep: main.bicep, .bicep files, azure.bicepparam
Configuration Management
- Ansible: *.yml, *.yaml, playbooks/, roles/, inventory/, ansible.cfg, ansible.cfg, group_vars/, host_vars/
- Chef: recipes/, attributes/, resources/, Cheffile, Berksfile, Policyfile.rb
- Puppet: manifests/*.pp, modules/, templates/, hiera.yaml
- SaltStack: *.sls, pillar/, reactor/, salt/
- CFEngine: promises.cf, inputs/
CI/CD Platforms
- GitHub Actions: .github/workflows/.yml, .github/workflows/.yaml, .github/actions/
- GitLab CI: .gitlab-ci.yml, .gitlab-ci.yaml
- Jenkins: Jenkinsfile, Jenkinsfile.* (declarative)
- CircleCI: .circleci/config.yml, .circleci/config.yaml
- Travis CI: .travis.yml, .travis.yaml
- Bitbucket Pipelines: bitbucket-pipelines.yml
- Azure Pipelines: azure-pipelines.yml, azure-pipelines.yaml
- Azure DevOps: azure-pipelines-*.yml, build.yml, release.yml
- GitLab Runner: config.toml
- TeamCity: teamcity-settings.kts, .teamcity/
Build Systems & Tools
- Make: Makefile, makefile, GNUmakefile
- CMake: CMakeLists.txt, cmake/ directory
- Gradle: build.gradle, build.gradle.kts, settings.gradle, settings.gradle.kts, gradle.properties, .gradle/ directory
- Maven: pom.xml, mvnw, mvnw.cmd
- Bazel: BUILD, WORKSPACE, BUILD.bazel, MODULE.bazel
- Buck: BUCK, TARGETS
- Nix: default.nix, shell.nix, flake.nix
- Nixpkgs: pkgs/ directory, .nix/
- Webpack: webpack.config.js, webpack.config.ts, webpackfile.js
- Vite: vite.config.js, vite.config.ts
- Rollup: rollup.config.js, rollup.config.ts
- esbuild: esbuild.config.js, esbuild.config.mjs
- Parcel: parcelrc, .parcelrc
- Turbopack: turbo.json
Monitoring & Observability
- Prometheus: prometheus.yml, prometheus.yaml, alerts.yml, rules/*.yml
- Grafana: grafana.ini, provisioning/, dashboards/, datasources/
- Elastic Stack (ELK): elasticsearch.yml, logstash.conf, kibana.yml, filebeat.yml
- Jaeger: jaeger-config.json
- Zipkin: zipkin-config.json
- OpenTelemetry: otel-collector-config.yaml
- Thanos: thanos.yaml, query.yaml, store.yaml
- VictoriaMetrics: victoriametrics.yaml
- Datadog: datadog.yaml, datadog.json
- New Relic: newrelic.yml
Logging & Tracing
- Fluentd: fluent.conf, fluentd.conf
- Fluent Bit: fluent-bit.conf, parsers.conf
- Logstash: logstash.conf, pipelines.yml
- Loki: loki-config.yaml
- Syslog-ng: syslog-ng.conf
- Rsyslog: rsyslog.conf
Application Type Detection
Web Servers
- Nginx: nginx.conf, conf.d/, sites-enabled/, sites-available/, .nginx file extension
- Apache HTTPD: httpd.conf, apache2.conf, conf-available/, conf-enabled/
- Caddyfile: Caddyfile
- Traefik: traefik.yml, traefik.yaml, traefik.toml, traefik动态配置
- Envoy: envoy.yaml, envoy.json
Reverse Proxies & Load Balancers
- HAProxy: haproxy.cfg
- Traefik: (see Web Servers section)
- Envoy: (see Web Servers section)
Message Queues & Brokers
- RabbitMQ: rabbitmq.conf, rabbitmq.config, advanced.config
- Apache Kafka: server.properties, consumer.properties, producer.properties
- Apache ActiveMQ: activemq.xml
- Redis: redis.conf, sentinel.conf
- NATS: nats-server.conf
- Apache Pulsar: broker.conf, client.conf
Runtime & Application Detection
- Node.js: package.json, package-lock.json, yarn.lock, pnpm-lock.yaml, tsconfig.json, .nvmrc, .node-version
- Python: requirements.txt, requirements-dev.txt, pyproject.toml, setup.py, setup.cfg, Pipfile, Pipfile.lock, poetry.lock, tox.ini
- PHP: composer.json, composer.lock, phpunit.xml, .php-cs-fixer.php
- Java: pom.xml, build.gradle, settings.gradle, gradle.properties, .gradle/, src/main/java/
- Go: go.mod, go.sum, main.go
- Ruby: Gemfile, Gemfile.lock, Rakefile, config.ru
- Rust: Cargo.toml, Cargo.lock, src/main.rs
- C#/.NET: .csproj, .sln, packages.config, appsettings.json
Cloud Provider Detection
- AWS: aws/ directory, AWS SDK imports (boto3, aws-sdk), AWS CloudFormation templates, AWS SAM templates
- Google Cloud: gcloud/ directory, GCP SDK imports, gcloud-deployment.yaml
- Azure: azure/ directory, Azure SDK imports, ARM templates, Bicep files
- Alibaba Cloud: alibaba-cloud/ directory, Aliyun SDK imports
- DigitalOcean: digitalocean/ directory, doctl/
- Linode: linode/ directory, linode-cli/
- Heroku: Procfile, Heroku-specific config
- Vercel: vercel.json, .vercelignore
- Netlify: netlify.toml, netlify.toml
- Cloudflare Workers: wrangler.toml, workers/
Unsupported Platform Fallback
- Detection Failed: If no platform detected after checking all indicators -> Load generic infrastructure patterns and ask clarifying questions
- Questions to Ask:
- "What container platform are you using (Docker, Kubernetes, etc.)?"
- "What CI/CD platform are you using?"
- "What cloud provider (AWS, GCP, Azure, etc.) or on-premise infrastructure?"
- "What configuration management tool (Ansible, Chef, Puppet, etc.)?"
- Fallback Strategy: Load generic container/infrastructure patterns and request user confirmation
WHEN TO USE THIS SKILL
✅ Use when:
- Designing CI/CD pipelines
- Creating Docker containers/images
- Setting up Kubernetes deployments
- Implementing infrastructure as code
- Configuring monitoring, logging, alerting
- Designing deployment strategies (blue-green, canary)
- Container security hardening
- Infrastructure reliability and observability
- Release automation and rollback strategies
❌ Do NOT use when:
- Code-level security issues (use secops-engineering)
- Application architecture (use software-engineering)
- Database design (use database-engineering)
- API design (use api-engineering)
- Performance profiling (use performance-engineering)
EXECUTION PROTOCOL
Phase 1: Clarification
- Detect Mode: WRITE vs REVIEW based on keywords
- Detect Context: Platform (Docker, K8s, Terraform), application type
- Load Patterns: Progressive (write) or Exhaustive (review)
Phase 2: Planning
- Load relevant pattern references
- Implement infrastructure according to best practices
- Apply security hardening
- Consider observability (metrics, logs, traces)
- Provide platform-specific examples
Phase 3: Execution
- Load all checklist references
- Systematically check each category:
- Infrastructure (CI/CD, containers, IaC)
- Security (container escapes, least privilege, secrets)
- Observability (monitoring, logging, alerting)
- Deployment strategies (rollback, gradual rollout)
- Provide prioritized issues with severity levels
Phase 4: Validation
- Verify infrastructure follows loaded patterns
- Check for security vulnerabilities
- Ensure observability is comprehensive
- Validate rollback mechanisms exist
Write Mode Output
## Infrastructure Implementation: [Component]
### Technology Stack
[Platform choice with rationale]
### Configuration
```yaml
# Dockerfile / Kubernetes manifest / Terraform
configuration here
Security Considerations
- [Security measure 1]
- [Security measure 2]
Observability
- Metrics: [metrics collected]
- Logs: [logging strategy]
- Alerts: [alerting rules]
Related Patterns
@infrastructure/[specific-pattern].md
### Review Mode Output
```markdown
## Infrastructure Review Report
### Critical Issues
1. **[Issue Name]**: [Component: file]
- Severity: CRITICAL
- Description: [Issue details]
- Impact: [Potential consequence]
- Fix: [Recommended action]
- Reference: @security/CONTAINER-SECURITY.md
### High Priority Issues
[Same format]
### Medium Priority Issues
[Same format]
### Low Priority Issues
[Same format]
### Recommendations
1. [Improvement suggestion]
2. [Improvement suggestion]