name

devops-engineering

description

Domain specialist for infrastructure, CI/CD, containers, observability, and DevOps operations. Expertise includes CI/CD pipelines, containerization (Docker, Kubernetes), infrastructure as code (Terraform, Ansible), monitoring and observability, container security, release strategies (blue-green, canary), and infrastructure reliability patterns. Use when: infrastructure questions, CI/CD setup, Docker configuration, Kubernetes orchestration, monitoring/alerting, deployment strategies, infrastructure automation. Triggers: "CI/CD", "Docker", "Kubernetes", "K8s", "deployment", "pipeline", "monitoring", "observability", "Terraform", "Ansible", "infrastructure", "container", "containerization", "release", "blue-green", "canary", "rolling", "logging", "metrics", "tracing", "alerting".

DEVOPS_ENGINEERING

DOMAIN EXPERTISE

Common Attacks: Container escapes, supply chain attacks, credential exposure, privilege escalation, insecure configurations, poisoned images
Common Issues: Poor monitoring, lack of alerts, inconsistent environments, manual deployments, no rollback capability, resource exhaustion
Common Mistakes: Running containers as root, hard-coded secrets, no resource limits, missing health checks, manual production changes, fragile pipelines, untested rollbacks
Related Patterns: Infrastructure as Code, Immutable Infrastructure, Blue-Green Deployment, Canary Deployment, Cattle vs Pets, 12-Factor App
Problematic Patterns: SSH into production, manual configuration management, snowflake servers, monolithic deployments, lack of idempotency
Container Security: Container escapes, image scanning, least privilege, secure base images, secrets management, network policies
Observability: Metrics (counters, gauges, histograms), logging (structured logs), tracing (distributed tracing), alerting (SLO/SLI based)
Infrastructure Reliability: SLO (Service Level Objectives), SLI (Service Level Indicators), error budgets, blameless post-mortems, gradual rollouts

MODE DETECTION

WRITE Mode: Keywords: ["create", "generate", "write", "build", "implement", "add", "new", "deploy", "setup", "configure"]
REVIEW Mode: Keywords: ["review", "analyze", "audit", "check", "find issues", "security audit", "infrastructure review", "deployment issues", "monitoring problems"]

LOADING STRATEGY

Write Mode (Progressive)

Load patterns based on infrastructure requirements:

CI/CD questions -> Load @infrastructure/CI-CD.md
Container questions -> Load @infrastructure/CONTAINERIZATION.md
Infrastructure as code -> Load @infrastructure/IAC.md
Monitoring/observability -> Load @infrastructure/OBSERVABILITY.md
Container security -> Load @security/CONTAINER-SECURITY.md
Deployment questions -> Load @release/RELEASE-STRATEGIES.md

Review Mode (Exhaustive)

Load comprehensive checklists:

IF infrastructure review requested -> Load all infrastructure patterns
IF security review requested -> Load @security/CONTAINER-SECURITY.md + security patterns
IF deployment issues -> Load @release/RELEASE-STRATEGIES.md + infrastructure patterns

Progressive Loading (Write Mode)

IF request mentions "CI", "CD", "pipeline", "build", "deploy" -> READ FILE: @infrastructure/CI-CD.md
IF request mentions "Docker", "container", "image" -> READ FILE: @infrastructure/CONTAINERIZATION.md
IF request mentions "Terraform", "Ansible", "IaC", "infrastructure code" -> READ FILE: @infrastructure/IAC.md
IF request mentions "monitoring", "logging", "metrics", "tracing", "alerting", "observability" -> READ FILE: @infrastructure/OBSERVABILITY.md
IF request mentions "container security", "image scanning", "container escapes" -> READ FILE: @security/CONTAINER-SECURITY.md
IF request mentions "deployment", "release", "blue-green", "canary" -> READ FILE: @release/RELEASE-STRATEGIES.md

Comprehensive Loading (Review Mode)

IF request mentions "review", "audit", "security" -> READ FILES: @security/CONTAINER-SECURITY.md, @infrastructure/CONTAINERIZATION.md, @infrastructure/CI-CD.md
IF request mentions "infrastructure issues" -> READ FILES: @infrastructure/CI-CD.md, @infrastructure/CONTAINERIZATION.md, @infrastructure/IAC.md, @infrastructure/OBSERVABILITY.md
IF request mentions "deployment issues" -> READ FILES: @release/RELEASE-STRATEGIES.md, @infrastructure/CI-CD.md

CONTEXT DETECTION

Platform Detection

Container Platforms

Docker: Dockerfile, docker-compose.yml, docker-compose.yaml, .dockerignore
Docker Compose: docker-compose.yml, docker-compose.yaml, compose.yml, compose.yaml, docker-compose.override.yml
Kubernetes (K8s): deployment.yaml, service.yaml, ingress.yaml, configmap.yaml, secret.yaml, statefulset.yaml, daemonset.yaml, namespace.yaml, persistentvolume.yaml, persistentvolumeclaim.yaml, storageclass.yaml, .k8s/ directory
Kustomize: kustomization.yaml, kustomization.yml, overlays/ directory
Helm: Chart.yaml, values.yaml, templates/ directory, .helmignore, requirements.yaml, helmfile.yaml
Docker Swarm: docker-stack.yml, docker-stack.yaml, docker-compose.yml with deploy keys

Cloud Platforms

AWS CloudFormation: template.yaml, template.json, AWS::CloudFormation::Stack
AWS SAM: template.yaml with AWSTemplateFormatVersion: '2010-09-09' and Transform: AWS::Serverless-2016-10-31, sam.yaml
AWS CDK: lib/ with stack definitions, cdk.json, package.json with aws-cdk
Terraform: *.tf, main.tf, variables.tf, outputs.tf, terraform.tfvars, .terraform.lock.hcl, .terraform/ directory, modules/ directory
Terraform Cloud/Enterprise: terraform-cloud.tf, workspace files
Pulumi: Pulumi.yaml, Pulumi..yaml, index.ts, main.py, main.go
Google Cloud Deployment Manager: *.yaml, *.jinja, config.yaml
Azure Resource Manager (ARM): template.json, azuredeploy.json, azuredeploy.parameters.json
Azure Bicep: main.bicep, .bicep files, azure.bicepparam

Configuration Management

Ansible: *.yml, *.yaml, playbooks/, roles/, inventory/, ansible.cfg, ansible.cfg, group_vars/, host_vars/
Chef: recipes/, attributes/, resources/, Cheffile, Berksfile, Policyfile.rb
Puppet: manifests/*.pp, modules/, templates/, hiera.yaml
SaltStack: *.sls, pillar/, reactor/, salt/
CFEngine: promises.cf, inputs/

CI/CD Platforms

GitHub Actions: .github/workflows/.yml, .github/workflows/.yaml, .github/actions/
GitLab CI: .gitlab-ci.yml, .gitlab-ci.yaml
Jenkins: Jenkinsfile, Jenkinsfile.* (declarative)
CircleCI: .circleci/config.yml, .circleci/config.yaml
Travis CI: .travis.yml, .travis.yaml
Bitbucket Pipelines: bitbucket-pipelines.yml
Azure Pipelines: azure-pipelines.yml, azure-pipelines.yaml
Azure DevOps: azure-pipelines-*.yml, build.yml, release.yml
GitLab Runner: config.toml
TeamCity: teamcity-settings.kts, .teamcity/

Build Systems & Tools

Make: Makefile, makefile, GNUmakefile
CMake: CMakeLists.txt, cmake/ directory
Gradle: build.gradle, build.gradle.kts, settings.gradle, settings.gradle.kts, gradle.properties, .gradle/ directory
Maven: pom.xml, mvnw, mvnw.cmd
Bazel: BUILD, WORKSPACE, BUILD.bazel, MODULE.bazel
Buck: BUCK, TARGETS
Nix: default.nix, shell.nix, flake.nix
Nixpkgs: pkgs/ directory, .nix/
Webpack: webpack.config.js, webpack.config.ts, webpackfile.js
Vite: vite.config.js, vite.config.ts
Rollup: rollup.config.js, rollup.config.ts
esbuild: esbuild.config.js, esbuild.config.mjs
Parcel: parcelrc, .parcelrc
Turbopack: turbo.json

Monitoring & Observability

Prometheus: prometheus.yml, prometheus.yaml, alerts.yml, rules/*.yml
Grafana: grafana.ini, provisioning/, dashboards/, datasources/
Elastic Stack (ELK): elasticsearch.yml, logstash.conf, kibana.yml, filebeat.yml
Jaeger: jaeger-config.json
Zipkin: zipkin-config.json
OpenTelemetry: otel-collector-config.yaml
Thanos: thanos.yaml, query.yaml, store.yaml
VictoriaMetrics: victoriametrics.yaml
Datadog: datadog.yaml, datadog.json
New Relic: newrelic.yml

Logging & Tracing

Fluentd: fluent.conf, fluentd.conf
Fluent Bit: fluent-bit.conf, parsers.conf
Logstash: logstash.conf, pipelines.yml
Loki: loki-config.yaml
Syslog-ng: syslog-ng.conf
Rsyslog: rsyslog.conf

Application Type Detection

Web Servers

Nginx: nginx.conf, conf.d/, sites-enabled/, sites-available/, .nginx file extension
Apache HTTPD: httpd.conf, apache2.conf, conf-available/, conf-enabled/
Caddyfile: Caddyfile
Traefik: traefik.yml, traefik.yaml, traefik.toml, traefik动态配置
Envoy: envoy.yaml, envoy.json

Reverse Proxies & Load Balancers

HAProxy: haproxy.cfg
Traefik: (see Web Servers section)
Envoy: (see Web Servers section)

Message Queues & Brokers

RabbitMQ: rabbitmq.conf, rabbitmq.config, advanced.config
Apache Kafka: server.properties, consumer.properties, producer.properties
Apache ActiveMQ: activemq.xml
Redis: redis.conf, sentinel.conf
NATS: nats-server.conf
Apache Pulsar: broker.conf, client.conf

Runtime & Application Detection

Node.js: package.json, package-lock.json, yarn.lock, pnpm-lock.yaml, tsconfig.json, .nvmrc, .node-version
Python: requirements.txt, requirements-dev.txt, pyproject.toml, setup.py, setup.cfg, Pipfile, Pipfile.lock, poetry.lock, tox.ini
PHP: composer.json, composer.lock, phpunit.xml, .php-cs-fixer.php
Java: pom.xml, build.gradle, settings.gradle, gradle.properties, .gradle/, src/main/java/
Go: go.mod, go.sum, main.go
Ruby: Gemfile, Gemfile.lock, Rakefile, config.ru
Rust: Cargo.toml, Cargo.lock, src/main.rs
C#/.NET: .csproj, .sln, packages.config, appsettings.json

Cloud Provider Detection

AWS: aws/ directory, AWS SDK imports (boto3, aws-sdk), AWS CloudFormation templates, AWS SAM templates
Google Cloud: gcloud/ directory, GCP SDK imports, gcloud-deployment.yaml
Azure: azure/ directory, Azure SDK imports, ARM templates, Bicep files
Alibaba Cloud: alibaba-cloud/ directory, Aliyun SDK imports
DigitalOcean: digitalocean/ directory, doctl/
Linode: linode/ directory, linode-cli/
Heroku: Procfile, Heroku-specific config
Vercel: vercel.json, .vercelignore
Netlify: netlify.toml, netlify.toml
Cloudflare Workers: wrangler.toml, workers/

Unsupported Platform Fallback

Detection Failed: If no platform detected after checking all indicators -> Load generic infrastructure patterns and ask clarifying questions
Questions to Ask:
- "What container platform are you using (Docker, Kubernetes, etc.)?"
- "What CI/CD platform are you using?"
- "What cloud provider (AWS, GCP, Azure, etc.) or on-premise infrastructure?"
- "What configuration management tool (Ansible, Chef, Puppet, etc.)?"
Fallback Strategy: Load generic container/infrastructure patterns and request user confirmation

WHEN TO USE THIS SKILL

✅ Use when:

Designing CI/CD pipelines
Creating Docker containers/images
Setting up Kubernetes deployments
Implementing infrastructure as code
Configuring monitoring, logging, alerting
Designing deployment strategies (blue-green, canary)
Container security hardening
Infrastructure reliability and observability
Release automation and rollback strategies

❌ Do NOT use when:

Code-level security issues (use secops-engineering)
Application architecture (use software-engineering)
Database design (use database-engineering)
API design (use api-engineering)
Performance profiling (use performance-engineering)

EXECUTION PROTOCOL

Phase 1: Clarification

Detect Mode: WRITE vs REVIEW based on keywords
Detect Context: Platform (Docker, K8s, Terraform), application type
Load Patterns: Progressive (write) or Exhaustive (review)

Phase 2: Planning

Load relevant pattern references
Implement infrastructure according to best practices
Apply security hardening
Consider observability (metrics, logs, traces)
Provide platform-specific examples

Phase 3: Execution

Load all checklist references
Systematically check each category:
- Infrastructure (CI/CD, containers, IaC)
- Security (container escapes, least privilege, secrets)
- Observability (monitoring, logging, alerting)
- Deployment strategies (rollback, gradual rollout)
Provide prioritized issues with severity levels

Phase 4: Validation

Verify infrastructure follows loaded patterns
Check for security vulnerabilities
Ensure observability is comprehensive
Validate rollback mechanisms exist

Write Mode Output

## Infrastructure Implementation: [Component]

### Technology Stack
[Platform choice with rationale]

### Configuration
```yaml
# Dockerfile / Kubernetes manifest / Terraform
configuration here

Security Considerations

[Security measure 1]
[Security measure 2]

Observability

Metrics: [metrics collected]
Logs: [logging strategy]
Alerts: [alerting rules]

Related Patterns

@infrastructure/[specific-pattern].md


### Review Mode Output
```markdown
## Infrastructure Review Report

### Critical Issues
1. **[Issue Name]**: [Component: file]
   - Severity: CRITICAL
   - Description: [Issue details]
   - Impact: [Potential consequence]
   - Fix: [Recommended action]
   - Reference: @security/CONTAINER-SECURITY.md

### High Priority Issues
[Same format]

### Medium Priority Issues
[Same format]

### Low Priority Issues
[Same format]

### Recommendations
1. [Improvement suggestion]
2. [Improvement suggestion]