| name | dev-sre |
| description | Gate 2 of the development cycle. VALIDATES that observability was correctly implemented by developers. Does NOT implement observability code - only validates it. |
| trigger | - Gate 2 of development cycle - Gate 0 (Implementation) complete with observability code - Gate 1 (DevOps) setup complete - Service needs observability validation (logging, tracing) |
| skip_when | - No service implementation (documentation only) |
| NOT_skip_when | - "Task says observability not required" → AI cannot self-exempt. ALL services need observability. - "Pure frontend" → If it calls ANY API, backend needs observability. Frontend-only = static HTML. - "MVP doesn't need observability" → MVP without observability = blind MVP. No exceptions. |
| sequence | [object Object] |
| related | [object Object] |
| verification | [object Object] |
| examples | [object Object], [object Object] |
Standards Loading (MANDATORY)
Before ANY SRE validation, you MUST load Ring SRE standards:
See CLAUDE.md and dev-team/docs/standards/sre.md for canonical requirements. This section summarizes the loading process.
MANDATORY ACTION: You MUST use the WebFetch tool NOW:
| Parameter | Value |
|---|---|
| url | https://raw.githubusercontent.com/LerianStudio/ring/main/dev-team/docs/standards/sre.md |
| prompt | "Extract all SRE standards, observability requirements, metric patterns, and health check specifications" |
Execute this WebFetch before proceeding. Do NOT continue until standards are loaded and understood.
If WebFetch fails → STOP and report blocker. Cannot proceed without Ring SRE standards.
Standards Loading Verification
After WebFetch, confirm in your analysis:
| Ring SRE Standards | ✅ Loaded |
| Sections Extracted | Health Endpoints, Logging, Tracing |
CANNOT proceed without successful standards loading.
SRE Validation (Gate 2)
Overview
This skill VALIDATES that observability was correctly implemented by developers:
- Structured logging with trace correlation
- OpenTelemetry tracing instrumentation
CRITICAL: Role Clarification
Developers IMPLEMENT observability. SRE VALIDATES it.
| Who | Responsibility |
|---|---|
| Developers (Gate 0) | IMPLEMENT observability following Ring Standards |
| SRE Agent (Gate 2) | VALIDATE that observability is correctly implemented |
If observability is missing or incorrect:
- SRE reports issues with severity levels
- Issues go back to developers to fix
- SRE re-validates after fixes
Blocker Criteria - STOP and Report
| Decision Type | Examples | Action |
|---|---|---|
| HARD BLOCK | Service lacks JSON structured logs, verification commands not run (no evidence), user says "feature complete, add observability later" | STOP immediately - Return to Gate 0. Cannot proceed. User CANNOT override. |
| CRITICAL | Logs are fmt.Println/echo, not JSON structured | Report CRITICAL severity - Return to Gate 0. Must fix. User CANNOT override. |
Cannot Be Overridden
These requirements are NON-NEGOTIABLE and cannot be waived by:
| Requirement | Cannot Be Waived By | Rationale |
|---|---|---|
| Gate 2 execution | CTO, PM, "MVP" arguments | Observability prevents production blindness |
| Automated verification | "Developer confirms it works" | Evidence required for Gate 2 PASS |
| JSON structured logs | "Plain text is enough" | Minimum viable observability - structured logs required |
| "Complete" includes observability | Deadline pressure | Definition of done is non-negotiable |
If pressured: "Observability is PART of completion, not an addition to it. Gate 2 cannot be skipped regardless of authority or deadline."
Severity Calibration
| Severity | Scenario | Gate 2 Status | Can Proceed? |
|---|---|---|---|
| CRITICAL | Missing ALL observability (no structured logs) | FAIL | ❌ Return to Gate 0 |
| CRITICAL | fmt.Println/echo instead of JSON logs | FAIL | ❌ Return to Gate 0 |
| CRITICAL | Verification commands not run | FAIL | ❌ Cannot mark complete |
| LOW | Dashboard deferred for non-critical service | PARTIAL | ✅ Can proceed with note |
Pressure Resistance
See shared-patterns/shared-pressure-resistance.md for universal pressure scenarios (including Combined Pressure Scenarios).
Gate 2-specific note: Minimum viable observability = structured JSON logs. No exceptions.
Common Rationalizations - REJECTED
See shared-patterns/shared-anti-rationalization.md for universal anti-rationalizations.
Gate 2-specific rationalizations:
| Excuse | Reality |
|---|---|
| "Add tracing later" | Later = never. Retrofitting is 10x harder. |
| "Dashboard can come later" | Dashboard is optional but structured logging is not. |
| "Task says observability not needed" | AI cannot self-exempt. Tasks don't override gates. |
| "Basic fmt.Println logs are enough" | fmt.Println is not structured, not searchable, not alertable. JSON logs required. |
| "Feature complete, observability later" | Feature without observability is NOT complete. Redefine "complete". |
Red Flags - STOP
See shared-patterns/shared-red-flags.md for universal red flags (including Observability section).
If you catch yourself thinking ANY of those patterns, STOP immediately. Return to developers to implement observability.
Component Type Decision Tree
Not all code is a service. Use this tree to determine observability requirements:
Is it runnable code?
├── NO (library/package) → No observability required
│ └── Libraries are consumed by services that have observability
│
└── YES → Does it expose HTTP/gRPC/TCP endpoints?
├── YES (API Service) → FULL OBSERVABILITY REQUIRED
│ └── /health + /ready + structured logs + tracing
│
└── NO → Does it run continuously?
├── YES (Background Worker) → WORKER OBSERVABILITY
│ └── /health + structured logs + tracing
│
└── NO (Script/Job) → SCRIPT OBSERVABILITY
└── Structured logs + exit codes + optional /health
Component Type Requirements
| Type | JSON Logs | Tracing | Exit Codes |
|---|---|---|---|
| API Service | REQUIRED | Recommended | N/A |
| Background Worker | REQUIRED | Optional | N/A |
| CLI Tool | REQUIRED | N/A | REQUIRED |
| One-time Script | REQUIRED | N/A | REQUIRED |
| Library | N/A | N/A | N/A |
Migration Scripts and One-Time Jobs
Migration scripts still need observability, but different kind:
| Requirement | Why | Example |
|---|---|---|
| Structured logs | Track progress, debug failures | {"level":"info","step":"migrate_users","count":1500} |
| Exit codes | Orchestration needs success/failure signal | exit 0 success, exit 1 failure |
| Idempotency logging | Know if re-run is safe | {"already_migrated":true,"skipping":true} |
Anti-Rationalization Table
See shared-patterns/shared-anti-rationalization.md for universal anti-rationalizations.
Gate-Specific Anti-Rationalizations
| Rationalization | Why It's WRONG | Required Action |
|---|---|---|
| "Core functionality works, observability is enhancement" | Observability is PART of definition of done, not addition to it. Feature is NOT complete. | STOP. Return to Gate 0. Gate 2 is REQUIRED. |
| "It's just MVP, add structured logging Monday" | MVP without structured logging = debugging nightmare. "Later" = never. Retrofitting is 10x harder. | STOP. Implement JSON logging before Gate 2. |
| "Tech lead approved skipping Gate 2" | Gates are NON-NEGOTIABLE. Authority cannot waive mandatory gates. | STOP. Inform user gates cannot be waived. |
| "Plain text logs exist, that's enough" | Minimum = JSON structured logs. Plain text = Gate 2 FAIL. | STOP. Implement JSON structured logging. |
| "Developer confirms it works" | Confirmation ≠ Verification. MUST run automated validation commands. | STOP. Run verification checklist. |
| "It's just a script, runs once" | Scripts fail. Logs tell you why. | Add structured logging |
| "Library doesn't need observability" | Correct! Libraries are exempt. | Verify it's truly a library |
| "Exit code 0 is enough" | Exit code + logs = complete picture. | Add structured logs |
| "Migration runs in CI only" | CI failures need debugging too. | Structured logs required |
"Feature Complete" Redefinition Prevention
A feature is NOT complete without observability:
| What "Complete" Means | Includes Observability? |
|---|---|
| "Code works" | ❌ NO - Only partial |
| "Tests pass" | ❌ NO - Only partial |
| "Ready for review" | ❌ NO - Gate 2 before Gate 4 |
| "Gate 2 passed" | ✅ YES - This is complete |
If someone says "feature is complete, just needs structured logging":
- That statement is a contradiction
- Feature is NOT complete
- Gate 2 is PART of completion, not addition to it
- Correct response: "Feature is at Gate 1. Gate 2 (structured logging) required for completion."
Observability is definition of done, not enhancement.
Verification Checklist (MANDATORY)
Before marking Gate 2 complete, verify ALL:
| Check | Command | Expected |
|---|---|---|
| Structured logs | docker-compose logs app | head -1 | jq .level |
Returns log level |
This check MUST pass.
Mandatory Requirements
Gate 2 is NOT OPTIONAL. Services cannot proceed to production without:
| Requirement | Status | Notes |
|---|---|---|
| Structured JSON logs | REQUIRED | With trace_id correlation |
Can be deferred with explicit approval:
- Distributed tracing (for standalone workers only)
Handling Pushback
Response: "Observability is not optional. Without it: no auto-detection of failures, no efficient debugging, no auto-recovery. If time-constrained, reduce FEATURE scope, not observability scope."
Prerequisites
Before starting Gate 2:
- Gate 0 Complete: Code implementation is done
- Gate 1 Complete: DevOps setup (Dockerfile, docker-compose) is done
- Standards:
docs/PROJECT_RULES.md(local project) + Ring SRE Standards via WebFetch (https://raw.githubusercontent.com/LerianStudio/ring/main/dev-team/docs/standards/sre.md)
Step 1: Analyze Observability Implementation
Review Gate 0/1 handoff: Service type (API/Worker/Batch), Language, External dependencies. Check status of: Health endpoints, Structured logging, Tracing, Dashboard/Alerts (optional).
Step 2: Dispatch SRE Agent for Validation
Dispatch: Task(subagent_type: "sre") - VALIDATE observability (not implement). Include service info (type, language, deps) and Gate 0/1 handoff. Agent validates: JSON logging, Tracing. Returns: PASS/FAIL per component, issues by severity.
Steps 3-5: Validate Health, Logging, Tracing
| Component | Validation Commands | Expected |
|---|---|---|
| Logging | docker-compose logs app | head -5 | jq . |
JSON with timestamp/level/message/service |
| Tracing | docker-compose logs app | grep trace_id |
trace_id/span_id present |
Step 6: Prepare Handoff to Gate 3
Gate 2 Handoff contents:
| Section | Content |
|---|---|
| Status | COMPLETE/PARTIAL/NEEDS_FIXES |
| Validated | JSON logging ✓, Tracing (if applicable) ✓ |
| Results | Logging: PASS/FAIL, Tracing: PASS/FAIL/N/A |
| Issues | List by severity (CRITICAL/HIGH/MEDIUM/LOW) or "None" |
| Ready for Testing | Logs structured ✓, No Critical/High ✓ |
Observability by Service Type
| Service Type | Required | Optional |
|---|---|---|
| API Service | Structured logging, Tracing (if calls external services) | — |
| Background Worker | Structured logging | Tracing |
| Batch Job | Structured logging, Exit code handling | — |
Execution Report
Base metrics per shared-patterns/output-execution-report.md:
| Metric | Value |
|---|---|
| Duration | Xm Ys |
| Iterations | N |
| Result | PASS/FAIL/NEEDS_FIXES |
Validation Details
- logging_structured: YES/NO
- tracing_enabled: YES/NO/N/A
Issues Found
- List issues by severity (CRITICAL/HIGH/MEDIUM/LOW) or "None"
Handoff to Next Gate
- SRE validation status (complete/needs_fixes)
- Observability endpoints validated
- Issues for developers to fix (if any)
- Ready for testing: YES/NO