name	homelab-deployment
description	Automated service deployment with validation, templating, and verification - use when deploying new services, updating existing deployments, or troubleshooting deployment issues

Homelab Service Deployment

Overview

Systematic service deployment workflow that eliminates common mistakes and ensures consistent, documented deployments.

Philosophy: Deployment should be boring, predictable, and self-documenting.

When to Use

Always use for:

Deploying new services
Updating existing service configurations
Troubleshooting deployment failures
Validating deployment before execution
Rolling back failed deployments

Triggers:

User asks to "deploy "
User mentions service won't start after deployment
User asks "how do I deploy a new service?"
User requests deployment validation

Core Principle

Every deployment follows the same workflow:

Validate prerequisites
Generate configuration from templates
Deploy and verify
Document changes

No ad-hoc deployments. No manual config editing without validation.

Integration with Subagents

This skill integrates with specialized subagents for design decisions, verification, and cleanup:

Before Deployment (Phase 1):

infrastructure-architect - Design network topology, security architecture, deployment pattern selection
Invoked when: User asks "how should I deploy..." or design questions exist
Output: Comprehensive design document with network, security, resource, and integration decisions

After Deployment (Phase 5):

service-validator - Comprehensive 7-level verification with "assume failure" mindset
Invoked automatically: After service starts, before documentation
Output: Structured verification report with confidence score, pass/warn/fail status

After Verification (Phase 5.5 - Optional):

code-simplifier - Refactor configs to maintain pattern compliance, remove bloat
Invoked optionally: After successful verification, for config cleanup
Output: Simplified configs aligned with homelab patterns and ADRs

Workflow with Subagents:

User Request → infrastructure-architect (design)
            ↓
    homelab-deployment (implement)
            ↓
    service-validator (verify)
            ↓
    code-simplifier (cleanup - optional)
            ↓
    Documentation + Git Commit

The Deployment Workflow

Phase 1: Discovery & Planning

Gather information about the service:

Service Identity
- Name (container name, service name)
- Image (registry/image:tag)
- Purpose (media server, database, auth service, etc.)
- Documentation link (official docs)
Resource Requirements
- Memory limits
- CPU shares
- Disk space
- Special hardware (GPU, etc.)
Network Requirements
- Which networks? (Use network-selection-guide.md)
- Does it need reverse proxy access?
- Does it need database access?
- Does it need monitoring?
- Does it expose metrics?
Security Requirements
- Public or authenticated?
- Which middleware? (CrowdSec, rate limiting, Authelia)
- Sensitive data handling
- Secrets management
Storage Requirements
- Configuration files location
- Data storage location
- Database storage (NOCOW needed?)
- Media files (large files)
- Logs
Dependencies
- Database required?
- Cache required? (Redis)
- Other services?
- Network creation needed?

Phase 2: Pre-Deployment Validation

Run checks BEFORE any deployment:

# Execute validation script
./.claude/skills/homelab-deployment/scripts/check-prerequisites.sh \
  --service-name jellyfin \
  --image docker.io/jellyfin/jellyfin:latest \
  --networks systemd-reverse_proxy,systemd-media_services,systemd-monitoring \
  --ports 8096 \
  --config-dir ~/containers/config/jellyfin \
  --data-dir ~/containers/data/jellyfin

# Validation checklist:
# ✓ Image exists in registry
# ✓ Networks exist
# ✓ Ports available (not in use)
# ✓ Config directory created
# ✓ Data directory created with correct permissions
# ✓ Parent directories exist
# ✓ Sufficient disk space
# ✓ No conflicting services
# ✓ SELinux status verified

If validation fails, STOP. Fix issues before proceeding.

Phase 3: Configuration Generation

Generate configuration from templates:

Select Template Pattern
- Web application → templates/quadlets/web-app.container
- Database → templates/quadlets/database.container
- Monitoring → templates/quadlets/monitoring-service.container
- Background worker → templates/quadlets/background-worker.container

Customize Quadlet

# Copy template
cp .claude/skills/homelab-deployment/templates/quadlets/web-app.container \
   ~/.config/containers/systemd/jellyfin.container

# Substitute values
sed -i "s/{{SERVICE_NAME}}/jellyfin/g" ~/.config/containers/systemd/jellyfin.container
sed -i "s|{{IMAGE}}|docker.io/jellyfin/jellyfin:latest|g" ~/.config/containers/systemd/jellyfin.container
sed -i "s/{{MEMORY_LIMIT}}/4G/g" ~/.config/containers/systemd/jellyfin.container
# ... etc

Validate Quadlet Syntax

# Run validation
./.claude/skills/homelab-deployment/scripts/validate-quadlet.sh \
  ~/.config/containers/systemd/jellyfin.container

# Checks:
# ✓ Valid INI syntax
# ✓ Required fields present
# ✓ Network names match systemd- prefix
# ✓ Volume paths use :Z SELinux labels
# ✓ Health check defined
# ✓ Resource limits set

Generate Traefik Route (if externally accessible)

# Select template based on security tier
# Public → templates/traefik/public-service.yml
# Authenticated → templates/traefik/authenticated-service.yml
# Admin → templates/traefik/admin-service.yml
# API → templates/traefik/api-service.yml

# Customize route
cp .claude/skills/homelab-deployment/templates/traefik/authenticated-service.yml \
   ~/containers/config/traefik/dynamic/jellyfin-router.yml

# Substitute values
sed -i "s/{{SERVICE_NAME}}/jellyfin/g" ~/containers/config/traefik/dynamic/jellyfin-router.yml
sed -i "s/{{HOSTNAME}}/jellyfin.patriark.org/g" ~/containers/config/traefik/dynamic/jellyfin-router.yml
sed -i "s/{{PORT}}/8096/g" ~/containers/config/traefik/dynamic/jellyfin-router.yml

Generate Prometheus Scrape Config (if metrics exposed)

# Add to prometheus.yml
# Template: templates/prometheus/service-scrape-config.yml

Phase 4: Deployment Execution

Deploy the service:

# Reload systemd to recognize new quadlet
systemctl --user daemon-reload

# Enable service for auto-start
systemctl --user enable jellyfin.service

# Start service
systemctl --user start jellyfin.service

# Wait for healthy state
for i in {1..30}; do
  podman healthcheck run jellyfin && break
  sleep 2
done

# Reload Traefik (if route added)
# Traefik watches files, no manual reload needed

# Restart Prometheus (if scrape config added)
systemctl --user restart prometheus.service

Phase 5: Post-Deployment Verification

Invoke service-validator subagent for comprehensive verification:

The service-validator subagent uses a 7-level verification framework with an "assume failure until proven otherwise" mindset:

Level 1: Service Health (CRITICAL) - Systemd active, container running, health checks passing, no crash loops, clean logs
Level 2: Network Connectivity (HIGH) - On expected networks, internal endpoint accessible, DNS resolution
Level 3: External Routing (HIGH) - Traefik route exists, external URL responds, TLS valid, security headers present
Level 4: Authentication Flow (HIGH) - Authelia redirect working, middleware chain correct
Level 5: Monitoring Integration (MEDIUM) - Prometheus scraping, Loki ingestion, Grafana dashboard
Level 6: Configuration Drift (LOW) - Running config matches quadlet definition
Level 7: Security Posture (CRITICAL) - CrowdSec active, rate limiting, no direct host exposure

Automated verification:

# Claude automatically invokes service-validator subagent
# Which runs: ~/.claude/skills/homelab-deployment/scripts/verify-deployment.sh

# Manual verification (if needed):
~/.claude/skills/homelab-deployment/scripts/verify-deployment.sh \
  jellyfin \
  https://jellyfin.patriark.org \
  true  # expect Authelia auth

Verification outcomes:

VERIFIED (>90% confidence): Proceed to Phase 5.5 (optional simplification), then Phase 6 (documentation)
WARNINGS (70-90% confidence): Review warnings, decide if acceptable, proceed with caution
FAILED (<70% confidence): STOP - Invoke systematic-debugging skill, investigate failures, consider rollback

Never document failed deployments. Verification must pass before proceeding.

Phase 5.5: Code Simplification (Optional)

Invoke code-simplifier subagent to refactor configs:

After successful verification, optionally clean up configurations to maintain pattern compliance:

# Claude may invoke code-simplifier subagent
# Simplifies: Quadlet directives, Traefik routes, environment variables
# Aligns with: Homelab patterns, ADRs, template standards

Simplification examples:

Consolidate duplicate volume mounts
Use systemd variables (%h for home directory)
Deduplicate middleware chains in Traefik
Remove commented-out configuration
Align with pattern templates

Safety:

BTRFS snapshot created before simplification
Service restarted and re-verified after changes
Rollback if re-verification fails

Skip simplification if:

First deployment for this pattern (let it stabilize first)
Security-critical configs (don't simplify Authelia, CrowdSec)
Workarounds for known issues
Config less than 24 hours old

Phase 6: Documentation

Generate documentation automatically:

Service Guide (docs/10-services/guides/jellyfin.md)
- Service description
- Configuration details
- Network topology
- Management commands
- Troubleshooting
Deployment Journal (docs/10-services/journal/YYYY-MM-DD-jellyfin-deployment.md)
- Deployment timestamp
- Configuration used
- Verification results
- Issues encountered
- Resolution steps
Update CLAUDE.md
- Add service to Common Commands section
- Add to Troubleshooting section if needed

Phase 7: Git Commit

Commit deployment changes:

# Add all deployment artifacts
git add ~/.config/containers/systemd/jellyfin.container
git add ~/containers/config/traefik/dynamic/jellyfin-router.yml
git add ~/containers/config/prometheus/prometheus.yml  # if modified
git add docs/10-services/guides/jellyfin.md
git add docs/10-services/journal/$(date +%Y-%m-%d)-jellyfin-deployment.md

# Commit with structured message
git commit -m "$(cat <<'EOF'
Deploy Jellyfin media server

- Add quadlet configuration (4G memory, systemd networks)
- Configure Traefik route with Authelia authentication
- Add Prometheus scrape target
- Generate service documentation

Configuration:
  Image: docker.io/jellyfin/jellyfin:latest
  Networks: reverse_proxy, media_services, monitoring
  Middleware: CrowdSec → Rate limit → Authelia

Verification: ✓ Service healthy, ✓ External access working
EOF
)"

# Push changes
git push origin main

Rollback Procedure

If deployment fails:

# Stop service
systemctl --user stop jellyfin.service

# Disable service
systemctl --user disable jellyfin.service

# Remove container
podman rm jellyfin

# Remove quadlet
rm ~/.config/containers/systemd/jellyfin.container

# Remove Traefik route
rm ~/containers/config/traefik/dynamic/jellyfin-router.yml

# Reload systemd
systemctl --user daemon-reload

# Document rollback reason

Integration with Other Skills

This skill works with:

systematic-debugging: Use when deployment fails
homelab-intelligence: Verify system health before deployment
git-advanced-workflows: Clean commit history
security-audit (future): Validate security configuration

Templates Reference

Quadlet Template Variables

All templates support these substitutions:

{{SERVICE_NAME}}     - Container/service name
{{IMAGE}}            - Container image (registry/name:tag)
{{MEMORY_LIMIT}}     - Memory limit (e.g., 4G)
{{MEMORY_HIGH}}      - Memory high watermark (e.g., 3G)
{{CPU_SHARES}}       - CPU shares (optional)
{{NICE}}             - Process priority (optional)
{{CONFIG_DIR}}       - Configuration directory path
{{DATA_DIR}}         - Data directory path
{{NETWORKS}}         - Comma-separated network list
{{PORTS}}            - Exposed ports
{{ENVIRONMENT}}      - Environment variables
{{HEALTH_CMD}}       - Health check command

Network Selection Guide

Use this decision tree:

Service needs external access (web UI/API)?
  YES → Add systemd-reverse_proxy
  NO  → Skip

Service needs database access?
  YES → Add systemd-database (if exists) or service-specific network
  NO  → Skip

Service provides/consumes metrics?
  YES → Add systemd-monitoring
  NO  → Skip

Service handles authentication?
  YES → Add systemd-auth_services
  NO  → Skip

Service processes media?
  YES → Add systemd-media_services
  NO  → Skip

Service manages photos?
  YES → Add systemd-photos
  NO  → Skip

IMPORTANT: First network determines default route (internet access)!

Middleware Selection Guide

Security tiers:

PUBLIC SERVICE (no auth required):
  crowdsec-bouncer@file
  rate-limit-public@file
  security-headers-public@file

AUTHENTICATED SERVICE (standard):
  crowdsec-bouncer@file
  rate-limit@file
  authelia@file
  security-headers@file

ADMIN SERVICE (strict):
  crowdsec-bouncer@file
  admin-whitelist@file
  rate-limit-strict@file
  authelia@file
  security-headers-strict@file

API SERVICE:
  crowdsec-bouncer@file
  rate-limit@file
  cors-headers@file
  authelia@file
  security-headers@file

INTERNAL ONLY:
  internal-only@file
  rate-limit@file
  security-headers@file

Common Patterns

Pattern 1: Web Application with Database

Components:

Database service (PostgreSQL/MySQL/Redis)
Web application service
Traefik route
Prometheus scraping (optional)

Network topology:

Database:     systemd-database (internal only)
Web app:      systemd-reverse_proxy, systemd-database, systemd-monitoring
Traefik:      systemd-reverse_proxy (already configured)
Prometheus:   systemd-monitoring (already configured)

Example: Vaultwarden (password manager)

Pattern 2: Monitoring Service

Components:

Monitoring service (exporter, scraper, etc.)
Prometheus scrape config
Grafana dashboard (optional)

Network topology:

Service:      systemd-monitoring
Prometheus:   systemd-monitoring

Example: Node Exporter, cAdvisor

Pattern 3: Media Processing Service

Components:

Media service
Traefik route with optional auth
Large storage volumes
Optional transcoding (GPU access)

Network topology:

Service:      systemd-reverse_proxy, systemd-media_services, systemd-monitoring

Example: Jellyfin, Plex, Immich

Pattern 4: Authentication Service

Components:

Auth service
Session storage (Redis)
Traefik ForwardAuth configuration
User database

Network topology:

Auth service: systemd-reverse_proxy, systemd-auth_services
Redis:        systemd-auth_services

Example: Authelia, Authentik

Error Handling

Error: "Network not found"

Cause: Network doesn't exist or wrong name

Solution:

# Check existing networks
podman network ls

# Create network if needed
podman network create systemd-<name>

# Fix quadlet network name (must start with systemd-)
sed -i 's/Network=reverse_proxy/Network=systemd-reverse_proxy/' \
  ~/.config/containers/systemd/service.container

Error: "Permission denied" on volume mount

Cause: Missing :Z SELinux label

Solution:

# Fix volume mount in quadlet
sed -i 's|:/config|:/config:Z|' ~/.config/containers/systemd/service.container
sed -i 's|:/data|:/data:Z|' ~/.config/containers/systemd/service.container

Error: "Port already in use"

Cause: Another service using the port

Solution:

# Find what's using the port
ss -tulnp | grep <port>

# Change service port OR stop conflicting service

Error: "Service fails health check"

Cause: Health check command incorrect or service not ready

Solution:

# Check service logs
journalctl --user -u service.service -n 50

# Verify health check command
podman inspect service | grep -A 5 Healthcheck

# Test health check manually
podman healthcheck run service

# Increase health check timeout if needed

Error: "Traefik 502 Bad Gateway"

Cause: Service not reachable from Traefik

Solution:

# 1. Verify service running
systemctl --user status service.service

# 2. Check networks match
podman network inspect systemd-reverse_proxy | grep traefik
podman network inspect systemd-reverse_proxy | grep service

# 3. Test from Traefik container
podman exec traefik wget -O- http://service:port/

# 4. Check Traefik logs
podman logs traefik | grep service

Success Criteria

Deployment is complete when:

Service running and healthy
Internal endpoint accessible
External URL accessible (if public)
Authentication working (if required)
Monitoring configured (if applicable)
Documentation generated
Git commit created
No errors in logs

Notes

Always validate before deploying
Use templates, don't create from scratch
Document as you deploy
Test thoroughly before considering complete
Roll back if verification fails

This skill ensures every deployment is systematic, validated, and documented.

homelab-deployment

Install Skill

SKILL.md

Homelab Service Deployment

Overview

When to Use

Core Principle

Integration with Subagents

The Deployment Workflow

Phase 1: Discovery & Planning

Phase 2: Pre-Deployment Validation

Phase 3: Configuration Generation

Phase 4: Deployment Execution

Phase 5: Post-Deployment Verification

Phase 5.5: Code Simplification (Optional)

Phase 6: Documentation

Phase 7: Git Commit

Rollback Procedure

Integration with Other Skills

Templates Reference

Quadlet Template Variables

Network Selection Guide

Middleware Selection Guide

Common Patterns

Pattern 1: Web Application with Database

Pattern 2: Monitoring Service

Pattern 3: Media Processing Service

Pattern 4: Authentication Service

Error Handling

Error: "Network not found"

Error: "Permission denied" on volume mount

Error: "Port already in use"

Error: "Service fails health check"

Error: "Traefik 502 Bad Gateway"

Success Criteria

Notes