| name | security-controls-design |
| description | Layered security controls at trust boundaries - defense-in-depth, least privilege, fail-secure |
Security Controls Design
Overview
Design security controls as layered defenses at trust boundaries. Core principle: Apply systematic checks at every boundary to ensure no single control failure compromises security.
Key insight: List specific controls after identifying WHERE to apply them (trust boundaries first, then controls).
When to Use
Load this skill when:
- Implementing authentication/authorization systems
- Hardening API endpoints, databases, file storage
- Designing data protection mechanisms
- Securing communication channels
- Protecting sensitive operations
Symptoms you need this:
- "How do I secure this API/database/upload feature?"
- "What controls should I implement?"
- "How do I prevent unauthorized access to X?"
- "How do I harden this system?"
Don't use for:
- Threat modeling (use
ordis/security-architect/threat-modelingfirst) - Code-level security patterns (use
ordis/security-architect/secure-code-patterns) - Reviewing existing designs (use
ordis/security-architect/security-architecture-review)
Core Methodology: Trust Boundaries First
DON'T start with: "What controls should I implement?"
DO start with: "Where are the trust boundaries?"
###Step 1: Identify Trust Boundaries
Trust boundaries are points where data/requests cross from less-trusted to more-trusted zones.
Common boundaries:
- Internet → API Gateway
- API Gateway → Application Server
- Application → Database
- Application → File Storage
- Unauthenticated → Authenticated
- User Role → Admin Role
- External Service → Internal Service
Example: File Upload System
Trust Boundaries:
1. User Browser → Upload Endpoint (UNTRUSTED → APP)
2. Upload Endpoint → Virus Scanner (APP → SCANNER)
3. Scanner → Storage (SCANNER → S3)
4. Storage → Display (S3 → USER)
5. Storage → Internal Processing (S3 → APP)
Step 2: Apply Defense-in-Depth at Each Boundary
For EACH boundary, apply multiple control layers. If one fails, others provide backup.
Defense-in-Depth Checklist
Use this checklist at every trust boundary:
Layer 1: Validation (First Line)
- Input validation: Type, format, size, allowed values
- Sanitization: Remove dangerous characters, escape output
- Canonicalization: Resolve to standard form (prevent bypass)
Layer 2: Authentication (Who Are You?)
- Identity verification: Credentials, tokens, certificates
- Multi-factor authentication: For sensitive boundaries
- Session management: Secure tokens, expiration, rotation
Layer 3: Authorization (What Can You Do?)
- Access control checks: RBAC, ABAC, resource-level
- Least privilege enforcement: Grant minimum necessary
- Privilege escalation prevention: No path to higher access
Layer 4: Rate Limiting (Abuse Prevention)
- Request rate limits: Per-IP, per-user, per-endpoint
- Resource quotas: Prevent resource exhaustion
- Anomaly detection: Flag unusual patterns
Layer 5: Audit Logging (Detective)
- Security event logging: Who, what, when, where, outcome
- Tamper-proof logs: Write-only for applications
- Alerting: Automated detection of suspicious activity
Layer 6: Encryption (Confidentiality)
- Data in transit: TLS 1.3, certificate validation
- Data at rest: Encryption for sensitive data
- Key management: Secure storage, rotation, separation
Example Application (API Authentication Boundary):
Internet → API Gateway boundary:
Layer 1 (Validation):
- Validate Authorization header present and well-formed
- Check request size limits (prevent DoS)
- Validate content-type and payload structure
Layer 2 (Authentication):
- Verify JWT signature (RS256, public key validation)
- Check token expiration (exp claim)
- Verify token not revoked (check Redis revocation list)
Layer 3 (Authorization):
- Extract scopes from token
- Verify endpoint requires scope present in token
- Check resource-level permissions (can user access THIS resource?)
Layer 4 (Rate Limiting):
- Per-token: 1000 requests/minute
- Per-IP: 100 requests/minute (catch token sharing)
- Per-endpoint: Stricter limits on write operations
Layer 5 (Audit Logging):
- Log authentication attempts (success/failure)
- Log authorization decisions (allowed/denied)
- Log resource access (who accessed what)
Layer 6 (Encryption):
- Enforce TLS 1.3 only (reject unencrypted)
- Validate certificate chain
- Store tokens encrypted in session store
If ANY layer fails, others provide defense.
Fail-Secure Patterns
When a control fails, system should default to secure state (deny access, close connection, reject request).
Fail-Closed (Secure) vs Fail-Open (Insecure)
| Situation | Fail-Open (❌ BAD) | Fail-Closed (✅ GOOD) |
|---|---|---|
| Auth service down | Allow all requests through | Deny all requests until service recovers |
| Token validation fails | Treat as valid | Reject request |
| Database unreachable | Skip permission check | Deny access |
| Rate limit store unavailable | No rate limiting | Apply strictest default limit |
| Audit log fails to write | Continue operation | Reject operation |
Examples of Fail-Secure Implementation
Example 1: Authentication Service Failure
def authenticate_request(request):
try:
token = extract_token(request)
user = auth_service.validate_token(token) # External service call
return user
except AuthServiceUnavailable:
# ❌ FAIL-OPEN: return AnonymousUser() # Let them through
# ✅ FAIL-CLOSED: raise Unauthorized("Authentication service unavailable")
raise Unauthorized("Authentication service unavailable")
except InvalidToken:
raise Unauthorized("Invalid token")
Example 2: Rate Limiter Failure
def check_rate_limit(user_id):
try:
redis.incr(f"rate:{user_id}")
count = redis.get(f"rate:{user_id}")
if count > LIMIT:
raise RateLimitExceeded()
except RedisConnectionError:
# ❌ FAIL-OPEN: return # Let request through
# ✅ FAIL-CLOSED: Apply strictest default limit
# If Redis is down, apply aggressive in-memory rate limit
in_memory_limiter.check(user_id, limit=10) # Much stricter than normal
Example 3: Database Permission Check
def can_user_access_resource(user_id, resource_id):
try:
permission = db.query(
"SELECT can_read FROM permissions WHERE user_id = ? AND resource_id = ?",
user_id, resource_id
)
return permission.can_read
except DatabaseConnectionError:
# ❌ FAIL-OPEN: return True # Assume they have access
# ✅ FAIL-CLOSED: return False # Deny access if can't verify
logger.error(f"DB unavailable, denying access for user={user_id} resource={resource_id}")
return False
Example 4: File Type Validation
def validate_file_upload(file):
# Layer 1: Check extension
if file.extension not in ALLOWED_EXTENSIONS:
raise ValidationError("Invalid file type")
# Layer 2: Check magic bytes
try:
magic_bytes = file.read(16)
if not is_valid_magic_bytes(magic_bytes):
# ✅ FAIL-CLOSED: If magic bytes don't match, reject
# Even if extension passed, magic bytes take precedence
raise ValidationError("File content doesn't match extension")
except Exception as e:
# ❌ FAIL-OPEN: return True # Couldn't check, assume valid
# ✅ FAIL-CLOSED: raise ValidationError("Could not validate file")
raise ValidationError(f"Could not validate file: {e}")
Principle: When in doubt, deny. It's better to have a false positive (deny legitimate request) than false negative (allow malicious request).
Least Privilege Principle
Grant minimum necessary access for each component to perform its function. No more.
Application Method
For each component, ask three questions:
- What does it NEED to do? (functional requirements)
- What's the MINIMUM access to achieve that? (reduce scope)
- What can it NEVER do? (explicit denials)
Example: Database Access Roles
Web Application Role:
-- What it NEEDS: Read customers, write audit logs
GRANT SELECT ON customers TO web_app_user;
GRANT INSERT ON audit_logs TO web_app_user;
-- What's MINIMUM: No DELETE, no UPDATE on audit logs (immutable), no admin tables
REVOKE DELETE ON customers FROM web_app_user;
REVOKE ALL ON admin_users FROM web_app_user;
-- Explicit NEVER: Cannot modify audit logs (tamper-proof)
REVOKE UPDATE, DELETE ON audit_logs FROM web_app_user;
-- Row-level security: Only active customers
CREATE POLICY web_app_access ON customers
FOR SELECT TO web_app_user
USING (status = 'active');
Analytics Role:
-- What it NEEDS: Read non-PII customer data for analytics
-- What's MINIMUM: View with PII columns excluded
CREATE VIEW customers_analytics AS
SELECT customer_id, country, subscription_tier, created_at
FROM customers; -- Excludes: name, email, address
GRANT SELECT ON customers_analytics TO analytics_user;
-- What it can NEVER do: Access PII, modify data, see payment info
REVOKE ALL ON customers FROM analytics_user;
REVOKE ALL ON payment_info FROM analytics_user;
SET default_transaction_read_only = true FOR analytics_user;
File System Permissions
Application Server:
# What it NEEDS: Read config, write logs, read/write uploads
/etc/app/config/ → Read-only (owner: root, chmod 640, group: app)
/var/log/app/ → Write-only (owner: app, chmod 200, append-only)
/var/uploads/ → Read/write (owner: app, chmod 700)
# What it can NEVER do: Write to config, execute from uploads
/etc/app/config/ → No write permissions
/var/uploads/ → Mount with noexec flag (prevent execution)
API Scopes (OAuth2 Pattern)
# User requests minimal scopes
scopes_requested = ["read:profile", "read:posts"]
# DON'T grant admin scopes by default
# DO grant only what was requested and approved
token = create_token(user, scopes=scopes_requested)
# At each endpoint, verify scope
@require_scope("write:posts")
def create_post(request):
# This endpoint is inaccessible with read:posts scope
pass
Principle: Default deny, explicit allow. Start with no access, grant only what's needed.
Separation of Duties
No single component/person/account should have complete control over a critical operation.
Patterns
Pattern 1: Multi-Signature Approvals
Example: Production Deployments
# Require 2 approvals from different teams
approvals:
required: 2
teams:
- engineering-leads
- security-team
# Cannot approve own PR
prevent_self_approval: true
Pattern 2: Split Responsibilities
Example: Payment Processing
# Component A: Initiates payment (can create, cannot approve)
payment_service.initiate_payment(amount, account)
# Component B: Approves payment (can approve, cannot create)
# Different credentials, different service
approval_service.approve_payment(payment_id)
# Component C: Executes payment (can execute, cannot create/approve)
# Only accepts approved payments
execution_service.execute_payment(approved_payment_id)
No single service can create AND approve AND execute a payment.
Pattern 3: Key Splitting
Example: Encryption Key Management
# Master key split into 3 shares using Shamir Secret Sharing
# Require 2 of 3 shares to reconstruct
shares = split_key(master_key, threshold=2, num_shares=3)
# Distribute to different teams/locations
security_team.store(shares[0])
ops_team.store(shares[1])
compliance_team.store(shares[2])
# Reconstruction requires 2 teams to cooperate
reconstructed = reconstruct_key([shares[0], shares[1]])
Pattern 4: Admin Operations Require Approval
Example: Database Admin Actions
# Admin initiates action (creates request, cannot execute)
admin_request = AdminRequest(
action="DELETE_USER",
user_id=12345,
reason="GDPR erasure request",
requested_by=admin_id
)
# Second admin reviews and approves (cannot initiate)
reviewer.approve(admin_request, reviewer_id=different_admin_id)
# System executes after approval (automated, no single admin control)
if admin_request.is_approved():
execute_admin_action(admin_request)
Principle: Break critical paths into multiple steps requiring different actors.
Control Verification Method
For each control you design, ask: "What if this control fails?"
Verification Checklist
For each control:
- What attack does this prevent? (threat it addresses)
- How can this control fail? (failure modes)
- What happens if it fails? (impact)
- What's the next layer of defense? (backup control)
- Is failure logged/detected? (observability)
Example: API Token Validation
Control: Verify JWT signature before processing request
- What attack: Prevents forged tokens, ensures authenticity
- How it can fail:
- Public key unavailable (service down)
- Expired token not caught (clock skew)
- Token revocation list unavailable (Redis down)
- Signature algorithm downgrade attack (accept HS256 instead of RS256)
- What if it fails:
- Public key unavailable → Fail-closed (deny all requests)
- Expired token → Layer 2: Check expiration explicitly
- Revocation list down → Layer 3: Apply strict rate limits as fallback
- Algorithm downgrade → Layer 4: Explicitly require RS256, reject others
- Next layer:
- Authorization checks (even with valid token, check permissions)
- Rate limiting (limit damage from compromised token)
- Audit logging (detect unusual access patterns)
- Failure logged: Yes → Log signature validation failures, alert on spike
Outcome: Designed 4 layers of defense against token attacks.
Example: File Upload Validation
Control: Check file extension against allowlist
- What attack: Prevents upload of executable files (.exe, .sh)
- How it can fail:
- Attacker renames malware.exe → malware.jpg
- Double extension: malware.jpg.exe
- Case variation: malware.ExE
- What if it fails: Malicious file stored, potentially executed
- Next layers:
- Layer 2: Magic byte verification (check file content, not name)
- Layer 3: Antivirus scanning (detect known malware)
- Layer 4: File reprocessing (re-encode images, destroying embedded code)
- Layer 5: noexec mount (storage prevents execution)
- Layer 6: Separate domain for user content (CSP prevents XSS)
- Failure logged: Yes → Log validation failures, rejected files
Outcome: Extension check is Layer 1 of 6. If bypassed, 5 more layers prevent exploitation.
Quick Reference: Control Selection
For every trust boundary, apply this checklist:
| Layer | Control Type | Example |
|---|---|---|
| 1. Validation | Input checking | Size limits, type validation, sanitization |
| 2. Authentication | Identity verification | JWT validation, certificate checks, MFA |
| 3. Authorization | Permission checks | RBAC, resource-level access, least privilege |
| 4. Rate Limiting | Abuse prevention | Per-user limits, anomaly detection, quotas |
| 5. Audit Logging | Detective | Security events, tamper-proof logs, alerting |
| 6. Encryption | Confidentiality | TLS in transit, encryption at rest, key management |
For each control:
- Define fail-secure behavior (what happens if it fails?)
- Apply least privilege (minimum necessary access)
- Verify separation of duties (no single point of complete control)
- Test "what if this fails?" (ensure backup layers exist)
Common Mistakes
❌ Designing Controls Before Identifying Boundaries
Wrong: "I need authentication and authorization and rate limiting"
Right: "Where are my trust boundaries? → Internet→API, API→Database → At each: apply layered controls"
Why: Controls are meaningless without knowing WHERE to apply them.
❌ Single Layer of Defense
Wrong: "Authentication is enough security"
Right: "Authentication + Authorization + Rate Limiting + Audit Logging"
Why: If authentication is bypassed (bug, misconfiguration), other layers provide defense.
❌ Fail-Open Defaults
Wrong:
try:
user = auth_service.validate(token)
except ServiceUnavailable:
user = AnonymousUser() # Let them through
Right:
try:
user = auth_service.validate(token)
except ServiceUnavailable:
raise Unauthorized("Auth service unavailable")
Why: Control failure should result in secure state (deny), not insecure state (allow).
❌ Excessive Privileges
Wrong: Grant web application full database access (SELECT, INSERT, UPDATE, DELETE on all tables)
Right: Grant only needed operations per table (SELECT on customers, INSERT-only on audit_logs)
Why: Minimizes damage from compromised application (SQL injection, stolen credentials).
❌ Single Point of Control
Wrong: One admin account can initiate, approve, and execute critical operations
Right: Separate accounts for initiate vs approve, require multi-signature
Why: Prevents single compromised account from complete system control.
❌ No Verification of "What If This Fails?"
Wrong: Design controls, assume they work
Right: For each control, ask "how can this fail?" and design backup layers
Why: Controls fail due to bugs, misconfigurations, attacks. Backup layers provide resilience.
Cross-References
Use BEFORE this skill:
ordis/security-architect/threat-modeling- Identify threats first, then design controls to address them
Use WITH this skill:
muna/technical-writer/documentation-structure- Document control architecture as ADR
Use AFTER this skill:
ordis/security-architect/security-architecture-review- Review controls for completeness
Real-World Impact
Well-designed controls using this methodology:
- Multi-layered API authentication catching token forgery even when signature validation was bypassed (algorithm confusion attack)
- Database access controls limiting SQL injection damage to read-only operations (least privilege prevented data deletion)
- File upload defenses stopping malware despite extension check bypass (magic bytes + antivirus + reprocessing layers)
Key lesson: Systematic application of defense-in-depth at trust boundaries is more effective than ad-hoc control selection.