| name | code-review-agent |
| description | Comprehensive security and quality code review agent that checks for OWASP vulnerabilities, GDPR compliance, accessibility standards, and code quality issues. |
Code Review Agent
Purpose
The Code Review Agent performs comprehensive automated code review analyzing implementations for:
- Security Vulnerabilities - OWASP Top 10 compliance
- Code Quality - Anti-patterns, optimization opportunities
- GDPR Compliance - Data privacy, consent management, user rights
- Accessibility - WCAG 2.1 AA standards compliance
When to Use This Skill
Invoke the code review agent:
- After Development Stage - Review Developer A and Developer B implementations
- Before Arbitration - Ensure both solutions meet quality/security standards
- Before Production Deployment - Final security and compliance check
- On-Demand Reviews - Security audit of existing codebase
Responsibilities
1. Security Analysis (OWASP Top 10)
Detects all OWASP Top 10 (2021) vulnerabilities:
- A01 - Broken Access Control - Authorization bypasses, IDOR vulnerabilities
- A02 - Cryptographic Failures - Weak encryption, hardcoded secrets
- A03 - Injection - SQL, command, XSS, template injection
- A04 - Insecure Design - Missing security controls, threat modeling gaps
- A05 - Security Misconfiguration - Default credentials, unnecessary features
- A06 - Vulnerable Components - Outdated dependencies, known CVEs
- A07 - Authentication Failures - Weak passwords, session management
- A08 - Integrity Failures - Unsigned updates, insecure deserialization
- A09 - Logging Failures - Missing audit logs, insufficient monitoring
- A10 - SSRF - Unvalidated URL requests
Example Detection:
# CRITICAL - SQL Injection detected
# File: database.py:45
cursor.execute(f"SELECT * FROM users WHERE id={user_id}")
# Recommendation: Use parameterized queries
cursor.execute("SELECT * FROM users WHERE id=?", (user_id,))
2. Code Quality Review
Identifies anti-patterns and optimization issues:
Anti-Patterns:
- God Objects (too many responsibilities)
- Spaghetti Code (tangled control flow)
- Magic Numbers/Strings
- Duplicate Code (DRY violations)
- Long Methods (>50 lines)
- Deep Nesting (>3 levels)
- Tight Coupling
Optimization Issues:
- Inefficient algorithms (O(n²) vs O(n))
- N+1 database query problems
- Missing caching
- Memory leaks (unclosed resources)
- Blocking I/O in async contexts
Example Detection:
# MEDIUM - God Object detected
# File: user_manager.py:1
class UserManager: # 850 lines, 45 methods
# Handles: auth, validation, email, logging, billing
# Recommendation: Split into UserService, AuthService,
# EmailService, BillingService per SRP
3. GDPR Compliance
Ensures data privacy and regulatory compliance:
Required Implementations:
- ✅ Data minimization (Article 5)
- ✅ Consent management (Article 6, 7)
- ✅ Right to access (Article 15)
- ✅ Right to erasure (Article 17)
- ✅ Data portability (Article 20)
- ✅ Privacy by design (Article 25)
- ✅ Breach notification (Article 33, 34)
- ✅ Data Processing Agreements (Article 28)
Example Detection:
# HIGH - GDPR Article 17 violation
# File: user_service.py:120
def delete_user(user_id):
# TODO: implement
# Recommendation: Implement complete data deletion across
# all tables, logs, and backups. Confirm deletion to user.
4. Accessibility (WCAG 2.1 AA)
Validates compliance with WCAG 2.1 Level AA:
Perceivable:
- Text alternatives (alt attributes)
- Captions for media
- Semantic HTML structure
- Color contrast ≥4.5:1
- Resizable text (200%)
Operable:
- Keyboard accessible
- No keyboard traps
- Adjustable timing
- No flashing content
- Skip navigation links
Understandable:
- Language specified (lang attribute)
- Predictable navigation
- Input error identification
- Form labels
Robust:
- Valid HTML
- ARIA roles/properties
- Status messages
Example Detection:
<!-- MEDIUM - WCAG 1.1.1 violation -->
<!-- File: dashboard.html:34 -->
<img src="chart.png">
<!-- Recommendation: Add descriptive alt text -->
<img src="chart.png" alt="Monthly revenue chart showing 15% growth">
Review Process
1. Input
- Developer implementation directory (
/tmp/developer-a/or/tmp/developer-b/) - Task context (title, description)
- ADR for architectural decisions
2. Analysis
Uses LLM APIs (OpenAI/Anthropic) to:
- Parse all implementation files (.py, .js, .html, .css, etc.)
- Analyze against OWASP, GDPR, WCAG standards
- Detect code quality issues and anti-patterns
- Generate categorized findings with severity levels
3. Output
JSON Report:
{
"review_summary": {
"overall_status": "PASS|NEEDS_IMPROVEMENT|FAIL",
"total_issues": 15,
"critical_issues": 0,
"high_issues": 3,
"medium_issues": 8,
"low_issues": 4,
"score": {
"code_quality": 85,
"security": 75,
"gdpr_compliance": 90,
"accessibility": 80,
"overall": 82
}
},
"issues": [
{
"category": "SECURITY",
"subcategory": "A03:2021 - SQL Injection",
"severity": "HIGH",
"file": "database.py",
"line": 45,
"description": "...",
"recommendation": "...",
"owasp_reference": "..."
}
]
}
Markdown Summary:
- Overall assessment
- Category scores
- Critical/High issues detailed
- Positive findings
- Actionable recommendations
Severity Levels
| Severity | Criteria | Examples |
|---|---|---|
| CRITICAL | Security breach risk, GDPR fine risk (€20M), accessibility blocker | SQL injection, exposed secrets, missing data deletion |
| HIGH | Significant vulnerability, major compliance gap | Weak encryption, missing consent, inaccessible forms |
| MEDIUM | Code quality issue, minor security concern | God objects, missing CSRF tokens, low contrast |
| LOW | Style/convention, optimization opportunity | Magic numbers, inefficient algorithm |
Decision Criteria
PASS (Implementation acceptable):
- 0 critical issues
- ≤2 high issues
- Overall score ≥80
NEEDS_IMPROVEMENT (Can proceed with warnings):
- 0 critical issues
- ≤5 high issues
- Overall score ≥60
FAIL (Must fix before proceeding):
- Any critical issues
5 high issues
- Overall score <60
Integration with Pipeline
Placement in Pipeline
Development Stage (Developer A + B)
↓
📋 Code Review Agent ← NEW STAGE
↓
Validation (TDD checks)
↓
Arbitration (Select winner)
↓
Integration
Communication
Receives:
- Implementation files from developers
- Task context from orchestrator
- ADR from architecture agent
Sends:
- Review report to orchestrator
- Issues list to validation agent
- Pass/Fail status to arbitration agent
Usage Examples
Standalone Usage
python3 code_review_agent.py \
--developer developer-a \
--implementation-dir /tmp/developer-a/ \
--output-dir /tmp/code-reviews/ \
--task-title "User Authentication" \
--task-description "Implement JWT-based auth"
Programmatic Usage
from code_review_agent import CodeReviewAgent
agent = CodeReviewAgent(
developer_name="developer-a",
llm_provider="openai"
)
result = agent.review_implementation(
implementation_dir="/tmp/developer-a/",
task_title="User Authentication",
task_description="Implement JWT auth with bcrypt",
output_dir="/tmp/code-reviews/"
)
print(f"Status: {result['review_status']}")
print(f"Score: {result['overall_score']}/100")
print(f"Critical Issues: {result['critical_issues']}")
Pipeline Integration
# In pipeline orchestrator
from code_review_agent import CodeReviewAgent
# Review Developer A
review_agent_a = CodeReviewAgent(developer_name="developer-a")
review_a = review_agent_a.review_implementation(
implementation_dir="/tmp/developer-a/",
task_title=task_title,
task_description=task_description,
output_dir="/tmp/code-reviews/"
)
# Review Developer B
review_agent_b = CodeReviewAgent(developer_name="developer-b")
review_b = review_agent_b.review_implementation(
implementation_dir="/tmp/developer-b/",
task_title=task_title,
task_description=task_description,
output_dir="/tmp/code-reviews/"
)
# Use reviews in arbitration
if review_a['critical_issues'] > 0:
# Disqualify Developer A
winner = "developer-b"
Configuration
Environment Variables
# LLM Provider (default: openai)
ARTEMIS_LLM_PROVIDER=openai
# API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Optional: Specific model
ARTEMIS_LLM_MODEL=gpt-4o
Supported Models
OpenAI:
- gpt-4o (default)
- gpt-4o-mini
- gpt-4-turbo
Anthropic:
- claude-sonnet-4-5-20250929 (default)
- claude-3-5-sonnet-20241022
Cost Considerations
Typical review costs per implementation:
- Prompt tokens: 3,000-5,000 (code + prompt)
- Completion tokens: 2,000-4,000 (review JSON)
- Total: 5,000-9,000 tokens
Estimated Costs
| Model | Cost per Review | Recommended Use |
|---|---|---|
| GPT-4o | $0.05-$0.12 | Production reviews |
| GPT-4o-mini | $0.005-$0.01 | Development/testing |
| Claude Sonnet 4.5 | $0.10-$0.20 | Thorough security audits |
Best Practices
- Review Early - Catch issues before arbitration
- Review Both Developers - Ensures fair comparison
- Monitor Critical Issues - Auto-reject implementations with critical issues
- Track Metrics - Monitor security score trends over time
- Use in CI/CD - Automated reviews on every commit
- Combine with Static Analysis - Complement with Bandit, ESLint, SonarQube
Limitations
- Static Analysis Only - Cannot detect runtime vulnerabilities
- No Execution - Cannot find logic errors requiring execution
- Language Coverage - Best for Python, JavaScript, HTML/CSS
- LLM Dependent - Quality depends on LLM capabilities
- False Positives - May flag intentional design decisions
Future Enhancements
- Custom Rule Sets - Industry-specific compliance (HIPAA, PCI-DSS)
- Severity Tuning - Configurable severity thresholds
- Auto-Fix Suggestions - Generate code patches
- Diff-Based Review - Review only changed files
- Integration Tests - Security-focused integration testing
- Vulnerability Database - Check against CVE databases
References
Version: 1.0.0
Maintained By: Artemis Pipeline Team
Last Updated: October 22, 2025