| name | forensic-knowledge-mapping |
| description | Use when assessing team resilience, planning for developer departures, calculating bus/truck factor, identifying knowledge silos, or evaluating organizational risk - maps code ownership from git history and identifies single points of failure using research-backed thresholds (>80% ownership = silo) |
Forensic Knowledge Mapping
šÆ When You Use This Skill
State explicitly: "Using forensic-knowledge-mapping pattern"
Then follow these steps:
- Calculate ownership percentages using git commit history
- Identify single-owner files (>80% commits from one person)
- Calculate truck factor (minimum developers to lose before crisis)
- Cite research benchmarks (>9 contributors = 2-3x defects)
- Cross-reference with hotspots (single owner + hotspot = critical risk)
- Provide mitigation strategies for knowledge concentration
Overview
Knowledge mapping analyzes git history to understand who knows what in your codebase. It identifies knowledge silos, calculates the "truck factor" (minimum developers who could leave before project stalls), and highlights critical single-owner files that create organizational risk.
Core principle: Code ownership concentration creates risk. Files with single owners, especially critical files, represent organizational fragility that can be measured and mitigated.
When to Use
- Assessing team resilience and bus/truck factor
- Planning for developer departures (resignation, leave, promotion)
- Onboarding new team members (where to focus training)
- Identifying knowledge transfer priorities
- Team reorganization planning
- M&A due diligence (technical risk assessment)
- Quarterly team health checks
When NOT to Use
- Very small teams (2-3 developers) - some concentration inevitable
- Greenfield projects (<6 months) - patterns not yet established
- When ownership by design (designated experts) - may be intentional
- For non-critical codebases where loss doesn't matter
Core Pattern
ā” TRUCK FACTOR CALCULATION (USE THIS)
This is the standard formula for calculating organizational risk:
Truck Factor = Minimum number of developers whose loss would
orphan >50% of essential files
Where:
"Orphan" = <1 remaining knowledgeable developer
"Essential" = Critical files (combine with hotspot analysis)
š Risk Classification (APPLY THESE THRESHOLDS)
| Ownership | Contributors | Risk Level | Action | Cite When |
|---|---|---|---|---|
| Single owner | 1 person >80% commits | š“ CRITICAL | Immediate pair/cross-train | "This file has >80% single-owner concentration" |
| Dominant owner | 2 people (60/40 split) | š” HIGH | Add backup knowledge | "Needs backup owner for resilience" |
| Shared | 3+ balanced | š¢ HEALTHY | Maintain | "Good knowledge distribution" |
| Too diffuse | >9 contributors | š” COORDINATION | May need splitting | "Google: >9 contributors = 2-3x bugs" |
š Research Benchmarks (CITE THESE)
Always reference these thresholds when analyzing ownership:
- Truck Factor < 5: High organizational risk for most teams
- >80% ownership: Defined as knowledge silo requiring action
- >9 contributors: 2-3x higher defect rate (Google research on coordination overhead)
- Critical file + single owner: Maximum risk - cite both factors together
Always cite when identifying these patterns to stakeholders.
Quick Reference
Essential Git Commands
| Purpose | Command |
|---|---|
| All contributors | git log --since="12 months ago" --format="%an" | sort | uniq |
| Contributions by author | git log --since="12 months ago" --format="%an" | sort | uniq -c | sort -rn |
| File ownership | git log --since="12 months ago" --follow --format="%an" -- FILE | sort | uniq -c | sort -rn |
| Current file owners | git blame --line-porcelain FILE | grep "^author " | sort | uniq -c | sort -rn |
| Last modified | git log -1 --format="%an %ad" --date=short -- FILE |
Ownership Calculation
Primary Owner: Author with >50% of commits to file
Secondary Owner: Author with >20% of commits
Backup Knowledge: Author with >10% of commits
Ownership %: (author_commits / total_commits) Ć 100
Implementation
Basic Knowledge Map
#!/bin/bash
# Generate code ownership map
TIME_PERIOD="12 months ago"
OUTPUT_FILE="knowledge-map.txt"
echo "CODE OWNERSHIP ANALYSIS" > "$OUTPUT_FILE"
echo "======================" >> "$OUTPUT_FILE"
echo "" >> "$OUTPUT_FILE"
# 1. Get all active contributors
echo "Active Contributors:" >> "$OUTPUT_FILE"
git log --since="$TIME_PERIOD" --format="%an" | \
sort | uniq -c | sort -rn | \
awk '{printf " %s: %d commits\n", $2, $1}' >> "$OUTPUT_FILE"
echo "" >> "$OUTPUT_FILE"
# 2. Identify files and calculate ownership
echo "Analyzing file ownership..." >&2
git log --since="$TIME_PERIOD" --name-only --format="" | \
sort | uniq | \
while read file; do
if [ -f "$file" ]; then
# Get contributors for this file
contributors=$(git log --since="$TIME_PERIOD" --follow --format="%an" -- "$file" | sort | uniq -c | sort -rn)
total_commits=$(echo "$contributors" | awk '{sum+=$1} END {print sum}')
if [ "$total_commits" -gt 5 ]; then # Minimum 5 commits for meaningful analysis
# Calculate ownership percentages
primary_owner=$(echo "$contributors" | head -1 | awk '{print $2}')
primary_commits=$(echo "$contributors" | head -1 | awk '{print $1}')
primary_pct=$(echo "scale=1; $primary_commits * 100 / $total_commits" | bc)
# Classify risk
if [ "$(echo "$primary_pct > 80" | bc)" -eq 1 ]; then
risk="š“ CRITICAL"
elif [ "$(echo "$primary_pct > 60" | bc)" -eq 1 ]; then
risk="š” HIGH"
else
risk="š¢ SHARED"
fi
echo "$risk $file ($primary_owner: ${primary_pct}%)" >> "$OUTPUT_FILE"
fi
fi
done
# 3. Calculate truck factor (simplified)
echo "" >> "$OUTPUT_FILE"
echo "TRUCK FACTOR ANALYSIS" >> "$OUTPUT_FILE"
echo "=====================" >> "$OUTPUT_FILE"
# Count files owned (>80%) by each developer
git log --since="$TIME_PERIOD" --format="%an" | sort | uniq | \
while read author; do
owned_files=0
# Count files where this author has >80% ownership
# (Full implementation would iterate through all files)
echo " $author: analyzing..." >&2
# Simplified output
done
echo "" >> "$OUTPUT_FILE"
echo "Analysis complete. See $OUTPUT_FILE for results."
Truck Factor Analysis
#!/usr/bin/env python3
"""
Calculate truck factor - minimum developers whose loss would orphan critical files
"""
import subprocess
import json
from collections import defaultdict
def get_file_ownership(time_period="12 months ago", min_commits=5):
"""Get ownership map for all files"""
ownership = {}
# Get all files with activity
files_cmd = f"git log --since='{time_period}' --name-only --format=''"
files = subprocess.check_output(files_cmd, shell=True, text=True)
files = [f for f in set(files.strip().split('\n')) if f]
for file_path in files:
try:
# Get contributors for this file
contrib_cmd = f"git log --since='{time_period}' --follow --format='%an' -- '{file_path}'"
contributors = subprocess.check_output(contrib_cmd, shell=True, text=True)
contrib_counts = defaultdict(int)
for author in contributors.strip().split('\n'):
if author:
contrib_counts[author] += 1
total = sum(contrib_counts.values())
if total >= min_commits:
# Calculate ownership percentages
ownership[file_path] = {
author: (count / total * 100)
for author, count in contrib_counts.items()
}
except subprocess.CalledProcessError:
continue
return ownership
def calculate_truck_factor(ownership, critical_threshold=50):
"""Calculate minimum developers whose loss would orphan >50% of critical files"""
# Identify primary owners (>50% ownership)
primary_owners = defaultdict(list)
for file_path, owners in ownership.items():
for author, pct in owners.items():
if pct > 50:
primary_owners[author].append(file_path)
break # Only count primary owner
# Sort developers by number of files owned
sorted_devs = sorted(primary_owners.items(), key=lambda x: len(x[1]), reverse=True)
# Calculate truck factor: minimum devs to cover >50% of files
total_files = len(ownership)
files_covered = set()
truck_factor = 0
for author, files in sorted_devs:
truck_factor += 1
files_covered.update(files)
coverage_pct = len(files_covered) / total_files * 100
if coverage_pct > critical_threshold:
break
return truck_factor, sorted_devs
if __name__ == "__main__":
print("Analyzing code ownership...")
ownership = get_file_ownership()
print(f"\nTotal files analyzed: {len(ownership)}")
truck_factor, dev_ownership = calculate_truck_factor(ownership)
print(f"\nš TRUCK FACTOR: {truck_factor}")
print("\nTop Knowledge Holders:")
for author, files in dev_ownership[:10]:
print(f" {author}: {len(files)} files owned")
# Identify critical risks
print("\nš“ CRITICAL RISK FILES (single owner):")
for file_path, owners in ownership.items():
primary = max(owners.items(), key=lambda x: x[1])
if primary[1] > 80:
print(f" {file_path}: {primary[0]} ({primary[1]:.1f}%)")
Common Mistakes
Mistake 1: Confusing authorship with current ownership
Problem: Using git blame which shows who last modified each line, not who actively maintains it.
# ā BAD: Shows current line authors, not active maintainers
git blame file.js
# ā
GOOD: Shows commit activity over relevant time period
git log --since="12 months ago" --format="%an" -- file.js | sort | uniq -c
Fix: Always use commit history over a time period (12 months), not line-by-line blame.
Mistake 2: Not cross-referencing with hotspots
Problem: Identifying single-owner files without checking if they're also risky.
Fix: Always suggest: "Let's cross-reference with hotspot analysis (forensic-hotspot-finder). Single owner + hotspot = CRITICAL risk."
Mistake 3: Treating all single ownership as bad
Problem: Flagging every file with one owner without context.
Fix: Single ownership of stable, non-critical files is fine. Focus on critical + single owner combination. State this explicitly.
Mistake 4: Not citing the research
Problem: Saying "too many contributors" without backing it up.
Fix: Always cite: "Google research shows files with >9 contributors have 2-3x higher defect rates due to coordination overhead."
Mistake 5: Forgetting mitigation strategies
Problem: Identifying risks without providing solutions.
Fix: Always include specific mitigation strategies from the "Risk Mitigation Strategies" section. Don't just identify problems.
Real-World Impact
Example: Tech Lead Departure
Context: Tech lead gave 2 months notice, 12-person team
Analysis:
- Owned 23 files (>80% ownership each)
- 8 of these were critical (auth, payments, core API)
- Secondary knowledge existed for only 3 files
- Risk: High probability of project slowdown
Action Taken:
- Immediate pairing on critical 8 files (4 weeks)
- Documentation sprint for complex areas
- Knowledge transfer sessions (2/week for 8 weeks)
- Promoted internal candidate with 20% ownership overlap
Outcome: Transition completed with minimal disruption, truck factor improved from 2 to 4.
Example: Open Source Project Sustainability
Context: Popular OSS project, maintainer burnout concern
Analysis:
- Truck factor: 2 (70% of commits from 2 maintainers)
- 15 critical files with single maintainer
- Onboarding difficulty: 3-6 months to contribution
- Coordination bottleneck: Core module touched by all features
Actions:
- Modularized core to reduce coupling
- Created "good first issues" pathway
- Documented architecture decisions
- Identified and mentored 3 backup maintainers
Outcome: Truck factor increased to 5, new contributor onboarding reduced to 1-2 months.
ā” After Running Knowledge Mapping (DO THIS)
Immediately suggest these next steps to the user:
Cross-reference with hotspots (use forensic-hotspot-finder)
- CRITICAL RISK = single owner + hotspot file
- This is the most dangerous combination
- Prioritize these for immediate knowledge transfer
Calculate business cost (use forensic-debt-quantification)
- Add risk premium for single-owner files
- Calculate onboarding cost if developer leaves
- Show ROI for knowledge transfer efforts
Check if ownership is worsening (use forensic-complexity-trends)
- Are files becoming more concentrated?
- Track quarterly to measure improvement
Find coupled ownership (use forensic-change-coupling)
- Files that change together should have overlapping owners
- Identify coupling that creates dependencies
Example: Complete Knowledge Mapping Workflow
"Using forensic-knowledge-mapping pattern, I've analyzed code ownership:
TRUCK FACTOR: 2 (HIGH RISK for 12-person team)
Critical Findings:
- Alice owns 8 files at >80% (including 3 hotspots) - CRITICAL RISK
- Bob owns 6 files at >80% (including 2 hotspots) - HIGH RISK
- Losing Alice + Bob would orphan 14 critical files
Research shows >80% ownership concentration is a knowledge silo requiring action.
RECOMMENDED NEXT STEPS:
1. Cross-reference with hotspots (forensic-hotspot-finder) - Which are risky?
2. Calculate departure cost (forensic-debt-quantification) - What's at stake?
3. Create knowledge transfer plan for Alice's 3 critical hotspots
Would you like me to identify which of Alice's files are also hotspots?"
Always cross-reference with hotspots - that's where the real organizational risk lies.
Risk Mitigation Strategies
For High-Risk Files (Critical + Single Owner)
Immediate (1-2 weeks):
- Schedule pairing sessions (4-8 hours)
- Document non-obvious decisions
- Add inline comments for complex sections
- Create README for the module
Short-term (1-3 months):
- Rotate code reviews to spread knowledge
- Have secondary owner make next feature change
- Refactor if complexity is the barrier
Long-term:
- Simplify architecture if possible
- Foster collective ownership culture
- Track ownership metrics quarterly
For Team Resilience
Goal: Truck Factor ā„ 5 for teams >10 people
Strategies:
- Pair programming rotations
- Knowledge sharing sessions
- Documentation culture
- Avoid "expert zones" (everyone touches everything)
- Rotate on-call responsibilities
Supporting Files
For automated tracking and visualization:
forensic-knowledge-mapping/
āāā SKILL.md (this file)
āāā ownership-analyzer.py (complete Python script)
āāā truck-factor-calculator.sh (bash implementation)
āāā visualization/ (optional: ownership matrix charts)
Related Patterns
- Collective Code Ownership: XP practice of shared responsibility
- Knowledge Transfer: Explicit handoff processes
- Documentation as Code: ADRs, inline comments, READMEs
- Pair Programming: Built-in knowledge sharing
- Mob Programming: Maximum knowledge distribution