| name | adding-new-metric |
| description | Guides systematic implementation of new sustainability metrics in OSS Sustain Guard using the plugin-based metric system. Use when adding metric functions to evaluate project health aspects like issue responsiveness, test coverage, or security response time. |
Add New Metric
This skill provides a systematic workflow for adding new sustainability metrics to the OSS Sustain Guard project using the plugin-based metric system.
When to Use
- User wants to add a new metric to evaluate project health
- Implementing metrics from NEW_METRICS_IDEA.md
- Extending analysis capabilities with additional measurements
- Creating custom external metrics via plugins
Critical Principles
- No Duplication: Always check existing metrics to avoid measuring the same thing
- 10-Point Scale: ALL metrics use max_score=10 for consistency and transparency
- Integer Weights: Metric importance is controlled via profile weights (integers ≥1)
- Project Philosophy: Use "observation" language, not "risk" or "critical"
- CHAOSS Alignment: Reference CHAOSS metrics when applicable
- Plugin Architecture: Metrics are discovered via entry points and MetricSpec
Implementation Workflow
1. Verify No Duplication
# Search for similar metrics in the metrics directory
ls oss_sustain_guard/metrics/
grep -rn "def check_" oss_sustain_guard/metrics/
# Check entry points in pyproject.toml
grep -A 30 '\[project.entry-points."oss_sustain_guard.metrics"\]' pyproject.toml
Check: Does any existing metric measure the same aspect?
2. Create Metric Module
Create a new file in oss_sustain_guard/metrics/:
touch oss_sustain_guard/metrics/my_metric.py
Template:
"""My metric description."""
from typing import Any
from oss_sustain_guard.metrics.base import Metric, MetricContext, MetricSpec
def check_my_metric(repo_data: dict[str, Any]) -> Metric:
"""
Evaluates [metric purpose].
[Description of what this measures and why it matters.]
Scoring:
- [Condition]: X/10 ([Label])
- [Condition]: X/10 ([Label])
CHAOSS Aligned: [CHAOSS metric name] (if applicable)
"""
max_score = 10 # ALWAYS use 10 for all metrics
# Extract data from repo_data
data = repo_data.get("fieldName", {})
if not data:
return Metric(
"My Metric Name",
score_on_no_data,
max_score,
"Note: [Reason for default score].",
"None",
)
# Calculate metric
# ...
# Score logic with graduated thresholds (0-10 scale)
if condition_excellent:
score = 10 # Excellent
risk = "None"
message = f"Excellent: [Details]."
elif condition_good:
score = 8 # Good (80%)
risk = "Low"
message = f"Good: [Details]."
elif condition_moderate:
score = 5 # Moderate (50%)
risk = "Medium"
message = f"Moderate: [Details]."
elif condition_needs_attention:
score = 2 # Needs attention (20%)
risk = "High"
message = f"Observe: [Details]. Consider improving."
else:
score = 0 # Critical issue
risk = "Critical"
message = f"Note: [Details]. Immediate attention recommended."
return Metric("My Metric Name", score, max_score, message, risk)
def _check(repo_data: dict[str, Any], _context: MetricContext) -> Metric:
"""Wrapper for metric spec."""
return check_my_metric(repo_data)
def _on_error(error: Exception) -> Metric:
"""Error handler for metric spec."""
return Metric(
"My Metric Name",
0,
10,
f"Note: Analysis incomplete - {error}",
"Medium",
)
# Export MetricSpec for automatic discovery
METRIC = MetricSpec(
name="My Metric Name",
checker=_check,
on_error=_on_error,
)
Key Decisions:
max_score: ALWAYS 10 for all metrics (consistency)- Score range: 0-10 (use integers or decimals)
- Importance: Controlled by profile weights (integers ≥1)
- Risk levels: "None", "Low", "Medium", "High", "Critical"
- Use supportive language: "Observe", "Consider", "Monitor" not "Failed", "Error"
3. Register Entry Point
Add to pyproject.toml under [project.entry-points."oss_sustain_guard.metrics"]:
[project.entry-points."oss_sustain_guard.metrics"]
# ... existing entries ...
my_metric = "oss_sustain_guard.metrics.my_metric:METRIC"
4. Add to Built-in Registry
Update oss_sustain_guard/metrics/__init__.py:
_BUILTIN_MODULES = [
# ... existing modules ...
"oss_sustain_guard.metrics.my_metric",
]
Why both entry points and built-in registry?
- Entry points: Enable external plugins
- Built-in registry: Fallback for direct imports and faster loading
5. Update ANALYSIS_VERSION
CRITICAL: Before integrating your new metric, increment ANALYSIS_VERSION in cli.py.
# In cli.py, update the version
ANALYSIS_VERSION = "1.2" # Increment from previous version
Why this is required:
- New metrics change the total score calculation
- Old cached data won't include your new metric
- Without version increment, users get inconsistent scores (cache vs. real-time)
- Version mismatch automatically invalidates old cache entries
Always increment when:
- Adding/removing metrics
- Changing metric weights in profiles
- Modifying scoring thresholds
- Changing max_score values
6. Add Metric to Scoring Profiles
Update SCORING_PROFILES in core.py to include your new metric:
SCORING_PROFILES = {
"balanced": {
"name": "Balanced",
"description": "...",
"weights": {
# Existing metrics...
"Contributor Redundancy": 3,
"Security Signals": 2,
# Add your new metric
"My Metric Name": 2, # Assign appropriate weight (1+)
# ...
},
},
# Update all 4 profiles...
}
Weight Guidelines:
- Critical metrics: 3-5 (bus factor, security)
- Important metrics: 2-3 (activity, responsiveness)
- Supporting metrics: 1-2 (documentation, governance)
7. Test Implementation
# Create test file
touch tests/metrics/test_my_metric.py
# Write tests (see section below)
# Run tests
uv run pytest tests/metrics/test_my_metric.py -v
# Syntax check
python -m py_compile oss_sustain_guard/metrics/my_metric.py
# Run analysis on test project
uv run os4g check fastapi --insecure --no-cache -o detail
# Verify metric appears in output
# Check score is reasonable
# Run all tests
uv run pytest tests/ -x --tb=short
# Lint check
uv run ruff check oss_sustain_guard/metrics/my_metric.py
uv run ruff format oss_sustain_guard/metrics/my_metric.py
8. Write Comprehensive Tests
Create tests/metrics/test_my_metric.py:
"""Tests for my_metric module."""
from oss_sustain_guard.metrics.my_metric import check_my_metric
def test_check_my_metric_excellent():
"""Test metric with excellent conditions."""
mock_data = {"fieldName": {"value": 100}}
result = check_my_metric(mock_data)
assert result.score == 10
assert result.max_score == 10
assert result.risk == "None"
assert "Excellent" in result.message
def test_check_my_metric_good():
"""Test metric with good conditions."""
mock_data = {"fieldName": {"value": 80}}
result = check_my_metric(mock_data)
assert result.score == 8
assert result.max_score == 10
assert result.risk == "Low"
def test_check_my_metric_no_data():
"""Test metric with missing data."""
mock_data = {}
result = check_my_metric(mock_data)
assert result.max_score == 10
assert "Note:" in result.message
9. Update Documentation (if needed)
Consider updating:
docs/local/NEW_METRICS_IDEA.md- Mark as implemented- Metric count in README.md
docs/SCORING_PROFILES_GUIDE.md- If significant new metric
Plugin Architecture Details
MetricSpec Structure
class MetricSpec(NamedTuple):
"""Specification for a metric check."""
name: str # Metric display name
checker: Callable[[dict[str, Any], MetricContext], Metric | None] # Main logic
on_error: Callable[[Exception], Metric] | None = None # Error handler
error_log: str | None = None # Error log format
MetricContext
Context provided to metric checkers:
class MetricContext(NamedTuple):
"""Context provided to metric checks."""
owner: str # GitHub owner
name: str # Repository name
repo_url: str # Full GitHub URL
platform: str | None # Platform (e.g., "pypi", "npm")
package_name: str | None # Original package name
Metric Discovery Flow
- Built-in loading:
_load_builtin_metric_specs()imports from_BUILTIN_MODULES - Entry point loading:
_load_entrypoint_metric_specs()discovers viaimportlib.metadata - Deduplication: Built-in metrics take precedence over external metrics with same name
- Integration:
load_metric_specs()returns combined list tocore.py
External Plugin Example
For external plugins (separate packages):
my_custom_metric/pyproject.toml:
[project]
name = "my-custom-metric"
version = "0.1.0"
dependencies = ["oss-sustain-guard>=0.13.0"]
[project.entry-points."oss_sustain_guard.metrics"]
my_custom = "my_custom_metric:METRIC"
my_custom_metric/__init__.py:
from oss_sustain_guard.metrics.base import Metric, MetricContext, MetricSpec
def check_custom(repo_data, context):
return Metric("Custom Metric", 10, 10, "Custom logic", "None")
METRIC = MetricSpec(name="Custom Metric", checker=check_custom)
Installation:
pip install my-custom-metric
Metrics are automatically discovered and loaded!
from datetime import datetime
created_at = datetime.fromisoformat(created_str.replace("Z", "+00:00"))
completed_at = datetime.fromisoformat(completed_str.replace("Z", "+00:00"))
duration_days = (completed_at - created_at).total_seconds() / 86400
Ratio/Percentage Metrics
ratio = (count_a / total) * 100
# Use graduated scoring
if ratio < 15:
score = max_score # Excellent
elif ratio < 30:
score = max_score * 0.6 # Acceptable
Median Calculations
values.sort()
median = (
values[len(values) // 2]
if len(values) % 2 == 1
else (values[len(values) // 2 - 1] + values[len(values) // 2]) / 2
)
GraphQL Data Access
# Common paths in repo_data
issues = repo_data.get("issues", {}).get("edges", [])
prs = repo_data.get("pullRequests", {}).get("edges", [])
commits = repo_data.get("defaultBranchRef", {}).get("target", {}).get("history", {})
funding = repo_data.get("fundingLinks", [])
Score Budget Guidelines
| Importance | Max Score | Use Case |
|---|---|---|
| Critical | 20 | Core sustainability (Bus Factor, Activity) |
| High | 10 | Important health signals (Funding, Retention) |
| Medium | 5 | Supporting metrics (CI, Community Health) |
| Low | 3-5 | Supplementary observations |
Total Budget: 100 points across ~20-25 metrics
Validation Checklist
- ANALYSIS_VERSION incremented in cli.py
- No duplicate measurement with existing metrics
- Total max_score budget ≤ 100
- Uses supportive "observation" language
- Has graduated scoring (not binary)
- Handles missing data gracefully
- Error handling in integration
- Syntax check passes
- Real-world test shows metric in output
- Unit tests pass
- Lint checks pass
Example: Stale Issue Ratio
For a complete, production-ready implementation example, see examples/stale-issue-ratio.md.
Quick overview:
- Measures: Percentage of issues not updated in 90+ days
- Max Score: 5 points
- Scoring: <15% stale (5pts), 15-30% (3pts), 30-50% (2pts), >50% (1pt)
- Key patterns: Time-based calculation, graduated scoring, graceful error handling
- Real results: fastapi (8.2% stale, 5/5), requests (23.4%, 3/5)
Score Validation with Real Projects
After implementing a new metric, validate scoring behavior with diverse real-world projects.
Validation Script
Create scripts/validate_scoring.py:
#!/usr/bin/env python3
"""
Score validation script for testing new metrics against diverse projects.
Usage:
uv run python scripts/validate_scoring.py
"""
import subprocess
import json
from typing import Any
VALIDATION_PROJECTS = {
"Famous/Mature": {
"requests": "psf/requests",
"react": "facebook/react",
"kubernetes": "kubernetes/kubernetes",
"django": "django/django",
"fastapi": "fastapi/fastapi",
},
"Popular/Active": {
"angular": "angular/angular",
"numpy": "numpy/numpy",
"pandas": "pandas-dev/pandas",
},
"Emerging/Small": {
# Add smaller projects you want to test
},
}
def analyze_project(owner: str, repo: str) -> dict[str, Any]:
"""Run analysis on a project and return results."""
cmd = [
"uv", "run", "os4g", "check",
f"{owner}/{repo}",
"--insecure", "--no-cache", "-o", "json"
]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
return {"error": result.stderr}
# Parse JSON output
try:
return json.loads(result.stdout)
except json.JSONDecodeError:
return {"error": "Failed to parse JSON output"}
def main():
print("=" * 80)
print("OSS Sustain Guard - Score Validation Report")
print("=" * 80)
print()
for category, projects in VALIDATION_PROJECTS.items():
print(f"\n## {category}\n")
print(f"{'Project':<25} {'Score':<10} {'Status':<15} {'Key Observations'}")
print("-" * 80)
for name, repo_path in projects.items():
result = analyze_project(*repo_path.split("/"))
if "error" in result:
print(f"{name:<25} {'ERROR':<10} {result['error'][:40]}")
continue
score = result.get("total_score", 0)
status = "✓ Healthy" if score >= 80 else "⚠ Monitor" if score >= 60 else "⚡ Needs attention"
observations = result.get("key_observations", "N/A")[:40]
print(f"{name:<25} {score:<10} {status:<15} {observations}")
print("\n" + "=" * 80)
print("\nValidation complete. Review scores for:")
print(" - Famous projects should score 70-95")
print(" - New metrics should show reasonable distribution")
print(" - No project should score >100")
if __name__ == "__main__":
main()
Quick Validation Command
# Test specific famous projects
uv run os4g check requests react fastapi kubernetes --insecure --no-cache
# Compare before/after metric changes
uv run os4g check requests --insecure --no-cache -o detail > before.txt
# ... make changes ...
uv run os4g check requests --insecure --no-cache -o detail > after.txt
diff before.txt after.txt
Expected Score Ranges
| Category | Expected Score | Examples |
|---|---|---|
| Famous/Mature | 75-95 | requests, kubernetes, react |
| Popular/Active | 65-85 | angular, numpy, pandas |
| Emerging/Small | 45-70 | New projects with activity |
| Problematic | 20-50 | Abandoned or struggling projects |
Validation Checklist
After implementing a new metric:
- Test on 3-5 famous projects (requests, react, kubernetes, etc.)
- Verify scores remain within 0-100
- Check that famous projects score reasonably high (70+)
- Ensure new metric contributes meaningfully to total score
- Review that metric differentiates well between projects
- Confirm no single metric dominates the total score
Troubleshooting
Score calculation issues: Verify all metrics have max_score=10 and check profile weights
Metric not appearing: Check integration in _analyze_repository_data()
Tests fail: Update expected metric names in test files
Data not available: Add proper null checks and default handling
Scores too similar across projects: Adjust scoring thresholds for better differentiation
Famous project scores low: Review metric logic and thresholds