name	Skill Validator
description	Validates that a skill or MCP implementation matches its manifest by running Codex-powered semantic comparisons across descriptions, preconditions, effects, and API surface.
version	0.1.0
category	testing
tags	codex, validation, quality

Skill Validator

Validate implementations match manifests using Codex for semantic comparison

Purpose

Ensures that skill/MCP implementations actually deliver what their manifests promise. Uses Codex to perform semantic analysis comparing descriptions, preconditions, and effects against actual code. Detects drift, missing functionality, and over-promised capabilities.

When to Use

After updating skill implementations
During quality audits to verify accuracy
When manifests feel outdated or incorrect
To detect description-implementation drift
Before releasing new versions of resources

Key Capabilities

Semantic Comparison: Uses Codex to understand if code matches description
Precondition Validation: Verifies claimed preconditions are actually checked
Effect Verification: Confirms code produces claimed effects
API Surface Analysis: Validates exposed functions match manifest
Drift Detection: Identifies when implementation diverges from manifest
Coverage Scoring: Measures how much of manifest is implemented

Inputs

inputs:
  resource_path: string          # Path to skill/MCP directory
  manifest_path: string          # Path to manifest.yaml (default: resource_path/manifest.yaml)
  implementation_path: string    # Path to code (default: resource_path/index.js)
  strict_mode: boolean           # Fail on warnings (default: false)

Process

Step 1: Load Manifest and Implementation

#!/bin/bash
# Load manifest and implementation

RESOURCE_PATH="$1"
MANIFEST_PATH="${2:-$RESOURCE_PATH/manifest.yaml}"
IMPL_PATH="${3:-$RESOURCE_PATH/index.js}"

if [ ! -f "$MANIFEST_PATH" ]; then
  echo "❌ Manifest not found: $MANIFEST_PATH"
  exit 1
fi

if [ ! -f "$IMPL_PATH" ]; then
  # Try alternative extensions
  if [ -f "$RESOURCE_PATH/index.ts" ]; then
    IMPL_PATH="$RESOURCE_PATH/index.ts"
  elif [ -f "$RESOURCE_PATH/SKILL.md" ]; then
    # Skill might be declarative only
    IMPL_PATH=""
  else
    echo "⚠️  No implementation file found, validating description only"
    IMPL_PATH=""
  fi
fi

# Read manifest
MANIFEST=$(cat "$MANIFEST_PATH")

# Read implementation (if exists)
if [ -n "$IMPL_PATH" ]; then
  IMPLEMENTATION=$(cat "$IMPL_PATH")
else
  IMPLEMENTATION=""
fi

Step 2: Validate Description Accuracy

# Use Codex to compare description with implementation
codex exec "
Compare this manifest description with the actual implementation:

MANIFEST:
$MANIFEST

IMPLEMENTATION:
$IMPLEMENTATION

Questions:
1. Does the implementation match the description?
2. Are there features described but not implemented?
3. Are there features implemented but not described?
4. Is the description accurate and complete?

Output JSON:
{
  \"description_accurate\": boolean,
  \"missing_features\": [\"feature1\", \"feature2\"],
  \"undocumented_features\": [\"feature3\"],
  \"accuracy_score\": 0.0-1.0,
  \"issues\": [
    {
      \"type\": \"missing_feature\",
      \"severity\": \"high|medium|low\",
      \"description\": \"...\",
      \"suggestion\": \"...\"
    }
  ]
}
" > /tmp/validation-description.json

Step 3: Validate Preconditions

# Check if preconditions are actually enforced in code
PRECONDITIONS=$(python3 -c "
import yaml, json
manifest = yaml.safe_load(open('$MANIFEST_PATH'))
print(json.dumps(manifest.get('preconditions', []), indent=2))
")

codex exec "
Analyze if these preconditions are actually checked in the code:

PRECONDITIONS:
$PRECONDITIONS

IMPLEMENTATION:
$IMPLEMENTATION

For each precondition, determine:
1. Is it checked in the code?
2. Where is it checked (function name, line number)?
3. Does it fail gracefully if not met?
4. Is the error message clear?

Output JSON:
{
  \"preconditions_validated\": [
    {
      \"check\": \"file_exists('package.json')\",
      \"enforced\": boolean,
      \"location\": \"function:line\",
      \"error_handling\": \"good|poor|missing\",
      \"suggestion\": \"...\"
    }
  ],
  \"coverage_score\": 0.0-1.0
}
" > /tmp/validation-preconditions.json

Step 4: Validate Effects

# Check if claimed effects are actually produced
EFFECTS=$(python3 -c "
import yaml, json
manifest = yaml.safe_load(open('$MANIFEST_PATH'))
print(json.dumps(manifest.get('effects', []), indent=2))
")

codex exec "
Verify that this code actually produces the claimed effects:

CLAIMED EFFECTS:
$EFFECTS

IMPLEMENTATION:
$IMPLEMENTATION

For each effect, determine:
1. Is the effect actually produced?
2. Where in the code does it happen?
3. Are there conditions where it might not happen?
4. Are there other effects not listed?

Output JSON:
{
  \"effects_validated\": [
    {
      \"effect\": \"creates_vector_index\",
      \"implemented\": boolean,
      \"location\": \"function:line\",
      \"conditional\": boolean,
      \"confidence\": 0.0-1.0
    }
  ],
  \"missing_effects\": [\"effect1\"],
  \"extra_effects\": [\"effect2\"],
  \"coverage_score\": 0.0-1.0
}
" > /tmp/validation-effects.json

Step 5: Validate API Surface

# For MCPs/tools with defined APIs, validate exports
if [ -n "$IMPLEMENTATION" ]; then
  codex exec "
Analyze the API surface of this implementation:

IMPLEMENTATION:
$IMPLEMENTATION

Questions:
1. What functions/classes are exported?
2. What are their signatures?
3. Are they documented?
4. Do they match what the manifest describes?

Output JSON:
{
  \"exports\": [
    {
      \"name\": \"functionName\",
      \"type\": \"function|class|object\",
      \"signature\": \"(args) => result\",
      \"documented\": boolean
    }
  ],
  \"api_complete\": boolean,
  \"documentation_quality\": \"good|fair|poor\"
}
" > /tmp/validation-api.json
fi

Step 6: Generate Validation Report

# Combine all validation results
python3 <<'PYTHON_SCRIPT'
import json
from datetime import datetime

# Load validation results
with open('/tmp/validation-description.json') as f:
    desc_validation = json.load(f)

with open('/tmp/validation-preconditions.json') as f:
    precond_validation = json.load(f)

with open('/tmp/validation-effects.json') as f:
    effects_validation = json.load(f)

try:
    with open('/tmp/validation-api.json') as f:
        api_validation = json.load(f)
except FileNotFoundError:
    api_validation = None

# Calculate overall score
scores = [
    desc_validation.get('accuracy_score', 0),
    precond_validation.get('coverage_score', 0),
    effects_validation.get('coverage_score', 0)
]
overall_score = sum(scores) / len(scores)

# Collect all issues
all_issues = []
all_issues.extend(desc_validation.get('issues', []))

for precond in precond_validation.get('preconditions_validated', []):
    if not precond.get('enforced'):
        all_issues.append({
            'type': 'unenforced_precondition',
            'severity': 'medium',
            'description': f"Precondition not enforced: {precond['check']}",
            'suggestion': precond.get('suggestion', '')
        })

for effect in effects_validation.get('effects_validated', []):
    if not effect.get('implemented'):
        all_issues.append({
            'type': 'unimplemented_effect',
            'severity': 'high',
            'description': f"Effect not implemented: {effect['effect']}",
            'suggestion': 'Implement this effect or remove from manifest'
        })

# Generate report
report = {
    'resource': 'RESOURCE_NAME_PLACEHOLDER',
    'validated_at': datetime.utcnow().isoformat() + 'Z',
    'overall_score': round(overall_score, 3),
    'scores': {
        'description_accuracy': desc_validation.get('accuracy_score', 0),
        'precondition_coverage': precond_validation.get('coverage_score', 0),
        'effect_coverage': effects_validation.get('coverage_score', 0)
    },
    'validation_results': {
        'description': desc_validation,
        'preconditions': precond_validation,
        'effects': effects_validation,
        'api': api_validation
    },
    'issues': all_issues,
    'issue_count': len(all_issues),
    'passed': overall_score >= 0.8 and len([i for i in all_issues if i['severity'] == 'high']) == 0
}

with open('/tmp/validation-report.json', 'w') as f:
    json.dump(report, f, indent=2)

# Print summary
print(f"\nValidation Score: {overall_score:.2f}")
print(f"Issues Found: {len(all_issues)}")
print(f"Status: {'✅ PASSED' if report['passed'] else '❌ FAILED'}")
PYTHON_SCRIPT

Validation Criteria

Scoring Rules

// Overall score is average of component scores
overallScore = (
  descriptionAccuracy +
  preconditionCoverage +
  effectCoverage
) / 3

// Pass criteria
passed = (
  overallScore >= 0.8 &&
  highSeverityIssues.length === 0
)

Severity Levels

High: Missing core functionality, unenforced preconditions, unimplemented effects
Medium: Incomplete features, poor error handling, undocumented exports
Low: Minor inconsistencies, documentation gaps, style issues

Example Output

{
  "resource": "rag-implementer",
  "validated_at": "2025-10-28T12:00:00Z",
  "overall_score": 0.85,
  "scores": {
    "description_accuracy": 0.90,
    "precondition_coverage": 0.80,
    "effect_coverage": 0.85
  },
  "validation_results": {
    "description": {
      "description_accurate": true,
      "missing_features": [],
      "undocumented_features": ["vector_index_optimization"],
      "accuracy_score": 0.90,
      "issues": [
        {
          "type": "undocumented_feature",
          "severity": "low",
          "description": "Implementation includes vector optimization not mentioned in manifest",
          "suggestion": "Add 'optimizes_vector_queries' to effects"
        }
      ]
    },
    "preconditions": {
      "preconditions_validated": [
        {
          "check": "file_exists('package.json')",
          "enforced": true,
          "location": "validateProject:12",
          "error_handling": "good"
        },
        {
          "check": "env_var_set('OPENAI_API_KEY')",
          "enforced": true,
          "location": "setupEmbeddings:45",
          "error_handling": "good"
        }
      ],
      "coverage_score": 0.80
    },
    "effects": {
      "effects_validated": [
        {
          "effect": "creates_vector_index",
          "implemented": true,
          "location": "createIndex:120",
          "conditional": false,
          "confidence": 0.95
        },
        {
          "effect": "adds_embedding_pipeline",
          "implemented": true,
          "location": "setupPipeline:85",
          "conditional": false,
          "confidence": 0.90
        }
      ],
      "missing_effects": [],
      "extra_effects": ["optimizes_vector_queries"],
      "coverage_score": 0.85
    }
  },
  "issues": [
    {
      "type": "undocumented_feature",
      "severity": "low",
      "description": "Implementation includes vector optimization not mentioned in manifest",
      "suggestion": "Add 'optimizes_vector_queries' to effects"
    }
  ],
  "issue_count": 1,
  "passed": true
}

Integration

With manifest-generator

Validates that generated manifests are accurate by comparing with implementation.

With capability-graph-builder

Ensures graph relationships are based on accurate capability descriptions.

CI/CD Pipeline

# .github/workflows/validate-skills.yml
name: Validate Skills

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Validate all skills
        run: |
          for skill in SKILLS/*/; do
            bash SKILLS/skill-validator/validate.sh "$skill"
          done

Success Metrics

✅ All skills score >= 0.8
✅ No high severity issues in production skills
✅ 100% of preconditions enforced
✅ 95%+ of effects implemented
✅ API surface matches manifest

Related Skills

manifest-generator: Generates manifests to be validated
capability-graph-builder: Uses validated manifests
system-diagnostician: Uses validation results for health checks

Install Skill

SKILL.md