name	nwb_validation
version	0.1.0
description	Validates NWB files using NWB Inspector and provides intelligent correction recommendations
category	validation
author	Agentic Neurodata Conversion Team
tags	validation, nwb, nwb-inspector, quality-assurance, dandi-compliance, issue-detection
dependencies	[object Object]
input_format	*.nwb files
output_format	ValidationResult with issues, summary, and corrections
estimated_duration	30 seconds - 2 minutes (depends on file size)
implementation	backend/src/agents/evaluation_agent.py

Skill: NWB Validation

Purpose

Validate NWB (Neurodata Without Borders) files against the official specification and DANDI archive requirements using NWB Inspector. Provide detailed issue reports with intelligent prioritization and correction recommendations. This skill ensures converted files meet quality standards and are ready for sharing/archival.

When to Use This Skill

✅ USE When:

NWB conversion has completed
User requests validation of existing NWB file
Reconversion has completed (need to re-validate)
Quality assessment is needed before DANDI upload
Checking if file meets NWB specification requirements

❌ DON'T USE When:

File is not in NWB format (.nwb extension)
Conversion hasn't started yet
User wants data conversion (not validation)
Format detection is needed (use nwb_conversion skill)

Core Responsibilities

1. NWB Inspector Integration

Runs official NWB validation tool:

Checks compliance with NWB schema
Validates required fields
Detects specification violations
Checks DANDI archive requirements
Identifies best practice violations

Issue Severities (from NWB Inspector):

CRITICAL: Spec violations, file unreadable
ERROR: Required fields missing, serious problems
WARNING: Best practices not followed, potential issues
INFO: Suggestions for improvement

2. Intelligent Issue Analysis

LLM-powered features (when available):

a) Issue Prioritization:

Ranks issues by importance
Groups related issues
Identifies which to fix first
Separates critical from optional

b) User-Friendly Explanations:

Translates technical issues to plain language
Provides context ("why this matters")
Gives specific examples
Links to documentation

c) Correction Recommendations:

Suggests specific fixes
Categorizes: auto-fixable vs needs-user-input
Provides example values
Estimates effort required

3. Validation Status Determination

Three possible outcomes:

PASSED (Green):

No critical, error, or warning issues
File fully compliant with NWB spec
Ready for DANDI upload
Action: Mark complete, celebrate

PASSED_WITH_ISSUES (Yellow):

File is valid but has warnings/info issues
Usable but not perfect
May not meet DANDI standards
Action: Offer to fix or accept as-is

FAILED (Red):

Has critical or error severity issues
File may be unusable or incomplete
Cannot upload to DANDI
Action: Enter correction loop, must fix

4. Correction Context Generation

Prepares data for correction loop:

Extracts fixable issues
Categorizes auto-fix vs needs-input
Generates suggested corrections
Provides structured data for conversation agent

MCP Message Handlers

1. `run_validation`

Purpose: Validate an NWB file and return results

Input:

{
  "target_agent": "evaluation",
  "action": "run_validation",
  "context": {
    "nwb_path": "/tmp/outputs/session.nwb"
  }
}

Workflow:

Check if file exists and is readable
Run NWB Inspector on file
Parse Inspector output
Determine overall status (PASSED/PASSED_WITH_ISSUES/FAILED)
Use LLM to prioritize and explain issues (if available)
Generate correction recommendations (if needed)
Create validation report
Update GlobalState with results
Return structured response

Output (PASSED):

{
  "success": true,
  "result": {
    "overall_status": "PASSED",
    "validation_status": "PASSED",
    "is_valid": true,
    "summary": {
      "total_issues": 0,
      "critical": 0,
      "error": 0,
      "warning": 0,
      "info": 0
    },
    "issues": [],
    "message": "Validation passed! File is fully compliant with NWB specification."
  }
}

Output (PASSED_WITH_ISSUES):

{
  "success": true,
  "result": {
    "overall_status": "PASSED_WITH_ISSUES",
    "is_valid": true,
    "summary": {
      "total_issues": 3,
      "critical": 0,
      "error": 0,
      "warning": 2,
      "info": 1
    },
    "issues": [
      {
        "severity": "WARNING",
        "message": "session_start_time is recommended but missing",
        "location": "NWBFile.session_start_time",
        "check_function": "check_session_start_time"
      },
      {
        "severity": "WARNING",
        "message": "Subject.age uses non-standard format",
        "location": "Subject.age",
        "check_function": "check_age_format"
      },
      {
        "severity": "INFO",
        "message": "Consider adding keywords for discoverability",
        "location": "NWBFile.keywords",
        "check_function": "check_keywords"
      }
    ],
    "prioritized_issues": [
      {
        "issue": {...},
        "priority": "high",
        "explanation": "Session start time is required by DANDI...",
        "correction_suggestion": "Add session_start_time in ISO 8601 format: '2024-03-15T14:30:00'",
        "auto_fixable": false
      }
    ],
    "message": "Validation passed with 3 minor issues. You can fix them or proceed as-is."
  }
}

Output (FAILED):

{
  "success": true,
  "result": {
    "overall_status": "FAILED",
    "is_valid": false,
    "summary": {
      "total_issues": 5,
      "critical": 1,
      "error": 2,
      "warning": 2,
      "info": 0
    },
    "issues": [
      {
        "severity": "CRITICAL",
        "message": "NWBFile.session_description is required but missing",
        "location": "NWBFile.session_description",
        "check_function": "check_required_field"
      },
      {
        "severity": "ERROR",
        "message": "Subject.species must be in binomial nomenclature",
        "location": "Subject.species",
        "check_function": "check_species_format"
      },
      {...}
    ],
    "correction_context": {
      "auto_fixable_count": 1,
      "needs_user_input_count": 4,
      "suggested_corrections": {
        "species": {
          "current_value": "mouse",
          "suggested_value": "Mus musculus",
          "reasoning": "DANDI requires binomial nomenclature"
        }
      },
      "required_user_input": [
        {
          "field": "session_description",
          "prompt": "Please describe what was recorded in this session",
          "example": "Recording of V1 neurons during visual stimulation"
        }
      ]
    },
    "message": "Validation failed with 5 issues. I can help you fix them."
  }
}

2. `analyze_corrections`

Purpose: Generate correction recommendations for failed validation

Input:

{
  "target_agent": "evaluation",
  "action": "analyze_corrections",
  "context": {
    "issues": [...],
    "current_metadata": {...}
  }
}

Output:

{
  "success": true,
  "result": {
    "correction_context": {
      "auto_fixable_count": 2,
      "needs_user_input_count": 3,
      "auto_fixes": {
        "species": "Mus musculus",
        "age": "P90D"
      },
      "user_input_required": [
        {
          "field": "session_description",
          "prompt": "Describe what was recorded",
          "example": "Recording of V1 neurons",
          "why_needed": "Required by NWB spec"
        }
      ],
      "priority_order": ["session_description", "species", "age", ...],
      "estimated_fix_time": "5-10 minutes"
    }
  }
}

Validation Workflow

Step-by-Step Process

1. File Check (1-2 seconds)

- Verify file exists
- Check .nwb extension
- Verify file is readable
- Check file size (warn if >5GB, slow validation)

2. NWB Inspector Execution (30 sec - 2 min)

- Run: nwbinspector inspect {nwb_path}
- Capture JSON output
- Parse validation results
- Extract issues with severities

3. Issue Categorization (1-2 seconds)

- Count by severity: critical, error, warning, info
- Determine overall status:
  - PASSED: 0 critical/error/warning
  - PASSED_WITH_ISSUES: 0 critical/error, >0 warning/info
  - FAILED: >0 critical/error

4. LLM Enhancement (2-5 seconds, if available)

- Prioritize issues by importance
- Generate user-friendly explanations
- Suggest specific corrections
- Identify auto-fixable issues

5. State Update (instant)

- Set state.validation_status
- Set state.overall_status
- Store state.validation_report
- Update state.previous_issues (for no-progress detection)

6. Return Results (instant)

- Package all data into MCPResponse
- Include correction context if failed
- Provide actionable message

Issue Analysis

Auto-Fixable Issues

Type 1: Format Standardization

Issue: "species is 'mouse', should be binomial nomenclature"
Auto-Fix: species = "Mus musculus"
Reasoning: Common name → scientific name mapping

Type 2: Format Conversion

Issue: "age is '90 days', should be ISO 8601 duration"
Auto-Fix: age = "P90D"
Reasoning: Parse duration and convert to ISO 8601

Type 3: Default Values

Issue: "sex is missing"
Auto-Fix: sex = "U" (Unknown)
Reasoning: Use standard unknown value when not provided

Needs User Input

Type 1: Required Descriptive Fields

Issue: "session_description is required but missing"
Cannot Auto-Fix: Requires domain knowledge
Ask User: "Please describe what was recorded in this session"
Example: "Recording of V1 neurons during visual stimulation"

Type 2: Experiment-Specific Values

Issue: "session_start_time is missing"
Cannot Auto-Fix: Only user knows when experiment occurred
Ask User: "When did this recording session start?"
Example: "2024-03-15T14:30:00-05:00"

Type 3: Ambiguous Corrections

Issue: "experimenter format is unclear"
Current: "Smith"
Cannot Auto-Fix: Could be "Dr. Smith", "J. Smith", "Jane Smith"
Ask User: "What's the full name of the experimenter?"

LLM-Enhanced Features

Feature 1: Issue Prioritization

Input: List of 10 issues (mixed severities)

LLM Prompt:

You are an NWB validation expert. Prioritize these issues by:
1. Severity (critical > error > warning > info)
2. DANDI requirements (required > recommended)
3. Fix difficulty (quick > complex)
4. Impact on usability (high > low)

Return ranked list with reasoning.

Output:

[
  {
    "issue": "session_description missing",
    "priority": "critical",
    "rank": 1,
    "reasoning": "Required by NWB spec, blocks DANDI upload, easy to fix"
  },
  {
    "issue": "species format incorrect",
    "priority": "high",
    "rank": 2,
    "reasoning": "DANDI requirement, auto-fixable"
  },
  ...
]

Feature 2: User-Friendly Explanations

Technical Issue:

"check_subject_species failed: species 'mouse' does not match binomial nomenclature pattern"

LLM Translation:

"The species needs to be in scientific format (like 'Mus musculus' instead of 'mouse').
This is required by DANDI so other researchers can precisely identify the organism.

Quick fix: I can change 'mouse' to 'Mus musculus' for you."

Feature 3: Correction Suggestions

LLM Prompt:

Given this validation issue and current metadata, suggest a specific correction:

Issue: Subject.age uses non-standard format
Current value: "90 days"
Expected format: ISO 8601 duration (e.g., "P90D")

What should the corrected value be?

LLM Response:

{
  "corrected_value": "P90D",
  "reasoning": "Converted '90 days' to ISO 8601 duration format",
  "confidence": 0.95,
  "auto_fixable": true
}

ValidationResult Structure

class ValidationResult(BaseModel):
    is_valid: bool
    # True if no CRITICAL/ERROR, False otherwise

    overall_status: str
    # "PASSED" | "PASSED_WITH_ISSUES" | "FAILED"

    summary: Dict[str, int]
    # {"total_issues": 5, "critical": 1, "error": 2, "warning": 1, "info": 1}

    issues: List[Issue]
    # Full list of issues from NWB Inspector

    prioritized_issues: Optional[List[PrioritizedIssue]]
    # LLM-enhanced issues with explanations (if LLM available)

    correction_context: Optional[CorrectionContext]
    # Auto-fixes and user input requirements (if FAILED)

class Issue(BaseModel):
    severity: str  # CRITICAL | ERROR | WARNING | INFO
    message: str  # Technical description
    location: str  # NWBFile.session_description
    check_function: str  # check_required_field

class PrioritizedIssue(BaseModel):
    issue: Issue
    priority: str  # critical | high | medium | low
    rank: int  # 1, 2, 3, ...
    explanation: str  # User-friendly description
    correction_suggestion: str  # Specific fix
    auto_fixable: bool  # Can system fix it?
    estimated_effort: str  # "30 seconds", "2 minutes"

class CorrectionContext(BaseModel):
    auto_fixable_count: int
    needs_user_input_count: int
    auto_fixes: Dict[str, Any]  # field → corrected_value
    user_input_required: List[UserInputRequest]
    priority_order: List[str]  # Ordered list of fields to fix
    estimated_fix_time: str

Integration with Other Skills

Called By:

Orchestration Skill: After conversion completes
Orchestration Skill: After reconversion (correction loop)
API Endpoints: Direct validation requests

Calls:

None: This is a leaf skill (doesn't call other skills)

Provides Data To:

Orchestration Skill: Validation results, correction context
Report Service: Detailed validation reports
GlobalState: Updates validation_status and validation_report

State Management

GlobalState Fields Used:

Write:

validation_status: RUNNING → PASSED / PASSED_IMPROVED / (none for FAILED)
overall_status: "PASSED" / "PASSED_WITH_ISSUES" / "FAILED"
validation_report: Full ValidationResult object
previous_issues: Copy of issues (for no-progress detection)
logs: Detailed activity log

Read:

correction_attempt: To determine if this is PASSED_IMPROVED
output_path: NWB file to validate

Validation Status vs Overall Status

Important Distinction:

validation_status (ValidationStatus enum):
- State machine status: RUNNING, PASSED, PASSED_IMPROVED
- Used internally to track workflow
- Only set when validation completes successfully
- NOT set for FAILED (leaves in RUNNING, awaits correction)
overall_status (string):
- Outcome category: "PASSED", "PASSED_WITH_ISSUES", "FAILED"
- Used for user-facing messages and decision-making
- Always set after validation runs
- Determines next workflow step

Example:

# After validation runs:
if no_issues:
    state.overall_status = "PASSED"
    if correction_attempt > 0:
        state.validation_status = ValidationStatus.PASSED_IMPROVED
    else:
        state.validation_status = ValidationStatus.PASSED

elif has_warnings_only:
    state.overall_status = "PASSED_WITH_ISSUES"
    # Don't set validation_status yet (await user decision)

else:  # has errors/critical
    state.overall_status = "FAILED"
    # Don't set validation_status yet (await correction)

Performance Characteristics

File Size	Validation Time	Issues Detected (typical)
100 MB	30-45 seconds	0-5
1 GB	1-2 minutes	0-10
10 GB	5-10 minutes	0-15
50 GB	20-40 minutes	0-20

Notes:

NWB Inspector is I/O bound (reads entire file)
LLM analysis adds 2-5 seconds (negligible)
Large files (>10GB) may timeout if validation limit not increased
Issue count independent of file size (depends on metadata quality)

Common Validation Issues

Issue 1: Missing Required Fields

Example:

CRITICAL: NWBFile.session_description is required but missing

Fix: Provide session description User Action: "Describe what was recorded"

Issue 2: Species Format

Example:

ERROR: Subject.species 'mouse' does not match binomial nomenclature

Fix: species = "Mus musculus" (auto-fixable) User Action: None (system fixes)

Issue 3: Age Format

Example:

WARNING: Subject.age '90 days' should use ISO 8601 duration

Fix: age = "P90D" (auto-fixable) User Action: None (system fixes)

Issue 4: Session Start Time

Example:

WARNING: session_start_time is recommended but missing

Fix: Needs user input User Action: "When did the experiment start?"

Issue 5: Sex Value

Example:

INFO: Subject.sex is missing

Fix: sex = "U" for Unknown (auto-fixable) User Action: None or provide if known

Testing Checklist

Validates files with no issues (PASSED)
Validates files with warnings only (PASSED_WITH_ISSUES)
Validates files with errors (FAILED)
Handles missing file gracefully
Handles corrupt/unreadable files
LLM prioritization works when available
LLM explanations are helpful
Auto-fix suggestions are correct
Correction context generated properly
State.validation_status set correctly
State.overall_status set correctly
PASSED_IMPROVED set after successful retry
previous_issues stored for no-progress detection
Validation report complete and accurate

Security Considerations

File Access:

Read: NWB files in /tmp/outputs/
No Write: Validation is read-only, never modifies files
No Network: NWB Inspector runs locally

Data Privacy:

Validation results may contain file metadata
Don't log sensitive experiment details
Sanitize file paths in public logs

Resource Limits:

Set timeout for large files (default: 10 minutes)
Limit memory usage (NWB Inspector can be memory-intensive)
Monitor CPU during validation

Version History

v0.1.0 (2025-10-17)

Initial skill documentation
NWB Inspector integration
LLM-powered issue prioritization and explanation
Correction recommendation generation
Three-tier status system (PASSED/PASSED_WITH_ISSUES/FAILED)
Auto-fix vs needs-input categorization
PASSED_IMPROVED status for successful corrections

Planned Features (v0.2.0):

Custom validation rules (beyond NWB Inspector)
Validation caching (skip re-validation if file unchanged)
Incremental validation (only check what changed)
Validation quality score (0-100)
Best practice recommendations (beyond spec compliance)

References

Last Updated: 2025-10-17 Maintainer: Agentic Neurodata Conversion Team License: MIT Support: GitHub Issues

Install Skill

SKILL.md

Skill: NWB Validation

Purpose

When to Use This Skill

✅ USE When:

❌ DON'T USE When:

Core Responsibilities

1. NWB Inspector Integration

2. Intelligent Issue Analysis

3. Validation Status Determination

4. Correction Context Generation

MCP Message Handlers

1. run_validation

2. analyze_corrections

Validation Workflow

Step-by-Step Process

Issue Analysis

Auto-Fixable Issues

Needs User Input

LLM-Enhanced Features

Feature 1: Issue Prioritization

Feature 2: User-Friendly Explanations

Feature 3: Correction Suggestions

ValidationResult Structure

Integration with Other Skills

Called By:

Calls:

Provides Data To:

State Management

GlobalState Fields Used:

Validation Status vs Overall Status

Performance Characteristics

Common Validation Issues

Issue 1: Missing Required Fields

Issue 2: Species Format

Issue 3: Age Format

Issue 4: Session Start Time

Issue 5: Sex Value

Testing Checklist

Security Considerations

File Access:

Data Privacy:

Resource Limits:

Version History

v0.1.0 (2025-10-17)

Planned Features (v0.2.0):

References

1. `run_validation`

2. `analyze_corrections`