| name | html-structure-validate |
| description | Validate HTML5 structure and basic syntax. BLOCKING quality gate - stops pipeline if validation fails. Ensures deterministic output quality. |
HTML Structure Validate Skill
Purpose
This skill is a BLOCKING quality gate that ensures generated HTML meets minimum structural requirements. It is the first deterministic validation of probabilistic AI-generated output.
The skill checks:
- HTML5 compliance - Proper DOCTYPE, tags
- Tag closure - All tags properly closed
- Required elements - Meta tags, stylesheet links
- Well-formedness - Valid structure
If validation fails, the pipeline STOPS and triggers a hook to notify the user.
This enforces the principle: Python validates, ensuring deterministic quality.
What to Do
Load HTML file to validate
- Read
04_page_XX.htmlgenerated by AI skill - Verify file exists and is readable
- Confirm file is text (not binary)
- Read
Run validation checks
- Check HTML5 structure compliance
- Verify tag closure
- Validate head section
- Check required CSS link
- Validate page container structure
Generate validation report
- Document all checks performed
- List any errors found
- Note warnings (non-blocking)
- Record informational findings
Save validation report as JSON
- Save to:
output/chapter_XX/page_artifacts/page_YY/06_validation_structure.json - Include timestamp
- Include all check results
- Save to:
Exit with appropriate code
- Return 0 if VALID (continue pipeline)
- Return 1 if INVALID (STOP pipeline, trigger hook)
Input Parameters
html_file: <str> - Path to 04_page_XX.html
output_dir: <str> - Directory for validation report
strict_mode: <bool> - If true, warnings also fail (default: false)
page_number: <int> - Page number (for reporting)
chapter: <int> - Chapter number (for reporting)
Validation Checks
Check 1: DOCTYPE Declaration
Requirement: File must start with proper DOCTYPE
<!DOCTYPE html>
Check:
- File contains
<!DOCTYPE html>(case-insensitive) - DOCTYPE appears before any tags
- DOCTYPE is on first line or near beginning
Error if: Missing or incorrect DOCTYPE
Check 2: HTML Tags
Requirement: Proper <html> opening and closing tags
<html lang="en">
...
</html>
Checks:
-
<html>tag present -
</html>closing tag present - Tags are properly paired
- No unclosed
<html>tags
Error if: Missing either tag or improperly paired
Check 3: Head Section
Requirement: Complete <head> section with metadata
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>...</title>
<link rel="stylesheet" href="../../styles/main.css">
</head>
Checks:
-
<head>and</head>tags present -
<meta charset="UTF-8">present -
<meta name="viewport">present (warning if missing) -
<title>tag with content present - CSS
<link>tag present with href attribute
Error if: Missing charset, title, or CSS link Warning if: Missing viewport meta tag
Check 4: Body Section
Requirement: Proper <body> tags with content
<body>
<div class="page-container">
<main class="page-content">
...
</main>
</div>
</body>
Checks:
-
<body>and</body>tags present -
<div class="page-container">present -
<main class="page-content">present inside container - Body contains substantial content (> 100 bytes)
Error if: Missing tags or required container divs
Check 5: Tag Closure Validation
Requirement: All tags must be properly closed
Checks for:
- Unmatched opening tags (e.g.,
<p>without</p>) - Improper nesting (e.g.,
<p><h2>text</h2></p>) - Self-closing tags used correctly (e.g.,
<br/>,<img/>) - Comment blocks properly formatted (
<!-- -->)
Validation method:
- Parse HTML into tree structure
- Verify all nodes properly matched
- Check nesting doesn't violate HTML5 rules
Error if: Any unmatched or improperly nested tags
Check 6: Heading Tags (h1-h6)
Requirement: Valid heading hierarchy
<h1>Chapter Title</h1>
<h2>Section Heading</h2>
<h3>Subsection</h3>
Checks:
- All heading tags properly closed
- First heading should be h1 (warning if not)
- Heading levels don't skip dramatically (h1 → h4 is suspicious)
- All headings have text content (not empty)
Error if: Heading tags improperly closed Warning if: Suspicious hierarchy
Check 7: Content Structure
Requirement: Meaningful content in page container
Checks:
-
<main class="page-content">contains elements - Content includes headings or paragraphs
- No completely empty content area
- Text nodes or elements present (> 100 words total)
Error if: No content or empty structure
Check 8: List Integrity
Requirement: All lists properly structured
Checks for each <ul> or <ol>:
- List opening and closing tags matched
- List contains
<li>elements - All
<li>tags properly closed -
<li>count matches opening/closing pairs - No nested
<ul>or<ol>improperly closed
Error if: Empty lists or unmatched <li> tags
Check 9: Image and Link Tags
Requirement: Self-closing tags properly formatted
Checks:
- All
<img>tags havesrcandaltattributes - All
<a>tags have validhrefattributes - Image paths don't have obvious errors (no broken syntax)
- Self-closing tags use proper syntax
Warning if: Images missing alt text or links missing href
Check 10: Table Tags (if present)
Requirement: Proper table structure
Checks:
-
<table>,<tr>,<td>,<th>tags properly nested - All rows have consistent column counts
- Table headers and body properly structured
Error if: Malformed table structure
Validation Report Format
Output: 06_validation_structure.json
{
"page": 16,
"book_page": 17,
"chapter": 2,
"validation_type": "structure",
"validation_timestamp": "2025-11-08T14:34:00Z",
"overall_status": "PASS",
"error_count": 0,
"warning_count": 1,
"checks_performed": [
{
"check_name": "DOCTYPE Declaration",
"status": "PASS",
"details": "Valid HTML5 DOCTYPE found"
},
{
"check_name": "HTML Tags",
"status": "PASS",
"details": "Proper <html> opening and closing tags"
},
{
"check_name": "Head Section",
"status": "PASS",
"details": "All required meta tags and title present"
},
{
"check_name": "Body Section",
"status": "PASS",
"details": "Body and content structure valid"
},
{
"check_name": "Tag Closure",
"status": "PASS",
"details": "All tags properly matched and closed"
},
{
"check_name": "Heading Hierarchy",
"status": "PASS",
"details": "4 headings found, proper h1-h4 hierarchy"
},
{
"check_name": "Content Structure",
"status": "PASS",
"details": "Main content area contains 245 words across 3 paragraphs"
},
{
"check_name": "List Integrity",
"status": "PASS",
"details": "1 list with 3 items, all properly formed"
},
{
"check_name": "Image Tags",
"status": "PASS",
"details": "No images on this page"
},
{
"check_name": "Table Tags",
"status": "PASS",
"details": "No tables on this page"
}
],
"errors": [],
"warnings": [
{
"check": "Heading Hierarchy",
"message": "First heading is h2, typically should be h1 for page opening",
"severity": "LOW"
}
],
"summary": {
"total_checks": 10,
"passed": 9,
"failed": 0,
"warnings": 1,
"html_valid": true,
"tags_matched": true,
"content_substantial": true
}
}
Validation Rules
PASS Criteria
- DOCTYPE present and valid
- All required tags (
html,head,body,main,div.page-container) present - All tags properly closed and matched
- Title tag with content
- CSS stylesheet link present
- Content structure valid
- No structural errors
FAIL Criteria (BLOCKS PIPELINE)
- Missing DOCTYPE
- Missing required tags
- Unmatched or improperly nested tags
- Missing title or CSS link
- Empty content
- Malformed lists or tables
WARNING (Logged but doesn't block)
- Missing viewport meta tag
- First heading is not h1
- Large heading jumps (h1 → h4)
- Missing alt text on images
- Missing href on links
Implementation: Using Python Script
This validation is performed by existing validate_html.py tool, run in structure validation mode:
cd Calypso/tools
# Validate single page HTML
python3 validate_html.py \
../output/chapter_02/page_artifacts/page_16/04_page_16.html \
--output-json ../output/chapter_02/page_artifacts/page_16/06_validation_structure.json \
--strict-structure
# Exit code:
# 0 = VALID (continue to next skill)
# 1 = INVALID (STOP pipeline)
Hook Integration
When validation FAILS:
# Trigger hook: .claude/hooks/validate-structure.sh
# Receives:
# - Page number
# - HTML file path
# - Validation report path
# - Error details
# Hook behavior:
# - Log failure with details
# - Save error report
# - Notify user
# - STOP pipeline (no further processing)
Error Recovery
If validation fails:
- User reviews validation report
- User identifies issue in AI-generated HTML
- Options:
- Fix HTML manually and re-validate
- Re-run AI generation with improved prompt
- Review source extraction data for errors
- Proceed with caution (expert override)
Quality Metrics
Validation provides metrics:
- Percentage of checks passing
- Error severity levels
- Content size (word count, element count)
- Structure complexity
These metrics feed into final quality reports.
Success Criteria
✓ Validation completes successfully ✓ All structural checks pass (0 errors) ✓ Validation report saved in JSON format ✓ Exit code 0 returned (or 1 if invalid) ✓ Clear error messages if validation fails
Next Steps After PASS
If validation passes:
- All pages of chapter processed through this gate
- Skill 4 (consolidate pages) merges individual page HTMLs
- Quality Gate 2 (semantic validate) checks semantic structure
- Continue through validation pipeline
Next Steps After FAIL
If validation fails:
- PIPELINE STOPS
- Hook
validate-structure.shtriggered - User receives error report with details
- User must fix issues and retry
Design Notes
- This is the first deterministic quality gate
- Uses proven
validate_html.pytool - Catches structural issues before semantic analysis
- Provides clear, actionable error messages
- Essential for ensuring pipeline reliability
Testing
To test structure validation:
# Test with known-good HTML
python3 validate_html.py ../output/chapter_01/chapter_01.html
# Should show: ✓ VALID
# Test with invalid HTML (if needed)
python3 validate_html.py broken_html.html
# Should show: ✗ INVALID with specific errors