| name | source-unifier |
| description | Merge multiple documentation sources (docs, GitHub, PDF) with conflict detection. Use when combining docs + code for complete skill coverage. |
| tools | Read, Write, Bash, Glob, Grep |
Source Unifier Skill
Purpose
Single responsibility: Intelligently merge documentation from multiple sources (websites, GitHub repos, PDFs) while detecting and transparently reporting conflicts between documented and implemented behavior. (BP-4)
Grounding Checkpoint (Archetype 1 Mitigation)
Before executing, VERIFY:
- All source URLs/paths are accessible
- Each source type is correctly identified (docs, github, pdf)
- Output directory is writable
- Merge mode is specified (rule-based or AI-enhanced)
- Conflict resolution strategy is defined
DO NOT merge without inspecting each source first.
Uncertainty Escalation (Archetype 2 Mitigation)
ASK USER instead of guessing when:
- Conflict severity unclear (is doc or code authoritative?)
- Multiple valid interpretations of API signature
- Source versions don't match (v2 docs vs v3 code)
- Merge strategy produces ambiguous results
NEVER silently resolve conflicts. Always report discrepancies.
Context Scope (Archetype 3 Mitigation)
| Context Type | Included | Excluded |
|---|---|---|
| RELEVANT | All specified sources, merge config | Unrelated documentation |
| PERIPHERAL | Version history for context | Other projects |
| DISTRACTOR | Previous merge attempts | Unrelated codebases |
Conflict Types
| Type | Severity | Description | Example |
|---|---|---|---|
| Missing in code | HIGH | Documented but not implemented | API endpoint in docs, not in code |
| Missing in docs | MEDIUM | Implemented but not documented | Hidden feature in code |
| Signature mismatch | MEDIUM | Different parameters/types | func(a, b) vs func(a, b, c=None) |
| Description mismatch | LOW | Different explanations | Wording differences |
Workflow Steps
Step 1: Verify Sources (Grounding)
# Test documentation URL
curl -I https://docs.example.com/
# Test GitHub repo
gh repo view owner/repo --json name,description
# Test PDF file
file manual.pdf && pdfinfo manual.pdf
Step 2: Create Unified Configuration
{
"name": "myframework",
"description": "Complete framework knowledge from docs + code",
"merge_mode": "rule-based",
"conflict_resolution": {
"missing_in_code": "warn",
"missing_in_docs": "include",
"signature_mismatch": "show_both",
"description_mismatch": "prefer_docs"
},
"sources": [
{
"type": "documentation",
"base_url": "https://docs.example.com/",
"extract_api": true,
"max_pages": 200
},
{
"type": "github",
"repo": "owner/myframework",
"include_code": true,
"code_analysis_depth": "surface",
"max_issues": 100
},
{
"type": "pdf",
"path": "docs/manual.pdf",
"extract_tables": true
}
]
}
Step 3: Execute Unified Scraping
Option A: With skill-seekers
skill-seekers unified --config unified-config.json
Option B: Manual merge workflow
- Scrape each source independently
- Extract API signatures from each
- Compare and detect conflicts
- Generate merged output with conflict annotations
Step 4: Review Conflict Report
The unifier generates a conflict report:
# Conflict Report: myframework
## Summary
- Total APIs analyzed: 245
- Conflicts detected: 18
- Missing in code: 3 (HIGH)
- Missing in docs: 8 (MEDIUM)
- Signature mismatches: 5 (MEDIUM)
- Description mismatches: 2 (LOW)
## HIGH Severity Conflicts
### `deprecated_function()`
- **Status**: Documented but not found in code
- **Documentation**: "Use this function to..."
- **Code**: NOT FOUND
- **Recommendation**: Remove from docs or implement
## MEDIUM Severity Conflicts
### `process_data(input: str)`
- **Status**: Signature mismatch
- **Documentation**: `process_data(input: str)`
- **Code**: `process_data(input: str, validate: bool = True)`
- **Recommendation**: Update documentation to include `validate` parameter
Step 5: Validate Merged Output
# Check merged skill structure
ls -la output/myframework/
# Verify conflict annotations
grep -r "⚠️\|Conflict\|WARNING" output/myframework/references/
# Count conflict markers
grep -c "Conflict" output/myframework/references/*.md
Recovery Protocol (Archetype 4 Mitigation)
On error:
- PAUSE - Preserve partial merge state
- DIAGNOSE - Check error type:
Source unavailable→ Skip source, note in reportParse error→ Check source format, retry with different parserMemory error→ Process sources sequentiallyConflict overflow→ Increase conflict threshold or filter by severity
- ADAPT - Adjust merge strategy
- RETRY - Resume merge (max 3 attempts)
- ESCALATE - Present partial results, ask user for conflict resolution
Checkpoint Support
State saved to: .aiwg/working/checkpoints/source-unifier/
checkpoints/source-unifier/
├── source_1_docs.json # Processed docs
├── source_2_github.json # Processed GitHub
├── source_3_pdf.json # Processed PDF
├── conflicts.json # Detected conflicts
└── merge_progress.json # Current merge state
Resume: skill-seekers unified --config config.json --resume
Output Structure
output/myframework/
├── SKILL.md # Main skill with conflict summary
├── references/
│ ├── index.md # Unified index
│ ├── api_reference.md # Merged API docs (with conflict markers)
│ ├── guides.md # Merged guides
│ └── conflicts.md # Detailed conflict report
├── sources/
│ ├── documentation.md # Original docs content
│ ├── github.md # GitHub-extracted content
│ └── pdf.md # PDF-extracted content
└── metadata/
├── sources.json # Source metadata
└── conflict_summary.json # Machine-readable conflicts
Conflict Markers in Output
Merged content includes inline conflict markers:
#### `process_data(input: str, validate: bool = True)`
⚠️ **Conflict**: Documentation signature differs from implementation
**Documentation says:**
```python
def process_data(input: str) -> dict:
"""Process input data and return results."""
Code implementation:
def process_data(input: str, validate: bool = True) -> dict:
"""Process input data with optional validation."""
Resolution: Documentation should be updated to include the validate parameter added in v2.3.
## Merge Modes
| Mode | Description | Use Case |
|------|-------------|----------|
| `rule-based` | Apply predefined rules for conflict resolution | Fast, deterministic |
| `ai-enhanced` | Use AI to intelligently merge conflicting content | Better quality, slower |
| `manual` | Generate conflicts only, user resolves | Full control |
## Troubleshooting
| Issue | Diagnosis | Solution |
|-------|-----------|----------|
| Too many conflicts | Sources very different | Filter by severity, merge incrementally |
| False positives | Parser differences | Normalize API extraction |
| Missing content | Source incomplete | Add supplementary source |
| Merge too slow | Large sources | Use parallel processing |
## References
- Skill Seekers Unified Scraping: https://github.com/jmagly/Skill_Seekers/blob/main/docs/UNIFIED_SCRAPING.md
- REF-001: Production-Grade Agentic Workflows (BP-4, BP-6 model consortium parallel)
- REF-002: LLM Failure Modes (Archetype 1-4 mitigations)