| name | Knowledge Base Manager |
| description | Design, build, and maintain comprehensive knowledge bases. Bridges document-based (RAG) and entity-based (graph) knowledge systems. Use when building knowledge-intensive applications, managing organizational knowledge, or creating intelligent information systems. |
| version | 1.0.0 |
Knowledge Base Manager
Build and maintain high-quality knowledge bases for AI systems and human consumption.
Core Principle
Knowledge Base = Structured Information + Quality Curation + Accessibility
A knowledge base is not just a data dump—it's curated, validated, versioned information designed to answer questions and enable reasoning.
When to Use Knowledge Bases
Use Knowledge Bases When:
- ✅ Need to answer factual questions consistently
- ✅ Information changes frequently and needs version control
- ✅ Multiple sources need to be unified and reconciled
- ✅ Provenance and citation tracking is critical
- ✅ Building AI systems that need grounded, verifiable information
- ✅ Organizational knowledge needs to be preserved and searchable
- ✅ Complex domain with interconnected concepts
Don't Use Knowledge Bases When:
- ❌ Static documentation is sufficient (use docs + search)
- ❌ No one will maintain/update it (knowledge rot guaranteed)
- ❌ Simple FAQ covers all questions (<50 items)
- ❌ Information doesn't change (static site faster/cheaper)
- ❌ Team lacks resources for curation
Knowledge Base Types: Decision Framework
1. Document-Based Knowledge Base (RAG)
What it is: Collection of documents, chunked and embedded for semantic search
Best for:
- Technical documentation
- Support articles, FAQs
- Policy documents
- Research papers
- Blog content
- User manuals
Strengths:
- Easy to add new documents
- Preserves full context
- Natural for text-heavy content
Weaknesses:
- Hard to query relationships ("Who works where?")
- Duplicate information across documents
- Difficult to keep facts consistent
Use: rag-implementer skill + vector-database-mcp
2. Entity-Based Knowledge Base (Knowledge Graph)
What it is: Network of entities (people, places, things) connected by relationships
Best for:
- Organizational charts
- Product catalogs with relationships
- Social networks
- Recommendation systems
- Fraud detection
- Supply chain tracking
Strengths:
- Excellent for "how are X and Y related?" queries
- Consistent facts (one source of truth)
- Powerful traversal ("friends of friends")
Weaknesses:
- Upfront modeling required (ontology design)
- Harder to add unstructured information
- Learning curve for graph queries
Use: knowledge-graph-builder skill + graph-database-mcp
3. Hybrid Knowledge Base (RAG + Graph)
What it is: Documents for unstructured knowledge + Graph for structured entities/relationships
Best for:
- Enterprise knowledge management
- Research with citations and relationships
- Medical systems (documents + patient/drug relationships)
- Legal systems (cases + precedents + entities)
- E-commerce (products + specs + relationships)
Strengths:
- Best of both worlds
- Flexible for different knowledge types
- Rich querying capabilities
Weaknesses:
- Most complex to build and maintain
- Requires expertise in both RAG and graphs
- Higher infrastructure costs
Use: Both rag-implementer + knowledge-graph-builder skills
Decision Tree: Which KB Type?
What kind of knowledge do you have?
├─ Mostly unstructured text (docs, articles, content)?
│ └─ Document-Based KB (RAG)
│ Use: rag-implementer skill
│
├─ Mostly structured entities with relationships?
│ └─ Entity-Based KB (Graph)
│ Use: knowledge-graph-builder skill
│
└─ Mix of both?
└─ Hybrid KB (RAG + Graph)
Use: Both skills + This skill for integration
6-Phase Knowledge Base Implementation
Phase 1: Knowledge Audit & Architecture
Goal: Understand what knowledge exists and how to structure it
Actions:
Inventory existing knowledge sources
- Internal: databases, documents, wikis, Slack, emails
- External: public data, APIs, third-party sources
- Tribal: SME interviews, recorded conversations
Classify knowledge types
- Factual: Verifiable facts ("Product X costs $50")
- Procedural: How-to knowledge ("How to deploy")
- Conceptual: Definitions and explanations
- Relationship: Connections between entities
Choose KB architecture
- Document-based? Entity-based? Hybrid?
- Decision: Use framework above
Define knowledge schema
- For documents: metadata fields (source, date, author, category)
- For entities: ontology (entity types, relationship types, properties)
Validation:
- All knowledge sources inventoried and prioritized
- KB architecture chosen and justified
- Schema defined and validated with users
- Success metrics established
Phase 2: Knowledge Curation & Ingestion
Goal: Transform raw information into high-quality knowledge
Actions:
Extract knowledge from sources
- Automated: scraping, API ingestion, file parsing
- Manual: expert input, annotation, validation
Clean and normalize
- Remove duplicates
- Standardize formats
- Fix inconsistencies
- Enrich with metadata
Structure knowledge
- For documents: chunk intelligently (semantic boundaries)
- For entities: extract entities, relationships, properties
Add provenance
- Source URL or reference
- Last updated timestamp
- Author/contributor
- Confidence score (if applicable)
Curation Best Practices:
- Single Source of Truth: One canonical answer per question
- Deduplication: Merge similar knowledge entries
- Conflict Resolution: When sources disagree, establish priority rules
- Metadata Richness: More metadata = better filtering and search
Validation:
- Knowledge extracted and structured
- Quality metrics above threshold (accuracy >95%)
- Provenance tracked for all entries
- Sample queries return relevant results
Phase 3: Storage & Retrieval Setup
Goal: Implement technical infrastructure for knowledge access
Architecture Patterns:
For Document-Based KB:
// Vector database for semantic search
interface DocumentKB {
store: 'Pinecone' | 'Weaviate' | 'pgvector';
chunks: {
content: string;
embedding: number[];
metadata: {
source: string;
title: string;
updated_at: string;
category: string;
};
}[];
}
For Entity-Based KB:
// Graph database for relationship queries
interface EntityKB {
store: 'Neo4j' | 'ArangoDB';
nodes: {
id: string;
type: 'Person' | 'Organization' | 'Product' | 'Concept';
properties: Record<string, any>;
}[];
relationships: {
from: string;
to: string;
type: string;
properties: Record<string, any>;
}[];
}
For Hybrid KB:
// Both vector DB + graph DB
interface HybridKB {
vectorDB: DocumentKB;
graphDB: EntityKB;
linker: {
// Links documents to entities mentioned in them
linkDocumentToEntities(docId: string): string[];
// Links entities to documents that mention them
linkEntityToDocuments(entityId: string): string[];
};
}
Actions:
Choose database(s)
- Document: Pinecone, Weaviate, pgvector
- Entity: Neo4j, ArangoDB
- Hybrid: Both + linking layer
Implement search/query layer
- Vector similarity search (for documents)
- Graph traversal (for entities)
- Hybrid queries (combining both)
Add caching and optimization
- Cache frequent queries
- Optimize for common access patterns
Validation:
- Database deployed and accessible
- Search/query functionality working
- Performance meets requirements (<100ms for most queries)
Phase 4: Quality Control & Validation
Goal: Ensure knowledge base accuracy and reliability
Quality Metrics:
- Accuracy: % of correct answers to test questions
- Coverage: % of user questions answerable
- Freshness: Average age of knowledge
- Consistency: % of conflicts/contradictions
- Source Quality: % from authoritative sources
Validation Strategies:
1. Test Question Sets Create 100+ test questions with known correct answers:
interface TestQuestion {
question: string;
expected_answer: string;
category: string;
difficulty: 'easy' | 'medium' | 'hard';
}
2. Human Review
- Sample random knowledge entries
- Subject matter expert validation
- User feedback loops
3. Automated Checks
- Duplicate Detection: Find near-identical entries
- Conflict Detection: Find contradictory facts
- Staleness Detection: Flag outdated information
- Citation Validation: Verify sources still exist
4. Continuous Monitoring
interface KBHealthMetrics {
accuracy_score: number; // 0-100
coverage_score: number; // % questions answered
freshness_score: number; // avg days since update
consistency_score: number; // % no conflicts
user_satisfaction: number; // feedback rating
}
Actions:
- Run test question validation (target: >90% accuracy)
- Conduct human review (sample 10% of entries)
- Fix detected issues (duplicates, conflicts, staleness)
- Establish monitoring dashboards
Validation:
- Accuracy >90% on test questions
- Coverage >80% of user questions
- <5% conflicting information
- Monitoring dashboard operational
Phase 5: Versioning & Evolution
Goal: Track knowledge changes over time and enable rollback
Why Versioning Matters:
- Knowledge changes (facts update, policies change)
- Need audit trail (who changed what when)
- Rollback capability (undo bad updates)
- Historical queries ("What was policy on X in 2023?")
Versioning Strategies:
1. Snapshot Versioning
interface KnowledgeEntry {
id: string;
content: string;
version: number;
created_at: string;
updated_at: string;
updated_by: string;
changelog: string;
previous_version?: string; // ID of prior version
}
2. Event Sourcing
interface KnowledgeEvent {
event_id: string;
entity_id: string;
event_type: 'created' | 'updated' | 'deleted';
timestamp: string;
changes: {
field: string;
old_value: any;
new_value: any;
}[];
author: string;
}
3. Git-Style Versioning
- Treat knowledge like code
- Commit-based changes
- Branch for experimental knowledge
- Merge when validated
Actions:
- Implement version tracking
- Add changelog for all updates
- Create rollback mechanism
- Build version comparison tools
Validation:
- All changes tracked with versions
- Rollback tested and working
- Historical queries supported
- Audit trail complete
Phase 6: Maintenance & Governance
Goal: Keep knowledge base healthy long-term
Maintenance Tasks:
Daily:
- Monitor for errors and failures
- Review user feedback
- Address urgent corrections
Weekly:
- Review new content submissions
- Update time-sensitive knowledge
- Run automated quality checks
Monthly:
- Audit knowledge freshness
- Review and resolve conflicts
- Analyze usage patterns
- Update stale content
Quarterly:
- Comprehensive quality audit
- Schema/ontology review
- Performance optimization
- User satisfaction survey
Governance Framework:
1. Roles & Responsibilities
- Knowledge Owners: Domain experts responsible for content
- Curators: Review and approve changes
- Contributors: Submit new knowledge
- Consumers: Use knowledge and provide feedback
2. Change Process
Submit → Review → Approve → Publish → Monitor
3. Quality Standards
- Minimum source quality requirements
- Citation requirements
- Update frequency requirements
- Conflict resolution process
Actions:
- Establish maintenance schedule
- Assign roles and responsibilities
- Create governance documentation
- Train team on processes
Validation:
- Maintenance schedule in place
- Governance documented and communicated
- Team trained on processes
- Quality trending upward
Knowledge Base Anti-Patterns
❌ Anti-Pattern 1: Data Dump Without Curation
Problem: Ingesting everything without quality filtering
Impact: Low signal-to-noise ratio, poor search results, user frustration
Solution: Curate before ingesting. Quality > Quantity
❌ Anti-Pattern 2: No Version Control
Problem: Knowledge changes but no history tracked
Impact: Can't audit changes, can't rollback errors, no accountability
Solution: Implement versioning from Phase 5
❌ Anti-Pattern 3: Stale Knowledge
Problem: Knowledge base outdated but no one knows
Impact: AI systems hallucinate using old facts, users get wrong answers
Solution: Freshness monitoring + scheduled updates
❌ Anti-Pattern 4: Duplicate Information
Problem: Same fact in multiple places, becomes inconsistent
Impact: Conflicting answers, confused users
Solution: Deduplication + single source of truth
❌ Anti-Pattern 5: No Provenance
Problem: Knowledge without source citations
Impact: Can't verify accuracy, can't trace errors
Solution: Always track source + timestamp + author
Integration with Other Skills
With rag-implementer
- Use for document-based portion of hybrid KB
- Follow RAG implementation phases
- Integrate vector search with KB queries
With knowledge-graph-builder
- Use for entity-based portion of hybrid KB
- Follow graph design patterns
- Integrate graph traversal with KB queries
With data-engineer
- For ETL pipelines (extract, transform, load knowledge)
- For data quality monitoring
- For performance optimization
With quality-auditor
- For automated quality checks
- For testing and validation
- For continuous monitoring
With technical-writer
- For knowledge documentation
- For user guides on KB usage
- For governance documentation
Tools & Technologies
Document-Based KB Stack
- Vector DB: Pinecone, Weaviate, pgvector
- Embeddings: OpenAI, Cohere, custom
- Search: Semantic + keyword hybrid
Entity-Based KB Stack
- Graph DB: Neo4j, ArangoDB
- Query: Cypher, AQL
- Visualization: Neo4j Bloom, Gephi
Curation Tools
- Deduplication: Custom algorithms, fuzzy matching
- Conflict Detection: Rule-based, ML-based
- Validation: Test question sets, human review
Monitoring
- Metrics: Custom dashboard (Grafana)
- Logging: Structured logging of queries/updates
- Alerts: Freshness, accuracy, error rate alerts
Success Metrics
Knowledge Quality
- Accuracy: >90% on test questions
- Coverage: >80% of user questions answered
- Freshness: <30 days average age
- Consistency: <5% conflicting information
User Satisfaction
- Relevance: >85% query results rated relevant
- Usefulness: >80% users find KB valuable
- Speed: <100ms median query time
Operational Health
- Uptime: >99.9%
- Update frequency: Weekly minimum
- Team engagement: Regular contributions
Common Pitfalls & Solutions
Pitfall 1: "Build it and they will come"
Problem: No user validation, KB doesn't meet needs
Solution: Start with user research, validate continuously
Pitfall 2: Perfectionism
Problem: Waiting to launch until KB is "perfect"
Solution: Launch with 80% coverage, iterate based on usage
Pitfall 3: Over-engineering
Problem: Building complex hybrid system when simple docs would work
Solution: Start simple, add complexity only when needed
Pitfall 4: Maintenance neglect
Problem: Build once, never update
Solution: Establish maintenance schedule from day 1
Quick Start Checklist
Before you start:
- Read this entire skill
- Review
rag-implementerif using document KB - Review
knowledge-graph-builderif using entity KB - Have clear use case and success metrics
Phase 1 - Architecture (Week 1):
- Inventory knowledge sources
- Choose KB type (document/entity/hybrid)
- Define schema/ontology
- Set up infrastructure
Phase 2 - Initial Build (Week 2-3):
- Ingest and curate initial knowledge
- Implement search/query functionality
- Create test question set
- Validate with users
Phase 3 - Iterate (Ongoing):
- Add more knowledge based on usage
- Monitor quality metrics
- Fix issues as discovered
- Establish maintenance cadence
Related Resources
- Skills:
rag-implementer,knowledge-graph-builder,data-engineer,quality-auditor - MCPs:
vector-database-mcp,graph-database-mcp,knowledge-base-mcp,semantic-search-mcp - Patterns:
STANDARDS/architecture-patterns/rag-pattern.md,knowledge-base-pattern.md(coming soon) - Integrations:
INTEGRATIONS/pinecone/,INTEGRATIONS/graph-databases/neo4j/
Further Reading
- The Knowledge Graph Cookbook
- Building Knowledge Bases with LLMs
- RAG: Retrieval-Augmented Generation
- Knowledge Management Best Practices
Remember: A knowledge base is only as good as its curation. Invest in quality from day 1, establish maintenance processes, and iterate based on user feedback. The goal is not to have all knowledge—it's to have the right knowledge, well-organized, and easily accessible.