| name | data-schema-knowledge-modeling |
| description | Use when designing database schemas, need to model domain entities and relationships clearly, building knowledge graphs or ontologies, creating API data models, defining system boundaries and invariants, migrating between data models, establishing taxonomies or hierarchies, user mentions "schema", "data model", "entities", "relationships", "ontology", "knowledge graph", or when scattered/inconsistent data structures need formalization. |
Data Schema & Knowledge Modeling
Table of Contents
Purpose
Create rigorous, validated models of entities, relationships, and constraints that enable correct system implementation, knowledge representation, and semantic reasoning.
When to Use
Invoke this skill when you need to:
- Design database schema (SQL, NoSQL, graph) for new application
- Model complex domain with many entities and relationships
- Build knowledge graph or ontology for semantic search/reasoning
- Define API data models and contracts
- Create taxonomies or classification hierarchies
- Establish data governance and canonical models
- Migrate legacy schemas to modern architectures
- Resolve ambiguity in domain concepts and relationships
- Enable data integration across systems
- Document system invariants and business rules
Common trigger phrases:
- "Design a schema for..."
- "Model the entities and relationships"
- "Create a knowledge graph"
- "What's the data model?"
- "Define the ontology"
- "How should we structure this data?"
- "Map relationships between..."
- "Design the API data model"
What Is It
Data schema & knowledge modeling is the process of formally defining:
- Entities - Things that exist (User, Product, Order, Organization)
- Attributes - Properties of entities (name, price, status, createdAt)
- Relationships - Connections between entities (User owns Order, Product belongsTo Category)
- Constraints - Rules and invariants (unique email, price > 0, one primary address)
- Cardinality - How many of each (one-to-many, many-to-many)
Quick example: E-commerce schema:
- Entities: User, Product, Order, Cart, Payment
- Relationships: User has many Orders, Order contains many Products (via OrderItems), User has one Cart
- Constraints: Email must be unique, Order total matches sum of OrderItems, Payment amount equals Order total
- Result: Unambiguous model that prevents data inconsistencies
Workflow
Copy this checklist and track your progress:
Data Schema & Knowledge Modeling Progress:
- [ ] Step 1: Gather domain requirements and scope
- [ ] Step 2: Identify entities and attributes
- [ ] Step 3: Define relationships and cardinality
- [ ] Step 4: Specify constraints and invariants
- [ ] Step 5: Validate and document the model
Step 1: Gather domain requirements and scope
Ask user for domain description, core use cases (what queries/operations will this support), existing data (if migration/integration), performance/scale requirements, and technology constraints (SQL vs NoSQL vs graph database). Understanding use cases shapes the model - OLTP vs OLAP vs graph traversal require different designs. See Schema Types for guidance.
Step 2: Identify entities and attributes
Extract nouns from requirements (those are candidate entities). For each entity, list attributes with types and nullability. Use resources/template.md for systematic entity identification. Verify each entity represents a distinct concept with independent lifecycle. Document entity purpose and examples.
Step 3: Define relationships and cardinality
Map connections between entities (one-to-one, one-to-many, many-to-many). For many-to-many, identify junction tables/entities. Specify relationship directionality and optionality (can X exist without Y?). Use resources/methodology.md for complex relationship patterns like hierarchies, polymorphic associations, and temporal relationships.
Step 4: Specify constraints and invariants
Define uniqueness constraints, foreign key relationships, check constraints, and business rules. Document domain invariants (rules that must ALWAYS be true). Identify derived/computed attributes vs stored. Use resources/methodology.md for advanced constraint patterns and validation strategies.
Step 5: Validate and document the model
Create data-schema-knowledge-modeling.md file with complete schema definition. Validate against use cases - can the schema support required queries/operations? Check for normalization (eliminate redundancy) or denormalization (optimize for specific queries). Self-assess using resources/evaluators/rubric_data_schema_knowledge_modeling.json. Minimum standard: Average score ≥ 3.5.
Schema Types
Choose based on use case and technology:
Relational (SQL) Schema
- Best for: Transactional systems (OLTP), strong consistency, complex queries with joins
- Pattern: Normalized tables, foreign keys, ACID transactions
- Example use cases: E-commerce orders, banking transactions, HR systems
- Key decision: Normalization level (3NF for consistency vs denormalized for read performance)
Document/NoSQL Schema
- Best for: Flexible/evolving structure, high write throughput, denormalized reads
- Pattern: Nested documents, embedded relationships, no joins
- Example use cases: Content management, user profiles, event logs
- Key decision: Embed vs reference (embed for 1-to-few, reference for 1-to-many)
Graph Schema (Ontology)
- Best for: Complex relationships, traversal queries, semantic reasoning, knowledge graphs
- Pattern: Nodes (entities), edges (relationships), properties on both
- Example use cases: Social networks, fraud detection, recommendation engines, scientific research
- Key decision: Property graph vs RDF triples
Event/Time-Series Schema
- Best for: Audit logs, metrics, IoT data, append-only data
- Pattern: Immutable events, time-based partitioning, aggregation tables
- Example use cases: User activity tracking, monitoring, financial transactions
- Key decision: Raw events vs pre-aggregated summaries
Dimensional (Data Warehouse) Schema
- Best for: Analytics (OLAP), aggregations, historical reporting
- Pattern: Fact tables + dimension tables (star/snowflake schema)
- Example use cases: Business intelligence, sales analytics, customer 360
- Key decision: Star schema (denormalized) vs snowflake (normalized dimensions)
Common Patterns
Pattern: Entity Lifecycle Modeling Track entity state changes explicitly. Example: Order (draft → pending → confirmed → shipped → delivered → completed/cancelled). Include status field, timestamps for each state, and transitions table if history needed.
Pattern: Soft Deletes
Never physically delete records - add deletedAt timestamp. Allows data recovery, audit compliance, and referential integrity. Filter WHERE deletedAt IS NULL in queries.
Pattern: Polymorphic Associations Entity relates to multiple types. Example: Comment can be on Post or Photo. Options: (1) separate foreign keys (commentableType + commentableId), (2) junction tables per type, (3) single table inheritance.
Pattern: Temporal/Historical Data Track changes over time. Options: (1) Effective/expiry dates per record, (2) separate history table, (3) event sourcing (store all changes as events). Choose based on query patterns.
Pattern: Multi-tenancy Isolate data per customer. Options: (1) Separate databases (strong isolation), (2) Shared schema with tenantId column (efficient), (3) Separate schemas in same DB (balance). Add tenantId to all queries if shared.
Pattern: Hierarchies Model trees/nested structures. Options: (1) Adjacency list (parentId), (2) Nested sets (left/right values), (3) Path enumeration (materialized path), (4) Closure table (all ancestor-descendant pairs). Trade-offs between read/write performance.
Guardrails
✓ Do:
- Start with use cases - schema serves queries/operations
- Normalize first, then denormalize for specific performance needs
- Document all constraints and invariants explicitly
- Use meaningful, consistent naming conventions
- Consider future evolution - design for extensibility
- Validate model against ALL required use cases
- Model the real world accurately (don't force fit to technology)
✗ Don't:
- Design schema in isolation from use cases
- Premature optimization (denormalize before measuring)
- Skip constraint definitions (leads to data corruption)
- Use generic names (data, value, thing) - be specific
- Ignore cardinality and nullability
- Model implementation details in domain entities
- Forget about data migration path from existing systems
- Create circular dependencies between entities
Quick Reference
Resources:
resources/template.md- Structured process for entity identification, relationship mapping, and constraint definitionresources/methodology.md- Advanced patterns: temporal modeling, graph ontologies, schema evolution, normalization strategiesresources/examples/- Worked examples showing complete schema designs with validationresources/evaluators/rubric_data_schema_knowledge_modeling.json- Quality assessment before delivery
When to choose which resource:
- Simple domain (< 10 entities) → Start with template
- Complex domain or graph/ontology → Study methodology for advanced patterns
- Need to see examples → Review examples folder
- Before delivering to user → Always validate with rubric
Expected deliverable:
data-schema-knowledge-modeling.md file containing: domain description, complete entity definitions with attributes and types, relationship mappings with cardinality, constraint specifications, diagram (ERD/graph visualization), validation against use cases, and implementation notes.
Common schema notations:
- ERD (Entity-Relationship Diagram): Visual representation of entities and relationships
- UML Class Diagram: Object-oriented view with inheritance and associations
- Graph Diagram: Nodes and edges for graph databases
- JSON Schema: API/document structure with validation rules
- SQL DDL: Executable CREATE TABLE statements
- Ontology (OWL/RDF): Semantic web knowledge representation