| name | data-systems-architecture |
| description | Use when designing databases for data-heavy applications, making schema decisions for performance, choosing between normalization and denormalization, selecting storage/indexing strategies, planning for scale, or evaluating OLTP vs OLAP trade-offs. Also use when encountering N+1 queries, ORM issues, or concurrency problems. |
Data Systems Architecture
Overview
Core principle: Good data system architecture balances reliability (correct operation under faults), scalability (handling growth gracefully), and maintainability (enabling productive change over time). Every architectural decision involves trade-offs between these concerns.
This skill synthesizes knowledge from three foundational texts:
- Designing Data-Intensive Applications (Kleppmann) - distributed systems, storage engines, scaling
- The Art of PostgreSQL (Fontaine) - PostgreSQL-specific patterns, SQL as programming
- PostgreSQL Query Optimization (Dombrovskaya et al.) - execution plans, performance tuning
When to Use
| Symptom |
Start With |
| Designing a new database/schema |
01-foundational-principles.md |
| Normalization vs denormalization decisions |
02-data-modeling.md |
| Need to understand OLTP vs OLAP |
03-storage-engines.md |
| Slow queries, index selection |
04-indexing.md |
| Planning for growth, read replicas |
05-scaling-patterns.md |
| Race conditions, deadlocks, isolation issues |
06-transactions-concurrency.md |
| N+1 queries, ORM problems, application integration |
07-application-integration.md |
Navigation
Reference Files (Load as needed)
01-foundational-principles.md - Reliability/Scalability/Maintainability, load parameters
02-data-modeling.md - Normalization, denormalization, schema design patterns
03-storage-engines.md - B-trees, LSM-trees, OLTP vs OLAP, PostgreSQL internals
04-indexing.md - Index types, compound indexes, covering indexes, maintenance
05-scaling-patterns.md - Replication, partitioning, sharding strategies
06-transactions-concurrency.md - ACID, isolation levels, MVCC, locking patterns
07-application-integration.md - ORM pitfalls, N+1, business logic placement, batch processing
Quick Decision Framework
New system design?
├─ Yes → Read 01, then 02 for data model
└─ No → What's the problem?
├─ "Queries are slow" → Read 04 (indexing) + 03 (storage patterns)
├─ "Data is inconsistent" → Read 02 (modeling) + 06 (transactions)
├─ "Can't handle the load" → Read 05 (scaling) + 03 (OLTP vs OLAP)
├─ "App makes too many queries" → Read 07 (N+1, ORM patterns)
└─ "Race conditions/deadlocks" → Read 06 (concurrency)
Core Concepts (Quick Reference)
The Three Pillars
| Concern |
Definition |
Key Question |
| Reliability |
System works correctly under faults |
What happens when things fail? |
| Scalability |
Handles growth gracefully |
What's 10x load look like? |
| Maintainability |
Easy to operate and evolve |
Can new engineers understand this? |
Data Model Selection
| Model |
Best For |
Avoid When |
| Relational |
Many-to-many relationships, joins, consistency |
Highly hierarchical data, constant schema changes |
| Document |
Self-contained docs, tree structures |
Need for joins, many-to-many |
| Graph |
Highly connected data, recursive queries |
Simple CRUD, no relationship traversal |
OLTP vs OLAP
| Aspect |
OLTP |
OLAP |
| Query pattern |
Point lookups, few rows |
Aggregates, many rows |
| Optimization |
Index everything used in WHERE |
Fewer indexes, full scans OK |
| Storage |
Row-oriented |
Consider column-oriented |
Index Type Quick Reference
| Type |
Use Case |
PostgreSQL |
| B-tree |
Equality, range, sorting |
Default, most queries |
| Hash |
Equality only |
Faster for exact match |
| GIN |
Arrays, JSONB, full-text |
@>, @@ operators |
| GiST |
Geometric, range types |
PostGIS, nearest-neighbor |
| BRIN |
Large, naturally ordered tables |
Time-series data |
Isolation Levels
| Level |
Prevents |
PostgreSQL Default? |
| Read Committed |
Dirty reads |
Yes |
| Repeatable Read |
+ Non-repeatable reads |
No |
| Serializable |
All anomalies |
No (uses SSI) |
Design Checklist
Before finalizing a data architecture:
References
- Kleppmann, M. Designing Data-Intensive Applications (O'Reilly, 2017)
- Fontaine, D. The Art of PostgreSQL (2nd ed., 2020)
- Dombrovskaya, H., Novikov, B., Bailliekova, A. PostgreSQL Query Optimization (Apress, 2021)