data-model-design
SKILL.md
Star Schema as Default
For analytics data products, default to star schema with strategic denormalization:
Fact tables contain events at the lowest useful grain:
- One row per event (transaction, visit, measurement, interaction)
- Foreign keys to dimension tables
- Numeric measures (amount, count, duration)
- Timestamps at the grain of the analysis
Dimension tables contain context:
- Descriptive attributes for filtering and grouping
- Human-readable labels alongside codes
- Hierarchies for drill-down (region -> state -> city)
NEVER fully denormalize into One Big Table. Many-to-many relationships cause exponential row growth. A patient with 10 conditions and 5 medications creates 50 rows instead of 15.
ALWAYS start with the query patterns. What questions will consumers ask? Design the schema to make those queries simple. If 80% of queries filter by date and group by category, those should be the primary dimensions.