Data Warehouse Design is the process of structuring data in a way that supports fast reporting, analytics, and business intelligence.
A data warehouse is optimized for analysis, not for day-to-day transactions.
What is a Data Warehouse?
A Data Warehouse is a centralized repository that stores:
- Historical data
- Structured data
- Integrated data from multiple sources
It is used for:
- Reporting
- Dashboarding
- Business analysis
- Decision-making
Popular cloud warehouses include:
- Amazon Redshift
- Google BigQuery
- Snowflake
Key Goals of Data Warehouse Design
- Fast query performance
- Easy reporting
- Scalable architecture
- Clean data structure
- Business-friendly schema
OLTP vs OLAP
OLTP (Online Transaction Processing):
- Used for daily operations
- Normalized structure
- Fast inserts/updates
OLAP (Online Analytical Processing):
- Used for analytics
- Denormalized structure
- Fast reads and aggregations
Data warehouses are designed for OLAP.
Core Concepts in Data Warehouse Design
1. Fact Tables
Fact tables store:
- Quantitative data (measures)
- Numeric metrics
Examples:
- Sales amount
- Quantity sold
- Revenue
- Profit
Fact tables usually contain:
- Foreign keys
- Measures
2. Dimension Tables
Dimension tables store:
- Descriptive attributes
- Context for facts
Examples:
- Customer
- Product
- Date
- Region
Dimensions help answer questions like:
- Who?
- What?
- When?
- Where?
Schema Types
1. Star Schema
- One fact table
- Multiple dimension tables
- Simple and fast queries
Most commonly used design.
2. Snowflake Schema
- Normalized dimensions
- More complex structure
- Saves storage but slower queries
Example: Sales Data Warehouse Design
Fact Table:
Fact_Sales
- order_id
- product_id
- customer_id
- date_id
- quantity
- revenue
Dimension Tables:
Dim_Product
Dim_Customer
Dim_Date
Dim_Region
This structure enables easy analytics.
Data Warehouse Layers
Typical architecture:
Data Sources
↓
Staging Layer
↓
Transformation Layer
↓
Data Warehouse
↓
BI Tools
Slowly Changing Dimensions (SCD)
When dimension data changes over time.
Common types:
- Type 1: Overwrite old data
- Type 2: Keep history (add new row)
- Type 3: Store previous value in new column
Type 2 is most common in enterprise systems.
Best Practices
- Use surrogate keys
- Keep fact tables narrow and tall
- Use indexing and partitioning
- Maintain consistent naming conventions
- Separate staging and production layers
- Document data definitions
Performance Optimization
- Partition large tables
- Use clustering
- Pre-aggregate heavy calculations
- Avoid unnecessary joins
Real-World Use Case
E-commerce Data Warehouse:
Business Questions:
- Total sales by month
- Top products by region
- Customer lifetime value
- Profit by category
Well-designed warehouse answers these quickly.
Interview Answer (Short Version)
Data Warehouse Design is the process of structuring analytical data using fact and dimension tables, typically in a star or snowflake schema, to support fast and scalable reporting and business intelligence.
Final Summary
Data Warehouse Design focuses on:
- Fact and dimension modeling
- Analytical performance
- Scalability
- Clean business structure
- Historical data management
It is a core skill for data engineers and BI professionals building enterprise reporting systems.