
Star, snowflake, and galaxy schemas are three common data warehouse schema designs used for organizing and structuring data for efficient querying and reporting.
1. Star Schema:
- Description: In a star schema, a central fact table is connected to multiple dimension tables through foreign key relationships. The fact table contains quantitative data (measurements or metrics), while the dimension tables store descriptive attributes that provide context to the data in the fact table.
- Example:
- Fact Table: Sales
- Columns: OrderID, ProductID, CustomerID, DateID, Quantity, Revenue
- Dimension Tables:
- Products
- Columns: ProductID, ProductName, Category, Brand
- Customers
- Columns: CustomerID, CustomerName, Address, Phone
- Dates
- Columns: DateID, Date, DayOfWeek, Month, Year
- Products
- Fact Table: Sales
2. Snowflake Schema:
- Description: The snowflake schema is an extension of the star schema. In a snowflake schema, dimension tables are normalized, meaning that they are organized into multiple related tables. This results in a structure that resembles a snowflake when visualized.
- Example:
- Fact Table: Sales
- Columns: OrderID, ProductID, CustomerID, DateID, Quantity, Revenue
- Dimension Tables:
- Products
- Columns: ProductID, ProductName
- ProductDetails
- Columns: ProductID, Category, Brand
- Customers
- Columns: CustomerID, CustomerName
- CustomerDetails
- Columns: CustomerID, Address, Phone
- Dates
- Columns: DateID, Date, DayOfWeek, Month, Year
- Products
- Fact Table: Sales
3. Galaxy Schema (Constellation Schema):
- Description: The galaxy schema is a hybrid design that combines elements of both the star and snowflake schemas. It allows for more flexibility and complexity by connecting multiple fact tables to multiple dimension tables. This can be useful when dealing with multiple business processes or areas.
- Example:
- Fact Table 1: Sales
- Columns: OrderID, ProductID, CustomerID, DateID, Quantity, Revenue
- Fact Table 2: Inventory
- Columns: ProductID, WarehouseID, StockLevel, LastStockUpdate
- Dimension Tables:
- Products
- Columns: ProductID, ProductName, Category, Brand
- Customers
- Columns: CustomerID, CustomerName, Address, Phone
- Dates
- Columns: DateID, Date, DayOfWeek, Month, Year
- Warehouses
- Columns: WarehouseID, WarehouseName, Location
- Products
- Fact Table 1: Sales
Conclusion:
Each schema design has its own advantages and considerations. The star schema is often preferred for its simplicity and ease of use in query performance, while the snowflake schema offers normalization benefits at the cost of increased complexity. The galaxy schema provides a balance between the two, allowing for more intricate data relationships. The choice of schema depends on the specific requirements and characteristics of the data warehouse environment.
For more detailed explanation refer to below video:
The Datapedia, sponsored by “The Data Channel”,
Leave a comment