Introduction to Data warehouse Schema
The Data Warehouse Schema is a structure that rationally defines the contents of the Data Warehouse, by facilitating the operations performed on the Data Warehouse and the maintenance activities of the Data Warehouse system, which usually includes the detailed description of the databases, tables, views, indexes, and the Data, that are regularly structured using predefined design types such as Star Schema, Snowflake Schema, Galaxy Schema (also known as Fact Constellation Schema).
A schema is a logical description that describes the entire database. In the data warehouse there includes the name and description of records. It has all data items and also different aggregates associated with the data. Like a database has a schema, it is required to maintain a schema for a data warehouse as well. There are different schemas based on the setup and data which are maintained in a data warehouse.
Types of Data Warehouse Schema
Following are the three major types of schemas:
- Star Schema
- Snowflake Schema
- Galaxy Schema
There are fact tables and dimension tables that form the basis of any schema in the data warehouse that are important to be understood. The fact tables should have data corresponding data to any business process. Every row represents any event that can be associated with any process. It stores quantitative information for analysis. A dimension table stores data about how the data in fact table is being analyzed. They facilitate the fact table in gathering different dimensions on the measures which are to be taken.
Let us have a look at all these in detail.
1. Star Schema
Here are some of the basic points of star schema which are as follows:
- In a star schema, as the structure of a star, there is one fact table in the middle and a number of associated dimension tables. This structure resembles a star and hence it is known as a star schema.
- The fact table here consists of primary information in the data warehouse. It surrounds the smaller dimension lookup tables which will have details for different fact tables. The primary key which is present in each dimension is related to a foreign key which is present in the fact table.
- This infers that fact table has two types of columns having foreign keys to dimension tables and measures which contain numeric facts. At the center of the star, there is a fact table and the points of the star are the dimension tables.
- The fact tables are in 3NF form and the dimension tables are in denormalized form. Every dimension in star schema should be represented by the only one-dimensional table. The dimension table should be joined to a fact table. The fact table should have a key and measure.
2. Snowflake Schema
Here are some of the basic points of snowflake schema which are as follows:
- Snowflake schema acts like an extended version of a star schema. There are additional dimensions added to Star schema. This schema is known as snowflake due to its structure.
- In this schema, the centralized fact table will be connected to different multiple dimensions. The dimensions present are in normalized form from the multiple related tables which are present. The snowflake structure is detailed and structured when compared to star schema.
- There are multiple levels of relationships and child tables involved that have multiple parent tables. In snowflake schema, the affected tables are only the dimension tables and not the fact tables.
- The difference between star and snowflake schema is that the dimensions of snowflake schema are maintained in such a way that they reduce the redundancy of data. The tables are easy to manage and maintain. They also save storage space.
- However, due to this, it is needed to have more joins in the query in order to execute the query. The further expansion of the tables leads to snowflaking. When a dimension table has a low cardinality attribute of dimensions then it is said to be snowflaked.
- The dimension tables have been divided into segregated normalized tables. Once they are segregated they are further joined with the original dimension table which has a referential constraint. This schema may hamper the performance as the number of tables that are required are more so that the joins are satisfied.
- The advantage of snowflake schema is that it uses small disk space. The implementation of dimensions is easy when they are added to this schema. The same set of attributes are published by different sources.
3. Fact Constellation Schema or Galaxy Schema
Here are some of the basic points of fact constellation schema which are as follows:
- A fact constellation can consist of multiple fact tables. These are more than two tables that share the same dimension tables. This schema is also known as galaxy schema.
- It is viewed as a collection of stars and hence the name galaxy. The shared dimensions in this schema are known as conformed dimensions. The dimensions in this schema are separated into segregated dimensions which are having different levels of hierarchy.
- As an example, we can consider the four levels of hierarchy taking geography into consideration as region, country, state, and city. This galaxy schema has four dimensions. Another way of creating a galaxy schema is by splitting one-star schema into more star schemas.
- The dimensions created as large and built on the basis of hierarchy. This schema is useful when aggregation of fact tables is necessary. Fact constellations are considered to be more complex than star and snowflake schemas. These are considered to be more flexible but hard to implement and maintain.
- This type of schema is usually used for sophisticated applications. The multiple number of tables present in this schema makes it difficult and complex. Implementing this schema is hence difficult. The architecture is thus more complex when compared to star and snowflake schema.
Like the databases have relational schemas where all data is saved and maintained in the form of schemas, the data warehouse also uses the same concept to maintain the data. Having schemas makes it easier to maintain the data. There are three main types of schemas as discussed above. They mainly operate on fact tables and dimension tables. The star schema is the easiest of all schemas. It consists of one fact table surrounded by multiple dimension tables. The snowflake schema has multiple dimension tables that are in normalized form. The last type consists of multiple fact tables. All three schemas segregate data and help in filtering and managing data in an efficient way. These schemas thus play a major role in setting up any environment. The performance of queries can also be enhanced by using these schemas.
This is a guide to Data Warehouse Schema. Here we discuss the different types of data warehouse schema such as star, snowflake, and fact constellation schema in detail. You may also look at the following articles to learn more-