Updated March 13, 2023
Introduction to Data Warehouse Components
In the article Data Warehouse Components Data warehouse (abbreviated as DWH) is the process of capturing and maintaining business data in Relational Database for Business Intelligence (BI) practices. DWH databases are efficiently designed to store large amounts of historical business transactions for data analysis and to support Business decisions. Moreover, DWH databases are architected and optimized for query processing and multi-dimensional analysis which helps businesses with comparative study, historical data pattern analyzes, and improved decision making.
There are several related industry terms in the BI world, such as
- Decision Support system(DSS)
- Business Information system
- Historical database
- Enterprise Data warehouse (EDW)
Types of Database
There are two types of databases:
1. OLTP(Online Transnational Processing)
- These types of databases are designed for real-time business operations or transactions.
- Optimized for single record insert or retrieve at a time from database table and business validation rules are applied on these tables.
- These databases generally support a very high number of concurrent user interactions to support regular business activities.
- Examples of an OLTP will be processing and string an accounts receivable transaction or an invoice.
2. OLAP (Online Analytical Processing)
- These databases are designed for reporting and analysis of business data.
- Optimized for Bulk load and complex query processing to support reporting needs on large volumes data set.
- The OLAP databases are configured for specific users who are part of the Business Intelligence or Data Analysis department.
- Examples of OLAP would be marinating sales data or order returned data for the business.
The basic DWH architecture consists of the following four major components
- Source systems
- ETL tools
- DWH databases
- BI Tools
1. Source Systems
The source systems are generally business applications like ERP and CRM system, which captures the transaction data from the front ad applications. The transaction data are stored mainly in relational databases or files. ETL tools connect these sources through the connectors and process these data.
2. ETL Tools
There is a popular concept associated with DWH which is called ETL(Extract, Transform & Load) which performs the Data movement from Sources systems or Databases or Files to DWH database. There are several industry demand enterprise ETL tools that are popular like Informatica power center, IBM Data Stage, Ab Initio, Alteryx, Oracle Data Integrator, and Open source tools like Talend.
3. DWH Databases
DWH is RDBMS systems that are generally created using popular database vendors like Oracle, SQL Server, or DB2. Also by the popularity of the cloud, there are several offerings from cloud vendors like AWS which provides Amazon Redshift database, or Microsoft Azure SQL Data warehouse solution are growing in demand.
4. BI Tools
Similarly, There are several BI tools that consume EDW data for BI Reports and charts creation which is referred by Business users to make data-driven business decisions. Some of the BI or Visualization tools are Tableau, Microsoft Power BI, SAP Crystal Reports, Quick View. Some organizations prefer using MS Excel also for basic Data Analysis tasks.
Also, there are several best practices that enhance the basic architecture of DWH, by introducing staging Tables and Datamart.
5. Staging table
These generally work as pre-process data tables which are useful to develop and maintain efficient ETL pipelines.
6. Data Mart
Datamart is a practice pertaining to a single subject-specific or focused business unit data. Whereas DWH is a broader scope of Data which is a superset of the data mart.
Examples: Revenue department data analysis or Sales Data analysis of a particular business unit. whereas DWH example would be a financial analysis of the organization.
Implementation Methodology – Data Warehouse Components
Based on the Datamart approach, there are two types of implementations methods or designs that are being used.
1. Inmon Methodology
- This is also known as the Top-down approach or Enterprise wise approach.
- It involves first creating DWH and subsequently building multiple Data marts.
2. Kimball Methodology
- This is also known as the Bottom-up approach or Data Mart approach.
- It involves first creating multiple data marts and subsequently building the DWH.
Next, we will discuss the data modeling part of the DWH which is the design principle for managing analysis-ready data in a larger volume. There are two popular types of schema which are being used.
- Star Schema
- Snowflake Schema
Before elaborating on these schema let’s understand a few important tables type based upon the data it captures, which are the components of the schema design.
- These are the type of table which stores the measures or facts.
- These tables include numeric measure columns and foreign keys to other dimension tables.
- Example – Sales Data fact table which has Sales in USD and timestamp columns.
- It’s possible to have a fact table without actual fact columns, these are called factless fact tables.
- Example-In an educational system DWH for students, teachers, and course data analysis.
- Also, there are different types of facts
- Additive fact –these are the fact that can be summed up through all the dimensions in the fact table
- Nonadditive fact-These are the facts that cannot be summed up with any of the dimensions in the fact table Example- Height, width, percentage.
- Semi additive fact-these are the facts that are summed up with some of the dimensions in the Fact table
- These are the type of table which stores the descriptive or textual value of the business attributes and most of the data are static in nature.
- These tables are joined to the fact table through the foreign key reference.
- Example – Product Dimension table which contains the product name, product category, product subcategory.
- Also, there is a specific type of dimension table is being used, which is called Confirmed Dimension Table.
- These are the dimension table which is shared by more than one data mart or more than one fact table.
- Example: buying a product and return the product.
- The fact table is surrounded by a dimension table in a star-like structure called a star schema. These dimensions are generally de-normalized for DWH.
- This schema supports the ease of developing reports and efferent summarization of queries.
- Following is a diagrammatical representation of Star schema for Sales Data analysis.
- The fact table is surrounded by de-normalized dimension tables and if anyone or more of the dimension tables are subdivided into another dimension table then it is called snowflake schema.
- This schema supports further drill down of reports by retrieving the related data from the dimension table.
- Example: Sales Data DWH with Snowflake schema where customer dimensions are subdivided further to countries dimension.
- Following is a diagrammatical representation of Snowflake schema for sales data analysis.
In an organization Data modelers or Data Architect are the professionals who develop these types of schemas for Multi-dimensional DWH. And they generally follow these three stages of processes for its designing.
- Conceptual Modelling – the conceptual level of the data model or schema.
- Logical Modelling – the logical level of the data model or schema.
- Physical Modelling – database-specific data model or schema.
Conclusion – Data Warehouse Components
The Data warehouse is in practice in the industry for a very long. It supports BI commonly through Batch processing or processes historical data for analysis. There are several advancements happening in the data warehouse for real-time data analysis or capturing and processing heterogeneous structures of data. The concept and technologies related to Data Lake is the latest trend in the industry which is an advancement of enterprise data warehouse concepts.
This has been a guide to the Data Warehouse Components. Here we also discuss their types, architecture, implementation methodology, and data modeling. You may also have a look at the following articles to learn more–