Data Warehouse Architecture
The Three-Tier Data Warehouse Architecture is the commonly used Data Warehouse design in order to build a Data Warehouse by including the required Data Warehouse Schema Model, the required OLAP server type, and the required front-end tools for Reporting or Analysis purposes, which as the name suggests contains three tiers such as Top tier, Bottom Tier and the Middle Tier that are procedurally linked with one another from Bottom tier(data sources) through Middle tier(OLAP servers) to the Top tier(Front-end tools).
Data Warehouse Architecture is the design based on which a Data Warehouse is built, to accommodate the desired type of Data Warehouse Schema, user interface application and database management system, for data organization and repository structure. The type of Architecture is chosen based on the requirement provided by the project team. Three-tier Data Warehouse Architecture is the commonly used choice, due to its detailing in the structure. The three different tiers here are termed as:
Each Tier can have different components based on the prerequisites presented by the decision-makers of the project but are subject to the novelty of their respective tier.
Three-Tier Data Warehouse Architecture
Here is a pictorial representation for the Three-Tier Data Warehouse Architecture
1. Bottom Tier
The Bottom Tier in the three-tier architecture of a data warehouse consists of the Data Repository. Data Repository is the storage space for the data extracted from various data sources, which undergoes a series of activities as a part of the ETL process. ETL stands for Extract, Transform and Load. As a preliminary process, before the data is loaded into the repository, all the data relevant and required are identified from several sources of the system. These data are then cleaned up, to avoid repeating or junk data from its current storage units. The next step is to transform all these data into a single format of storage. The final step of ETL is to Load the data on the repository. Few commonly used ETL tools are:
- Microsoft SSIS
- Apache Kafka
- Ab Initio
- IBM Infosphere
The storage type of the repository can be a relational database management system or a multidimensional database management system. A relational database system can hold simple relational data, whereas a multidimensional database system can hold data that more than one dimension. Whenever the Repository includes both relational and multidimensional database management systems, there exists a metadata unit. As the name suggests, the metadata unit consists of all the metadata fetched from both the relational database and multidimensional database systems. This Metadata unit provides incoming data to the next tier, that is, the middle tier. From the user’s standpoint, the data from the bottom tier can be accessed only with the use of SQL queries. The complexity of the queries depends on the type of database. Data from the relational database system can be retrieved using simple queries, whereas the multidimensional database system demands complex queries with multiple joins and conditional statements.
2. Middle Tier
The Middle tier here is the tier with the OLAP servers. The Data Warehouse can have more than one OLAP server, and it can have more than one type of OLAP server model as well, which depends on the volume of the data to be processed and the type of data held in the bottom tier. There are three types of OLAP server models, such as:
- Relational online analytical processing is a model of online analytical processing which carries out an active multidimensional breakdown of data stored in a relational database, instead of redesigning a relational database into a multidimensional database.
- This is applied when the repository consists of only the relational database system in it.
- Multidimensional online analytical processing is another model of online analytical processing that catalogs and comprises of directories directly on its multidimensional database system.
- This is applied when the repository consists of only the multidimensional database system in it.
- Hybrid online analytical processing is a hybrid of both relational and multidimensional online analytical processing models.
- When the repository contains both the relational database management system and the multidimensional database management system, HOLAP is the best solution for a smooth functional flow between the database systems. HOLAP allows storing data in both the relational and the multidimensional formats.
The Middle Tier acts as an intermediary component between the top tier and the data repository, that is, the top tier and the bottom tier respectively. From the user’s standpoint, the middle tier gives an idea about the conceptual outlook of the database.
3. Top Tier
The Top Tier is a front-end layer, that is, the user interface that allows the user to connect with the database systems. This user interface is usually a tool or an API call, which is used to fetch the required data for Reporting, Analysis, and Data Mining purposes. The type of tool depends purely on the form of outcome expected. It could be a Reporting tool, an Analysis tool, a Query tool or a Data mining tool.
It is essential that the Top Tier should be uncomplicated in terms of usability. Only user-friendly tools can give effective outcomes. Even when the bottom tier and middle tier are designed with at most cautiousness and clarity, if the Top tier is enabled with a bungling front-end tool, then the whole Data Warehouse Architecture can become an utter failure. This makes the selection of the user interface/ front-end tool as the Top Tier, which will serve as the face of the Data Warehouse system, a very significant part of the Three-Tier Data Warehouse Architecture designing process.
Below are the few commonly used Top Tier tools.
- IBM Cognos
- Microsoft BI Platform
- SAP Business Objects Web
- Crystal Reports
- SAP BW
- SAS Business Intelligence
To sum up, the processes involved in the Three Tier Architecture are ETL, querying, OLAP and the results produced in the Top Tier of this three-tier system. The front-end activities such as reporting, analytical results or data-mining are also a part of the process flow of the Data Warehouse system. The end result produced in the top tier is used for business decision making. Hence the quality and efficiency that can grant are palpable. It is also dependent on the competence of the other two tiers. This Three Tier Data Warehouse Architecture helps in achieving the excellence and worthiness that is expected out of a Data Warehouse system.
This is a guide to Three Tier Data Warehouse Architecture. Here we discuss the Introduction and the three tier data warehouse architecture which includes top, middle, and bottom tier. You may also have a look at the following articles to learn more –