Introduction to Data Mart
A pattern used in data warehouse environment to retrieve client data is called data mart. It is a structure specific to the data warehouse and used by the business domain in the team. Every organization has a single data mart which is located in the data warehouse repository. Different types of data mart are dependent, independent and hybrid data marts. Dependent data marts take data that is already created whereas independent data marts take data from external sources and from data warehouse. We can call data marts as logical subsets of data warehouse.
Data Mart vs Data Warehouse:
- A data warehouse is a warehouse with a collection of data from multiple streams of the subject. The maintenance and control part like the collection of raw data and processing them is mainly handled by Corporate Information Technology IT groups which provides various services to the parent organizations.
- The data warehouse is also referred to as a central or enterprise data warehouse. So the source to a data warehouse will be multiple in contrast to the data mart which is a subset of data warehouse in some cases.
Types of Data Mart
There are typically three types:
1. Dependent Data Warehouse
- A dependent data mart is purely from the data warehouse and all the grouped dependent will form an enterprise data warehouse. It is purely a subset of data warehouse since it is created from central DW.
- Since clean and summarised data is already present in the central data warehouse ETT process or Extract Transform and Transportation is simplified. We just need to identify the particular subset here and perform ETT on top of it. These data marts are typically built to achieve better availability and a lot of improved performance with better control and efficiency.
2. Independent Data Mart
- This is not created from the central data warehouse and the source to this can be different. Since data is from other than the central DW ETT process is a bit different.
- Most of the independent data mart is used by a smaller group of organizations and the source to this is also limited. The Independent data mart is generally created when we need to get a solution in a relatively shorter time-bound.
3. Hybrid Data Mart
- Hybrid data mart will allow you to group the data from all other sources other than the central data warehouse DW.
- When we deal with ad hoc integration this will greatly benefit the top work on all the products that got added to the organizations externally.
Features of Data Mart
Below are some of the features mentioned:
- Since the source of the data is concentrated to subject the user response time is enhanced by using it.
- For frequently required data, using data marts will be beneficial since it is subset to central DW and hence data size will be lesser.
- Also since the volume of the data is limited the processing time will be quite reduced compared to central Dws.
- These are basically agile and can accommodate the changes in the model quite quickly and efficiently compared to the data warehouse.
- Datamart requires a single subject expert to handle, in contrast to warehouse data, the expertise we require in multiple subject warehouses. Because of this, we say that data mart is more agile.
- We can segregate access categories to a low level with partitioned data and with data mart, it is a lot easy.
- Infrastructure dependency is quite limited and data can be stored in on different hardware platforms upon segmentation.
Steps to Implement Data Mart
Below are the steps that are required to implement it:
This will be the first step in the implementation where all the required tasks and sources identified to gather technical and business information. Later the logical plan is implemented and upon review, this will be converted to a physical plan. Also, the logical and physical structure of the data is decided here like how to partition the data and field of partition like date or any other file.
This is the second phase of implementation where physical databases were generated with the help of RDBMS, which was determined as part of the design process and logical structures. All the objects like schema, indexes, tables, views, etc are created.
This is the third phase and here data is populated in it upon sourcing the data. All the required transformations are implemented prior to populating the data on it.
This is the next step of implementation where we will use the populated data to query upon for creating reports. End-user use this step to understand the data using queries.
This is the last stage of implementation of the data mart and here various tasks such as access management, system optimization, and tuning, managing and adding fresh data to the data mart and planing recovery scenarios to handle any failure cases are taken care of.
Following are some of the benefits of using it:
- It is one of the best cost-effective alternatives to a data warehouse where you need to work on only a small segment of data.
- Segregation of data from sources will make data mart efficient as a specific group of people can work the data from a specific source instead of all using the data warehouse.
- Faster access to the data is possible by using data mart if we know on which subset we require to access.
- Datamart is lot easier to use so end-users can easily query on top of them.
- Coming to the implementation time data mart requires less time compared with the data warehouse since the data is segregated in groups.
- Historical data from a particular subject can be used for easy trend analysis.
Because it is concentrated on a single functional area, there are numerous advantages both to the process implementor and the end-user. Hence efficient marts implementation is required along with a data warehouse in the organization.
This is a guide to What is Data Mart. Here we discuss introduction, features, top 3 types along with its features and steps. You may also look at the following articles to learn more –
- Oracle Data Warehousing
- R Data Types
- Cassandra Data Modeling
- Complete Guide to Data Model in Cassandra