EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

ETL Process

By Savi JaggaSavi Jagga

Secondary Sidebar
Data Warehouse Tutorial
  • ETL
    • What is Data Mart
    • What is Data Cube
    • What is a Data Lake
    • What is Data Integration
    • What is ETL
    • What is ETL Testing
    • ETL Testing Tools
    • ETL architecture
    • Dimension Table
    • Multidimensional Data Model
    • Fact Constellation Schema
    • ETL Process
  • Basic
    • What is Data Warehouse
    • Data Warehouse tools
    • Career in Data Warehousing
    • Benefits of Data Warehouse
    • Data Warehouse Architecture
    • Data Warehouse Design
    • Data Warehouse Implementation
    • Data Warehouse Features
    • Data Warehouse Modeling
    • Data Warehouse Software
    • Data Warehousing
    • Types of Data Warehouse
    • 10 Popular Data Warehouse Tools
    • Data Lake Architecture
    • Three Tier Data Warehouse Architecture
    • Data Warehouse Process
    • Database Parallelism
    • What is OLTP
    • What is OLAP
    • OLAP Tools
    • Types of OLAP
    • Operations in OLAP
    • MOLAP
    • HOLAP
    • Data Warehouse Schema
    • Data Warehouse Components
    • Snowflake Schema
    • Snowflake Architecture
    • What is Star Schema
    • Galaxy Schema
    • What is Fact Table
    • Kimball Methodology
    • Data Warehouse Testing
    • Operational Data Stores
  • Interview Questions
    • Data Warehouse Interview Questions
    • ETL Interview Questions
    • ETL Testing Interview Questions
    • Data Warehousing Interview Questions

Related Courses

Business Intelligence Course

All in One Data Science Course

Data Visualization Certification Courses

Home Data Science Data Science Tutorials Data Warehouse Tutorial ETL Process

ETL Process

Introduction to ETL Process

ETL is one of the important processes required by Business Intelligence. Business Intelligence relies on the data stored in data warehouses from which many analyses and reports are generated which helps in building more effective strategies and leads to tactical, and operational insights and decision-making. ETL refers to the Extract, Transform, and Load process. It is a kind of data integration step where data coming from different sources gets extracted and sent to data warehouses. Data is extracted from various resources first gets transformed to convert it into a specific format according to business requirements.

Various tools that help to perform these tasks are:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (85,992 ratings)
  • IBM DataStage
  • Abinitio
  • Informatica
  • Tableau
  • Talend

ETL Process

ETL Process

How does it Work?

The ETL process is a 3-step process that starts with extracting the data from various data sources and then raw data undergoes various transformations to make it suitable for storing in data warehouse and load it in data warehouses in the required format and make it ready for analysis.

How does it work ETL Process

Step 1: Extract

This step refers to fetching the required data from various sources that are present in different formats such as XML, Hadoop files, Flat Files, JSON, etc. The extracted data is stored in the staging area where further transformations are being performed. Thus, data is checked thoroughly before moving it to data warehouses otherwise it will become a challenge to revert the changes in data warehouses. A proper data map is required between source and target before data extraction occurs as the ETL process needs to interact with various systems such as Oracle, Hardware, Mainframe, real-time systems such as ATM, Hadoop, etc. while fetching data from these systems.

But one should take care that these systems must remain unaffected while extraction.

Data Extraction Strategies:
  • Full Extraction: This is followed when whole data from sources get loaded into the data warehouses that show either the data warehouse is being populated the first time or no strategy has been made for data extraction.
  • Partial Extraction (with update notification): This strategy is also known as delta, where only the data being changed is extracted and update data warehouses.
  • Partial Extraction (without update notification): This strategy refers to extract specific required data from sources according to load in the data warehouses instead of extracting whole data.

Step 2: Transform

This step is the most important step of ETL. In this step many transformations are performed to make data ready for load in data warehouses by applying the below transformations:

a. Basic Transformations: These transformations are applied in every scenario as they are basic need while loading the data that has been extracted from various sources, in the data warehouses.

  • Data Cleansing or Enrichment: It refers to cleaning the undesired data from the staging area so that wrong data doesn’t get loaded from the data warehouses.
  • Filtering: Here we filter out the required data out of a large amount of data present according to business requirements. For example, for generating sales reports one only needs sales records for that specific year.
  • Consolidation: Data extracted are consolidated in the required format before loading it into the data warehouses.4.
  • Standardizations: Data fields are transformed to bring it in the same required format for eg, the data field must be specified as MM/DD/YYYY.

b. Advanced Transformations: These types of transformations are specific to the business requirements.

  • Joining: In this operation, data from 2 or more sources are combined t generate data with only desired columns with rows that are related to each other
  • Data Threshold Validation Check: Values present in various fields are checked if they are correct or not such as not null bank account numbers in case of bank data.
  • Use Lookups to Merge Data: Various flat files or other files are used to extract the specific information by performing lookup operation on that.
  • Using any Complex Data Validation: Many complex validations are applied to extract valid data only from the source systems.
  • Calculated and Derived values: Various calculations are applied to transform the data into some required information
  • Duplication: Duplicate data coming from the source systems are analyzed and removed before loading it in the data warehouses.
  • Key Restructuring: In the case of capturing slowly changing data, various surrogate keys need to be generated to structure the data in the required format.

MPP-Massive Parallel Processing is used sometimes to perform some basic operations such as filtering or cleansing on data in the staging area to process a large amount of data faster.

Step 3: Load

This step refers to loading the transformed data into the data warehouse from where it can be used to generate many analytics decisions as well as reporting.

  • Initial Load: This type of load occurs while loading data in data warehouses for the first time.
  • Incremental Load: This is the type of load that is done to update the data warehouse on a periodic basis with changes occurring in source system data.
  • Full Refresh: This type of load refers to the situation when complete data of the table is deleted and loaded with fresh data.

The data warehouse then allows OLAP or OLTP features.

Disadvantages of ETL Process

Below are the disadvantages mentioned:

  • Increasing Data: There is a limit to data being extracted from various sources by the ETL tool and pushed to data warehouses. Thus with the increase of data, working with the ETL tool and data warehouses become cumbersome.
  • Customization: This refers to the fast and effective solutions or responses to the data generated by source systems. But using the ETL tool here slows down this process.
  • Expensive: Using a data warehouse to store an increasing amount of data being generated periodically is a high cost an organization needs to pay.

Conclusion

ETL tool comprises of extract, transform and load processes where it helps to generate information from the data gathered from various source systems. The data from the source system can come in any format and can be loaded in any desired format in data warehouses thus ETL tool must support connectivity to all types of these formats.

Recommended Articles

This is a guide to an ETL Process. Here we discuss the introduction, how does it work?, ETL tools and disadvantages respectively. You can also go through our other suggested articles to learn more –

  1. Informatica ETL Tools
  2. ETL Testing Tools
  3. What is ETL?
  4. What is ETL Testing?
Popular Course in this category
All in One Data Science Bundle (360+ Courses, 50+ projects)
  360+ Online Courses |  1500+ Hours |  Verifiable Certificates |  Lifetime Access
4.7
Price

View Course

Related Courses

Business Intelligence Training (12 Courses, 6+ Projects)4.9
Data Visualization Training (15 Courses, 5+ Projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

Special Offer - All in One Data Science Bundle (360+ Courses, 50+ projects) Learn More