EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hadoop Tutorial Hadoop Architecture
Secondary Sidebar
Hadoop Tutorial
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Configuration
    • Hadoop Components
    • Hadoop WordCount
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
    • Hadoop Namenode
    • Hadoop data lake
    • Hadoop fsck
    • HDFS File System
    • Hadoop Distributed File System
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HDFS ls
    • Hadoop Stack
    • HBase Commands
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop?Administrator
    • Hadoop DistCp
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Distributed File System (HDFS)
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions

Related Courses

Data Science Certification

Online Machine Learning Training

Hadoop Certification

MapReduce Certification Course

Hadoop Architecture

hadoop-architecture

Introduction to Hadoop Architecture

Hadoop architecture is an open-source framework used to process extensive data quickly using distributed computing concepts where the data is spread across different nodes of the clusters. This architecture follows a master-slave structure where it is divided into two steps of processing and storing data. The MapReduce performs these steps and HDFS where the MapReduce does the processing while the HDFS does the storing.

Hadoop Architecture

  • This architecture’s basic idea is that the entire storing and processing are done in two steps and two ways. The first step is processing, which reduces programming, and the second-way step is storing the data done on HDFS.
  • It has a master-slave architecture for storage and data processing. The master node for data storage in Hadoop is the name node. There is also a master node that monitors and parallels data processing by using Hadoop Map Reduce.
  • The slaves are other machines in the Hadoop cluster that help store data and perform complex computations. Each slave node has been assigned with a task tracker, and a data node has a job tracker, which helps run the processes and synchronize them effectively. This type of system can be set up either on the cloud or on-premise.
  • The Name Node is a single point of failure when it is not running on high availability mode. The Hadoop architecture also has provisions for maintaining a stand by Name node to safeguard the system from losses. Previously there were secondary name nodes that acted as a backup when the primary name node was down.

HDFS-1.1

FSimage and Edit Log

  • FSimage and Edit Log ensure File System Metadata’s Persistence to keep up with all information and name node stores the metadata in two files. These files are the FSimage and the edit log. The job of FSimage is to keep a complete snapshot of the file system at a given time. The changes that are constantly being made in a system need to be kept a record of. These incremental changes, like renaming or appending details to the file, are stored in the edit log.
  • The framework provides a better option than creating a new FSimage every time, a better chance to store the data while a new file for FSimage. FSimage creates a new snapshot every time changes are made. If the Name node fails, it can restore its previous state. The secondary name node can also update its copy whenever there are changes in FSimage and edit logs. Thus, it ensures that even though the name node is down, there will not be any loss of data in the presence of a secondary name node. Name node does not require that these images have to be reloaded on the secondary name node.

Data Replication

  • HDFS is designed to process data fast and provide reliable data. It stores data across machines and in large clusters. All files are stored in a series of blocks. These blocks are replicated for fault tolerance. The block size and replication factor can be decided by the users and configured as per the user requirements. By default, the replication factor is 3. The replication factor can be specified at the time of file creation, and it can be changed later.
  • The name node makes all decisions regarding these replicas. The name node keeps sending heartbeats and block report at regular intervals for all data nodes in the cluster. The receipt of the heartbeat implies that the data node is working properly. Block report specifies the list of all blocks present on the data node.

Block replication-1

Placement of Replicas

  • The placement of replicas is a critical task in Hadoop for reliability and performance. All the different data blocks are placed on other racks. The implementation of replica placement can be done as per reliability, availability and network bandwidth utilization. The cluster of computers can be spread across different frames. Not more than two nodes can be placed on the same rack. The third replica should be placed on a separate shelf to ensure more reliability of data.
  • The two nodes on the rack communicate through different switches. The name node has the rack id for each data node. But placing all nodes on other shelves prevents loss of any data and allows bandwidth usage from multiple frames. It also cuts the inter-rack traffic and improves performance. Also, the chance of rack failure is significantly less as compared to that of node failure. It reduces the aggregate network bandwidth when data is being read from two unique racks rather than three.

Map Reduce

Map Reduce is used for the processing of data which is stored on HDFS. It writes distributed data across distributed applications which ensures efficient processing of large amounts of data. They process on large clusters and require a commodity which is reliable and fault-tolerant. The core of Map-reduce can be three operations like mapping, collecting pairs, and shuffling the resulting data.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Conclusion – Hadoop Architecture

Hadoop is an open-source framework that helps in a fault-tolerant system. It can store large amounts of data and helps in storing reliable data. The two parts of storing data in HDFS and processing it through map-reduce help work correctly and efficiently. It has an architecture that helps manage all blocks of data and have the most recent copy by storing it in FSimage and edit logs. The replication factor also helps to have copies of data and get them back whenever there is a failure. HDFS also moves removed files to the trash directory for optimal usage of space.

Recommended Articles

This has been a guide to Hadoop Architecture. Here we have discussed the architecture, map-reduce, placement of replicas, data replication. You can also go through our other suggested articles to learn more –

  1. Become a Hadoop Developer
  2. Introduction To Android
  3. What is MapReduce in Hadoop?
  4. Hadoop Commands
Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)
  20 Online Courses |  14 Hands-on Projects |  135+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Data Scientist Training (85 Courses, 67+ Projects)4.9
Machine Learning Training (20 Courses, 29+ Projects)4.8
MapReduce Training (2 Courses, 4+ Projects)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more