EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hadoop Tutorial HDFS File System
Secondary Sidebar
Hadoop Tutorial
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Configuration
    • Hadoop Components
    • Hadoop WordCount
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
    • Hadoop Namenode
    • Hadoop data lake
    • Hadoop fsck
    • HDFS File System
    • Hadoop Distributed File System
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HDFS ls
    • Hadoop Stack
    • HBase Commands
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop?Administrator
    • Hadoop DistCp
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Distributed File System (HDFS)
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions

Related Courses

Data Science Certification

Online Machine Learning Training

Hadoop Certification

MapReduce Certification Course

HDFS File System

HDFS File System

Introduction to HDFS File System

In the Hadoop stack, we are having the HDFS service to manage the complete storage part of the Hadoop. It is a distributed file system. It is capable to handle a huge amount of data. One major advantage of the HDFS file system, it will run on commodity hardware. It means, there is no requirement for any specific hardware. In the single Hadoop cluster environment, we can add thousands of data nodes to store the data in the HDFS file system. In Hadoop, the most of services are dependent on the HDFS file system to store the data.

Syntax :

As such, there is no specific syntax available for the HDFS file system. Generally, we are using the number of services on it. As per the requirement or need, we can use the necessary components and use the appropriate syntax of it.

How HDFS File System Works?

The Hadoop file system is a distributed file system. The file system is scalable and portable. It is written in Java language for the Hadoop framework. In Hadoop, there are two major things, first is HDFS and second is MapReduce. The HDFS or HDFS file system is used to store the data. The MapReduce part is used for data processing.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

In the HDFS, below are the services that make it more scalable, portable, robust, etc.

  • Name Node
  • Data Node
  • Secondary Name Node
  • Job Tracker
  • Task Tracker

1. Name Node

In the HDFS file system, the namenode is master Daemons or Services or Nodes. All the master services will communicate with each other. In the HDFS file system, it will consist of the single instance of the Name Node that is active. It is known as the namenode. The namenode will able to track the files, blocks, manage the file system. It will also manage the HDFS file system metadata.

The metadata is having the detail information of a file or block-level information on HDFS level. In specific, the namenode is having the details information of the count of blocks, locations of the files or data on the data node. It will also take care of the HDFS file replication part. The HDFS namenode has direct contact with the HDFS client.

2. Data Node

The datanode is master Daemons or Services or Nodes. The datanode is responsible to store the actual file on the HDFS level. It will store the data in terms of blocks. When the client request for the data then the actual data will share by the datanode only. (Here, namenode will only share the information of the data or file block information). The datanode is a slave daemon. By default, every datanode will send the heartbeat information to the namenode in every 3 seconds. It will help for the namenode to identify the datanode is in a live state.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,408 ratings)

The same process will be going on. If in case, the datanode will not able to send the heartbeat to the namenode (till 2 mins) then the namenode will consider the datanode will dead. If the data node will dead then we are not able to fetch the data on the dead datanode. To avoid this condition, we are having a replication factor. In the HDFS, we are having the replication factor 3. It means that on the HDFS file system, we are having the 3 copy of file or data on the different datanode. If one or two datanodes will fail then there is no issue, we will serve the request from the last copy of the data. It will all manage by the namenode.

3. Secondary Name Node

In the Hadoop file system, the secondary namenode is master Daemons or Services or Nodes. The secondary namenode is also known as the checkpoint node. It is responsible to take care of the metadata checkpoints of the HDFS file system. It will take the metadata information from the active namenode and do the checkpoint. If any issues may happen on the namenode and the namenode may down then the secondary namenode will come in to picture and serving the role of the namenode to the Hadoop ecosystem.

The editlog is a key point to sync up with the live namenode and the secondary namenode. The editlog is responsible to make the secondary namenode will become the active namenode. The editlog will give the detail information to the secondary namenode what was the last update of the namenode from the same information the secondary namenode will start their work and become active (as namenode).

4. Job Tracker

The job tracker is Slave Service. From the client, the MapReduce execution request will receive the job tracker. The job tracker will communicate with the active namenode. The namenode will share the information in terms of the data location. The same data location information will be used for job processing. The active namenode will respond to the block metadata information for the requestor required process of HDFS data.

5. Task Tracker

The task tracker is one of the slave Service. The job tracker will share the job-related information to the task tracker. Most of the time, the task tracker will take the task-related information from the job tracker. It will also responsible to get the overview of the entire task those are running on the HDFS level.

HDFS File System Overview

In the Hadoop environment, we are having the system to store the data in distributed mode. It is scalable and portable also.

Syntax:

In the HDFS summery, we can get the HDFS file system information.

Explanation:

  • As per the below command, we will get a detailed overview of the HDFS file system.

Output:

HDFS File System 1

Conclusion

We have seen the uncut concept of “HDFS File System” with the proper example, explanation and output. This system is used to store a huge amount of data. It is scalable up to thousands of nodes. By default, we are having replication factor 3.

Recommended Articles

This is a guide to HDFS File System. Here we discuss the introduction, how HDFS file system works? and file system overview. You may also have a look at the following articles to learn more –

  1. HDFS Federation
  2. HDFS Commands
  3. HDFS Architecture
  4. What is HDFS?
Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)
  20 Online Courses |  14 Hands-on Projects |  135+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Data Scientist Training (85 Courses, 67+ Projects)4.9
Machine Learning Training (20 Courses, 29+ Projects)4.8
MapReduce Training (2 Courses, 4+ Projects)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more