EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials Hadoop Tutorial HDFS Commands
 

HDFS Commands

Anandkumar Murugesan
Article byAnandkumar Murugesan
EDUCBA
Reviewed byRavi Rathore

Updated June 7, 2023

hdfs commands

 

 

Introduction to HDFS Commands

Big data is a word for datasets that are so huge or compound that conventional data processing application software cannot pact with them. Hadoop is an open-source, Java-based programming framework that chains enormously bulky data sets’ processing and storage space in a disseminated computing environment. Apache software foundation is the key to installing Hadoop. In this topic, we will learn about the different HDFS commands.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

Features of HDFS

  • HDFS runs on Master/slave architecture
  • HDFS uses files to store user-related data
  • and holds a huge set of directories and files in a hierarchical format.
  • A file is ripped into smaller blocks inside and stored in a set of Datanodes.
  • Namenode and Datanode are the portion of software intended to run on product machines that classically run on GNU/Linux OS.

Namenode

  • Here the file system is maintained by name node.
  • Namenode is also responsible for logging all the file system changes moreover maintains an image of the complete file system namespace and file Blockmap in memory.
  • Checkpointing is done periodically. Hence easy recovery to the stage before the crash point can be achieved here.

Datanode

  • A Datanode provisions data in files in its local file system
  • To intimate on its existence, the data node sends the heartbeat to the namenode
  • A block report will be generated for every 10th heartbeat received
  • Replication is implied in the data stored in these data nodes.

Data Replication

  • Here the sequence of blocks form a file with a default block size of 128 MB
  • All blocks in the file, apart from the final, are of a similar size.
  • The namenode element receives a heartbeat from every data node in the cluster.
  • BlockReport contains all the blocks on a Datanode.
  • It holds a huge set of directories and files stored in a hierarchical format.
  • A file is ripped into smaller blocks inside, and these blocks are stored in a set of Datanodes.
  • Namenode and Datanode are the portion of software intended to run on product machines that classically run on GNU/Linux OS.

Job tracker: JobTracker debates with the NameNode to conclude the position of the data. Also, locate the most acceptable TaskTracker nodes to carry out tasks based on the data locality.

Task tracker: A TaskTracker is a node in the cluster that accepts tasks – Map, Reduce, and Shuffle operations – from a JobTracker.

Secondary Name node (or) checkpoint node: Gets the EditLog from the name node regularly and applies it to its FS image. And copies back a completed FS image to the name node during restart. The Secondary Name node’s whole purpose is to have a checkpoint in HDFS.

YARN

  • YARN has a central resource manager component that manages resources and assigns the resources to every application.
  • Here the Resource Manager is the master who adjudicates the resources associated with the cluster; the resource manager is coiled of two components: the application manager and a scheduler. These two components manage the jobs on the cluster systems. Another Node Manager (NM) component contains the users’ jobs and workflow on a given node.
  • The Standby NameNode holds an exact replication of the data inactive namenode. It acts as a slave and maintains enough state to supply a fast failover, if essential.

Basic HDFS Commands

Given Below are the basic commands:

Basic HDFS Commands

Sr.No HDFS Command Property HDFS Command
1 Print Hadoop version $ Hadoop version
2 List the contents of the root directory in HDFS $ Hadoop fs -ls
3 Report the amount of space used and available on a currently mounted filesystem $ Hadoop fs -df hdfs:/
4 The HDFS balancer re-balances data across the DataNodes, moving blocks from over-utilized to under-utilized nodes. $ Hadoop balancer
5 Help command $ Hadoop fs -help

Intermediate HDFS Commands

Given Below are the intermediate commands:

Intermediate HDFS Commands

Sr.No HDFS Command Property HDFS Command
6 creates a directory at the specified HDFS location $ Hadoop fs -mkdir /user/Cloudera/
7 Copies of data from one location to another $ Hadoop fs -put data/sample.txt /user/training/Hadoop
8 See the space occupied by a particular directory in HDFS $ Hadoop fs -du -s -h /user/Cloudera/
9 Remove a directory in Hadoop $ Hadoop fs -rm -r /user/cloudera/pigjobs/
10 Removes all the files in the given directory $ hadoop fs -rm -skipTrash hadoop/retail/*
11 To empty the trash $ hadoop fs -expunge
12 copies data from and to local to HDFS $ hadoop fs -copyFromLocal /home/cloudera/sample/ /user/cloudera/flume/

$ hadoop fs -copyToLocal /user/cloudera/pigjobs/* /home/cloudera/oozie/

Advanced HDFS Commands

Given Below are the advanced commands:

Intermediate HDFS Commands

Sr.No HDFS Command Property HDFS Command
13 change file permissions $ sudo -u hdfs hadoop fs -chmod 777 /user/cloudera/flume/
14 set data replication factor for a file $ hadoop fs -setrep -w 5 /user/cloudera/pigjobs/
15 Count the number of directories, files, and bytes under hdfs $ Hadoop fs -count hdfs:/
16 make namenode exit safe mode $ sudo -u hdfs hdfs dfsadmin -safemode leave
17 Hadoop format a namenode $hadoop namenode -format

Tips and Tricks to Use HDFS Commands

1)  We can recover faster when the cluster node count is higher.

2)  The increase in storage per unit time increases the recovery time.

3)  Namenode hardware has to be very reliable.

4)  Sophisticated monitoring can be achieved through ambari.

5)  System starvation can be decreased by increasing the reducer count.

Recommended Articles

This has been a guide to HDFS Commands. We discussed HDFS commands, features, basic, intermediate, and advanced commands with pictorial representation, with valuable tips and tricks. You can also go through our other suggested articles to learn more –

  1. Hadoop Ecosystem
  2. Hadoop fs Commands
  3. HDFS Architecture
  4. HDFS Federation

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW