EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

HDFS Commands

By Anandkumar MurugesanAnandkumar Murugesan

Home » Data Science » Data Science Tutorials » Hadoop Tutorial » HDFS Commands

hdfs commands

Introduction to HDFS Commands

Big data is a word for datasets that are so huge or compound that conventional data processing application software is not enough to pact with them. Hadoop is an open-source, Java-based programming framework that chains the processing and storage space of enormously bulky data sets in a disseminated computing environment. Apache software foundation is the key to installing Hadoop. In this topic, we will learn about the different HDFS commands

Features of HDFS

  • HDFS runs on Master/slave architecture
  • HDFS uses files for storing the user-related data
  • holds a huge set of directories and files which are stored in a hierarchical format.
  • A file is ripped into smaller blocks inside, and these blocks are stored in a set of Datanodes.
  • Namenode and Datanode are the portion of software intended to run on product machines that classically run on GNU/Linux OS.

Namenode

  • Here the file system is maintained by name node.
  • Namenode is also responsible for logging all the file system changes moreover maintains an image of the complete file system namespace and file Blockmap in memory.
  • Checkpointing is done periodically. Hence easy recovery to the stage before the crash point can be achieved here.

Datanode

  • A Datanode provisions data in files in its local file system
  • To intimate on its existence, the data node sends the heartbeat to the namenode
  • A block report will be generated for every 10th heartbeat received
  • Replication is implied in the data stored in these data nodes.

Data Replication

  • Here the sequence of blocks form a file with a default block size of 128 MB
  • All blocks in the file apart from the final are of a similar size.
  • From every data nodes in the cluster, the namenode element receives a heartbeat.
  • BlockReport contains all the blocks on a Datanode.
  • Holds a huge set of directories and files which are stored in a hierarchical format.
  • A file is ripped into smaller blocks inside, and these blocks are stored in a set of Datanodes.
  • Namenode and Datanode are the portion of software intended to run on product machines that classically run on GNU/Linux OS.

Job tracker: JobTracker debate to the NameNode to conclude the position of the data. Also, locate the most acceptable TaskTracker nodes to carry out tasks based on the data locality.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Task tracker: A TaskTracker is a node in the cluster that accepts tasks – Map, Reduce and Shuffle operations – from a JobTracker.

Secondary Name node (or) checkpoint node: Gets the EditLog from the name node in regular intervals and applies it to its FS image. And copies back a completed FS image to the name node during its restart. The secondary Name node’s whole purpose is to have a checkpoint in HDFS.

YARN

  • YARN has a central resource manager component that manages resources and assigns the resources to every application.
  • Here the Resource Manager is the master who adjudicates the resources associated with the cluster; the resource manager is coiled of two components: the application manager and a scheduler. These two components manage the jobs on the cluster systems. Another component calls the Node Manager (NM) responsible for managing the users’ jobs and workflow on a given node.
  • The Standby NameNode holds an exact replication of the data inactive namenode. It acts as a slave, maintains enough state to supply a fast failover, if essential.

Basic HDFS Commands

Given Below is the basic commands:

Basic HDFS Commands

Sr.No HDFS Command Property HDFS Command
1 Print Hadoop version $ Hadoop version
2 List the contents of the root directory in HDFS $ Hadoop fs -ls
3 Report the amount of space used and available on a currently mounted filesystem $ Hadoop fs -df hdfs:/
4 The HDFS balancer re-balances data across the DataNodes, moving blocks from over-utilized to under-utilized nodes. $ Hadoop balancer
5 Help command $ Hadoop fs -help

Intermediate HDFS Commands

Given Below is the intermediate commands:

Intermediate HDFS Commands

Sr.No HDFS Command Property HDFS Command
6 creates a directory at the specified HDFS location $ Hadoop fs -mkdir /user/Cloudera/
7 Copies data from one location to another $ Hadoop fs -put data/sample.txt /user/training/Hadoop
8 See the space occupied by a particular directory in HDFS $ Hadoop fs -du -s -h /user/Cloudera/
9 Remove a directory in Hadoop $ Hadoop fs -rm -r /user/cloudera/pigjobs/
10 Removes all the files in the given directory $ hadoop fs -rm -skipTrash hadoop/retail/*
11 To empty the trash $ hadoop fs -expunge
12 copies data from and to local to HDFS $ hadoop fs -copyFromLocal /home/cloudera/sample/ /user/cloudera/flume/

$ hadoop fs -copyToLocal /user/cloudera/pigjobs/* /home/cloudera/oozie/

Advanced HDFS Commands

Given Below is the advanced commands:

Intermediate HDFS Commands

Sr.No HDFS Command Property HDFS Command
13 change file permissions $ sudo -u hdfs hadoop fs -chmod 777 /user/cloudera/flume/
14 set data replication factor for a file $ hadoop fs -setrep -w 5 /user/cloudera/pigjobs/
15 Count the number of directories, files, and bytes under hdfs $ Hadoop fs -count hdfs:/
16 make namenode exit safe mode $ sudo -u hdfs hdfs dfsadmin -safemode leave
17 Hadoop format a namenode $hadoop namenode -format

Tips and tricks to Use HDFS Commands

1)  We can achieve faster recovery when the cluster node count is higher.

2)  The increase in storage per unit time increases the recovery time.

Popular Course in this category
Sale
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)20 Online Courses | 14 Hands-on Projects | 135+ Hours | Verifiable Certificate of Completion | Lifetime Access | 4 Quizzes with Solutions
4.5 (9,598 ratings)
Course Price

View Course

Related Courses
Data Scientist Training (85 Courses, 67+ Projects)Machine Learning Training (20 Courses, 29+ Projects)MapReduce Training (2 Courses, 4+ Projects)

3)  Namenode hardware has to be very reliable.

4)  Sophisticated monitoring can be achieved through ambari.

5)  System starvation can be decreased by increasing the reducer count.

Recommended Articles

This has been a guide to HDFS Commands. We discussed HDFS commands, features, basic, intermediate, and advanced commands with pictorial representation, with useful tips and tricks.  You can also go through our other suggested articles to learn more –

  1. Hadoop Ecosystem
  2. Hadoop fs Commands
  3. HDFS Architecture
  4. HDFS Federation

Hadoop Training Program (20 Courses, 14+ Projects)

20 Online Courses

14 Hands-on Projects

135+ Hours

Verifiable Certificate of Completion

Lifetime Access

4 Quizzes with Solutions

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Hadoop Tutorial
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HDFS ls
    • Hadoop Stack
    • HBase Commands
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Configuration
    • Hadoop Components
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
    • Hadoop Namenode
    • Hadoop data lake
    • Hadoop fsck
    • HDFS File System
    • Hadoop Distributed File System
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop Administrator
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions

Related Courses

Data Science Certification

Online Machine Learning Training

Hadoop Certification

MapReduce Certification Course

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

Independence Day Offer - Hadoop Training Program (20 Courses, 14+ Projects) Learn More