EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials Hadoop Tutorial Hadoop Admin Interview Questions
 

Hadoop Admin Interview Questions

Priya Pedamkar
Article byPriya Pedamkar

Updated March 1, 2023

Hadoop Admin Interview Questions

 

 

Introduction To Hadoop Admin Interview Questions And Answers

So you have finally found your dream job in Hadoop Admin but are wondering how to crack the 2023 Hadoop Admin Interview and what could be the probable Hadoop Admin Interview Questions. Every interview is different, and the scope of a job is different too. Keeping this in mind, we have designed the most common Hadoop Admin Interview Questions and Answers to help you get success in your interview.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

Following are the Hadoop Admin Interview Questions that will help you in cracking an interview with Hadoop.

Hadoop Admin Interview Questions & Answers

Below are some useful Hadoop Admin Interview Question and Answers

1. What is Rack awareness? And why is it necessary?

Answer:
Rack awareness is about distributing data nodes across multiple racks.HDFS follows the rack awareness algorithm to place the data blocks. A rack holds multiple servers. And for a cluster, there could be multiple racks. Let’s say there is a Hadoop cluster set up with 12 nodes. There could be 3 racks with 4 servers on each. All 3 racks are connected so that all 12 nodes are connected and that form a cluster. While deciding on the track count, the important point to consider is the replication factor. Suppose there is 100GB of data that will flow every day with the replication factor 3. Then it’s 300GB of data that will have to reside on the cluster. It’s a better option to have the data replicated across the racks. Even if any node goes down, the replica will be in another rack.

2. What is the default block size, and how is it defined?

Answer:
128MB, and it is defined in hdfs-site.xml, and also this is customizable depending on the volume of the data and the level of access.  Say, 100GB of data flowing in a day, and the data gets segregated and stored across the cluster. What will be the number of files? 800 files. (1024*100/128) [1024 à converted a GB to MB.] There are two ways to set the customize data block size.

  1. hadoop fs -D fs.local.block.size=134217728 (in bits)
  2. In hdfs-site.xml add this property à block.size with the size of the bits.

If you change the default size to 512MB as the data size is huge, then the no.of files generated will be 200. (1024*100/512)

3. How do you get the report of the hdfs file system? About disk availability and no.of active nodes?

Answer:
Command: sudo -u hdfs dfsadmin –report

These are the list of information it displays,

  1. Configured Capacity – Total capacity available in hdfs
  2. Present Capacity – This is the total amount of space allocated for the resources to reside beside the metastore and fsimage usage of space.
  3. DFS Remaining – It is the amount of storage space still available to the HDFS to store more files
  4. DFS Used – It is the storage space that HDFS has used up.
  5. DFS Used% – In percentage
  6. Under replicated blocks – No. of blocks
  7. Blocks with corrupt replicas – If any corrupted blocks
  8. Missing blocks
  9. Missing blocks (with replication factor 1)

4. What is Hadoop balancer, and why is it necessary?

Answer:
The data spread across the nodes are not distributed in the right proportion, meaning each node’s utilisation might not be balanced. One node might be over-utilized, and the other could be under-utilized. This leads to having high costing effect while running any process, and it would end up running on heavy usage of those nodes. To solve this, Hadoop balancer is used to balance the utilization of the data in the nodes. So whenever a balancer is executed, the data gets moved across where the under-utilized nodes get filled up, and the over-utilized nodes will be freed up.

5. Difference between Cloudera and Ambari?

 Answer:

Cloudera Manager Ambari
Administration tool for Cloudera Administration tool for Horton works
Monitors and manages the entire cluster and reports the usage and any issues Monitors and manages the entire cluster and reports the usage and any issues
Comes with Cloudera paid service Open-source

6. What are the main actions performed by the Hadoop admin?

Answer:
Monitor health of cluster  -Many application pages have to be monitored if any processes run. (Job history server, YARN resource manager, Cloudera manager/ambary depending on the distribution)

turn on security – SSL or Kerberos

Tune performance  – Hadoop balancer

Add new data nodes as needed  – Infrastructure changes and configurations.

Optional to turn on MapReduce Job History Tracking Server à Sometimes restarting the services would help release up cache memory. This is when the cluster with an empty process.

7. What is Kerberos?

Answer:
It’s an authentication required for each service to sync up to run the process. It is recommended to enable Kerberos.  Since we are dealing with distributed computing, it is always good practice to have encryption while accessing the data and processing it. As each node are connected, and any information passage is across a network. As Hadoop uses Kerberos, passwords not sent across the networks. Instead, passwords are used to compute the encryption keys. The messages are exchanged between the client and the server. In simple terms, Kerberos provides identity to each other (nodes) in a secure manner with the encryption.

Configuration in core-site.xml
Hadoop.security.authentication: Kerberos

8. What is the important list of hdfs commands?

Answer:

Commands Purpose
hdfs dfs –ls <hdfs path> To list the files from the hdfs filesystem.
Hdfs dfs –put <local file> <hdfs folder> Copy file from the local system to the hdfs filesystem
Hdfs dfs –chmod 777 <hdfs file> Give a read, write, execute permission to the file.
Hdfs dfs –get <hdfs folder/file> <local filesystem> Copy the file from hdfs filesystem to the local filesystem
Hdfs dfs –cat <hdfs file> View the file content from the hdfs filesystem
Hdfs dfs –rm <hdfs file> Remove the file from the hdfs filesystem. But it will be moved to trash file path (it’s like a recycle bin in windows)
Hdfs dfs –rm –skipTrash <hdfs filesystem> Removes the file permanently from the cluster.
Hdfs dfs –touchz <hdfs file> Create a file in the hdfs filesystem

9. How to check the logs of a Hadoop job submitted in the cluster and how to terminate already running process?

Answer:
yarn logs –applicationId <application_id>    — The application master generates logs on its container, and it will be appended with the id it generates. This will be helpful to monitor the process running status and the log information.

Yarn application –kill <application_id>     — If an existing process that was running in the cluster needs to be terminated, kill command is used where the application id is used to terminate the job in the cluster.

Recommended Articles

This has been a guide to List Of Hadoop Admin Interview Questions and Answers. Here we have listed the most useful 9 interview sets of questions so that the jobseeker can crack the interview with ease. You may also look at the following articles to learn more.

  1. Hadoop Cluster Interview Questions and Answer – Top 10 Most Useful
  2. Data Modeling Interview Questions – 10 Important Question
  3. Hadoop Versions
  4. Hadoop Administrator Jobs

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW