EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • All Courses
    • All Specializations
  • Blog
  • Enterprise
  • Free Courses
  • All Courses
  • All Specializations
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials Hadoop Tutorial Hadoop Configuration
 

Hadoop Configuration

Updated March 6, 2023

Hadoop Configuration

 

 

Introduction to Hadoop Configuration

In the Hadoop stack, we are having the multiple services in it like HDFS, Yarn, Oozie, MapReduce, Spark, Atlas, Ranger, Zeppelin, Kafka, NiFi, Hive, HBase, etc. Every service is having its own functionality and working methodology. As we have said the working methodology is different than the configuration is also different for the different services. Before doing the Hadoop configuration, we need to take care of the operating system configuration also. In the Hadoop ecosystem, Hadoop configuration will come in the second part. In the primary part, we need to tune up the operating system configuration and make the operating system up to the mark. So, the operating system will able to handle the load of the Hadoop ecosystem.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

Syntax of Hadoop Configuration:

As such, there is no specific syntax available for the Hadoop /. Generally, we are using the number of services on it. As per the requirement or need, we will install the Hadoop services and configure the parameters. In the Hadoop stack, most of the time we will do the configuration form the UI level only. But for the troubleshooting or some different configuration, we will also use the CLI.

Different Hadoop Configuration

Given below are the different Hadoop Configuration:

1. Hadoop Configuration: HDFS

In the Hadoop environment, the Hadoop configuration command is very common. It is using very widely. It will help us to list out the number of files on the HDFS level.

Configuration Properties:

  • client.https.need-auth: It will help to check whether SSL client certificate authentication is required or not for the client and server communication.
  • client.cached.conn.retry: From the cache, the value will define the number of times the HDFS client will able to get a socket. If the number of socket try will be exceeded, the HDFS client will try to create new socket.
  • https.server.keystore.resource: It will be the same resource file from which we will get the SSL server Keystore evidence will be extracted.
  • client.https.keystore.resource: It will be the same resource file from which we will get the SSL server Keystore evidence will be extracted in terms of HTTPS communication.
  • datanode.https.address: It will be the configuration parameter of the datanode secure HTTPS server address and port information.
  • namenode.https-address: It will be the configuration parameter of the namenode secure https server address and port information.
  • qjournal.queued-edits.limit.mb: In terms of the quorum journal edits, it will help to define the queue size. The value will be defined in MB.
  • qjournal.select-input-streams.timeout.ms: In terms of the journal managers, it is the timeout value for accepting streams. The value would be in milliseconds.
  • qjournal.start-segment.timeout.mb: This configuration value will help to define the quorum timeout. The value will be in milliseconds.

Configuration Screenshot:

Hadoop Configuration 1

2. Hadoop Configuration: Yarn

The yarn is very important in terms of resource allocation to the jobs that are running in the Hadoop ecosystem.

Configuration Properties:

  • resource-types: It will be the addition of the resources. We need to define by Comma separated value. It will not include the configuration parameters like memory (in Mb or GB) or vcores values.
  • resource-types.<resource>.units: It will be the default unit for the specified resource type in the yarn config.
  • resource-types.<resource>.minimum-allocation: We can set the value for the minimum request for the definite resource type.
  • resource-types.<resource>.maximum-allocation: We can set the value for the maximum request for the definite resource type.
  • app.mapreduce.am.resource.mb: It will help to configure the memory requested for the application master container. The value will be in MB. The defaults of the configuration are 1536.
  • app.mapreduce.am.resource.memory: It will help to configure the memory requested for the application master container. The value will be in MB. The defaults of the configuration are 1536.
  • app.mapreduce.am.resource.memory-mb: It will help to set the memory requested for the application master. The value will be in MB. The defaults of the configuration are 1536.
  • app.mapreduce.am.resource.cpu-vcores: It will help to configure the CPU requested for the application master container to the value. The value will be in CPU count. The defaults of the configuration are 1.

Configuration Screenshot:

Hadoop Configuration 2

3. Hadoop Configuration: Oozie

In Hadoop, we are using the Oozie service to schedule the Hadoop level job.

Configuration Properties:

  • CATALINA_OPTS: It will help to setup the tomcat server. It will help to run the oozie java configuration or properties. The oozie properties will provided in term of the variable. There is no default value for this configuration.
  • OOZIE_CONFIG_FILE: With the help of this configuration property, we will load the oozie configuration file in the system. The value of this configuration is oozie-site.xml.
  • OOZIE_LOGS: It will help to store the oozie logs information to the specific directory. While installing the oozie server, it will define the value by its own.
  • OOZIE_LOG4J_FILE: With the help of this configuration property, we will load the oozie Log4J configuration file in the system. The value of this configuration is oozie-log4j.properties.
  • OOZIE_LOG4J_RELOAD: It will help to reload the Log4J configuration file on a specific interval of time. The value will be in seconds. The default value is 10.
  • OOZIE_HTTP_PORT: It will help to define the port for the Oozie server. The default port value is 11000.
  • OOZIE_ADMIN_PORT: It will help to define the admin port for the Oozie server. The default port value is 11001.
  • OOZIE_HTTP_HOSTNAME: It will help to define the hostname on which the Oozie server runs. As per the Hadoop architecture, we need to define the specific host or node for it.
  • OOZIE_BASE_URL: It will help to define the base URL for call-back actions URLs to Oozie server. The default configuration value for the property is http://$ {oozie server host name} : $ {oozie server HTTP port} / oozie.

Configuration Screenshot:

Oozie

Conclusion

We have seen the uncut concept of “Hadoop Configuration” with the proper example, explanation and configuration output. In terms of service level tuning, the Hadoop configuration is very important. While doing the Hadoop configuration, we need to consider the other Hadoop level services also. Before doing the Hadoop configuration, we need to tune the operating system as well.

Recommended Articles

This is a guide to Hadoop Configuration. Here we discuss the introduction and different Hadoop configurations respectively. You may also have a look at the following articles to learn more –

  1. Hadoop Versions
  2. Hadoop Commands
  3. What is Yarn in Hadoop?
  4. Hadoop Administrator
Primary Sidebar
Footer
Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW