EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hadoop Tutorial Hadoop Configuration
Secondary Sidebar
Hadoop Tutorial
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Configuration
    • Hadoop Components
    • Hadoop WordCount
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
    • Hadoop Namenode
    • Hadoop data lake
    • Hadoop fsck
    • HDFS File System
    • Hadoop Distributed File System
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HDFS ls
    • Hadoop Stack
    • HBase Commands
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop?Administrator
    • Hadoop DistCp
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Distributed File System (HDFS)
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions

Related Courses

Data Science Certification

Online Machine Learning Training

Hadoop Certification

MapReduce Certification Course

Hadoop Configuration

Hadoop Configuration

Introduction to Hadoop Configuration

In the Hadoop stack, we are having the multiple services in it like HDFS, Yarn, Oozie, MapReduce, Spark, Atlas, Ranger, Zeppelin, Kafka, NiFi, Hive, HBase, etc. Every service is having its own functionality and working methodology. As we have said the working methodology is different than the configuration is also different for the different services. Before doing the Hadoop configuration, we need to take care of the operating system configuration also. In the Hadoop ecosystem, Hadoop configuration will come in the second part. In the primary part, we need to tune up the operating system configuration and make the operating system up to the mark. So, the operating system will able to handle the load of the Hadoop ecosystem.

Syntax of Hadoop Configuration:

As such, there is no specific syntax available for the Hadoop /. Generally, we are using the number of services on it. As per the requirement or need, we will install the Hadoop services and configure the parameters. In the Hadoop stack, most of the time we will do the configuration form the UI level only. But for the troubleshooting or some different configuration, we will also use the CLI.

Different Hadoop Configuration

Given below are the different Hadoop Configuration:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

1. Hadoop Configuration: HDFS

In the Hadoop environment, the Hadoop configuration command is very common. It is using very widely. It will help us to list out the number of files on the HDFS level.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,294 ratings)

Configuration Properties:

  • client.https.need-auth: It will help to check whether SSL client certificate authentication is required or not for the client and server communication.
  • client.cached.conn.retry: From the cache, the value will define the number of times the HDFS client will able to get a socket. If the number of socket try will be exceeded, the HDFS client will try to create new socket.
  • https.server.keystore.resource: It will be the same resource file from which we will get the SSL server Keystore evidence will be extracted.
  • client.https.keystore.resource: It will be the same resource file from which we will get the SSL server Keystore evidence will be extracted in terms of HTTPS communication.
  • datanode.https.address: It will be the configuration parameter of the datanode secure HTTPS server address and port information.
  • namenode.https-address: It will be the configuration parameter of the namenode secure https server address and port information.
  • qjournal.queued-edits.limit.mb: In terms of the quorum journal edits, it will help to define the queue size. The value will be defined in MB.
  • qjournal.select-input-streams.timeout.ms: In terms of the journal managers, it is the timeout value for accepting streams. The value would be in milliseconds.
  • qjournal.start-segment.timeout.mb: This configuration value will help to define the quorum timeout. The value will be in milliseconds.

Configuration Screenshot:

Hadoop Configuration 1

2. Hadoop Configuration: Yarn

The yarn is very important in terms of resource allocation to the jobs that are running in the Hadoop ecosystem.

Configuration Properties:

  • resource-types: It will be the addition of the resources. We need to define by Comma separated value. It will not include the configuration parameters like memory (in Mb or GB) or vcores values.
  • resource-types.<resource>.units: It will be the default unit for the specified resource type in the yarn config.
  • resource-types.<resource>.minimum-allocation: We can set the value for the minimum request for the definite resource type.
  • resource-types.<resource>.maximum-allocation: We can set the value for the maximum request for the definite resource type.
  • app.mapreduce.am.resource.mb: It will help to configure the memory requested for the application master container. The value will be in MB. The defaults of the configuration are 1536.
  • app.mapreduce.am.resource.memory: It will help to configure the memory requested for the application master container. The value will be in MB. The defaults of the configuration are 1536.
  • app.mapreduce.am.resource.memory-mb: It will help to set the memory requested for the application master. The value will be in MB. The defaults of the configuration are 1536.
  • app.mapreduce.am.resource.cpu-vcores: It will help to configure the CPU requested for the application master container to the value. The value will be in CPU count. The defaults of the configuration are 1.

Configuration Screenshot:

Hadoop Configuration 2

3. Hadoop Configuration: Oozie

In Hadoop, we are using the Oozie service to schedule the Hadoop level job.

Configuration Properties:

  • CATALINA_OPTS: It will help to setup the tomcat server. It will help to run the oozie java configuration or properties. The oozie properties will provided in term of the variable. There is no default value for this configuration.
  • OOZIE_CONFIG_FILE: With the help of this configuration property, we will load the oozie configuration file in the system. The value of this configuration is oozie-site.xml.
  • OOZIE_LOGS: It will help to store the oozie logs information to the specific directory. While installing the oozie server, it will define the value by its own.
  • OOZIE_LOG4J_FILE: With the help of this configuration property, we will load the oozie Log4J configuration file in the system. The value of this configuration is oozie-log4j.properties.
  • OOZIE_LOG4J_RELOAD: It will help to reload the Log4J configuration file on a specific interval of time. The value will be in seconds. The default value is 10.
  • OOZIE_HTTP_PORT: It will help to define the port for the Oozie server. The default port value is 11000.
  • OOZIE_ADMIN_PORT: It will help to define the admin port for the Oozie server. The default port value is 11001.
  • OOZIE_HTTP_HOSTNAME: It will help to define the hostname on which the Oozie server runs. As per the Hadoop architecture, we need to define the specific host or node for it.
  • OOZIE_BASE_URL: It will help to define the base URL for call-back actions URLs to Oozie server. The default configuration value for the property is http://$ {oozie server host name} : $ {oozie server HTTP port} / oozie.

Configuration Screenshot:

Oozie

Conclusion

We have seen the uncut concept of “Hadoop Configuration” with the proper example, explanation and configuration output. In terms of service level tuning, the Hadoop configuration is very important. While doing the Hadoop configuration, we need to consider the other Hadoop level services also. Before doing the Hadoop configuration, we need to tune the operating system as well.

Recommended Articles

This is a guide to Hadoop Configuration. Here we discuss the introduction and different Hadoop configurations respectively. You may also have a look at the following articles to learn more –

  1. Hadoop Versions
  2. Hadoop Commands
  3. What is Yarn in Hadoop?
  4. Hadoop Administrator
Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)
  20 Online Courses |  14 Hands-on Projects |  135+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Data Scientist Training (85 Courses, 67+ Projects)4.9
Machine Learning Training (20 Courses, 29+ Projects)4.8
MapReduce Training (2 Courses, 4+ Projects)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more