EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hadoop Tutorial Hadoop Versions
Secondary Sidebar
Hadoop Tutorial
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Configuration
    • Hadoop Components
    • Hadoop WordCount
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
    • Hadoop Namenode
    • Hadoop data lake
    • Hadoop fsck
    • HDFS File System
    • Hadoop Distributed File System
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HDFS ls
    • Hadoop Stack
    • HBase Commands
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop?Administrator
    • Hadoop DistCp
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Distributed File System (HDFS)
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions

Related Courses

Data Science Certification

Online Machine Learning Training

Hadoop Certification

MapReduce Certification Course

Hadoop Versions

By Priyanka BanerjeePriyanka Banerjee

Hadoop Versions

What is Hadoop Versions?

Hadoop is a Software which on an open-source framework storing data using a distributed network rather than a centralized one thereby processing the data in a parallel transition. This enables Hadoop to act as one of the most reliable batch processing engine and layered storage and resource management system. As the data beings stored and processed increases in its complexity so do Hadoop where the developers bring out various versions to address the issues (bug fixes) and simplify the complex data processes. The updates are automatically implemented as Hadoop development follows the trunk (base code) – branch (fix)model. Hadoop has two versions: a) Hadoop 1.x (Version 1) and b) Hadoop 2 (Version 2)

Architecture

Implementing Two Hadoop Versions

Below are the two Hadoop Versions:

  • Hadoop 1.x (Version 1)
  • Hadoop 2 (Version 2)

1. Hadoop 1.x

Below are the Components of Hadoop 1.x

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

1. The Hadoop Common Module is a jar file which acts as the base API on top of which all the other components work.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,584 ratings)

2. Version one being the first one to come in existence is rock solid and has got no new updates

3. It has a limitation on the scaling nodes with just a maximum of 4000 nodes for each cluster

4. The functionality is limited utilizing the slot concept, i.e., the slots are capable of running a map task or a reduce task.

5. The next component if the Hadoop Distributed File System commonly known as HDFS, which plays the role of a distributed storage system that is designed to cater to large data, with a block size of 64 MegaBytes (64MB) for supporting the architecture. It is further divided into two components:

Hadoop Version 1

  • Name Node which is used to store metadata about the Data node, placed with the Master Node. They contain details like the details about the slave note, indexing and their respective locations along with timestamps for timelining.
  • Data Nodes used for storage of data related to the applications in use placed in the Slave Nodes.

6. Hadoop 1 uses Map Reduce (MR) data processing model It is not capable of supporting other non-MR tools.

MR has two components:

  • Job Tracker is used to assigning or reassigning task-related (in case scenario fails or shutdown) to MapReduce to an application called task tracker is located in the node clusters. It additionally maintains a log about the status of the task tracker.
  • The Task Tracker is responsible for executing the functions which have been allocated by the job tracker and sensor cross the status report of those task to the job tracker.

7. The network of the cluster is formed by organizing the master node and slave nodes. Which of this cluster is further divided into tracks which contain a set of commodity computers or nodes.

8. Whenever a large storage operation for big data set is is received by the Hadoop system, the data is divided into decipherable and organized blocks that are distributed into different nodes.

2. Hadoop Version 2

Version 2 for Hadoop was released to provide improvements over the lags which the users faced with version 1. Let’s throw some light over the improvements that the new version provides:

Hadoop Version 2

  • HDFS Federation which has improved to provide for horizontal scalability for the name node. Moreover, the namenode was available for a single point of failure only, it is available on varied points. This is going to the Hadoop stat has been increased to include the stacks such as Hive, Pig, which make this tap well equipped enabling me to handle failures pertaining to NameNode.
  • YARN stands for Yey Another Resource Network has been improved with the new ability to process data in the larger term that is petabyte and terabyte to make it available for the HDFS while using the applications which are not MapReduce based. These include applications like MPI and GIRAPH.
  • Version – 2.7.x Released on 31st May 2018: The update focused to provide for two major functionalities that are providing for your application and providing for a global resource manager, thereby improving its overall utility and versatility, increasing scalability up to 10000 nodes for each cluster.
  • Version 2.8.x – Released in September 2018: The updated provided improvements include the capacity scheduler which is designed to provide multi-tenancy support for processing data over Hadoop and it has been made to be accessible for window uses so that there is an increase in the rate of adoption for the software across the industry for dealing with problems related to big data.

Version 3

Below is the latest running Hadoop Updated Version

Version 3.1.x – released on 21 October 2019: This update enables Hadoop to be utilized as a platform to serve a big chunk of Data Analytics Functions and utilities to be performed over event processing alongside using real-time operations give a better result.

  • It has now improved feature work on the container concept which enables had to perform generic which were earlier not possible with version 1.
  • The latest version 3.2.1 released on 22nd September 2019 addresses issues of non-functionality (in terms of support) of data nodes for multi-Tenancy, limitation to you only MapReduce processing and the biggest problem than needed for an alternate data storage which is needed for the real-time processing and graphical analysis.
  • The ever-increasing Avalanche of data and Big Data Analytics pertaining to just business standing at an estimated 169 billion dollars (USD), the predicted growth to 274 billion dollars by 2022, the market seems to be growing ecstatically.
  • This all the more calls for a system that is integrable in its functioning for the abandoned Utah which is growing day by day. Hadoop app great to store, process and access the great solution which works to store process and access this heterogeneous set of data which can be unstructured/ structure in an organized manner.
  • With the feature of constant updates which act as tools to rectify the bugs that developers say while using Hadoop, and the improved versions increase the scope of application and improve the dimension and flexibility of using Hadoop, increases the chances of it is the next biggest to for all functions related to big data processing and Analytics.

Recommended Articles

This is a guide to Hadoop Versions. Here we discuss the Hadoop 2 version in detail also knowing the longest and current running version. You can also go through our other related articles to learn more –

  1. What is HDFS?
  2. Hadoop Administrator
  3. YARN Package Manager
  4. Hadoop YARN Architecture
Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)
  20 Online Courses |  14 Hands-on Projects |  135+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Data Scientist Training (85 Courses, 67+ Projects)4.9
Machine Learning Training (20 Courses, 29+ Projects)4.8
MapReduce Training (2 Courses, 4+ Projects)4.7
1 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more