EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hadoop Tutorial Hadoop Streaming
Secondary Sidebar
Hadoop Tutorial
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop?Administrator
    • Hadoop DistCp
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Distributed File System (HDFS)
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Configuration
    • Hadoop Components
    • Hadoop WordCount
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
    • Hadoop Namenode
    • Hadoop data lake
    • Hadoop fsck
    • HDFS File System
    • Hadoop Distributed File System
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HDFS ls
    • Hadoop Stack
    • HBase Commands
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions

Related Courses

Data Science Certification

Online Machine Learning Training

Hadoop Certification

MapReduce Certification Course

Hadoop Streaming

By Priya PedamkarPriya Pedamkar

Hadoop Streaming

What is Hadoop Streaming?

Hadoop Streaming is defined as a utility which comes Hadoop distribution that is used to execute program analysis of big data using programming languages such as Jave, Unix, Perl, Python, Scala, etc. as this gives the user the liberty to create and run MapReduce jobs with the scripts hence it’s used for real-time data ingestion which can be used in different real-time apps(like watching stock portfolio, share market analysis, narrating weather report, etc.) It is a Hadoop distribution with utility. Utility helps us to create and run specific MapReduce jobs with an executable or the script as the mapper and/or reducer.

Understanding

There are java utilities provided by the Hadoop distribution, which are called Hadoop streaming. The utility is packaged in a JAR file. Using the utility, we can create and run MapReduce jobs with an executable script. Moreover, we can create executable scripts to run mapper and reducer functions. The executable scripts are passed to Hadoop streaming using a command. After the scripts are passed to Hadoop streaming, the Hadoop streaming utility creates a map and reduce jobs and submit them to the cluster. These jobs can also be monitored with this utility.

How does it Work?

The script specified for mapper and reducer works as below-

After the complete initialization of the mapper script, it will launch the instance of the script with different process ids. The mapper task, while running, takes the input lines and passes them to the standard input. At the same time, the outputs from the process’s standard output are collected by the mapper. It converts each line into a key-value pair. The set of key-value pairs is then collected as the output from the mapper. The key-value pair is selected based on the first tab character. The part of the line up to the initial tab is selected as key, while the rest of the line is selected as a valuable part. In case the tab is not present in a line, then the total line is selected as key, and there is no value part for the line. This can be adjusted according to business needs.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Purpose

It is used for real-time data ingestion, which can be used in different real-time apps. There are different real-time apps like watching stock portfolios, share market analysis, narrating weather report, traffic alerts which are done using Hadoop streaming.

Working of Hadoop Streaming

Below is a simple example of how it works:

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,584 ratings)

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /bin/wc

The input command is used to provide the input directory, while the output command is used to provide the output directory. The mapper command is used to specify the executable mapper class, while the reducer command is used to specify the executable reducer class.

Advantages

Below are the advantages explained:

advantages of haddop streaming

1. Availability

This doesn’t require any extra separate software to be installed and managed. There are other tools like a pig, hive which can be installed I need to be managed separately.

2. Learning

It doesn’t require learning new technologies. It can be leveraged with minimum Unix skills for data analysis.

3. Reduce Development Time

It requires to write mapper and reducer code while developing streaming applications in Unix, whereas doing the same work using Java MapReduce application is more complex and needs to be compiled first, then test, then package, followed by exporting JAR file, and then run.

4. Faster Conversion

It takes very little time to convert data from one format to another using Hadoop streaming. We can use it for converting data from text file to sequence file and then again from sequence file to text file and many others. This can be achieved using input format and output format options in Hadoop streaming.

5. Testing

Input and output data can be quickly tested by using it with Unix or Shell Script.

6. Requirement for Business

For simple business requirements like simple filtering operations and simple aggregation operation, we can use this with Unix.

7. Performance

Using this, we can get better performance while working with streaming data. There are also several disadvantages of Hadoop streaming which are addressed by using other tools in the Hadoop package like Kafka, Flume, spark.

Why do we need Hadoop Streaming?

It helps in real-time data analysis, which is much faster using MapReduce programming running on a multi-node cluster. There are different Technologies like spark Kafka and others which helps in real-time Hadoop streaming.

How will this technology help you in career growth?

Nowadays, all major enterprises are moving to Hadoop for their data analysis, and many of them may require analysis of real-time data. The demand for the use of real-time data and processing of the same day by day and this technology is creating a lot of scope for individual career growth.

Conclusion

It offers a huge range of advantages for different real-time data processing using streaming data.

Recommended Articles

This is a guide to Hadoop Streaming. Here we discuss the basic concept, need, purpose, working advantages, and disadvantages along with the career growth of Hadoop Streaming. You can also go through our other suggested articles to learn more –

  1. What is Hadoop Cluster?
  2. What Is Data Mining?
  3. What is Data Visualization
  4. What is Data Modeling?
  5. Complete Guide to Kafka Tools
Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)
  20 Online Courses |  14 Hands-on Projects |  135+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Data Scientist Training (85 Courses, 67+ Projects)4.9
Machine Learning Training (20 Courses, 29+ Projects)4.8
MapReduce Training (2 Courses, 4+ Projects)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more