EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Sqoop Commands

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Data Analytics Basics » Sqoop Commands

Sqoop-commands

Introduction to Sqoop Commands

In Sqoop Commands every row is treated as records and the tasks are subdivided into subtasks by Map Task Internally. The databases that are supported by sqoop are MYSQL, Oracle, IBM, PostgreSQL. Sqoop provides a simple command line, we can fetch data from the different database through sqoop commands. They are written in Java and uses JDBC for connection to other databases.

It stands for ‘SQL to Hadoop’ and Hadoop to SQL and an open source tool. It is an application com connectivity tool that transfers bulk data between the relational database system and Hadoop (Hive, map reduce, Mahout, Pig, HBase). They allow users to specify target location inside of Hadoop and make sqoop to move data from RDMS to target. They provide Optimized MySQL connectors that use database-specific API to do bulk transfers completely. The user import data from external sources like Hive or Hbase. The sqoop has two file formats: delimited text file format and sequence file format.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Here are the Basic Commands of Sqoop Commands

The basic commands are as explained below:

1. List Table

This command lists the particular table of the database in MYSQL server.

Example:

sqoop command list table

2. Target directory

This command import table in a specific directory in HDFS. -m denotes mapper argument. They have an integer value.

Example:

sqoop command target directory

3. Password Protection

Example:

sqoop command Password Protection

4. sqoop-eval

This command runs quickly SQL queries of the respective database.

Example:

sqoop command sqoop-eval

5. sqoop – version

This command displays version of the sqoop.

Example:

sqoop command target sqoop – version

6. sqoop-job

This command allows us to create a job, the parameters that are created can be invoked at any time. They take options like (–create,–delete,–show,–exit).

Example:

sqoop command sqoop-job

7. Loading CSV file to SQL

Example:

sqoop command Loading CSV file to SQL

8. Connector

Example:

Connector

9. code gen

This Sqoop command creates java class files which encapsulate the imported records. All the java files are recreated, and new versions of a class are generated. They generate code to interact with database records. Retrieves a list of all the columns and their datatypes.

Example:

code gen

10. List Database

This Sqoop command lists have all the available database in the RDBMS server.

Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)20 Online Courses | 14 Hands-on Projects | 135+ Hours | Verifiable Certificate of Completion | Lifetime Access | 4 Quizzes with Solutions
4.5 (6,067 ratings)
Course Price

View Course

Related Courses
Data Scientist Training (76 Courses, 60+ Projects)Machine Learning Training (17 Courses, 27+ Projects)Cloud Computing Training (18 Courses, 5+ Projects)

example:

List-database

Intermediate Commands of Sqoop Commands

The intermediate commands are as follows:

1. sqoop -meta store

This command host a shared metadata repository. Multiple /remote users can run several jobs.

Command:

$sqoop .metastore.client.autoconnect.url

example: jdbc:hsqldb:hsql://metastore .example.com/sqoop

2. sqoop -help

This command lists the tools available in sqoop and their purpose.

Command:

$ sqoop help

$ bin/sqoop help import

3. Exporting

This command export data from HDFS to the RDMS Database. In HDFS data are stored as records.

Command:

$ sqoop export\–connect jdbc: mysql://localhost/inventory – username jony –table lib –export -dir/user/jony/inventory.

4. Insert

This command inserts a new record from HDFS to RDBMS table.

Command:

$ sqoop export –connect JDBC:MySQL://localhost/sqoop_export – table emp_exported –export -dir/sqoop/newemp -m -000

5. Update

This command updates the records in the RDBMS from HDFS data.

Command:

$ sqoop export –connect JDBC: MySQL://localhost/sqoop_export – table emp_exported –export -dir/sqoop/newemp -m -000 –update -key id

6. Batch Option

This command insert multiple rows together, they optimize the speed of insertion by using Sqoop JDBC driver.

Command:

$ sqoop export \ -connect JDBC: MySQL://hostname/ <db-name>–username -password -export -dir

7. Split

When this command is used the where clause is applied to the entire SQL.

Command:

$sqoop import -D mysql://jdbc :// where.clause.location =SPLIT –table JUNK –where “rownum<=12”

8. AVRO file into HDFS

They store RDBMS Data as an Avro file.

Command:

$ sqoop import –connect JDBC: MySQL://localhost/Acadgild –username root –password pp.34 –table payment -m1 –target -dir/sqoop_data/payment/avro/ ==as -avrodatfile.

Advanced Commands

The advanced commands are as follows:

1. Import Commands

Import commands have Import control arguments. the various arguments are as follows:

  • boundary: used for creating splits.
  • as – text file: imports plain text data
  • -columns (<col,col> : import columns for table
  • -m,- num: to import parallel mapping tasks
  • split-by: Splits column of the table
  • -z,- compress: compression of the data is enabled.
  1. Incremental import Arguments
  • check – column: Indicates columns to determine which rows to be imported.
  • incremental (mode): indicates new rows (include append and last modifies rows)

Output Line Arguments

  • lines -terminated -by <char> : They set eol character
  • MySQL – delimiters: they set \n fields: lines:

2. Import to Hive

-hive – import: They import tables into hive

-hive – partition-key: Name of the partition is shared.

-hive – overwrite: They overwrite the data in the existing table.

3. Import to Hbase Arguments

-accumulo-table <tablename> : This specifies the target table in HBase.

-accumulo -column<family> : To import it sets the target column.

-accumulo -<username> : To import name of the accumulo

–accumulo -<password >: To import password of the accumulo

4. Storing in Sequence files

$ sqoop import -connect jdbc:mysql ://db.foo.com/emp -table inventory\ – class-name com.foo.com.Inventory -as – sequencefile

5.Query import

This command specify the Sql statement with the -query argument.

$sqoop import \ –query ‘SELECT a.*,b.* from a JOIN b on (a.id=b.id) where $ conditions’\ -split – by /target-dir/user

6. Incremental exports

$ sqoop export –connect –table –username –password –incremental –check-row –last-value

7. Importing all tables to HDFS

$ sqoop import -all – tables –connect jdbc: mysql:// localhost /sale_db — username root.

8. Importing data to Hive

$ sqoop import –connect –table –username –password –hive -import – hive -table

9. Importing data to HBase

Command:

$ sqoop import –connect –table –username –password –hive -import – HBase -table

10. Encode null values

Command:

$ mysql import\–connect JDBC: MySQL://mysql.ex.com/sqoop\–username sqoop\ -password sqoop\–table lib\ –null -string’

Tips and Tricks

If we want to execute the data operations effectively then we must use sqoop, just through a single command line we can perform many tasks and subtasks in it. Sqoop connects to different relational databases through connectors, they make use of the JDBC driver to interact with it. Since sqoop runs on its own source, we can execute sqoop without an installation process. The execution of sqoop is easy as because the execute the data in parallel. Using Map reduce we can import and export data, which in turn provides parallel execution.

Conclusion

To conclude, it regulates the process of importing and exporting the data. Sqoop provides the facility to update the parts of the table by the incremental load. The data import in sqoop is not event-driven. And there comes sqoop2 with enabled GUI for easy access along with command line. The data transfer is fast as they transfer in parallel. They play a vital role in the Hadoop environment. They do their job on its own, not necessary while importing small data sets.

Recommended Articles

This has been a guide to Sqoop Commands. Here we have discussed basic, intermediate as well as advanced Sqoop Commands along with tips and tricks to use. You may also look at the following article to learn more –

  1. Sqoop Interview Questions And Answer
  2. Cheat sheet SQL (Commands, Free Tips, and Tricks)
  3. Important VBA Commands
  4. Tableau Commands
  5. Sqoop | What is Sqoop? | How Work Sqoop?
  6. Learn Import Command in Sqoop with Syntax

Hadoop Training Program (20 Courses, 14+ Projects)

20 Online Courses

14 Hands-on Projects

135+ Hours

Verifiable Certificate of Completion

Lifetime Access

4 Quizzes with Solutions

Learn More

3 Shares
Share
Tweet
Share
Primary Sidebar
Data Analytics Basics
  • Basics
    • What is Natural Language Processing
    • What Is Apache
    • What is Business Intelligence
    • Predictive Modeling
    • What is NoSQL Database
    • Types of NoSQL Databases
    • What is Cluster Computing
    • Uses of Salesforce
    • The Beginners Guide to Startup Analytics
    • Analytics Software is Hiding From You
    • Real Time Analytics
    • Lean Analytics
    • Important Elements of Mudbox Software
    • Business Intelligence Tools (Benefits)
    • Mechatronics Projects
    • Know about A Business Analyst
    • Flexbox Essentials For Beginners
    • Predictive Analytics Tool
    • Data Modeling Tools (Free)
    • Modern Data Integration
    • Crowd Sourcing Data
    • Build a Data Supply Chain
    • What is Minitab
    • Sqoop Commands
    • Pig Commands
    • What is Apache Flink
    • What is Predictive Analytics
    • What is Business Analytics
    • What is Pig
    • What is Fuzzy Logic
    • What is Apache Tomcat
    • Talend Data Integration
    • Talend Open Studio
    • How MapReduce Works
    • Types of Data Model
    • Test Data Generation
    • Apache Flume
    • NoSQL Data Models
    • Advantages of NoSQL
    • What is Juypter Notebook
    • What is CentOS
    • What is MuleSoft
    • MapReduce Algorithms
    • What is Dropbox
    • Pandas.Dropna()
    • Salesforce IoT Cloud
    • Talend Tools
    • Data Integration Tool
    • Career in Business Analytics
    • Marketing Analytics For Dummies
    • Risk Analytics Helps in Risk management
    • Salesforce Certification
    • Tips to Become Certified Salesforce Admin
    • Customer Analytics Techniques
    • What is Data Engineering?
    • Business Analysis Tools
    • Business Analytics Techniques
    • Smart City Application
    • COBOL Data Types
    • Business Intelligence Dashboard
    • What is MDM?
    • What is Logstash?
    • CAP Theorem
    • Pig Architecture
    • Pig Data Types
    • KMP Algorithm
    • What is Metadata?
    • Data Modelling Tools
    • Sqoop Import
    • Apache Solr
    • What is Impala?
    • Impala Database
    • What is Digital Image?
    • What is Kibana?
    • Kibana Visualization
    • Kibana Logstash
    • Kibana_query
    • Kibana Reporting
    • Kibana Alert
    • Longitudinal Data Analysis
    • Metadata Management Tools
    • Time Series Analysis
    • Types of Arduino
    • Arduino Shields
    • What is Arduino UNO?
    • Arduino Sensors
    • Arduino Boards
    • Arduino Application
    • 8085 Architecture
    • Dynatrace Competitors
    • Data Migration Tools
    • Likert Scale Data Analysis
    • Predictive Analytics Techniques
    • Data Governance
    • What is RTK
    • Data Virtualization
    • Knowledge Engineering
    • Data Dictionaries
    • Types of Dimensions
    • What is Google Chrome?
    • Embedded Systems Architecture
    • Data Collection Tools
    • Panel Data Analysis
    • Sqoop Export
    • What is Metabase?

Related Courses

Data Science Certification

Online Machine Learning Training

Cloud Computing Certification

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Hadoop Training Program (20 Courses, 14+ Projects) Learn More