Introduction to Sqoop Commands
It stands for ‘SQL to Hadoop’ and Hadoop to SQL and an open source tool. It is an application com connectivity tool that transfers bulk data between the relational database system and Hadoop (Hive, map reduce, Mahout, Pig, HBase). They allow users to specify target location inside of Hadoop and make sqoop to move data from RDMS to target. They provide Optimized MySQL connectors that use database-specific API to do bulk transfers completely. The user import data from external sources like Hive or Hbase. The sqoop has two file formats: delimited text file format and sequence file format.
In Sqoop Commands every row is treated as records and the tasks are subdivided into subtasks by Map Task Internally. The databases that are supported by sqoop are MYSQL, Oracle, IBM, PostgreSQL. Sqoop provides a simple command line, we can fetch data from the different database through sqoop commands. They are written in Java and uses JDBC for connection to other databases.
Here are the Basic Commands of Sqoop Commands :
This command lists the particular table of the database in MYSQL server.
This command import table in a specific directory in HDFS. -m denotes mapper argument. They have an integer value.
This command runs quickly SQL queries of the respective database.
sqoop – version:
This command displays version of the sqoop.
This command allows us to create a job, the parameters that are created can be invoked at any time. They take options like (–create,–delete,–show,–exit).
Loading CSV file to SQL:
This Sqoop command creates java class files which encapsulate the imported records. All the java files are recreated, and new versions of a class are generated. They generate code to interact with database records. Retrieves a list of all the columns and their datatypes.
This Sqoop command lists have all the available database in the RDBMS server.
Intermediate Commands of Sqoop Commands:
1.sqoop -meta store:
This command host a shared metadata repository. Multiple /remote users can run several jobs.
example: jdbc:hsqldb:hsql://metastore .example.com/sqoop
2. sqoop -help:
This command lists the tools available in sqoop and their purpose.
$ sqoop help
$ bin/sqoop help import
This command export data from HDFS to the RDMS Database. In HDFS data are stored as records.
$ sqoop export\–connect jdbc: mysql://localhost/inventory – username jony –table lib –export -dir/user/jony/inventory.
This command inserts a new record from HDFS to RDBMS table.
$ sqoop export –connect JDBC:MySQL://localhost/sqoop_export – table emp_exported –export -dir/sqoop/newemp -m -000
This Sqoop command updates the records in the RDBMS from HDFS data.
$ sqoop export –connect JDBC: MySQL://localhost/sqoop_export – table emp_exported –export -dir/sqoop/newemp -m -000 –update -key id
6. Batch Option:
This command insert multiple rows together, they optimize the speed of insertion by using Sqoop JDBC driver.
$ sqoop export \ -connect JDBC: MySQL://hostname/ <db-name>–username -password -export -dir
When this command is used the where clause is applied to entire SQL.
$sqoop import -D mysql://jdbc :// where.clause.location =SPLIT –table JUNK –where “rownum<=12”
8. AVRO file into HDFS:
They store RDBMS Data as an Avro file.
$ sqoop import –connect JDBC: MySQL://localhost/Acadgild –username root –password pp.34 –table payment -m1 –target -dir/sqoop_data/payment/avro/ ==as -avrodatfile.
Advanced Commands of Sqoop Commands:
Import commands have Import control arguments. the various arguments are as follows:
- boundary: used for creating splits.
- as – text file: imports plain text data
- -columns (<col,col> : import columns for table
- -m,- num: to import parallel mapping tasks
- split-by: Splits column of the table
- -z,- compress: compression of the data is enabled.
Incremental import Arguments:
- check – column: Indicates columns to determine which rows to be imported.
- incremental (mode): indicates new rows (include append and last modifies rows)
Output Line Arguments:
- lines -terminated -by <char> : They set eol character
- MySQL – delimiters: they set \n fields: lines:
Import to Hive
-hive – import: They import tables into hive
-hive – partition-key: Name of the partition is shared.
-hive – overwrite: They overwrite the data in the existing table.
Import to Hbase Arguments:
-accumulo-table <tablename> : This specifies the target table in HBase.
-accumulo -column<family> : To import it sets the target column.
-accumulo -<username> : To import name of the accumulo
–accumulo -<password >: To import password of the accumulo
Storing in Sequence files:
$ sqoop import -connect jdbc:mysql ://db.foo.com/emp -table inventory\ – class-name com.foo.com.Inventory -as – sequencefile
This command specify the Sql statement with the -query argument.
$sqoop import \ –query ‘SELECT a.*,b.* from a JOIN b on (a.id=b.id) where $ conditions’\ -split – by /target-dir/user
$ sqoop export –connect –table –username –password –incremental –check-row –last-value
Importing all tables to HDFS:
$ sqoop import -all – tables –connect jdbc: mysql:// localhost /sale_db — username root.
Importing data to Hive:
$ sqoop import –connect –table –username –password –hive -import – hive -table
Importing data to HBase:
$ sqoop import –connect –table –username –password –hive -import – HBase -table
Encode null values:
$ mysql import\–connect JDBC: MySQL://mysql.ex.com/sqoop\–username sqoop\ -password sqoop\–table lib\ –null -string’
Tips and tricks to use Sqoop Commands:
If we want to execute the data operations effectively then we must use sqoop, just through a single command line we can perform many tasks and subtasks in it. Sqoop connects to different relational databases through connectors, they make use of JDBC driver to interact with it. Since sqoop runs on its own source, we can execute sqoop without an installation process. The execution of sqoop is easy as because the execute the data in parallel. Using Map reduce we can import and export data, which in turn provides parallel execution.
Conclusion – Sqoop Commands :
To conclude, Sqoop Commands regulates the process of importing and exporting the data. Sqoop provides the facility to update the parts of the table by the incremental load. The data import in sqoop is not event-driven. And there comes sqoop2 with enabled GUI for easy access along with command line. The data transfer is fast as they transfer in parallel. They play a vital role in the Hadoop environment. They do their job on its own, not necessary while importing small data sets.
This has been a guide to Sqoop Commands. Here we have discussed basic as well as advanced Sqoop Commands and some immediate Sqoop commands. You may also look at the following article to learn more –