EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hive Tutorial External Table in Hive
Secondary Sidebar
Hive Tutorial
  • Basics
    • Hive JDBC Driver
    • What is a Hive
    • Hive Architecture
    • Hive Installation
    • How To Install Hive
    • Hive Versions
    • Hive Commands
    • Hive Data Types
    • Hive Built-in Functions
    • Hive Function
    • Hive String Functions
    • Date Functions in Hive
    • Hive Table
    • Hive Drop Table
    • Hive Show Tables
    • Hive Group By
    • Hive Order By
    • Hive Cluster By
    • Joins in Hive
    • Hive Inner Join
    • Map Join in Hive
    • Hive nvl
    • Hive UDF
    • Dynamic Partitioning in Hive
    • HiveQL
    • HiveQL Queries
    • HiveQL Group By
    • Partitioning in Hive
    • Bucketing in Hive
    • Views in Hive
    • Indexes in Hive
    • External Table in Hive
    • Hive TimeStamp
    • Hive Database
    • Hive Interview Questions
    • Hive insert into

Related Courses

Hive Certification Course

Hadoop Course Training

All in One Data Science Course

External Table in Hive

By Priya PedamkarPriya Pedamkar

External Table in Hive

Introduction to External Table in Hive

An external table is a table that describes the schema or metadata of external files. Fundamentally, there are two types of tables in HIVE – Managed or Internal tables and external tables. The primary purpose of defining an external table is to access and execute queries on data stored outside the Hive. These data files may be stored in other tools like Pig, Azure storage Volumes (ASV) or any remote HDFS location. Hive assumes that it has no ownership of the data for external tables, and thus, it does not require to manage the data as in managed or internal tables. Also, for external tables, data is not deleted on dropping the table.

Working and Creating External tables in Hive

By default, in Hive table directory is created under the database directory. The exception is the default database. The location user/hive/warehouse does not have a directory so that the default database tables will have its directory directly created under this location. It is recommended to create external tables if we don’t want to use the default location.

An external table is generally used when data is located outside the Hive.

Let us create an external table using the keyword “EXTERNAL” with the below command.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

CREATE EXTERNAL TABLE if not exists students
(
Roll_id   Int,
Class      Int,
Name     String,
Rank      Int)
Row format delimited fields terminated by ‘,’
Location ‘/data/students_details’;

Output:

External Table in Hive

An external table can also be created by copying the schema and data of an existing table, with below command:

CREATE EXTERNAL TABLE if not exists students_v2 LIKE students
Location ‘/data/students_details’;

If we omit the EXTERNAL keyword, then the new table created will be external if the base table is external. Similarly, if the base table is managed with the external keyword, the new table created will be external.

To identify the type of table created, the DESCRIBE FORMATTED clause can be used. At the end of the detailed table description output table type will either be “Managed table” or “External table”.

Partitioned tables help in dividing the data into logical sub-segments or partitions, making query performance more efficient. A partitioned table can be created as seen below.

CREATE EXTERNAL TABLE if not exists students
( roll_id  Int,
name      String,
Rank      Int)
partitioned by (class Int)
Row format delimited fields terminated by ‘\t’

Output:

CREATE EXTERNAL TABLE

There May Be Instances when Partition or Structure of An External Table Is Changed, Then by Using This Command the Metadata Information Can Be Refreshed:

MSCK REPAIR TABLE external_table_name

1. Partitioned external table

While creating a non-partitioned external table, the LOCATION clause is required. But for a partitioned external table, it is not required. ALTER TABLE statement is required to add partitions along with the LOCATION clause.

ALTER TABLE students ADD PARTITION (class =10)
Location ‘here://master_server/data/log_messages/2012/01/02’;

From Hive v0.8.0 onwards, multiple partitions can be added in the same query. Also, the location for a partition can be changed by below query, without moving or deleting the data from the old location.

ALTER TABLE students_v2 partition( class = 10)
Set location ‘s2n://buckets/students_v2/10’;

To drop a partition, below query is used:

ALTER TABLE students DROP IF EXISTS PARTITION (class = 12);

This command will delete the data and metadata of the partition for managed or internal tables. However, for external tables, data is not deleted.

2. Operations on the external table

The operations like SELECT, JOINS, ORDER BY, GROUP BY, CLUSTER BY, and others are implemented on external tables. External tables can be easily joined with other tables to carry out complex data manipulations. Concepts of Partitioning, bucketing and indexing are also implemented on external tables in the same way as for managed or internal tables. All the configuration properties in Hive are applicable to external tables also. For example, by setting skip.header.line. Count = 1, we can skip the header row from the data file.

All File formats like ORC, AVRO, TEXTFILE, SEQUENCE FILE, or PARQUET are supported for Hive’s internal and external tables. On creating a table, positional mapping is used to insert data into the column, and that order is maintained.

Datatypes in external tables: In external tables, the collection data types are also supported along with primitive data types (like integer, string, character). It is necessary to specify the delimiters of the elements of collection data types (like an array, struct, and map).

When to use External Tables in Hive?

Generally, internal tables are created in Hive. But for certain scenarios, an external table can be helpful. These are:

  1. When data is placed outside the Hive or HDFS location, creating an external table helps as the other tools that may be using the table, places no lock on these files.
  2. An external table can be created when data is not present in any existing table (i.e., using the SELECT clause).
  3. The external table must be created if we don’t want Hive to own the data or have other data controls.
  4. The external table also prevents any accidental loss of data, as on dropping an external table, the base data is not deleted. This acts as a security feature in the Hive. This is the reason why TRUNCATE will also not work for external tables.

Features

There are certain features in Hive which are available only for either managed or external tables. These are:

  • Commands like ARCHIVE/UNARCHIVE/TRUNCATE/CONCATENATE/MERGE works only for internal tables.
  • The ACID works only for managed or internal tables.
  • DROP clause will delete only metadata for external tables. However, it deletes underlying data also for internal tables.
  • Query results caching is possible only for managed tables.
  • RELY constraint is allowed on external tables only.
  • Some features of materialized views work only for managed tables.

Conclusion

In this tutorial, we saw when and how to use external tables in Hive. The highlights of this tutorial are to create a background on the tables other than managed and analyzing data outside the Hive.

Recommended Articles

This is a guide to External Table in Hive. Here we discuss introducing External Tables in the Hive and the Features, and Queries. You can also go through our other related articles to learn more –

  1. Partitioning in Hive
  2. Map Join in Hive
  3. Hive Data Types
  4. Hive Installation
  5. Guide to Hive Cluster By
Popular Course in this category
Hive Training (2 Courses, 5+ Projects)
  2 Online Courses |  5 Hands-on Projects |  25+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,408 ratings)
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more