Introduction to Views in Apache Hive
Apache Hive supports the features of views, which are logical constructs and treated the same as tables. It makes it possible to save the query, on which all DML (Data Manipulation Language) commands can be performed. It is similarly used as views in SQL, however, views are created based on user requirement. In Hive, the query which is referencing the view is executed first and then the result obtained is used in the rest of the query. Also, the Hive’s query planner helps in the execution plan.
As Apache Hive supports array types, along with other primitive data types, LATERAL VIEW can also be created. The major use of this type of view is to convert or expand the arrays into rows using EXPLODE or INLINE function. Let us take an example for creating a LATERAL VIEW.
SELECT ph.* FROM (SELECT 0) b
LATERAL VIEW explode ( array(98992233442, 98993344556)) ph as ph_number
The output of this query as executed on HUE editor is:
Why Do We Need Hive Views?
Let us know the reason why we need hive views:
- One of the major reasons for using it is to reduce the complexity of a query. There are many times when long or complicated queries become difficult to manage or develop on because of too many nested queries. By turning the nested queries into views, end users can encapsulate the complexity of a query and create re-usable parts. This concept is similar to functions in any programming language or a layered design concept in software.
- Views are also widely used to filter or restrict data from a table based on the value of one or more columns. The use case of restricted data is when we don’t want the end-user to see all the information in the base table. So, we restrict the data by using the “where” clause on a table and store it as a view. Changes done in the underlying table would not be reflected in the view, however, the underlying table must be present otherwise the view will fail.
- Along with the primitive data types, the Hive also supports data types like maps, arrays, and struct. This gives Hive an ability to consider a field as a map, rather than fixed columns. So, by using it, a base table can be divided into multiple logical constructs or tables.
- Conceptually, it is evident that the Hive first executes the views and then uses its results to evaluate or execute the query. This can be better explained with an example that is, if the view and the query both have a LIMIT clause, where view says LIMIT 200 and query says LIMIT 400, then we will get at most 200 results.
- It also facilitates creating partitioned views even if the underlying table is not partitioned.ALTER VIEW students ADD/ DROP PARTITION [ if not exists] partition_spec is one way. Whereas, for creating a partitioned view, the command used is CREATE VIEW…PARTITIONED ON, while for creating a partitioned table, the command is CREATE TABLE…PARTITION BY. Metastore does not store the partition location or partition column storage descriptors as no data is stored for a hive view partition.
Examples for Creating Views in Hive
Let us take an example of creating a view that brings in the college students’ details attending the “English” class. So, first, we will create a students table as below:
1. Creating Table Students
Select * from students
Select * from mark_sheet
2. Creating view English_class
CREATE VIEW IF NOT EXISTS English_class AS
SELECT * FROM students AS a
INNER JOIN mark_sheet AS b
ON a.roll_id = b.roll_id
WHERE upper(b.subject) = “ENGLISH”;
Select * from English_class;
This view can then be used to solve the more complex problem of returning the university rank of students who have attended the “English” class.
SELECT * FROM university_rnklist AS rank
JOIN English_class AS b
ON rank.roll_id = b.roll_id
WHERE rank <=100;
The clauses which can be used on VIEWS in the same way as they are used on TABLES are
- IF NOT EXISTS and COMMENT clause are used in the same way as in tables. The name of a view must be unique, it cannot be the same as any table or database or view’s name.
- CREATE TABLE…LIKE clause can be used to copy a view into another.
CREATE TABLE English_class2 LIKE English_class
- SHOW TABLES is used to show both tables and views. There is nothing like SHOW VIEWS in Hive.
- DESCRIBE and DESCRIBE EXTENDED statements can be used for views in a similar way as for tables, however, for DESCRIBE EXTENDED, the detailed table information has a variable named typeable which has value = ‘virtual view’ for views.
- EXTERNAL and LOCATION clause also works for views.
- WHERE, ORDER BY, SORT BY and LIMIT clause can be used on views.
- ALTER VIEW works on views to alter the metadata of a view. Being read-only, INSERT INTO or LOAD INTO will not work on views.
3. Deleting a View
In order to delete a view created by you, use the below command.
DROP VIEW IF EXISTS English_class;
DROP TABLE command cannot be used to drop a view. IF EXISTS clause works in a similar fashion as for tables.
Advantages of Views
Given below are the advantages expressed.
- It helps in reducing the complexity of a query. For example, if we need only 5 columns from a table of 50 columns, we can create a view.
- It also helps in creating staging or intermediate tables which can be used to further create queries.
- Views are read-only and thus commands like INSERT or LOAD INTO cannot be used for a view, thus it helps in maintaining the integrity of base tables.
The view is a standard Relational database management concept, which is similar to that in SQL. It does not support materialized views, but this view has an advantage like other features of Apache Hive. The major difference between a view and a table is that view does not store data, it is just a logical construct. With the advantage of reducing the complexity of nested queries in Hive, views are widely used.
This is a guide to Views in Hive. Here we discuss the basic concept and why do we need it along with advantages and examples for creating view in hive. You may also have a look at the following articles to learn more –