Updated March 21, 2023
Introduction to Hive Built-in Functions
Functions are built to perform different analytical requirements and operations like mathematical, logical, arithmetic and relational, on huge datasets and tables. Functions are used when we have to re-use similar operations multiple times. In this advanced BigData world, to handle and process huge datasets, writing MapReduce programs becomes difficult. So, Hive Query Language (HQL) was introduced on top of Apache Hadoop in October 2010, using which we can easily run queries, similar to SQLs. HQLs are further internally converted into Mapreduce programs to yield the outputs. In this topic, we are going to discuss Hive Built-in Functions.
Hive functions are broadly categorized into two :
- Built-in Functions
- User-Defined Functions
Built-In Functions of Hive
Built-in Functions are predefined and readily available to use in Hive. They are categorized as below:
1. String Functions
Used for string manipulations and transformation.
Function Name | Return Type | Description | Example |
substr(string, int<pos>, int<len>) | String | It returns a string from the specified start position to specified length | substr(‘Hive query ‘,5,5) results in ‘query’ |
rtrim(string X) | String | It returns a string without right leading spaces | rtrim(‘Hello ‘) results in ‘Hello’ |
ltrim(string X) | String | It returns a string without left leading spaces | ltrim(‘ Hello’) results in ‘Hello’ |
reverse(string X) | String | It returns a reversed string of X | reverse(‘hello’) results in ‘olleh’ |
rpad(string X, int <len>, string <‘pad’>) | String | It returns a right padded string with ‘pad’ to a length of <len> | rpad(‘Hello ‘,3,’hi ‘) results in ‘Hello hi hi hi ‘ |
lpad(string X, int <len>, string <‘pad’>) | String | It returns a left padded string with ‘pad’ to a length of <len> | lpad(‘Hello’,3,’ Hi ‘) results in ‘ Hi Hi Hi Hello’ |
split(string X,string pat) | Array | It returns an array of strings after splitting the string around the specified pattern | split(“A,B,C,D”,”,”) results in [‘A’,B’,’C’,’D’] |
length(string X) | Integer | It returns length of string X | length(‘Bigdata’) results in 7 |
concat(string X,String Y) | String | It returns a concatenated string of both X and Y | concat(‘Hello’,’World’) results in ‘HelloWorld’ |
2. Mathematical Functions
Used for applying mathematical operations like rounding, ceiling, flooring, etc. instead of creating UDFs.
Function Name | Return Type | Description | Example |
round(double X) | Double | It returns the rounded off value of X | round(29.5) results in 30 |
round(double X,Int d) | Double | It returns the rounded off value of X to d decimal places | round(29.3456,2) results in 29.34 |
ceil(double X) | Double | It returns a minimum BIGINT value that is equal to or greater than X | ceil(20.5555) results in 21 |
floor(double X) | Double | It returns a maximum BIGINT value that is equal to or greater than X | ceil(20.5555) results in 20 |
rand() | Double | It returns random numbers between 0 to 9 | rand() results in rand(0-9) |
abs(double X) | Double | It returns the absolute value of the number X | abs(-25) results in 25 |
pow(double X,double Y) | Double | It returns X value raised to the Y power | pow(2,3) results in 8 |
exp(double X) | Double | It returns the exponent value of X | exp(2)results in 7.389 |
3. Conditional Functions
Used to check an expression for True or False and returns corresponding results.
Function Name | Return Type | Description | Example |
isnull(X) | Boolean | It returns TRUE if X is NULL else false | isnull(‘NULL’) returns in TRUE |
isnotnull(X) | Boolean | It returns TRUE if X is not NULL else false | isnotnull(‘NULL’) returns in FALSE |
nvl(arg X,arg Y) | String | It returns arg Y if arg X is NULL else return arg X | nvl(‘NULL’,’Value is Null’) results in ‘Value is Null’ |
if(Boolean testCondition, T valueTrue, T valueFalseOrNull) | T | It returns True when test condition is true and False or Null otherwise | if (2=2,’True’,’False’) |
coalesce(X,Y) | (Any) | It returns the first non-null values from the list{I have a doubt on this—will it return the very first value or all non null values?} | coalesce(null,null,null,1,5,null,null,6) results in 1 |
4. Date Functions
Used for performing date manipulations and date type conversions.
Function Name | Return Type | Description | Example |
current_date | Date | It returns current date of query execution | current_date() returns today’s date |
unix_timestamp() | BigInt | It returns current unix timestamp of query execution in seconds | unix_timestamp() returns current unix timestamp |
year(string date) | Int | It returns year part of a date | year(‘1994-11-24’) returns 1994 |
quarter(string date) | Int | It returns the quarter of the year of a date | quarter(‘1994-11-24’) returns 4 |
month(string date) | Int | It returns the month part of a date | month(‘1994-11-24’) returns 11 |
hour(string date) | Int | It returns the hour part of the timestamp | hour(‘1994-11-24 12:45:23’) returns 12 |
minute(string date) | Int | It returns the minute part of the timestamp | year(‘1994-11-24 12:45:23’) returns 45 |
to_date() | String | It returns date part of timestamp string | to_date() returns date part |
date_sub(string date, int days) | String | It returns subtraction of number of days to the date | date_sub(‘1994-11-24’,20) returns ‘1994-11-04’ |
months_between(date1,date2) | Double | It returns the number of months between date1 and date2{months or no of days?} | months_between(‘1994-11-24′,’1994-11-04’) returns 20 |
5. Collection Function
Used for transforming and retrieving parts of collection types like map, array, etc.
Function Name | Return Type | Description | Example |
size(map<k,v>) | Int | It returns the total number of elements in the map | size([‘a’:2,’b’:6]) returns 2 |
size(array<A>) | Int | It returns the total number of elements in the array | size([1,3,4,7]) returns 4 |
array_contains(array<A>,value) | Boolean | It returns true if the array contains the value | array_contains([1,3,9,8,5,4,7],5) returns TRUE |
map_keys(map<k,v>) | Array<k> | It returns an unordered array of keys of the map | map_keys([‘a’:2,’b’:6,’c’:1]) returns [‘b’,’c’,’a’] |
map_valuess(map<k,v>) | Array<v> | It returns an unordered array of values of the map | map_keys([‘a’:2,’b’:6,’c’:1]) returns [1,6,2] |
sort_array(array<A>) | Array<A> | It returns a sorted array of the given array | sort_array([1,3,9,8,5,4,7]) returns [1,3,4,5,7,8,9] |
Conclusion
So far we have discussed different built-in functions in Hive. Compared to MapReduce, Hive is more convenient and time-saving. Anyone who has basic SQL knowledge can easily write HQLs, rather than complicated MapReduce programs for data processing.
Recommended Articles
This is a guide to Hive Built-in Functions. Here we discuss the Built-in Functions which are predefined and readily available to use in Hive. You may also look at the following article to learn more –