EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Pig Data Types

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Data Analytics Basics » Pig Data Types

Pig Data Types

Introduction to Pig Data Types

Apache Hadoop is a file system it stores data but to perform data processing we need SQL like language which can manipulate data or perform complex data transformation as per our requirement this manipulation of data can be achieved by Apache PIG. It is a high-level scripting language like SQL used with Hadoop and is called as Pig Latin. Pig Data Types works with structured or unstructured data and it is translated into number of MapReduce job run on Hadoop cluster.

To understand Operators in Pig Latin we must understand Pig Data Types. Any data loaded in pig has certain structure and schema using structure of the processed data pig data types makes data model. Data model get defined when data is loaded and to understand structure data goes through a mapping. pig can handle any data due to SQL like structure it works well with Single value structure and nested hierarchical datastructure.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Pig’s Data Types

Since, pig Latin works well with single or nested data structure. Its data type can be broken into two categories:

Scalar/Primitive Types: Contain single value and simple data types.

ComplexTypes: Contains otherNested/Hierarchical data types.

Scalar Data Types

Pig’s scalar data types are also called as primitive datatypes, this is a simple data types that appears in programming languages.

Simple Type Description and Size Example
Int 32 -bit signed integer 26
Long 64 -bit signed integer 26L or 26l
Float 32 -bit floating point 26.5F or 26.5f or 26.5e2f or 26.5E2F
Double 64 -bit floating point 26.5 or 26.5e2
Chararray Character array (string) in Unicode UTF-8 format Welcome To educba
Bytearray Blob (Byte array)
Boolean Boolean True/False
Datetime Date-time 1970-01-01T00:00:00.000+00:00
Biginteger Java BigIneger 67499292089
bigdecimal Java BIgDecimal 150.76376256272893883

All datatypes are represented in java.lang classes except byte arrays. Default datatype is byte array in pig if type is not assigned. If schema is given in load statement, load function will apply schema and if data and datatype is different than loader will load Null values or generate error. Explicit casting is not supported like cast chararray to float. Pig does not support list or set type to store an items.

Complex Data Types

Complex datatypes are also termed as collection datatype. There are 3 complex datatypes:

Popular Course in this category
Data Scientist Training (76 Courses, 60+ Projects)76 Online Courses | 60 Hands-on Projects | 632+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.8 (9,019 ratings)
Course Price

View Course

Related Courses
Machine Learning Training (17 Courses, 27+ Projects)Cloud Computing Training (18 Courses, 5+ Projects)
  • MAP
  • TUPLES
  • BAGS
1. MAP

Map is set of key-value pair data element mapping. Data in key-value pair can be of any type, including complex type.

Key: Index to find an element, key should be unique and must be an chararray.
Value: Any type of data can be stored in value and each key has certain dataassociated with it.Map are formed using bracket and a hash between key and values.Commas to separate more than one key-value pair.

Code:

['keyname'#'valuename']

Code:

['resource'#'EDUCBA', 'year'#2019]

Explanation: Above example creates a Map withKeys as : ‘resource’ and ‘year’ andValue as :EDUCBA and 2019

2. TUPLE

Tuple is an fixed length, ordered collection of fields formed by grouping scalar datatypes. It is similar to ROW in SQL table with field representing sql columns.
fields need not to be of same datatypes and we can refer to the field by its position as it is ordered.Tuple may or may not have schema provided with it for representing each fields type and name.

Fields: Can be of any type, field is just single/piece of data. Tuple is enclosed in parenthesis

Code:

(field[,fields....])

Code:

(EduCba,2019,kabir,1,2)

3. BAGS

A bag is an unordered collection of non-unique tuples. Bag may or may not have schema associated with it and schema is flexible as each tuple can have a number of fields with any type.Bag is used to store collection when grouping and bag do not need to fit into memory it can spill bags to disks if needed.

Bag is constructed using braces and tuples are separated by commas.

Code:

{tuple,[,tuple...]}

Code:

{('Hadoop',2.7),('Hive','1.13'),('Spark',2.0)}

Null Values: A null value is a non-existent or unknown value and any type of data can null. Pig treats null value the same as SQL. Pig gets Null values if data is missing or error occurred during the processing of data. Also, null can be used as a placeholder for optional values.

Examples to Implement Pig Data Types

Below are the examples:

Example #1

Sample Data:

Pig Data Types1

Load Data:

DATA = LOAD ‘/user/educba/data’ AS (M:map []);
DESCRIBE DATA;

Pig Data Types2

Viewing The Data:

DUMP DATA;

Pig Data Types3

Example #2

Sample Data:

Tuple Example

Load Data:

DATA= LOAD ‘/user/educba/data_tuple’ AS((F:tuple(f1:int,f2:int,f3:int),T:tuple(t1:chararray,t2:int));
DESCRIBE DATA;

Pig Data Types5

Viewing The Data:

DUMP DATA;

Pig Data Types6

Example #3

Sample Data:

Bag Example1

Load Data:

DATA_BAG= LOAD ‘/user/educba/data_bag’ AS (B: bag {T: tuple(t1:int, t2:int, t3:int)});
DESCRIBE DATA_BAG;

Bag Example2

Viewing The Data:

DUMP DATA_BAG;

Bag Example3

Conclusion

Apache pig is a part of the Hadoop ecosystem which supports SQL like structure and also It supports data types used in SQL which are represented in java.lang classes. Because of complex data types pig is used for tasks involving structured and unstructured data processing. Yahoo uses around 40% of their jobs for search as Pig extract the data, perform operations, and dumps data in the HDFS file system.

Recommended Articles

This is a guide to Pig Data Types. Here we discuss the introduction to Pig Data Types along with complex data types and examples for better understanding. You can also go through our other related articles to learn more –

  1. Pig Architecture
  2. Big Data Architecture
  3. Pig Commands
  4. Apache PIG Interview Questions

All in One Data Science Bundle (360+ Courses, 50+ projects)

360+ Online Courses

50+ projects

1500+ Hours

Verifiable Certificates

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Data Analytics Basics
  • Basics
    • What is Natural Language Processing
    • What Is Apache
    • What is Business Intelligence
    • Predictive Modeling
    • What is NoSQL Database
    • Types of NoSQL Databases
    • What is Cluster Computing
    • Uses of Salesforce
    • The Beginners Guide to Startup Analytics
    • Analytics Software is Hiding From You
    • Real Time Analytics
    • Lean Analytics
    • Important Elements of Mudbox Software
    • Business Intelligence Tools (Benefits)
    • Mechatronics Projects
    • Know about A Business Analyst
    • Flexbox Essentials For Beginners
    • Predictive Analytics Tool
    • Data Modeling Tools (Free)
    • Modern Data Integration
    • Crowd Sourcing Data
    • Build a Data Supply Chain
    • What is Minitab
    • Sqoop Commands
    • Pig Commands
    • What is Apache Flink
    • What is Predictive Analytics
    • What is Business Analytics
    • What is Pig
    • What is Fuzzy Logic
    • What is Apache Tomcat
    • Talend Data Integration
    • Talend Open Studio
    • How MapReduce Works
    • Types of Data Model
    • Test Data Generation
    • Apache Flume
    • NoSQL Data Models
    • Advantages of NoSQL
    • What is Juypter Notebook
    • What is CentOS
    • What is MuleSoft
    • MapReduce Algorithms
    • What is Dropbox
    • Pandas.Dropna()
    • Salesforce IoT Cloud
    • Talend Tools
    • Data Integration Tool
    • Career in Business Analytics
    • Marketing Analytics For Dummies
    • Risk Analytics Helps in Risk management
    • Salesforce Certification
    • Tips to Become Certified Salesforce Admin
    • Customer Analytics Techniques
    • What is Data Engineering?
    • Business Analysis Tools
    • Business Analytics Techniques
    • Smart City Application
    • COBOL Data Types
    • Business Intelligence Dashboard
    • What is MDM?
    • What is Logstash?
    • CAP Theorem
    • Pig Architecture
    • Pig Data Types
    • KMP Algorithm
    • What is Metadata?
    • Data Modelling Tools
    • Sqoop Import
    • Apache Solr
    • What is Impala?
    • Impala Database
    • What is Digital Image?
    • What is Kibana?
    • Kibana Visualization
    • Kibana Logstash
    • Kibana_query
    • Kibana Reporting
    • Kibana Alert
    • Longitudinal Data Analysis
    • Metadata Management Tools
    • Time Series Analysis
    • Types of Arduino
    • Arduino Shields
    • What is Arduino UNO?
    • Arduino Sensors
    • Arduino Boards
    • Arduino Application
    • 8085 Architecture
    • Dynatrace Competitors
    • Data Migration Tools
    • Likert Scale Data Analysis
    • Predictive Analytics Techniques
    • Data Governance
    • What is RTK
    • Data Virtualization
    • Knowledge Engineering
    • Data Dictionaries
    • Types of Dimensions
    • What is Google Chrome?
    • Embedded Systems Architecture
    • Data Collection Tools
    • Panel Data Analysis
    • Sqoop Export
    • What is Metabase?

Related Courses

Data Science Certification

Online Machine Learning Training

Cloud Computing Certification

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Data Science Certification Learn More