Data types & Load Operator

As any other language pig provides a required set of data types. The below table describes each of them.

Data Type Description
int Represents 32-bit integer. Ex:2,3,10
long Represents 64-bit integer. Ex:5L,10L
float Represents 32-bit floating. Ex:10.5F
double Represents 64-bit floating. Ex:5.5,10.0
chararray Represents string. Ex: HELLO, Hi
date Represent date string.
bytearray Represents Byte character.
Boolean Represents Boolean value.
Tuple Represent ordered list of fields. Ex: (10,15)
Bag Represents collection of tuples. Ex: {(10,15),(20,30)}
Map Represents a set of key-value pair. Ex: [ name: ”Jack”]

Load Operator:

Let’s use the below data set to understand basic Pig Latin operations.

id name country mailid carbrand age
100 Rick Netherlands rickgr@gmail.com BMW 25
101 Jason Aus json.j@gmail.com Mini 37
102 Maggie Usa mg@hotmail.com mazda 55
104 Eugine Denmark eu@gmail.com honda 29
105 Jacob Usa jacob@hotmail.com volvo 19
110 Aus john@gmail.com jaguar 34
112 Negan Ind Negan@gmail.com Audi 28

Load data with schema:  

Use below syntax to load the dataset with the schema.

 
records = LOAD '/emp.txt' USING PigStorage(',') AS (id:int, name:chararray, country:chararray, email:chararray, carbrand:chararray, age:int); 

PigStorage()  is the type of load function which takes field separator. As our dataset is comma(‘,’) separated file, we have used PigStorage(‘,’).

Load data without a schema:

Pig can load data without explicitly specifying a schema. In this case, pig will consider each field as ‘chararray’. The below command is the syntax to load data without schema.

 
records = LOAD '/emp.txt' USING PigStorage(',') AS (id:int, name, country, email, carbrand, age); 

Leave a Reply