Data types & Load Operator

As any other language pig provides a required set of data types. The below table describes each of them.

Data Type Description
int Represents 32-bit integer. Ex:2,3,10
long Represents 64-bit integer. Ex:5L,10L
float Represents 32-bit floating. Ex:10.5F
double Represents 64-bit floating. Ex:5.5,10.0
chararray Represents string. Ex: HELLO, Hi
date Represent date string.
bytearray Represents Byte character.
Boolean Represents Boolean value.
Tuple Represent ordered list of fields. Ex: (10,15)
Bag Represents collection of tuples. Ex: {(10,15),(20,30)}
Map Represents a set of key-value pair. Ex: [ name: ”Jack”]

Load Operator:

Let’s use the below data set to understand basic Pig Latin operations.

id name country mailid carbrand age
100 Rick Netherlands BMW 25
101 Jason Aus Mini 37
102 Maggie Usa mazda 55
104 Eugine Denmark honda 29
105 Jacob Usa volvo 19
110 Aus jaguar 34
112 Negan Ind Audi 28

Load data with schema:  

Use below syntax to load the dataset with the schema.

records = LOAD '/emp.txt' USING PigStorage(',') AS (id:int, name:chararray, country:chararray, email:chararray, carbrand:chararray, age:int); 

PigStorage()  is the type of load function which takes field separator. As our dataset is comma(‘,’) separated file, we have used PigStorage(‘,’).

Load data without a schema:

Pig can load data without explicitly specifying a schema. In this case, pig will consider each field as ‘chararray’. The below command is the syntax to load data without schema.

records = LOAD '/emp.txt' USING PigStorage(',') AS (id:int, name, country, email, carbrand, age); 

