Read Dump & Store Operator

Pig Read | Pig Dump | Pig Store Operators

In this blog, you will find information regarding Pig Read, Pig Dump, and Pig Store operators.

Pig Read Operator

As pig is based on top of hdfs, we can use hdfs to read large files.  Follow below command to read a file from pig.

 
ubuntu@ip-172-31-80-83:~$ hadoop fs -ls /pig-data

Output:

Found 6 items
-rw-r--r--   1 ubuntu supergroup        279 2019-08-28 07:50 /pig-data/Student_data1.txt
-rw-r--r--   1 ubuntu supergroup         78 2019-08-28 07:51 /pig-data/Student_data2.txt
-rw-r--r--   1 ubuntu supergroup        182 2019-08-28 07:21 /pig-data/customers.txt
-rw-r--r--   1 ubuntu supergroup        279 2019-08-28 05:27 /pig-data/data-file.txt
-rw-r--r--   1 ubuntu supergroup        135 2019-08-28 06:05 /pig-data/data-file1.txt
-rw-r--r--   1 ubuntu supergroup        124 2019-08-28 07:22 /pig-data/orders.txt
 
ubuntu@ip-172-31-80-83:~$ hadoop fs -cat /pig-data/data-file.txt

Output:


100,Rick,Netherlands,rickgr@gmail.com,BMW,25
101,Jason,Aus,json.j@gmail.com,Mini,37
102,Maggie,Usa,mg@hotmail.com,mazda,55
104,Eugine,Denmark,eu@gmail.com,honda,29
105,Jacob,Usa,jacob@hotmail.com,volvo,19
110,,Aus,john@gmail.com,jaguar,34
112,Negan,Ind,Negan@gmail.com,Audi,28

Pig Dump Operator

The ‘dump‘ operator is used to display the results on the screen. The pig dump operator will run all the pig statements and finally execute the dump statements. It is generally used for debugging purposes.

 
record = LOAD 'hdfs://localhost:9000/pig-data/data-file.txt' USING PigStorage(',') as (id:int, name:chararray, country:chararray, email:chararray, carbrand:chararray, age:int);
dump record;

Output:


(100,Rick,Netherlands,rickgr@gmail.com,BMW,25)
(101,Jason,Aus,json.j@gmail.com,Mini,37)
(102,Maggie,Usa,mg@hotmail.com,mazda,55)
(104,Eugine,Denmark,eu@gmail.com,honda,29)
(105,Jacob,Usa,jacob@hotmail.com,volvo,19)
(110,,Aus,john@gmail.com,jaguar,34)
(112,Negan,Ind,Negan@gmail.com,Audi,28)

Pig Store Operator

The ‘store‘ operator is used to writing the output of the pig to a file. The below example describes the usage of the ‘store’ operator.

 
STORE record INTO 'hdfs://localhost:9000/pig_Output/' USING PigStorage (',');

Output:


HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
1.2.1	0.16.0	ubuntu	2019-08-29 08:40:04	2019-08-29 08:40:15	UNKNOWN

Success!

Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTime	AvgMapTime	MedianMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	MedianReducetime	Alias	Feature	Outputs
job_201908280520_0012	1	0	3	3	3	3	0	0	0	0	record	MAP_ONLY	hdfs://localhost:9000/pig_Output,

Input(s):
Successfully read 7 records (650 bytes) from: "hdfs://localhost:9000/pig-data/data-file.txt"

Output(s):
Successfully stored 7 records (279 bytes) in: "hdfs://localhost:9000/pig_Output"

Counters:
Total records written : 7
Total bytes written : 279
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201908280520_0012

2019-08-29 08:40:15,892 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 

STORE‘ command will create a directory and output will be stored in part files as shown below.

 
ubuntu@ip-172-31-80-83:~$ hadoop fs -ls /pig_Output

Output:


Found 3 items
-rw-r--r-- 1 ubuntu supergroup 0 2019-08-29 08:40 /pig_Output/_SUCCESS
drwxr-xr-x - ubuntu supergroup 0 2019-08-29 08:40 /pig_Output/_logs
-rw-r--r-- 1 ubuntu supergroup 279 2019-08-29 08:40 /pig_Output/part-m-00000

Conclusion:

In this blog, you learned about Pig Read, Pig Dump and Pig Store Operators. Follow this link for more details regarding Pig.

Leave a Reply