Read Dump & Store Operator
Table of Contents
Pig Read | Pig Dump | Pig Store Operators
In this blog, you will find information regarding Pig Read, Pig Dump, and Pig Store operators.
Pig Read Operator
As pig is based on top of hdfs, we can use hdfs to read large files. Follow below command to read a file from pig.
ubuntu@ip-172-31-80-83:~$ hadoop fs -ls /pig-data
Output:
Found 6 items
-rw-r--r-- 1 ubuntu supergroup 279 2019-08-28 07:50 /pig-data/Student_data1.txt
-rw-r--r-- 1 ubuntu supergroup 78 2019-08-28 07:51 /pig-data/Student_data2.txt
-rw-r--r-- 1 ubuntu supergroup 182 2019-08-28 07:21 /pig-data/customers.txt
-rw-r--r-- 1 ubuntu supergroup 279 2019-08-28 05:27 /pig-data/data-file.txt
-rw-r--r-- 1 ubuntu supergroup 135 2019-08-28 06:05 /pig-data/data-file1.txt
-rw-r--r-- 1 ubuntu supergroup 124 2019-08-28 07:22 /pig-data/orders.txt
ubuntu@ip-172-31-80-83:~$ hadoop fs -cat /pig-data/data-file.txt
Output:
100,Rick,Netherlands,rickgr@gmail.com,BMW,25
101,Jason,Aus,json.j@gmail.com,Mini,37
102,Maggie,Usa,mg@hotmail.com,mazda,55
104,Eugine,Denmark,eu@gmail.com,honda,29
105,Jacob,Usa,jacob@hotmail.com,volvo,19
110,,Aus,john@gmail.com,jaguar,34
112,Negan,Ind,Negan@gmail.com,Audi,28
Pig Dump Operator
The ‘dump‘ operator is used to display the results on the screen. The pig dump operator will run all the pig statements and finally execute the dump statements. It is generally used for debugging purposes.
record = LOAD 'hdfs://localhost:9000/pig-data/data-file.txt' USING PigStorage(',') as (id:int, name:chararray, country:chararray, email:chararray, carbrand:chararray, age:int); dump record;
Output:
(100,Rick,Netherlands,rickgr@gmail.com,BMW,25)
(101,Jason,Aus,json.j@gmail.com,Mini,37)
(102,Maggie,Usa,mg@hotmail.com,mazda,55)
(104,Eugine,Denmark,eu@gmail.com,honda,29)
(105,Jacob,Usa,jacob@hotmail.com,volvo,19)
(110,,Aus,john@gmail.com,jaguar,34)
(112,Negan,Ind,Negan@gmail.com,Audi,28)
Pig Store Operator
The ‘store‘ operator is used to writing the output of the pig to a file. The below example describes the usage of the ‘store’ operator.
STORE record INTO 'hdfs://localhost:9000/pig_Output/' USING PigStorage (',');
Output:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.2.1 0.16.0 ubuntu 2019-08-29 08:40:04 2019-08-29 08:40:15 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_201908280520_0012 1 0 3 3 3 3 0 0 0 0 record MAP_ONLY hdfs://localhost:9000/pig_Output,
Input(s):
Successfully read 7 records (650 bytes) from: "hdfs://localhost:9000/pig-data/data-file.txt"
Output(s):
Successfully stored 7 records (279 bytes) in: "hdfs://localhost:9000/pig_Output"
Counters:
Total records written : 7
Total bytes written : 279
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201908280520_0012
2019-08-29 08:40:15,892 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
‘STORE‘ command will create a directory and output will be stored in part files as shown below.
ubuntu@ip-172-31-80-83:~$ hadoop fs -ls /pig_Output
Output:
Found 3 items
-rw-r--r-- 1 ubuntu supergroup 0 2019-08-29 08:40 /pig_Output/_SUCCESS
drwxr-xr-x - ubuntu supergroup 0 2019-08-29 08:40 /pig_Output/_logs
-rw-r--r-- 1 ubuntu supergroup 279 2019-08-29 08:40 /pig_Output/part-m-00000
Conclusion:
In this blog, you learned about Pig Read, Pig Dump and Pig Store Operators. Follow this link for more details regarding Pig.