Diagnostic Operators

Apache Pig Diagnostic Operators

Apache Pig Diagnostic operators are used to verifying the statements of Pig Latin. There are four types of diagnostic operators.

  1. Dump operator
  2. Describe operator
  3. Explanation operator
  4. Illustration operator

Dump Operator:

The ‘dump’ operator is used to display the results on the screen. The dump operator will run all the pig statements and finally execute the dump statement. It is generally used for debugging purposes.

 
record = LOAD 'hdfs://localhost:9000/pig-data/data-file.txt' USING PigStorage(',') as (id:int, name:chararray, country:chararray, email:chararray, carbrand:chararray, age:int);
dump record;

Output:

(100,Rick,Netherlands,rickgr@gmail.com,BMW,25)
(101,Jason,Aus,json.j@gmail.com,Mini,37)
(102,Maggie,Usa,mg@hotmail.com,mazda,55)
(104,Eugine,Denmark,eu@gmail.com,honda,29)
(105,Jacob,Usa,jacob@hotmail.com,volvo,19)
(110,,Aus,john@gmail.com,jaguar,34)
(112,Negan,Ind,Negan@gmail.com,Audi,28) 

Describe Operator:

‘describe’ operator used to verify the schema of a relation.

 
record = LOAD 'hdfs://localhost:9000/pig-data/data-file.txt' USING PigStorage(',') as (id:int, name:chararray, country:chararray, email:chararray, carbrand:chararray, age:int);
describe record;

Output:
record: {id: int,name: chararray,country: chararray,email: chararray,carbrand: chararray,age: int}

Explain operator:

The ‘explain’ operator is used to verify the logical plan, Physical plan, and MapReduce Plan.

 
explain record; 

Output:

2019-08-31 10:05:57,181 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2019-08-31 10:05:57,182 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
record: (Name: LOStore Schema: id#25:int,name#26:chararray,country#27:chararray,email#28:chararray,carbrand#29:chararray,age#30:int)
|
|---record: (Name: LOForEach Schema: id#25:int,name#26:chararray,country#27:chararray,email#28:chararray,carbrand#29:chararray,age#30:int)
    |   |
    |   (Name: LOGenerate[false,false,false,false,false,false] Schema: id#25:int,name#26:chararray,country#27:chararray,email#28:chararray,carbrand#29:chararray,age#30:int)ColumnPrune:InputUids=[25, 27, 26, 29, 28, 30]ColumnPrune:OutputUids=[25, 27, 26, 29, 28, 30]
    |   |   |
    |   |   (Name: Cast Type: int Uid: 25)
    |   |   |
    |   |   |---id:(Name: Project Type: bytearray Uid: 25 Input: 0 Column: (*))
    |   |   |
    |   |   (Name: Cast Type: chararray Uid: 26)
    |   |   |
    |   |   |---name:(Name: Project Type: bytearray Uid: 26 Input: 1 Column: (*))
    |   |   |
    |   |   (Name: Cast Type: chararray Uid: 27)
    |   |   |
    |   |   |---country:(Name: Project Type: bytearray Uid: 27 Input: 2 Column: (*))
    |   |   |
    |   |   (Name: Cast Type: chararray Uid: 28)
    |   |   |
    |   |   |---email:(Name: Project Type: bytearray Uid: 28 Input: 3 Column: (*))
    |   |   |
    |   |   (Name: Cast Type: chararray Uid: 29)
    |   |   |
    |   |   |---carbrand:(Name: Project Type: bytearray Uid: 29 Input: 4 Column: (*))
    |   |   |
    |   |   (Name: Cast Type: int Uid: 30)
    |   |   |
    |   |   |---age:(Name: Project Type: bytearray Uid: 30 Input: 5 Column: (*))
    |   |
    |   |---(Name: LOInnerLoad[0] Schema: id#25:bytearray)
    |   |
    |   |---(Name: LOInnerLoad[1] Schema: name#26:bytearray)
    |   |
    |   |---(Name: LOInnerLoad[2] Schema: country#27:bytearray)
    |   |
    |   |---(Name: LOInnerLoad[3] Schema: email#28:bytearray)
    |   |
    |   |---(Name: LOInnerLoad[4] Schema: carbrand#29:bytearray)
    |   |
    |   |---(Name: LOInnerLoad[5] Schema: age#30:bytearray)
    |
    |---record: (Name: LOLoad Schema: id#25:bytearray,name#26:bytearray,country#27:bytearray,email#28:bytearray,carbrand#29:bytearray,age#30:bytearray)RequiredFields:null
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
record: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-42
|
|---record: New For Each(false,false,false,false,false,false)[bag] - scope-41
    |   |
    |   Cast[int] - scope-24
    |   |
    |   |---Project[bytearray][0] - scope-23
    |   |
    |   Cast[chararray] - scope-27
    |   |
    |   |---Project[bytearray][1] - scope-26
    |   |
    |   Cast[chararray] - scope-30
    |   |
    |   |---Project[bytearray][2] - scope-29
    |   |
    |   Cast[chararray] - scope-33
    |   |
    |   |---Project[bytearray][3] - scope-32
    |   |
    |   Cast[chararray] - scope-36
    |   |
    |   |---Project[bytearray][4] - scope-35
    |   |
    |   Cast[int] - scope-39
    |   |
    |   |---Project[bytearray][5] - scope-38
    |
    |---record: Load(hdfs://localhost:9000/pig-data/new-data-set:PigStorage(',')) - scope-22

2019-08-31 10:05:57,193 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2019-08-31 10:05:57,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2019-08-31 10:05:57,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-43
Map Plan
record: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-42
|
|---record: New For Each(false,false,false,false,false,false)[bag] - scope-41
    |   |
    |   Cast[int] - scope-24
    |   |
    |   |---Project[bytearray][0] - scope-23
    |   |
    |   Cast[chararray] - scope-27
    |   |
    |   |---Project[bytearray][1] - scope-26
    |   |
    |   Cast[chararray] - scope-30
    |   |
    |   |---Project[bytearray][2] - scope-29
    |   |
    |   Cast[chararray] - scope-33
    |   |
    |   |---Project[bytearray][3] - scope-32
    |   |
    |   Cast[chararray] - scope-36
    |   |
    |   |---Project[bytearray][4] - scope-35
    |   |
    |   Cast[int] - scope-39
    |   |
    |   |---Project[bytearray][5] - scope-38
    |
    |---record: Load(hdfs://localhost:9000/pig-data/new-data-set:PigStorage(',')) - scope-22--------
Global sort: false
----------------

Illustrate operator:

The ‘illustrate’ operator is used to reviewing how data is transformed. This is useful when a huge dataset is being transformed and verify the transformed data before the data store to HDFS.

 
record = LOAD 'hdfs://localhost:9000/pig-data/data-file.txt' USING PigStorage(',') as (id:int,name:chararray, country:chararray,email:chararray,carbrand:chararray,age:int); 
group_data = GROUP record by age;

illustrate group_data;

Output:

-------------------------------------------------------------------------------------------------------------------------------------
| record     | id:int     | name:chararray     | country:chararray     | email:chararray     | carbrand:chararray     | age:int     | 
-------------------------------------------------------------------------------------------------------------------------------------
|            | 112        | Negan              | Ind                   | Negan@gmail.com     | Audi                 | 28          | 
|            | 112        | Negan              | Ind                   | Negan@gmail.com     | Audi                 | 28          | 
-------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| group_data     | group:int     | record:bag{:tuple(id:int,name:chararray,country:chararray,email:chararray,carbrand:chararray,age:int)}                                 | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                | 28            | {(112, ..., 28), (112, ..., 28)}                                                                                                       | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In the next session, we will discuss Eval functions.

Leave a Reply