Apache Pig Introduction
Writing the MapReduce program requires a good amount of programming knowledge and users spend most of the time writing and debugging code even to do the small activity. To avoid such problems Pig tool is introduced and originally developed at ‘Yahoo’ and now it is managed by Apache Software Foundation.
‘Pig’ is a tool that allows users to run a MapReduce program without writing a MapReduce program. It follows its own language called ‘Pig Latin’. This tool converts the Pig Latin script to the MapReduce program.
Why Pig:
- Pig requires fewer lines of code.
- Pig can test your data set quickly.
- No java experience is required.
- Data set can be tested from the local system.
Pig vs Hive:
Pig and Hive are designed to serve similar purposes to reduce writing complex java codes and to build efficient code. Both tools are useful in different ways and in different problem statements.
Below are a few differences between the Hive and Pig.
Hive | Pig |
Hive query language is based on SQL and query language is called HiveQL. | Pig language is called Pig Latin. |
HiveQL is a declarative language. | Pig Latin is a procedural language. Lazy evaluation. |
HiveQL is schema bound. | Pig Latin is not schema bound. |
Hive is used for structured data. | Pig can be used for both structure and semi-structured data. |
Pig Setup / Pig Installation
Now Let’s discuss the steps for Pig Installation or Pig Setup.
Step-1:
Download the latest version of pig from here or download using ‘wget‘ like below.
wget https://archive.apache.org/dist/pig/pig-0.16.0/pig-0.16.0.tar.gz
Step-2:
Untar the zip and move to an installation directory
tar -zxvf pig-0.16.0.tar.gz mv pig-0.16.0 /usr/local/pig
Step-3:
Update ‘.bashrc’ file to and update PATH variable.
vi .bashrc export PIG_HOME=/usr/local/pig/ export PATH=$PATH:$PIG_HOME/bin
Step-4:
Refresh the ‘.bashrc’ file.
source ~/.bash_profile
Step-5:
To verify Pig installation or pig Setup, type pig and hit enter in the command prompt. This will navigate to Pig Grunt shell-like below, if installation steps are followed correctly.
ubuntu@ip-172-31-80-83:~$ pig 19/08/28 05:39:11 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 19/08/28 05:39:11 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 19/08/28 05:39:11 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType 2019-08-28 05:39:11,888 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0 (r1746530) compiled Jun 01 2016, 23:09:59 2019-08-28 05:39:11,888 [main] INFO org.apache.pig.Main - Logging error messages to: /home/ubuntu/pig_1566970751885.log 2019-08-28 05:39:11,916 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/ubuntu/.pigbootup not found 2019-08-28 05:39:12,525 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2019-08-28 05:39:12,847 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hdfs://localhost:9001 2019-08-28 05:39:12,905 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: 2019-08-28 05:39:12,906 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false >
Conclusion:
In this blog, you learned about the Pig tool, its usages, and Pig Setup. Stay tuned to the next Pig tutorial blogs to learn more about Pig.