Apache Pig Introduction

Writing the MapReduce program requires a good amount of programming knowledge and users spend most of the time writing and debugging code even to do the small activity. To avoid such problems Pig tool is introduced and originally developed at ‘Yahoo’ and now it is managed by Apache Software Foundation.

‘Pig’ is a tool that allows users to run a MapReduce program without writing a MapReduce program. It follows its own language called ‘Pig Latin’. This tool converts the Pig Latin script to the MapReduce program.

Why Pig:

  1. Pig requires fewer lines of code.
  2. Pig can test your data set quickly.
  3. No java experience is required.
  4. Data set can be tested from the local system.

Pig vs Hive:

Pig and Hive are designed to serve similar purposes to reduce writing complex java codes and to build efficient code. Both tools are useful in different ways and in different problem statements.

Below are a few differences between the Hive and Pig.

Hive Pig
Hive query language is based on SQL and query language is called HiveQL. Pig language is called Pig Latin.
HiveQL is a declarative language. Pig Latin is a procedural language. Lazy evaluation.
HiveQL is schema bound. Pig Latin is not schema bound.
Hive is used for structured data. Pig can be used for both structure and semi-structured data.

Pig Setup / Pig Installation

Now Let’s discuss the steps for Pig Installation or Pig Setup.


Download the latest version of pig from here or download using ‘wget‘ like below.

wget https://archive.apache.org/dist/pig/pig-0.16.0/pig-0.16.0.tar.gz 


Untar the zip and move to an installation directory

tar -zxvf pig-0.16.0.tar.gz
mv pig-0.16.0 /usr/local/pig 


Update ‘.bashrc’ file to and update PATH variable.

vi .bashrc
export PIG_HOME=/usr/local/pig/
export PATH=$PATH:$PIG_HOME/bin


Refresh the ‘.bashrc’ file.

source ~/.bash_profile

To verify Pig installation or pig Setup, type pig and hit enter in the command prompt. This will navigate to Pig Grunt shell-like below, if installation steps are followed correctly.

ubuntu@ip-172-31-80-83:~$ pig
19/08/28 05:39:11 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
19/08/28 05:39:11 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
19/08/28 05:39:11 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2019-08-28 05:39:11,888 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0 (r1746530) compiled Jun 01 2016, 23:09:59
2019-08-28 05:39:11,888 [main] INFO org.apache.pig.Main - Logging error messages to: /home/ubuntu/pig_1566970751885.log
2019-08-28 05:39:11,916 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/ubuntu/.pigbootup not found
2019-08-28 05:39:12,525 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2019-08-28 05:39:12,847 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: hdfs://localhost:9001
2019-08-28 05:39:12,905 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: 
2019-08-28 05:39:12,906 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false

In this blog, you learned about the Pig tool, its usages, and Pig Setup. Stay tuned to the next Pig tutorial blogs to learn more about Pig.

Leave a Reply