Pig Latin Operators

Pig Latin provides a number of operators which filter, join, or otherwise organize data.

FOREACH: The FOREACH command operates on each element of a data bag. This is useful, for instance, for processing each input record in a bag returned by a LOAD statement.

FOREACH bagname GENERATE expression, expression...

This statement iterates over the contents of a bag. It applies the expressions on the right of the GENERATE keyword to the data provided by the current record emitted from the bag. The expressions may be, for example, the names of fields. So to extract the names of all users who accessed the site (based on the query_log.txt example shown above), we could write a query like:

FOREACH queries GENERATE userId;

In the FOREACH statement, each element of the bag is considered independently. There are no expressions which reference multiple elements being extracted from the bag's iterator at a time; this allows the statement to be processed in parallel using Hadoop MapReduce.

Expressions emitted by the GENERATE element are not limited to the names of fields; they can be fields (by name like userId or by position like $0), constants, algebraic operations, map lookups, conditional expressions, or FLATTEN expressions, described below.

Finally, these expressions may also call user-provided functions that are written in Java. These user-provided functions have access to the entire current record through a Pig library; in this way, Pig can be used as the heavy-lifting component to automate record-by-record mapping using an application-specific Java function to perform tricky parsing or evaluation logic. Pig also provides several of the most commonly-needed functions, such as COUNT, AVG, MIN, MAX, and SUM.

FLATTEN is an expression which will eliminate a level of nesting. Given a tuple which contains a bag, FLATTEN will emit several tuples each of which contains one record from the bag. For example, if we had a bag of records containing a person's name and a list of types of pets they own:

(Alice, { turtle, goldfish, cat })
(Bob, { dog, cat })

A FLATTEN command would eliminate the inner bags like so:

(Alice, turtle)
(Alice, goldfish)
(Alice, cat)
(Bob, dog)
(Bob, cat)

FILTER statements iterate over a bag and return a new bag containing all elements which pass a conditional expression, e.g.:

adults = FILTER people BY age > 21;

The COGROUP and JOIN operations perform similar functions: they unite related data elements from multiple data sets. The difference is that JOIN acts like the SQL JOIN statement, creating a flat set of output records containing the joined cross-product of the input records. The COGROUP operator, on the other hand, groups the elements by their common field and returns a set of records each containing two separate bags. The first bag is the records of the first data set with the common field, and the second bag is the records of the second data set containing the common field.

To illustrate the difference, suppose we had the flattened data set mapping people to their pets, and another flattened data set mapping people to their friends. We could create a "pets of friends" data set out of these like the following. Here are the input data sets:

pets: (owner, pet)
(Alice, turtle)
(Alice, goldfish)
(Alice, cat)
(Bob, dog)
(Bob, cat)

friends: (friend1, friend2)
(Cindy, Alice)
(Mark, Alice)
(Paul, Bob)

Here is what is returned by COGROUP:

COGROUP pets BY owner, friends BY friend2; returns:

( Alice, {(Alice, turtle), (Alice, goldfish), (Alice, cat)},
{(Cindy, Alice), (Mark, Alice)} )
( Bob, {(Bob, dog), (Bob, cat)}, {(Paul, Bob)} )

Contrasted with the more familiar, non-hierarchical JOIN operator:

JOIN pets BY owner, friends BY friend2; returns:

(Alice, turtle, Cindy)
(Alice, turtle, Mark)
(Alice, goldfish, Cindy)
(Alice, goldfish, Mark)
(Alice, cat, Cindy)
(Alice, cat, Mark)
(Bob, dog, Paul)
(Bob, cat, Paul)

In general, COGROUP command supports grouping on as many data sets as are desired. Three or more data sets can be joined in this fashion. It is also possible to group up elements of only a single data set; this is supported through an alternate keyword, GROUP.

A GROUP ... BY statement will organize a bag of records into bags of related items based on the field identified as their common key field. e.g., the pets bag from the previous example could be grouped up with:

GROUP pets BY owner; returns:

( Alice, {(Alice, turtle), (Alice, goldfish), (Alice, cat)} )
( Bob, {(Bob, dog), (Bob, cat)} )

In this way, GROUP and FLATTEN are effectively inverses of one another.

More complicated statements can be realized as well: operations which expect a data set as input do not need to use an explicitly-named data set; they can use one generated "inline" with another FILTER, GROUP or other statement.

When the final data set has been created by a Pig Latin script, the output can be saved to a file with the STORE command, which follows the form:

STORE data set INTO 'filename' USING function()

The provided function specifies how to serialize the data to the file; if it is omitted, then a default serializer will write plain-text tab-delimited files.

A number of additional operators exist for the purposes of removing duplicate records, sorting records, etc. This paper explains the additional operators and expression syntaxes in greater detail.


  1. Thanks for such an article. You can find word count program in pig script at:

    word count program in pig script

  2. Nice Tutorial. http://pigtutorial.blogspot.in/2014/01/setting-up-eclipse-for-apache-pig-and.html will get you started with pig setup in eclipse

  3. Hadoop is creating more opportunities to every one. And thanks for sharing best information about hadoop in this blog Hadoop Tutorial
    Hadoop Tutorial

  4. Thanks for InformationHadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo. HADOOP Online Training

  5. Thanku soo much for sharing this valuable information.Really hadoop will makes you to pay your way to good growth.Recently I visited www.hadooponlinetutor.com,they are offering the videos at $20 only.

  6. Thank you so much for sharing this worthwhile to spent time on. You are running a really awesome blog. Keep up this good work Big Data Training

  7. Oracle Training in chennai | Oracle D2K Training In chennai
    This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..

  8. Thanks for Information Oracle Apps Technical is a collection of a bunch of collected applications like accounts payables, purchasing, inventory, accounts receivables, human resources, order management, general ledger and fixed assets, etc which have its own functionality for serving the business
    Oracle Apps Training In Chennai

  9. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..
    Selenium Training in Chennai | QTP Training in Chennai

  10. Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training in Chennai | Big Data Training Chennai | Big Data Training

  11. I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

    Software testing training in chennai | Software testing course | Manual testing training in Chennai

  12. There is a huge demand for professional big data analysts who are able to use the software which is used to process the big data in order to get accurate results. MNC's are looking for professionals who can process their data so that they can get into a accurate business decision which would eventually help them to earn more profits, they can serve their customers better, and their risk is lowered.
    big data training in chennai|big data training|big data course in chennai|big data training chennai|big data hadoop training in chennai

  13. Really awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog. Keep update your blog.
    Oracle Training In Chennai

  14. Salesforce.com is an american company which offfers CRM based cloud services and it is loved globally for it quality services
    salesforce training in chennai|salesforce training institute in chennai | salesforce course in chennai

  15. SAS stands for statistical analysis system which is a analysis tool developed by SAS institute and with the help of this tool data driven decisions can be taken which is helpful for the bsuiness.
    SAS training in Chennai | SAS course in Chennai | SAS training institute in Chennai

  16. Thanks a lot for letting me a chance to visit your any pointers. Your article about web design is really impressed me very much.ios applications development

  17. Great Tutorial with important information about Pig! Pig is a high-level platform for creating MapReduce programs used with Hadoop. I am Hadoop Developer. I will share you a link https://goo.gl/rrChA2 just have looks. I hope it will help who are looking for Hadoop.

    Thank you

  18. This comment has been removed by the author.

  19. Amazing content.If you are interested instudying nodejs visit this website. Nodejs is an open source, server side web application that enables you to build fast and scalable web application that is capable of running large number of simultaneous connections that has high throughput.
    Node js Training in Chennai | Node JS training institute in chennai

  20. This is a great inspiring article.I am pretty much pleased with your good work.You put really very helpful information..

    Chennai Bigdata Training

  21. Thanks for sharing the information very useful info about Hadoop and keep updating us, Please........

  22. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

  23. Use schemas to assign types to fields. If you don't assign types, fields default to type byte array and implicit conversions are applied to the data depending on the context in which that data is used.If want to do learning from Selenium automation testing to reach us Besant technologies.They Provide at real-time Selenium Automation Testing.
    Selenium Training in Chennai
    Selenium Training Institute in Chennai

  24. I appreciate your work on Hadoop. It's such a wonderful read on Hadoop tutorial. Keep sharing stuffs like this. I am also educating people on similar Hadoop so if you are interested to know more you can watch this Hadoop tutorial:-https://www.youtube.com/watch?v=1jMR4cHBwZE


  25. Top 10 hot technologies of 2019 to make a good career in the upcoming year: https://www.youtube.com/watch?v=-y5Z2fmnp-o

  26. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 
    Devops training in OMR

    Deops training in annanagar

    Devops training in chennai

    Devops training in marathahalli

    Devops training in rajajinagar

    Devops training in BTM Layout

  27. I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing.. Believe me I did wrote an post about tutorials for beginners with reference of your blog. 

    Data Science Training in Chennai
    Data science training in bangalore
    online Data science training
    Data science training in pune
    Data science training in kalyan nagar
    Data science training in Bangalore

  28. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.

    ccna training in chennai

    ccna training in bangalore

    ccna training in pune

  29. Nice tips. Very innovative... Your post shows all your effort and great experience towards your work Your Information is Great if mastered very well.

    java training in chennai | java training in bangalore

    java online training | java training in pune

    selenium training in chennai

    selenium training in bangalore

  30. Very Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.https://www.youtube.com/watch?v=nuPp-TiEeeQ&

  31. Great work. Quite a useful post, I learned some new points here.I wish you luck as you continue to follow that passion.

    CSS Training in Chennai
    CSS Training

  32. This comment has been removed by the author.

  33. Your story is truly inspirational and I have learned a lot from your blog. Much appreciated.
    python training in pune
    python training institute in chennai
    python training in Bangalore

  34. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Selenium Training in Chennai | Selenium Training in Bangalore | Selenium Training in Pune | Selenium online Training

  35. Thanks you for sharing this unique useful information content with us. Really awesome work. keep on blogging

    Devops Training in pune
    DevOps online Training

  36. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
    safety course in chennai

  37. This idea is mind blowing. I think everyone should know such information like you have described on this post. Thank you for sharing this explanation.Your final conclusion was good.
    Selenium Training in Chennai
    Selenium Training Institute in Chennai
    Java Courses in Chennai
    core Java training in chennai
    iOS Training Chennai
    best ios training in chennai

  38. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.

    rpa interview questions and answers
    automation anywhere interview questions and answers
    blueprism interview questions and answers
    uipath interview questions and answers
    rpa training in chennai

  39. Awwsome informative blog ,Very good information thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us.
    Aviation Academy in Chennai | Aviation Courses in Chennai | Best Aviation Academy in Chennai | Aviation Institute in Chennai | Aviation Training in Chennai

  40. After seeing your article I want to say that the presentation is very good and also a well-written article with some very good information which is very useful for the readers....thanks for sharing it and do share more posts like this.
    angularjs Training in bangalore

    angularjs Training in bangalore

    angularjs Training in btm

    angularjs Training in electronic-city

    angularjs online Training

    angularjs Training in marathahalli

  41. Thanks For Your valuable posting, it was very informative

    Guest posting sites

  42. Thanks for your interesting ideas.the information's in this blog is very much useful for me to improve my knowledge.
    android developer course in bangalore
    Android Training in chennai
    Android Training courses near me
    Android Training in chennai


  43. Fascinating .I really enjoy reading and also appreciate your work

    PHP Training in Chennai
    PHP Training

  44. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    iphone service center | ipad service center | imac service center


  45. Worthful Hadoop tutorial. Appreciate a lot for taking up the pain to write such a quality content on Hadoop tutorial. Just now I watched this similar Hadoop tutorial and I think this will enhance the knowledge of other visitors for sureHadoop Online Training

  46. Thanks For Sharing The Information The Information Shared Is Very Valuable Please Keep Updating

    Us Time Just Went On Reading The article Hadoop Online Course