Pig Latin provides a number of operators which filter, join, or otherwise organize data.
FOREACH: The FOREACH command operates on each element of a data bag. This is useful, for instance, for processing each input record in a bag returned by a LOAD statement.
FOREACH bagname GENERATE expression, expression...
This statement iterates over the contents of a bag. It applies the expressions on the right of the GENERATE keyword to the data provided by the current record emitted from the bag. The expressions may be, for example, the names of fields. So to extract the names of all users who accessed the site (based on the query_log.txt example shown above), we could write a query like:
FOREACH queries GENERATE userId;
In the FOREACH statement, each element of the bag is considered independently. There are no expressions which reference multiple elements being extracted from the bag's iterator at a time; this allows the statement to be processed in parallel using Hadoop MapReduce.
Expressions emitted by the GENERATE element are not limited to the names of fields; they can be fields (by name like userId or by position like $0), constants, algebraic operations, map lookups, conditional expressions, or FLATTEN expressions, described below.
Finally, these expressions may also call user-provided functions that are written in Java. These user-provided functions have access to the entire current record through a Pig library; in this way, Pig can be used as the heavy-lifting component to automate record-by-record mapping using an application-specific Java function to perform tricky parsing or evaluation logic. Pig also provides several of the most commonly-needed functions, such as COUNT, AVG, MIN, MAX, and SUM.
FLATTEN is an expression which will eliminate a level of nesting. Given a tuple which contains a bag, FLATTEN will emit several tuples each of which contains one record from the bag. For example, if we had a bag of records containing a person's name and a list of types of pets they own:
(Alice, { turtle, goldfish, cat })
(Bob, { dog, cat })
A FLATTEN command would eliminate the inner bags like so:
(Alice, turtle)
(Alice, goldfish)
(Alice, cat)
(Bob, dog)
(Bob, cat)
FILTER statements iterate over a bag and return a new bag containing all elements which pass a conditional expression, e.g.:
adults = FILTER people BY age > 21;
The COGROUP and JOIN operations perform similar functions: they unite related data elements from multiple data sets. The difference is that JOIN acts like the SQL JOIN statement, creating a flat set of output records containing the joined cross-product of the input records. The COGROUP operator, on the other hand, groups the elements by their common field and returns a set of records each containing two separate bags. The first bag is the records of the first data set with the common field, and the second bag is the records of the second data set containing the common field.
To illustrate the difference, suppose we had the flattened data set mapping people to their pets, and another flattened data set mapping people to their friends. We could create a "pets of friends" data set out of these like the following. Here are the input data sets:
pets: (owner, pet)
----------------------
(Alice, turtle)
(Alice, goldfish)
(Alice, cat)
(Bob, dog)
(Bob, cat)
friends: (friend1, friend2)
----------------------
(Cindy, Alice)
(Mark, Alice)
(Paul, Bob)
Here is what is returned by COGROUP:
COGROUP pets BY owner, friends BY friend2; returns:
( Alice, {(Alice, turtle), (Alice, goldfish), (Alice, cat)},
{(Cindy, Alice), (Mark, Alice)} )
( Bob, {(Bob, dog), (Bob, cat)}, {(Paul, Bob)} )
Contrasted with the more familiar, non-hierarchical JOIN operator:
JOIN pets BY owner, friends BY friend2; returns:
(Alice, turtle, Cindy)
(Alice, turtle, Mark)
(Alice, goldfish, Cindy)
(Alice, goldfish, Mark)
(Alice, cat, Cindy)
(Alice, cat, Mark)
(Bob, dog, Paul)
(Bob, cat, Paul)
In general, COGROUP command supports grouping on as many data sets as are desired. Three or more data sets can be joined in this fashion. It is also possible to group up elements of only a single data set; this is supported through an alternate keyword, GROUP.
A GROUP ... BY statement will organize a bag of records into bags of related items based on the field identified as their common key field. e.g., the pets bag from the previous example could be grouped up with:
GROUP pets BY owner; returns:
( Alice, {(Alice, turtle), (Alice, goldfish), (Alice, cat)} )
( Bob, {(Bob, dog), (Bob, cat)} )
In this way, GROUP and FLATTEN are effectively inverses of one another.
More complicated statements can be realized as well: operations which expect a data set as input do not need to use an explicitly-named data set; they can use one generated "inline" with another FILTER, GROUP or other statement.
When the final data set has been created by a Pig Latin script, the output can be saved to a file with the STORE command, which follows the form:
STORE data set INTO 'filename' USING function()
The provided function specifies how to serialize the data to the file; if it is omitted, then a default serializer will write plain-text tab-delimited files.
A number of additional operators exist for the purposes of removing duplicate records, sorting records, etc. This paper explains the additional operators and expression syntaxes in greater detail.
Thanks for such an article. You can find word count program in pig script at:
ReplyDeleteword count program in pig script
Nice Tutorial. http://pigtutorial.blogspot.in/2014/01/setting-up-eclipse-for-apache-pig-and.html will get you started with pig setup in eclipse
ReplyDeleteHadoop is creating more opportunities to every one. And thanks for sharing best information about hadoop in this blog Hadoop Tutorial
ReplyDeleteHadoop Tutorial
Thanku soo much for sharing this valuable information.Really hadoop will makes you to pay your way to good growth.Recently I visited www.hadooponlinetutor.com,they are offering the videos at $20 only.
ReplyDeleteThank you so much for sharing this worthwhile to spent time on. You are running a really awesome blog. Keep up this good work Big Data Training
ReplyDeleteLearning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.
ReplyDeleteHadoop Training in Chennai | Big Data Training in Chennai | Big Data Training Chennai | Big Data Training
I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.
ReplyDeleteSoftware testing training in chennai | Software testing course | Manual testing training in Chennai
Salesforce.com is an american company which offfers CRM based cloud services and it is loved globally for it quality services
ReplyDeletesalesforce training in chennai|salesforce training institute in chennai | salesforce course in chennai
SAS stands for statistical analysis system which is a analysis tool developed by SAS institute and with the help of this tool data driven decisions can be taken which is helpful for the bsuiness.
ReplyDeleteSAS training in Chennai | SAS course in Chennai | SAS training institute in Chennai
Thanks a lot for letting me a chance to visit your any pointers. Your article about web design is really impressed me very much.ios applications development
ReplyDeleteGreat Tutorial with important information about Pig! Pig is a high-level platform for creating MapReduce programs used with Hadoop. I am Hadoop Developer. I will share you a link https://goo.gl/rrChA2 just have looks. I hope it will help who are looking for Hadoop.
ReplyDeleteThank you
This comment has been removed by the author.
ReplyDeleteAmazing content.If you are interested instudying nodejs visit this website. Nodejs is an open source, server side web application that enables you to build fast and scalable web application that is capable of running large number of simultaneous connections that has high throughput.
ReplyDeleteNode js Training in Chennai | Node JS training institute in chennai
This is a great inspiring article.I am pretty much pleased with your good work.You put really very helpful information..
ReplyDeleteChennai Bigdata Training
ibm-message-broker training in chennai
ReplyDeletedatamodeling training in chennai
ReplyDeletegreat article thanks a lot for sharing
ReplyDeleteSelenium Training | Selenium Training Institute in Chennai | Best Selenium Training Institutes in Chennai | Software Testing Training in Chennai
keep blogging
ReplyDeleteSelenium Training | Selenium Training Institute in Chennai | Best Selenium Training Institutes in Chennai | Software Testing Training in Chennai
Thanks for sharing the information very useful info about Hadoop and keep updating us, Please........
ReplyDeleteUse schemas to assign types to fields. If you don't assign types, fields default to type byte array and implicit conversions are applied to the data depending on the context in which that data is used.If want to do learning from Selenium automation testing to reach us Besant technologies.They Provide at real-time Selenium Automation Testing.
ReplyDeleteSelenium Training in Chennai
Selenium Training Institute in Chennai
This comment has been removed by the author.
ReplyDeleteI appreciate your work on Hadoop. It's such a wonderful read on Hadoop tutorial. Keep sharing stuffs like this. I am also educating people on similar Hadoop so if you are interested to know more you can watch this Hadoop tutorial:-https://www.youtube.com/watch?v=1jMR4cHBwZE
ReplyDelete
ReplyDeleteTop 10 hot technologies of 2019 to make a good career in the upcoming year: https://www.youtube.com/watch?v=-y5Z2fmnp-o
This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..
ReplyDeleteDevops training in OMR
Deops training in annanagar
Devops training in chennai
Devops training in marathahalli
Devops training in rajajinagar
Devops training in BTM Layout
Nice tips. Very innovative... Your post shows all your effort and great experience towards your work Your Information is Great if mastered very well.
ReplyDeletejava training in chennai | java training in bangalore
java online training | java training in pune
selenium training in chennai
selenium training in bangalore
Very Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.https://www.youtube.com/watch?v=nuPp-TiEeeQ&
ReplyDeleteGreat work. Quite a useful post, I learned some new points here.I wish you luck as you continue to follow that passion.
ReplyDeleteCSS Training in Chennai
CSS Training
This comment has been removed by the author.
ReplyDeleteVery nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
ReplyDeleteSelenium Training in Chennai | Selenium Training in Bangalore | Selenium Training in Pune | Selenium online Training
Thanks you for sharing this unique useful information content with us. Really awesome work. keep on blogging
ReplyDeleteDevops Training in pune
DevOps online Training
This idea is mind blowing. I think everyone should know such information like you have described on this post. Thank you for sharing this explanation.Your final conclusion was good.
ReplyDeleteSelenium Training in Chennai
Selenium Training Institute in Chennai
Java Courses in Chennai
core Java training in chennai
iOS Training Chennai
best ios training in chennai
Awwsome informative blog ,Very good information thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us.
ReplyDeleteAviation Academy in Chennai | Aviation Courses in Chennai | Best Aviation Academy in Chennai | Aviation Institute in Chennai | Aviation Training in Chennai
After seeing your article I want to say that the presentation is very good and also a well-written article with some very good information which is very useful for the readers....thanks for sharing it and do share more posts like this.
ReplyDeleteangularjs Training in bangalore
angularjs Training in bangalore
angularjs Training in btm
angularjs Training in electronic-city
angularjs online Training
angularjs Training in marathahalli
Thanks For Your valuable posting, it was very informative
ReplyDeleteGuest posting sites
Education
It’s an awesome post. Keep sharing such kind of worthy information.
ReplyDeletePlacement Training Institutes | Placement Training Centres in Chennai | Placement Training Institutes in Chennai | Placement Training institutes in Adyar | Placement Training institutes in Velachery | Placement Training institutes in Tambaram
Thanks for your interesting ideas.the information's in this blog is very much useful for me to improve my knowledge.
ReplyDeleteandroid developer course in bangalore
Android Training in chennai
Android Training courses near me
Android Training in chennai
Very good blog, thanks for sharing such a wonderful blog with us. Keep sharing such worthy information to my vision.
ReplyDeleteAzure Training in Chennai
Azure Training near me
Microsoft Azure Training in Chennai
Robotics Process Automation Training in Chennai
AWS course in Chennai
Blue Prism Training Chennai
Really it was an awesome article. very interesting to read.
ReplyDeleteThanks for sharing.
Tableau Training in Chennai
Tableau Course in Chennai
Tableau Certification in Chennai
Tableau Training Institutes in Chennai
Tableau Certification
Tableau Training
Tableau Course
Very useful blog for those who are really want to enhance their knowledge in the software field. Keep updating.
ReplyDeleteSelenium Training in Chennai
software testing selenium training
ios developer course in chennai
Digital Marketing Course in Chennai
SEO Placement
SEO Training in Velachery
Excellent and useful blog admin, I would like to read more about this topic.
ReplyDeleteAngularjs Training in Chennai
Angularjs course in Chennai
Angularjs Training in Velachery
Robotics Process Automation Training in Chennai
Blue Prism Training in Chennai
UiPath Training in Chennai
Awesome Post. It was a pleasure reading your article. Thanks for sharing.
ReplyDeletePega training institutes
Pega training courses
Pega administrator training
Pega testing training
Pega software training
Pega software course
Interesting blog, it gives lots of information to me. Keep sharing more like this.
ReplyDeleteccna Training in Chennai
ccna certification in Chennai
ccna Training in Velachery
ccna course in Chennai
ccna Training institute in Chennai
ccna institute in Chennai
Nice blog! keep sharing.
ReplyDeleteTally Course in Chennai
Tally Classes in Chennai
Tally Training in Chennai
Spark Training Academy Chennai
VMware Training Center in Chennai
WordPress Training Institute in Chennai
Amazing post very impressive to read
ReplyDeleteCCNA training class in chennai
ReplyDeleteWorthful Hadoop tutorial. Appreciate a lot for taking up the pain to write such a quality content on Hadoop tutorial. Just now I watched this similar Hadoop tutorial and I think this will enhance the knowledge of other visitors for sureHadoop Online Training
Thanks For Sharing The Information The Information Shared Is Very Valuable Please Keep Updating
ReplyDeleteUs Time Just Went On Reading The article Hadoop Online Course
Good explanation with appropriate solution.
ReplyDeletefrenchtraining
Education
ReplyDeleteI am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
Ethical Hacking Course in Chennai
Hacking Course in Chennai
Hacking Classes in Chennai
Blue Prism Training in Chennai
CCNA Course in Chennai
Cloud Computing Training in Chennai
Ethical Hacking Training in OMR
ReplyDeleteAnd indeed, I’m just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective I’ve had.
Data science Course Training in Chennai |Best Data Science Training Institute in Chennai
RPA Course Training in Chennai |Best RPA Training Institute in Chennai
AWS Course Training in Chennai |Best AWS Training Institute in Chennai
Devops Course Training in Chennai |Best Devops Training Institute in Chennai
Selenium Course Training in Chennai |Best Selenium Training Institute in Chennai
Java Course Training in Chennai | Best Java Training Institute in Chennai
wow... what a great blog, this writter who wrote this article it's realy a great blogger, this article so inspiring me to be a better person
ReplyDeletedata science course malaysia
big data course malaysia
data analytics course malaysia
AI learning course malaysia
machinelearning course malaysia
pmp certification malaysia
Great article, valuable and excellent article, lots of great information, thanks for sharing with peoples.
ReplyDeleteExcelR Data Science Bangalore
Thank you for your post, I look for such article along time, today i find it finally. this post give me lots of advise it is very useful for me.
ReplyDeletedate analytics certification training courses
data science courses training
Attend The Python training in bangalore From ExcelR. Practical Python training in bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Python training in bangalore.
ReplyDeletepython training in bangalore
This comment has been removed by the author.
ReplyDeletethanks for sharing this information
ReplyDeletedata science with python training in Bangalore
Machine Learning training in bangalore
Qlik Sense Training in Bangalore
Qlikview Training in Bangalore
RPA Training in Bangalore
MEAN Stack Training in Bangalore
MERN StackTraining in Bangalore
Blue Prism Training in Bangalore
This comment has been removed by the author.
ReplyDeleteHi,
ReplyDeleteGood job & thank you very much for the new information, i learned something new. Very well written. It was sooo good to read and usefull to improve knowledge. Who want to learn this information most helpful. One who wanted to learn this technology IT employees will always suggest you take big data hadoop training in bangalore. Because big data course in Bangalore is one of the best that one can do while choosing the course.
Great Article
ReplyDeleteIEEE Projects on Cloud Computing
Final Year Projects for CSE
JavaScript Training in Chennai
JavaScript Training in Chennai
Thanks For sharing a nice post about datascience with python Training Course.It is very helpful and datascience with python useful for us.datascience with python training in bangalore
ReplyDeleteThis is an awesome blog. Really very informative and creative contents. This concept is a good way to enhance the knowledge. Thanks for sharing.
ReplyDeleteExcelR business analytics course
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.DataScience with Python Training in Bangalore
ReplyDeleteI am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.
ReplyDeleteSoftgen Infotech is the Best SAP HANA Admin Training in Bangalore located in BTM Layout, Bangalore providing quality training with Realtime Trainers and 100% Job Assistance.
very nice
ReplyDeleteinplant training in chennai
inplant training in chennai for it
Bermuda web hosting
Botswana hosting
armenia web hosting
dominican republic web hosting
iran hosting
palestinian territory web hosting
iceland web hosting
nice
ReplyDeleteBermuda web hosting
Botswana hosting
armenia web hosting
lithuania shared web hosting
inplant training in chennai
inplant training in chennai for it
suden web hosting
tunisia hosting
uruguay web hosting
Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog. Online Data Science Training in Pune, Mumbai, Delhi NCR
ReplyDeleteIt’s amazing how interesting it is for me to visit you very often.
ReplyDeleteBig Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery
"This blog is very nice and the author written way was very good with a brief explanation. Well done...!
ReplyDelete.
Digital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery
"
ReplyDeleteThat is very interesting; you are a very skilled blogger. I have shared your website in my social networks! A very nice guide. I will definitely follow these tips. Thank you for sharing such detailed article.thanks a lot
Java training in Chennai
Java training in Bangalore
Java training in Hyderabad
Java Training in Coimbatore
Java Online Training
I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly..Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..
ReplyDeleteData Science Training In Chennai
Data Science Online Training In Chennai
Data Science Training In Bangalore
Data Science Training In Hyderabad
Data Science Training In Coimbatore
Data Science Training
Data Science Online Training
Great thoughts you got there, believe I may possibly try just some of it throughout my daily life.
ReplyDeleteDevOps Training in Hyderabad
Attend The Business Analytics Courses From ExcelR. Practical Business Analytics Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Business Analytics Courses.
ReplyDeleteBusiness Analytics Courses
Awesome article, it was exceptionally helpful! I simply began in this and I'm becoming more acquainted with it better! Cheers, keep doing awesome!
ReplyDeleteMystrikingly Bloglovin
Reach to the best Python Training institute in Chennai for skyrocketing your career, Infycle Technologies. It is the best Software Training & Placement institute in and around Chennai, that also gives the best placement training for personality tests, interview preparation, and mock interviews for leveling up the candidate's grades to a professional level.
ReplyDeletebinance güvenilir mi
ReplyDeleteinstagram takipçi satın al
takipçi satın al
instagram takipçi satın al
shiba coin hangi borsada
shiba coin hangi borsada
tiktok jeton hilesi
is binance safe
is binance safe
Happy to read the informative blog. Thanks for sharing
ReplyDeletebest java training institute in chennai
best java training institute in chennai
I see the greatest contents on your blog and I extremely love reading them.
ReplyDeletefull stack web development course
SMM PANEL
ReplyDeletesmm panel
İs ilanlari blog
instagram takipçi satın al
hirdavatciburada.com
BEYAZESYATEKNİKSERVİSİ.COM.TR
SERVİS
Tiktok jeton hilesi indir
tuzla lg klima servisi
ReplyDeletetuzla alarko carrier klima servisi
beykoz daikin klima servisi
üsküdar daikin klima servisi
tuzla bosch klima servisi
tuzla arçelik klima servisi
çekmeköy samsung klima servisi
ataşehir samsung klima servisi
çekmeköy mitsubishi klima servisi
Immigration Consultants
ReplyDeleteThanks for sharing this blog.
ReplyDeleteCottages for monthly rental in Ooty