Pig Latin Data Types

Values in Pig Latin can be expressed by four basic data types:

* An atom is any atomic value (e.g., "fish")
* A tuple is a record of multiple values with fixed arity. e.g., ("dog", "sparky").
* A data bag is a collection of an arbitrary number of values. e.g., {("dog", "sparky"), ("fish", "goldie")}. Data bags support a scan operation for iterating through their contents.
* A data map is a collection with a lookup function translating keys to values. e.g., ["age" : 25]

All data types are fully nestable; bags may contain tuples, and maps may contain bags or other maps, etc. This differs from a traditional database model, where data must be normalized into lists of atoms. By allowing data types to be composed in this manner, Pig queries line up better to the conceptual model of the data held by the programmer. Data types may also be heterogeneous. For example, the fields of a tuple may each have different types; some may be atoms, others may be more tuples, etc. The values in a bag may hold different types, as may the values in data maps. These can vary from one record to the next in the bag. Data map keys must be atoms, for efficiency reasons.

No comments:

Post a Comment