Functions
Table of contents
Transformations
Transformation functions allow modifying elements of collections in various ways. In this section you can find functions related to each type of collections.
PCollection<T>
Method | Description |
---|---|
distinct() ↪️ PCollection<T> | Returns a collection with all distinct elements by the window. |
filter(f) ↪️ PCollection<T> | Returns a PCollection<T> with elements matched to the predicate.Arguments: · f: (T) -> Boolean - the predicate functions to check the elements |
flatMap(f) ↪️ PCollection<U> | Takes each element, produces zero, one, or more elements, and returns a new PCollection<U> with all of them.Arguments: · f: (T) -> Iterable<U> - a function to transform the element |
forEach(f) ↪️ Unit | Applies the defined functions to each element from the input and returns nothing. Arguments: · f: (T) -> Unit - a function to apply |
keyBy(f) ↪️ PCollection<KV<K, T>> | Assigns a new key to each value and returns a collection of key-value pairs. Arguments: · f: (T) -> K - a function to extract the key from the element |
map(f) ↪️ PCollection<U> | Takes each element from the input, transforms it into a new one, and returns a new PCollection<U> with all of them.Arguments: · f: (T) -> U - a function to transform the element |
partition(partitions) ↪️ PCollectionList<T> | Splits the elements of the input collection into the defined number of partitions, and returns a PCollectionList<T> that bundles all collections containing the split elements.Arguments: · partitions: Int - a number of partitions |
partitionBy(partitions, f) ↪️ PCollectionList<T> | Uses the defined function to split the elements of the input collection into partitions, and returns a PCollectionList<T> that bundles all collections containing the split elements.Arguments: · partitions: Int - a number of partitions· f: (T) -> Int - a function to define the number of partition for the element |
peek(f) ↪️ PCollection<T> | Applies the defined functions to each element and returns the same collection. Arguments: · f: (T) -> Unit - a function to apply |
sample(sampleSize) ↪️ PCollection<Iterable<T>> | Uniformly at random selects the defined number of the elements from the input and returns a collection containing the selected elements. Arguments: · sampleSize: Int - a size of the sample |
take(limit) ↪️ PCollection<T> | Returns a new collection containing up to limit elements of the input. Arguments: · limit: Long - a size of the limit |
top(count) ↪️ PCollection<List<T>> | Returns a new collection with a single element containing the largest elements of the input collection. Arguments: · count: Int - a limit count of the results |
PCollection<KV<K, T>>
Method | Description |
---|---|
flatMapValues(f) ↪️ PCollection<KV<K, U>> | Takes the value from each input key-value pair, produces zero, one, or more elements, and returns a new collection with key-value pairs for all results. Arguments: · f: (V) -> Iterable<U> - a function to transform the value |
keys() ↪️ PCollection<K> | Returns a collection of the keys from the input collection of key-value pairs. |
mapValues(f) ↪️ PCollection<KV<K, U>> | Takes the value from each input key-value pair, transforms it into a new one, and returns a new collection with key-value pairs with the results as values. Arguments: · f: (V) -> U - a function to transform the value |
swap() ↪️ PCollection<KV<V, K>> | Returns a new collection, where all the keys and values have been swapped. |
toPair() ↪️ PCollection<Pair<K, V>> | Returns a collection of Kotlin Pair instances from the input key-value pairs. |
values() ↪️ PCollection<V> | Returns a collection of the values from the input collection of key-value pairs. |
PCollection<Pair<K, T>>
Method | Description |
---|---|
toKV() ↪️ PCollection<KV<K, V>> | Returns a new collection of KV instances from the input collection of Kotlin Pair elements. |
Combinations
Combining functions are responsible for aggregating elements of the input collections.
PCollection<T>
Method | Description |
---|---|
and(pc) ↪️ PCollectionList<T> | Combines input collections to a list. Arguments: · pc: PCollection<T> - other collection to combine |
approximateQuantiles(num) ↪️ PCollection<List<T>> | Returns a collection with a single value as a list of the approximate N-tiles of the elements of the input collection. Arguments: · num: Int - a size of quantiles list |
aggregate(zero, seq, comb) ↪️ PCollection<U> | Combines elements from the input collection using given functions and the beginning value. Arguments: · zero: U - a beginning value to aggregate· seq: (U, T) -> U - adds the current element to result· comb: (U, U) -> U - combines two results into the one |
combine(cc, mv, mc) ↪️ PCollection<U> | Combines elements from the input collection using given functions. Arguments: · cc: (T) -> U - creates a new combiner from the element· mv: (U, T) -> U - adds the current element to the combiner· mc: (U, U) -> U - merges two combiners into the one |
countApproxDistinct(maxErr) ↪️ PCollection<Long> | Returns a collection with a single value that is an estimate of the number of distinct elements in the input collection. Arguments: · maxErr: Double - a desired maximum estimation error |
countApproxDistinct(sampleSize) ↪️ PCollection<Long> | Returns a collection with a single value that is an estimate of the number of distinct elements in the input collection. Arguments: · sampleSize: Int - controls the estimation error, which is about 2 / sqrt(sampleSize) |
count() ↪️ PCollection<Long> | Returns a collection with a single value that is the number of elements in the input collection. |
countByValue() ↪️ PCollection<KV<T, Long>> | Returns a keyed collection with the elements from the input collection as the keys and counts of them as the values. |
fold(zero, f) ↪️ PCollection<T> | Folds all elements from the input collection to the one result using the given function and the beginning value. Arguments: · zero: T - a beginning value to aggregate· f: (T, T) -> T - combines two elements into the one result |
groupBy(f) ↪️ PCollection<KV<K, Iterable<T>>> | Returns a keyed collection representing a map from each distinct key from the elements of the input collection to an Iterable over all the values associated with that key.Arguments: · f: (T) -> K - a function to extract a key from the element |
latest() ↪️ PCollection<T> | Returns a collection with the latest element according to its event time, or null if there are no elements. |
max() ↪️ PCollection<T> | Returns a collection with the maximum according to the natural ordering of T of the input elements, or null if there are no elements. |
min() ↪️ PCollection<T> | Returns a collection with the minimum according to the natural ordering of T of the input elements, or null if there are no elements. |
reduce(f) ↪️ PCollection<T> | Reduces all elements from the input collection to the one result using the given function. Arguments: · f: (T, T) -> T - combines two elements into the one result |
union(pc) ↪️ PCollection<T> | Returns a new collection with all elements from the input collections. Arguments: · pc: PCollection<T> - other collection to combine |
PCollection<T : Number>
Method | Description |
---|---|
mean() ↪️ PCollection<Double> | Returns a collection with the mean of the input elements, or 0 if there are no elements. |
sum() ↪️ PCollection<T> | Returns a collection with the sum of the input elements, or 0 if there are no elements. |
PCollection<KV<K, T>>
Method | Description |
---|---|
aggregateByKey(zero, seq, comb) ↪️ PCollection<KV<K, U>> | Combines values from the input key-value pairs using given functions and the beginning value. Arguments: · zero: U - a beginning value to aggregate· seq: (U, V) -> U - adds the current value to result· comb: (U, U) -> U - combines two results into the one |
approximateQuantilesByKey(num) ↪️ PCollection<KV<K, List<V>>> | Returns a collection with lists of the approximate N-tiles of the values for each key of the input key-value pairs. Arguments: · num: Int - a size of quantiles list |
combineByKey(cc, mv, mc) ↪️ PCollection<KV<K, U>> | Combines values from the input key-value pairs using given functions. Arguments: · cc: (V) -> U - creates a new combiner from the value· mv: (U, V) -> U - adds the current value to the combiner· mc: (U, U) -> U - merges two combiners into the one |
countByKey() ↪️ PCollection<KV<K, Long>> | Returns a collection with counts of values for each key in the input key-value pairs. |
foldByKey(zero, f) ↪️ PCollection<KV<K, V>> | Folds all values for each key from the input key-value pairs to the one result using the given function and the beginning value. Arguments: · zero: V - a beginning value to aggregate· f: (V, V) -> V - combines two values into the one result |
groupByKey() ↪️ PCollection<KV<K, Iterable<V>>> | Returns a keyed collection representing a map from each distinct key from the elements of the input collection to an Iterable over all the values associated with that key. |
groupToBatches(size) ↪️ PCollection<KV<K, Iterable<V>>> | Combines the input key-value pairs to a desired batch size for each key. |
latestByKey() ↪️ PCollection<KV<K, V>> | Returns a collection with the latest values for each key according to its event time. |
maxByKey() ↪️ PCollection<KV<K, V>> | Returns a collection with the maximum value for each key according to the natural ordering of V . |
minByKey() ↪️ PCollection<KV<K, V>> | Returns a collection with the minimum value for each key according to the natural ordering of V . |
reduceByKey(f) ↪️ PCollection<KV<K, V>> | Reduces all values for each key from the input key-value pairs to the one result using the given function. Arguments: · f: (V, V) -> V - combines two values into the one result |
PCollection<KV<K, V : Number>>
Method | Description |
---|---|
meanByKey() ↪️ PCollection<KV<K, Double>> | Returns a collection with the means of the input values for each key. |
sumByKey() ↪️ PCollection<KV<K, V>> | Returns a collection with the sums of the input values for each key. |
Reifying
Reifying functions for converting between the explicit and implicit forms of various Beam values.
PCollection<T>
Method | Description |
---|---|
asTimestamped() ↪️ PCollection<TimestampedValue<T>> | Returns a collection with all elements of the input wrapped in values with timestamps. |
asWindowed() ↪️ PCollection<ValueInSingleWindow<T>> | Returns a collection with all elements of the input enriched with additional information about windowing. |
PCollection<KV<K, V>>
Method | Description |
---|---|
asTimestampedValues() ↪️ PCollection<KV<K, TimestampedValue<T>>> | Returns a collection with all values of the input key-value pairs wrapped in values with timestamps. |
asWindowedValues() ↪️ PCollection<KV<K, ValueInSingleWindow<T>>> | Returns a collection with all values of the input key-value pairs enriched with additional information about windowing. |