Link Search Menu Expand Document

Functions

Table of contents

  1. Transformations
    1. PCollection<T>
    2. PCollection<KV<K, T>>
    3. PCollection<Pair<K, T>>
  2. Combinations
    1. PCollection<T>
    2. PCollection<T : Number>
    3. PCollection<KV<K, T>>
    4. PCollection<KV<K, V : Number>>
  3. Reifying
    1. PCollection<T>
    2. PCollection<KV<K, V>>

Transformations

Transformation functions allow modifying elements of collections in various ways. In this section you can find functions related to each type of collections.

PCollection<T>

Method Description
distinct()
↪️ PCollection<T>
Returns a collection with all distinct elements by the window.
filter(f)
↪️ PCollection<T>
Returns a PCollection<T> with elements matched to the predicate.

Arguments:
· f: (T) -> Boolean - the predicate functions to check the elements
flatMap(f)
↪️ PCollection<U>
Takes each element, produces zero, one, or more elements, and returns a new PCollection<U> with all of them.

Arguments:
· f: (T) -> Iterable<U> - a function to transform the element
forEach(f)
↪️ Unit
Applies the defined functions to each element from the input and returns nothing.

Arguments:
· f: (T) -> Unit - a function to apply
keyBy(f)
↪️ PCollection<KV<K, T>>
Assigns a new key to each value and returns a collection of key-value pairs.

Arguments:
· f: (T) -> K - a function to extract the key from the element
map(f)
↪️ PCollection<U>
Takes each element from the input, transforms it into a new one, and returns a new PCollection<U> with all of them.

Arguments:
· f: (T) -> U - a function to transform the element
partition(partitions)
↪️ PCollectionList<T>
Splits the elements of the input collection into the defined number of partitions, and returns a PCollectionList<T> that bundles all collections containing the split elements.

Arguments:
· partitions: Int - a number of partitions
partitionBy(partitionsf)
↪️ PCollectionList<T>
Uses the defined function to split the elements of the input collection into partitions, and returns a PCollectionList<T> that bundles all collections containing the split elements.

Arguments:
· partitions: Int - a number of partitions
· f: (T) -> Int - a function to define the number of partition for the element
peek(f)
↪️ PCollection<T>
Applies the defined functions to each element and returns the same collection.

Arguments:
· f: (T) -> Unit - a function to apply
sample(sampleSize)
↪️ PCollection<Iterable<T>>
Uniformly at random selects the defined number of the elements from the input and returns a collection containing the selected elements.

Arguments:
· sampleSize: Int - a size of the sample
take(limit)
↪️ PCollection<T>
Returns a new collection containing up to limit elements of the input.

Arguments:
· limit: Long - a size of the limit
top(count)
↪️ PCollection<List<T>>
Returns a new collection with a single element containing the largest elements of the input collection.

Arguments:
· count: Int - a limit count of the results

PCollection<KV<K, T>>

Method Description
flatMapValues(f)
↪️ PCollection<KV<K, U>>
Takes the value from each input key-value pair, produces zero, one, or more elements, and returns a new collection with key-value pairs for all results.

Arguments:
· f: (V) -> Iterable<U> - a function to transform the value
keys()
↪️ PCollection<K>
Returns a collection of the keys from the input collection of key-value pairs.
mapValues(f)
↪️ PCollection<KV<K, U>>
Takes the value from each input key-value pair, transforms it into a new one, and returns a new collection with key-value pairs with the results as values.

Arguments:
· f: (V) -> U - a function to transform the value
swap()
↪️ PCollection<KV<V, K>>
Returns a new collection, where all the keys and values ​​have been swapped.
toPair()
↪️ PCollection<Pair<K, V>>
Returns a collection of Kotlin Pair instances from the input key-value pairs.
values()
↪️ PCollection<V>
Returns a collection of the values from the input collection of key-value pairs.

PCollection<Pair<K, T>>

Method Description
toKV()
↪️ PCollection<KV<K, V>>
Returns a new collection of KV instances from the input collection of Kotlin Pair elements.

Combinations

Combining functions are responsible for aggregating elements of the input collections.

PCollection<T>

Method Description
and(pc)
↪️ PCollectionList<T>
Combines input collections to a list.

Arguments:
· pc: PCollection<T> - other collection to combine
approximateQuantiles(num)
↪️ PCollection<List<T>>
Returns a collection with a single value as a list of the approximate N-tiles of the elements of the input collection.

Arguments:
· num: Int - a size of quantiles list
aggregate(zeroseqcomb)
↪️ PCollection<U>
Combines elements from the input collection using given functions and the beginning value.

Arguments:
· zero: U - a beginning value to aggregate
· seq: (U, T) -> U - adds the current element to result
· comb: (U, U) -> U - combines two results into the one
combine(ccmvmc)
↪️ PCollection<U>
Combines elements from the input collection using given functions.

Arguments:
· cc: (T) -> U - creates a new combiner from the element
· mv: (U, T) -> U - adds the current element to the combiner
· mc: (U, U) -> U - merges two combiners into the one
countApproxDistinct(maxErr)
↪️ PCollection<Long>
Returns a collection with a single value that is an estimate of the number of distinct elements in the input collection.

Arguments:
· maxErr: Double - a desired maximum estimation error
countApproxDistinct(sampleSize)
↪️ PCollection<Long>
Returns a collection with a single value that is an estimate of the number of distinct elements in the input collection.

Arguments:
· sampleSize: Int - controls the estimation error, which is about 2 / sqrt(sampleSize)
count()
↪️ PCollection<Long>
Returns a collection with a single value that is the number of elements in the input collection.
countByValue()
↪️ PCollection<KV<T, Long>>
Returns a keyed collection with the elements from the input collection as the keys and counts of them as the values.
fold(zerof)
↪️ PCollection<T>
Folds all elements from the input collection to the one result using the given function and the beginning value.

Arguments:
· zero: T - a beginning value to aggregate
· f: (T, T) -> T - combines two elements into the one result
groupBy(f)
↪️ PCollection<KV<K, Iterable<T>>>
Returns a keyed collection representing a map from each distinct key from the elements of the input collection to an Iterable over all the values associated with that key.

Arguments:
· f: (T) -> K - a function to extract a key from the element
latest()
↪️ PCollection<T>
Returns a collection with the latest element according to its event time, or null if there are no elements.
max()
↪️ PCollection<T>
Returns a collection with the maximum according to the natural ordering of T of the input elements, or null if there are no elements.
min()
↪️ PCollection<T>
Returns a collection with the minimum according to the natural ordering of T of the input elements, or null if there are no elements.
reduce(f)
↪️ PCollection<T>
Reduces all elements from the input collection to the one result using the given function.

Arguments:
· f: (T, T) -> T - combines two elements into the one result
union(pc)
↪️ PCollection<T>
Returns a new collection with all elements from the input collections.

Arguments:
· pc: PCollection<T> - other collection to combine

PCollection<T : Number>

Method Description
mean()
↪️ PCollection<Double>
Returns a collection with the mean of the input elements, or 0 if there are no elements.
sum()
↪️ PCollection<T>
Returns a collection with the sum of the input elements, or 0 if there are no elements.

PCollection<KV<K, T>>

Method Description
aggregateByKey(zeroseqcomb)
↪️ PCollection<KV<K, U>>
Combines values from the input key-value pairs using given functions and the beginning value.

Arguments:
· zero: U - a beginning value to aggregate
· seq: (U, V) -> U - adds the current value to result
· comb: (U, U) -> U - combines two results into the one
approximateQuantilesByKey(num)
↪️ PCollection<KV<K, List<V>>>
Returns a collection with lists of the approximate N-tiles of the values for each key of the input key-value pairs.

Arguments:
· num: Int - a size of quantiles list
combineByKey(ccmvmc)
↪️ PCollection<KV<K, U>>
Combines values from the input key-value pairs using given functions.

Arguments:
· cc: (V) -> U - creates a new combiner from the value
· mv: (U, V) -> U - adds the current value to the combiner
· mc: (U, U) -> U - merges two combiners into the one
countByKey()
↪️ PCollection<KV<K, Long>>
Returns a collection with counts of values for each key in the input key-value pairs.
foldByKey(zerof)
↪️ PCollection<KV<K, V>>
Folds all values for each key from the input key-value pairs to the one result using the given function and the beginning value.

Arguments:
· zero: V - a beginning value to aggregate
· f: (V, V) -> V - combines two values into the one result
groupByKey()
↪️ PCollection<KV<K, Iterable<V>>>
Returns a keyed collection representing a map from each distinct key from the elements of the input collection to an Iterable over all the values associated with that key.
groupToBatches(size)
↪️ PCollection<KV<K, Iterable<V>>>
Combines the input key-value pairs to a desired batch size for each key.
latestByKey()
↪️ PCollection<KV<K, V>>
Returns a collection with the latest values for each key according to its event time.
maxByKey()
↪️ PCollection<KV<K, V>>
Returns a collection with the maximum value for each key according to the natural ordering of V.
minByKey()
↪️ PCollection<KV<K, V>>
Returns a collection with the minimum value for each key according to the natural ordering of V.
reduceByKey(f)
↪️ PCollection<KV<K, V>>
Reduces all values for each key from the input key-value pairs to the one result using the given function.

Arguments:
· f: (V, V) -> V - combines two values into the one result

PCollection<KV<K, V : Number>>

Method Description
meanByKey()
↪️ PCollection<KV<K, Double>>
Returns a collection with the means of the input values for each key.
sumByKey()
↪️ PCollection<KV<K, V>>
Returns a collection with the sums of the input values for each key.

Reifying

Reifying functions for converting between the explicit and implicit forms of various Beam values.

PCollection<T>

Method Description
asTimestamped()
↪️ PCollection<TimestampedValue<T>>
Returns a collection with all elements of the input wrapped in values with timestamps.
asWindowed()
↪️ PCollection<ValueInSingleWindow<T>>
Returns a collection with all elements of the input enriched with additional information about windowing.

PCollection<KV<K, V>>

Method Description
asTimestampedValues()
↪️ PCollection<KV<K, TimestampedValue<T>>>
Returns a collection with all values of the input key-value pairs wrapped in values with timestamps.
asWindowedValues()
↪️ PCollection<KV<K, ValueInSingleWindow<T>>>
Returns a collection with all values of the input key-value pairs enriched with additional information about windowing.