Link Search Menu Expand Document

Input and output basics

Table of contents

  1. Readers
  2. Writers

Readers

A reader is a general element to get data from any sources. The Kio context class has the read method to create an abstract reader instance. The reader instance has methods getting data, and the set of these methods depends on the set of attached connectors via the dependencies list (see Connectors).

The core module contains built-in methods to read data from text files:

kio.read().text(
    "/path/to/input.txt",
    compression = Compression.GZIP
)

This method returns a PCollection of Strings, each corresponding to one line of the input text file. The compression argument is optional and has Compression.AUTO as the default value. All other possible values you can find here.

Writers

Writers are responsible for sending results into various sinks. The set of the sink methods also depends on the set of attached connectors via the dependencies list (see Connectors).

All instances of the PCollection class in Kio have the write method to create an abstract writer instance, and the core module contains built-in methods to write data to text files:

collection.write().text(
    path = "/path/to/output",
    numShards = 5,
    suffix = ".txt",
    compression = Compression.GZIP
)

The path argument is required, all other arguments are optional:

  • numShards: Int - number of output files (default: 0 - by deciding of the runner)
  • suffix: String - name ending for the output files (default: .txt)
  • compression: Compression - method to compress output files (default: Compression.UNCOMPRESSED)