Input and output basics
Table of contents
Readers
A reader is a general element to get data from any sources. The Kio context class has the read
method to create an abstract reader instance. The reader instance has methods getting data, and the set of these methods depends on the set of attached connectors via the dependencies list (see Connectors).
The core
module contains built-in methods to read data from text files:
kio.read().text(
"/path/to/input.txt",
compression = Compression.GZIP
)
This method returns a PCollection
of Strings
, each corresponding to one line of the input text file. The compression
argument is optional and has Compression.AUTO
as the default value. All other possible values you can find here.
Writers
Writers are responsible for sending results into various sinks. The set of the sink methods also depends on the set of attached connectors via the dependencies list (see Connectors).
All instances of the PCollection
class in Kio have the write
method to create an abstract writer instance, and the core
module contains built-in methods to write data to text files:
collection.write().text(
path = "/path/to/output",
numShards = 5,
suffix = ".txt",
compression = Compression.GZIP
)
The path
argument is required, all other arguments are optional:
numShards: Int
- number of output files (default:0
- by deciding of the runner)suffix: String
- name ending for the output files (default:.txt
)compression: Compression
- method to compress output files (default:Compression.UNCOMPRESSED
)