ConsoleSink
is a streaming sink that shows the DataFrame (for a batch) to the console.
ConsoleSink
is registered as console format (by ConsoleSinkProvider).
Name | Default Value | Description |
---|---|---|
|
Number of rows to display |
|
|
Truncate the data to display to 20 characters |
scala> spark.version
res0: String = 2.3.0-SNAPSHOT
import org.apache.spark.sql.streaming.{OutputMode, Trigger}
import scala.concurrent.duration._
val query = spark.
readStream.
format("rate").
load.
writeStream.
format("console"). // <-- use ConsoleSink
option("truncate", false).
option("numRows", 10).
trigger(Trigger.ProcessingTime(10.seconds)).
queryName("rate-console").
start
-------------------------------------------
Batch: 0
-------------------------------------------
+---------+-----+
|timestamp|value|
+---------+-----+
+---------+-----+
addBatch(batchId: Long, data: DataFrame): Unit
Note
|
addBatch is a part of Sink Contract.
|
Internally, addBatch
records the input batchId
in lastBatchId internal property.
addBatch
collects the input data
DataFrame
and creates a brand new DataFrame that it then shows (per numRowsToShow and isTruncated properties).
-------------------------------------------
Batch: [batchId]
-------------------------------------------
+---------+-----+
|timestamp|value|
+---------+-----+
+---------+-----+
Note
|
You may see Rerun batch: instead if the input batchId is below lastBatchId (likely due to a batch failure).
|