Skip to content

Latest commit

 

History

History
57 lines (44 loc) · 2.74 KB

spark-sql-streaming-properties.adoc

File metadata and controls

57 lines (44 loc) · 2.74 KB

Configuration Properties

The following list are the properties that you can use to fine-tune Spark Structured Streaming applications.

You can set them in a SparkSession upon instantiation using config method.

import org.apache.spark.sql.SparkSession
val spark: SparkSession = SparkSession.builder
  .master("local[*]")
  .appName("My Spark Application")
  .config("spark.sql.streaming.metricsEnabled", true)
  .getOrCreate
Table 1. Structured Streaming’s Properties (in alphabetical order)
Name Default Description

spark.sql.streaming.checkpointLocation

(empty)

Default checkpoint directory for storing checkpoint data for streaming queries

spark.sql.streaming.metricsEnabled

false

Flag whether Dropwizard CodaHale metrics will be reported for active streaming queries

spark.sql.streaming.minBatchesToRetain

100

(internal) The minimum number of batches that must be retained and made recoverable.

Used…​FIXME

spark.sql.streaming.numRecentProgressUpdates

100

Number of progress updates to retain for a streaming query

spark.sql.streaming.pollingDelay

10 (millis)

(internal) Time delay before StreamExecution polls for new data when no data was available in a batch.

spark.sql.streaming.stateStore.maintenanceInterval

60s

The initial delay and how often to execute StateStore’s maintenance task.

spark.sql.streaming.stateStore.providerClass

org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider

(internal) The fully-qualified class name to manage state data in stateful streaming queries. This class must be a subclass of StateStoreProvider, and must have a zero-arg constructor.

spark.sql.streaming.unsupportedOperationCheck

true

(internal) When enabled (i.e. true), StreamingQueryManager makes sure that the logical plan of a streaming query uses supported operations only.