Back to Spark

Structured Streaming Programming Guide

docs/streaming/additional-information.md

4.1.13.4 KB
Original Source

Miscellaneous Notes

  • Several configurations are not modifiable after the query has run. To change them, discard the checkpoint and start a new query. These configurations include:
    • spark.sql.shuffle.partitions
      • This is due to the physical partitioning of state: state is partitioned via applying hash function to key, hence the number of partitions for state should be unchanged.
      • If you want to run fewer tasks for stateful operations, coalesce would help with avoiding unnecessary repartitioning.
        • After coalesce, the number of (reduced) tasks will be kept unless another shuffle happens.
    • spark.sql.streaming.stateStore.providerClass: To read the previous state of the query properly, the class of state store provider should be unchanged.
    • spark.sql.streaming.multipleWatermarkPolicy: Modification of this would lead inconsistent watermark value when query contains multiple watermarks, hence the policy should be unchanged.

Related Resources

Further Reading

Talks

  • Spark Summit Europe 2017
  • Spark Summit 2016

Migration Guide

The migration guide is now archived on this page.