Wednesday, 3 April 2019

Spark tuning parameters

Spark parameters

Dynamic Executor Allocation
spark.dynamicAllocation.enabled=True
spark.dynamicAllocation.executorIdleTimeout=2m
spark.dynamicAllocation.minExecutors=1
spark.dynamicAllocation.maxExecutors=2000

Better fetch failure handling
spark.max.fetch.failures.per.stage = 10

Scaling spark Driver
spark.rpc.io.serverThreads = 64

Tuning memory configurations
  1.Enable Off heap memory
  spark.memory.offHeap.enabled = True
  spark.memory.offHeap.size = 3g
  spark.executor.memory = 3g
  spark.yarn.executor.memoryOverhead = 0.1 * (spark.executor.memory + spark.memory.offHeap.size)

  2.Garbage collection Tuning
  spark.executor.extraJavaOptions = -XX:ParallelGCThreads=4 -XX:+UseParallelGC

Eliminate Disk I/O bottleneck
1.spark.shuffle.file.buffer=1Mb
  spark.unsafe.sorter.spill.reader.buffer.size=1Mb
2.spark.file.transferTo=false
  spark.shuffle.unsafe.file.output.buffer=5Mb
3.spark.io.comporession.lz4.blockSize=512KB

Cache index files on Shuffle Server
spark.shuffle.service.index.cache.entries=2048

Scaling External Shuffle Service
Tune shuffle service worker thread and backlog
spark.shuffle.io.serverThreads=128
spark.shuffle.io.backLog=8192

Configurable shuffle registration timeout and entry
spark.shuffle.registration.timeout = 2m
spark.shuffle.registration.maxAttempts = 5