Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start -


i set spark.eventlog.enabled , spark.eventlog.dir @ spark-submit or start-all level -- not require enabled in scala/java/python code. have tried various things no success:

setting spark-defualts.conf as

spark.eventlog.enabled           true spark.eventlog.dir               hdfs://namenode:8021/directory 

or

spark.eventlog.enabled           true spark.eventlog.dir               file:///some/where 

running spark-submit as:

spark-submit --conf "spark.eventlog.enabled=true" --conf "spark.eventlog.dir=file:///tmp/test" --master spark://server:7077 examples/src/main/python/pi.py 

starting spark environment variables:

spark_daemon_java_opts="-dspark.eventlog.enabled=true -dspark.history.fs.logdirectory=$sparkhistorydir -dspark.history.provider=org.apache.spark.deploy.history.fshistoryprovider -dspark.history.fs.cleaner.enabled=true -dspark.history.fs.cleaner.interval=2d" 

and overkill:

spark_history_opts="-dspark.eventlog.enabled=true -dspark.history.fs.logdirectory=$sparkhistorydir -dspark.history.provider=org.apache.spark.deploy.history.fshistoryprovider -dspark.history.fs.cleaner.enabled=true -dspark.history.fs.cleaner.interval=2d" 

where , how must these things set history on arbitrary jobs?

i solved problem, yet strangely had tried before... same, seems stable solution:

create directory in hdfs logging, /eventlogging

hdfs dfs -mkdir /eventlogging 

then spark-shell or spark-submit (or whatever) can run following options:

--conf spark.eventlog.enabled=true --conf spark.eventlog.dir=hdfs://<hdfsnamenodeaddress>:8020/eventlogging 

such as:

spark-shell --conf spark.eventlog.enabled=true --conf spark.eventlog.dir=hdfs://<hdfsnamenodeaddress>:8020/eventlogging 

Comments

Popular posts from this blog

python - argument must be rect style object - Pygame -

webrtc - Which ICE candidate am I using and why? -

c# - Better 64-bit byte array hash -