Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start -
i set spark.eventlog.enabled
, spark.eventlog.dir
@ spark-submit
or start-all
level -- not require enabled in scala/java/python code. have tried various things no success:
setting spark-defualts.conf
as
spark.eventlog.enabled true spark.eventlog.dir hdfs://namenode:8021/directory
or
spark.eventlog.enabled true spark.eventlog.dir file:///some/where
running spark-submit
as:
spark-submit --conf "spark.eventlog.enabled=true" --conf "spark.eventlog.dir=file:///tmp/test" --master spark://server:7077 examples/src/main/python/pi.py
starting spark environment variables:
spark_daemon_java_opts="-dspark.eventlog.enabled=true -dspark.history.fs.logdirectory=$sparkhistorydir -dspark.history.provider=org.apache.spark.deploy.history.fshistoryprovider -dspark.history.fs.cleaner.enabled=true -dspark.history.fs.cleaner.interval=2d"
and overkill:
spark_history_opts="-dspark.eventlog.enabled=true -dspark.history.fs.logdirectory=$sparkhistorydir -dspark.history.provider=org.apache.spark.deploy.history.fshistoryprovider -dspark.history.fs.cleaner.enabled=true -dspark.history.fs.cleaner.interval=2d"
where , how must these things set history on arbitrary jobs?
i solved problem, yet strangely had tried before... same, seems stable solution:
create directory in hdfs
logging, /eventlogging
hdfs dfs -mkdir /eventlogging
then spark-shell
or spark-submit
(or whatever) can run following options:
--conf spark.eventlog.enabled=true --conf spark.eventlog.dir=hdfs://<hdfsnamenodeaddress>:8020/eventlogging
such as:
spark-shell --conf spark.eventlog.enabled=true --conf spark.eventlog.dir=hdfs://<hdfsnamenodeaddress>:8020/eventlogging
Comments
Post a Comment