Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start -
i set spark.eventlog.enabled , spark.eventlog.dir @ spark-submit or start-all level -- not require enabled in scala/java/python code. have tried various things no success:
setting spark-defualts.conf as
spark.eventlog.enabled true spark.eventlog.dir hdfs://namenode:8021/directory or
spark.eventlog.enabled true spark.eventlog.dir file:///some/where running spark-submit as:
spark-submit --conf "spark.eventlog.enabled=true" --conf "spark.eventlog.dir=file:///tmp/test" --master spark://server:7077 examples/src/main/python/pi.py starting spark environment variables:
spark_daemon_java_opts="-dspark.eventlog.enabled=true -dspark.history.fs.logdirectory=$sparkhistorydir -dspark.history.provider=org.apache.spark.deploy.history.fshistoryprovider -dspark.history.fs.cleaner.enabled=true -dspark.history.fs.cleaner.interval=2d" and overkill:
spark_history_opts="-dspark.eventlog.enabled=true -dspark.history.fs.logdirectory=$sparkhistorydir -dspark.history.provider=org.apache.spark.deploy.history.fshistoryprovider -dspark.history.fs.cleaner.enabled=true -dspark.history.fs.cleaner.interval=2d" where , how must these things set history on arbitrary jobs?
i solved problem, yet strangely had tried before... same, seems stable solution:
create directory in hdfs logging, /eventlogging
hdfs dfs -mkdir /eventlogging then spark-shell or spark-submit (or whatever) can run following options:
--conf spark.eventlog.enabled=true --conf spark.eventlog.dir=hdfs://<hdfsnamenodeaddress>:8020/eventlogging such as:
spark-shell --conf spark.eventlog.enabled=true --conf spark.eventlog.dir=hdfs://<hdfsnamenodeaddress>:8020/eventlogging
Comments
Post a Comment