Spark 1.4 image for Google Cloud? -
with bdutil, latest version of tarball can find on spark 1.3.1:
gs://spark-dist/spark-1.3.1-bin-hadoop2.6.tgz
there few new dataframe features in spark 1.4 want use. chance spark 1.4 image available bdutil, or workaround?
update:
following suggestion angus davis, downloaded , pointed spark-1.4.1-bin-hadoop2.6.tgz, deployment went well; however, run error when calling sqlcontext.parquetfile(). cannot explain why exception possible, googlehadoopfilesystem should subclass of org.apache.hadoop.fs.filesystem. continue investigate on this.
caused by: java.lang.classcastexception: com.google.cloud.hadoop.fs.gcs.googlehadoopfilesystem cannot cast org.apache.hadoop.fs.filesystem @ org.apache.hadoop.fs.filesystem.createfilesystem(filesystem.java:2595) @ org.apache.hadoop.fs.filesystem.access$200(filesystem.java:91) @ org.apache.hadoop.fs.filesystem$cache.getinternal(filesystem.java:2630) @ org.apache.hadoop.fs.filesystem$cache.get(filesystem.java:2612) @ org.apache.hadoop.fs.filesystem.get(filesystem.java:370) @ org.apache.hadoop.fs.filesystem.get(filesystem.java:169) @ org.apache.hadoop.fs.filesystem.get(filesystem.java:354) @ org.apache.hadoop.fs.path.getfilesystem(path.java:296) @ org.apache.hadoop.hive.metastore.warehouse.getfs(warehouse.java:112) @ org.apache.hadoop.hive.metastore.warehouse.getdnspath(warehouse.java:144) @ org.apache.hadoop.hive.metastore.warehouse.getwhroot(warehouse.java:159) @ org.apache.hadoop.hive.metastore.warehouse.getdefaultdatabasepath(warehouse.java:177) @ org.apache.hadoop.hive.metastore.hivemetastore$hmshandler.createdefaultdb_core(hivemetastore.java:504) @ org.apache.hadoop.hive.metastore.hivemetastore$hmshandler.createdefaultdb(hivemetastore.java:523) @ org.apache.hadoop.hive.metastore.hivemetastore$hmshandler.init(hivemetastore.java:397) @ org.apache.hadoop.hive.metastore.hivemetastore$hmshandler.<init>(hivemetastore.java:356) @ org.apache.hadoop.hive.metastore.retryinghmshandler.<init>(retryinghmshandler.java:54) @ org.apache.hadoop.hive.metastore.retryinghmshandler.getproxy(retryinghmshandler.java:59) @ org.apache.hadoop.hive.metastore.hivemetastore.newhmshandler(hivemetastore.java:4944) @ org.apache.hadoop.hive.metastore.hivemetastoreclient.<init>(hivemetastoreclient.java:171)
asked separate question exception here
update:
the error turned out spark defect; resolution/workaround provided in above question.
thanks!
haiying
if local workaround acceptable, can copy spark-1.4.1-bin-hadoop2.6.tgz apache mirror bucket control. can edit extensions/spark/spark-env.sh , change spark_hadoop2_tarball_uri='<your copy of spark 1.4.1>' (make service account running vms has permission read tarball).
note haven't done any testing see if spark 1.4.1 works out of box right now, i'd interested in hearing experience if decide give go.
Comments
Post a Comment