hadoop - Install spark on yarn cluster -
i looking guide regarding how install spark on existing virtual yarn cluster.
i have yarn cluster consisting of 2 nodes, ran map-reduce job worked perfect. looked results in log , working fine.
now need add spark installation commands , configurations files in vagrantfile. can't find guide, give me link ?
i used guide yarn cluster
thanks in advance!
i don't know vagrant, have installed spark on top of hadoop 2.6 (in guide referred post-yarn) , hope helps.
installing spark on existing hadoop easy, need install only on one machine. have download 1 pre-built hadoop version it's official website (i guess can use without hadoop
version need point direction of hadoop binaries in system). decompress it:
tar -xvf spark-2.0.0-bin-hadoop2.x.tgz -c /opt
now need set environment variables. first in ~/.bashrc
(or ~/.zshrc
) can set spark_home
, add path
if want:
export spark_home=/opt/spark-2.0.0-bin-hadoop-2.x export path=$path:$spark_home/bin
also changes take effect can run:
source ~/.bashrc
second need point spark hadoop configuartion directories. set these 2 environmental variables in $spark_home/conf/spark-env.sh
:
export hadoop_conf_dir=[your-hadoop-conf-dir $hadoop_prefix/etc/hadoop] export yarn_conf_dir=[your-yarn-conf-dir same last variable]
if file doesn't exist, can copy contents of $spark_home/conf/spark-env.sh.template
, start there.
now start shell in yarn mode can run:
spark-shell --master yarn --deploy-mode client
(you can't run shell in cluster
deploy-mode)
----------- update
i forgot mention can submit cluster jobs configuration (thanks @juliancienfuegos):
spark-submit --master yarn --deploy-mode cluster project-spark.py
this way can't see output in terminal, , command exits job submitted (not completed).
you can use --deploy-mode client
see output right there in terminal testing, since job gets canceled if command interrupted (e.g. press ctrl+c
, or session ends)
Comments
Post a Comment