hadoop - Install spark on yarn cluster -
i looking guide regarding how install spark on existing virtual yarn cluster.
i have yarn cluster consisting of 2 nodes, ran map-reduce job worked perfect. looked results in log , working fine.
now need add spark installation commands , configurations files in vagrantfile. can't find guide, give me link ?
i used guide yarn cluster
thanks in advance!
i don't know vagrant, have installed spark on top of hadoop 2.6 (in guide referred post-yarn) , hope helps.
installing spark on existing hadoop easy, need install only on one machine. have download 1 pre-built hadoop version it's official website (i guess can use without hadoop version need point direction of hadoop binaries in system). decompress it:
tar -xvf spark-2.0.0-bin-hadoop2.x.tgz -c /opt now need set environment variables. first in ~/.bashrc (or ~/.zshrc) can set spark_home , add path if want:
export spark_home=/opt/spark-2.0.0-bin-hadoop-2.x export path=$path:$spark_home/bin also changes take effect can run:
source ~/.bashrc second need point spark hadoop configuartion directories. set these 2 environmental variables in $spark_home/conf/spark-env.sh:
export hadoop_conf_dir=[your-hadoop-conf-dir $hadoop_prefix/etc/hadoop] export yarn_conf_dir=[your-yarn-conf-dir same last variable] if file doesn't exist, can copy contents of $spark_home/conf/spark-env.sh.template , start there.
now start shell in yarn mode can run:
spark-shell --master yarn --deploy-mode client (you can't run shell in cluster deploy-mode)
----------- update
i forgot mention can submit cluster jobs configuration (thanks @juliancienfuegos):
spark-submit --master yarn --deploy-mode cluster project-spark.py this way can't see output in terminal, , command exits job submitted (not completed).
you can use --deploy-mode client see output right there in terminal testing, since job gets canceled if command interrupted (e.g. press ctrl+c, or session ends)
Comments
Post a Comment