Stand-alone cluster configuration notes
Skip this if you have already configured Hadoop cluster.Create users on all nodes:
useradd hduser passwd hduser groupadd hadoop usermod -a -G hadoop hduserLogin as the new user:
sudo su - hduser
Spark nodes interact with ssh. Password-less ssh should be enabled on all nodes:ssh-keygen -t rsa -P '' cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keysCheck that ssh works locally without password:
ssh localhost
Copy public key from the master node to worker node:ssh-copy-id -i ~/.ssh/id_rsa.pub user@worker_node
Spark compilation notes
Spark needs Java (at least 1.7), Scala 2.10 (does not support 2.11) and Maven. Hadoop is optional.Install Java and Scala using package manager yum or apt-get. Or download rpm files from corresponding sites and run install:
sudo rpm -i java.rpm
sudo rpm -i scala.rpm
Download and unzip maven to /usr/local/maven. Configure proxy for Maven if needed in maven/conf/settings.xml. Another config file might be in ~/.m2/settings.xml.Add to your ~/.bashrc
export M2_HOME=/usr/local/maven export M2=$M2_HOME/bin export PATH=$M2:$PATH
If Java is < 1.8:
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
export http_proxy=http://my-web-proxy.net:8088 export https_proxy=http://my-web-proxy.net:8088
Clone from git, compile and change owner to hduser (the user who has password-less ssh between nodes enabled):
sudo git clone https://github.com/apache/spark.git /usr/local/spark
cd /usr/local/spark
mvn -Dhadoop.version=1.2.1 -Pnetlib-lgpl -DskipTests clean package
sudo chown -R hduser:hadoop /usr/local/spark
Spark installation notes
Assume that Spark compilation was done on the master node. One need to copy Spark to all other nodes in the cluster to /usr/local/spark and change its owner to hduser (as above).Also, add to hduser ~/.bashrc on all nodes:
export SPARK_HOME=/usr/local/spark export _JAVA_OPTIONS=-Djava.io.tmpdir=[Folder with a lot of space]
The latter option is needed for Java temporary folder when Spark writes data on shuffle. By default it is /tmp and it is usually small.
Also if there is Hadoop installation it is useful to force Spark read its configuration instead of using the default one (e.g. for replication factor etc.):
export PATH=$PATH:$HADOOP_HOME/conf
Also if there is Hadoop installation it is useful to force Spark read its configuration instead of using the default one (e.g. for replication factor etc.):
export PATH=$PATH:$HADOOP_HOME/conf
Spark configuration notes
Some theory:- Spark runs one Master and several Workers. It is not recommended to have both Master and Worker on the same node. It worth having only one Worker on one node that owns all RAM and CPU cores unless it has many CPUs or the task is better solved ON many Workers.
- When one submit a task, Spark creates Driver on Master node and Executors on Worker nodes.
It would be nice that one has to configure only Master node and all options will be transferred to Workers. However, it is not the case. Though, in there is some minimal configuration when you don't need to touch each Workers's config. It is one Worker per node.
spark/conf/spark-defaults:
spark.master spark://mymaster.com:7077
spark.driver.memory 16g
spark.driver.cores 4
spark.executor.memory 16g #no more than available, otherwise will fail
spark.local.dir /home/hduser/tmp #Shuffle directory, should be fast and big disk
spark.sql.shuffle.partitions 2000 #number of reducers for SQL, default is 200
List all the Worker nodes in spark/conf/slaves:my-node2.net
my-node3.net
Spark start
Start all nodes:$SPARK_HOME/sbin/start-all.sh
You should be able to see the Web-interface on my-node1.net:8088
Start Spark shell:
$SPARK_HOME/bin/spark-shell.sh --master spark://my-node1.net:7077
Stop all nodes:
$SPARK_HOME/sbin/stop-all.sh
Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Hadoop Admin Online Course Bangalore
ReplyDeleteGreat article! It's really a pleasure to visit your site. I've been following your blogs for a while and I'm really impressed by your works. Keep sharing more such blogs.
ReplyDeleteSpark Training in Chennai
Spark Training Academy Chennai
Mobile Testing Training in Chennai
Mobile Testing Course in Chennai
Unix Training in Chennai
Unix Shell Scripting Training in Chennai
Spark Training in Velachery
Spark Training in Porur
Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
ReplyDeleteCCNA Course in Chennai
CCNA Training in Chennai
Java Training in Chennai
Web Designing Course in chennai
PHP Training in Chennai
CCNA Training in Porur
CCNA Training in Adyar
This comment has been removed by the author.
ReplyDeletenice post on cluster configuration and spark training
ReplyDeletemmorpg oyunlar
ReplyDeleteInstagram takipçi satın al
Tiktok Jeton Hilesi
tiktok jeton hilesi
antalya saç ekim
ınstagram takipçi
instagram takipçi satın al
metin2 pvp serverlar
instagram takipçi satın al
smm panel
ReplyDeleteSMM PANEL
iş ilanları
instagram takipçi satın al
hirdavatciburada.com
www.beyazesyateknikservisi.com.tr
servis
TİKTOK PARA HİLESİ İNDİR