{"id":255,"date":"2015-09-07T16:03:40","date_gmt":"2015-09-07T16:03:40","guid":{"rendered":"http:\/\/onlinelab.info\/?p=255"},"modified":"2015-09-07T16:03:40","modified_gmt":"2015-09-07T16:03:40","slug":"install-apache-hadoop-on-ubuntu-14-10-centos-7-single-node-cluster","status":"publish","type":"post","link":"https:\/\/www.asianux.org.vn\/index.php\/2015\/09\/07\/install-apache-hadoop-on-ubuntu-14-10-centos-7-single-node-cluster\/","title":{"rendered":"Install Apache Hadoop on Ubuntu 14.10 \/ CentOS 7 (Single Node Cluster)"},"content":{"rendered":"<figure id=\"attachment_8452\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-8452 size-full\" title=\"Hadoop\" src=\"http:\/\/www.itzgeek.com\/wp-content\/uploads\/2015\/02\/Hadoop-Logo.jpg\" alt=\"Hadoop Logo\" width=\"150\" height=\"120\" \/><figcaption class=\"wp-caption-text\">Hadoop<\/figcaption><\/figure>\n<p>Apache Hadoop is a an open-source software framework written in Java for distributed storage and distributed process, it handles very large size of data sets by distributing it across computer clusters. Rather than rely on hardware high availability, hadoop modules are designed to detect and handle the failure at application layer, so gives you high-available serveice.<\/p>\n<p>Hodoop framework consists of following modules,<\/p>\n<ul>\n<li>\u00a0Hadoop Common \u2013 It contains common set of libraries and utilities that support\u00a0 other Hadoop modules<\/li>\n<li>\u00a0Hadoop Distributed File System (HDFS) \u2013 is a java based distributed file-system that stores data, providing very high-throughput to the application.<\/li>\n<li>\u00a0Hadoop YARN \u2013\u00a0 It manages resources on compute clusters and using them for scheduling user\u2019s applications.<\/li>\n<li>Hadoop MapReduce \u2013 is a framework for large-scale data processing.<\/li>\n<\/ul>\n<p>This guide will help you to get apache hadoop installed on <a title=\"Linux-Dash Web based Monitoring tool for Ubuntu 14.10 \/ 14.04\" href=\"http:\/\/www.itzgeek.com\/tag\/ubuntu-14.10\" target=\"_blank\" rel=\"noopener\">Ubuntu 14.10<\/a> \/ <a href=\"http:\/\/www.itzgeek.com\/tag\/centos-7\" target=\"_blank\" rel=\"noopener\">CentOS 7<\/a>.<\/p>\n<h2>Prerequisites:<\/h2>\n<p>Since hadoop is based on java, make sure you have java jdk installed on the system. Incase your machine don\u2019t have a java, follow the below steps. You may also skip this if you have it already installed.<\/p>\n<p>Download oracle java by using the following command, on assumption of 64 bit operating system.<\/p>\n<pre># wget --no-check-certificate --no-cookies --header \"Cookie: oraclelicense=accept-securebackup-cookie\" http:\/\/download.oracle.com\/otn-pub\/java\/jdk\/8u5-b13\/jdk-8u5-linux-x64.tar.gz<\/pre>\n<p>Extract the downloaded archive, move it to \/usr.<\/p>\n<pre># tar -zxvf jdk-8u5-linux-x64.tar.gz<\/pre>\n<pre># mv jdk1.8.0_05\/ \/usr\/<\/pre>\n<h2>Create Hadoop user:<\/h2>\n<p>It is recommended to create a normal user to configure apache hadoop, create a user using following command.<\/p>\n<pre># useradd -m -d \/home\/hadoop hadoop\n\n# passwd hadoop<\/pre>\n<p>Once you created a user, <a title=\"SSH Passwordless Login \u2013 CentOS 7 \/ RHEL 7\" href=\"http:\/\/www.itzgeek.com\/how-tos\/linux\/centos-how-tos\/ssh-passwordless-login-centos-7-rhel-7.html\" target=\"_blank\" rel=\"noopener\">configure a passwordless ssh to local system<\/a>. Create a ssh key using following command<\/p>\n<pre># su - hadoop\n\n$ ssh-keygen\n\n$ cat ~\/.ssh\/id_rsa.pub &gt;&gt; ~\/.ssh\/authorized_keys<\/pre>\n<p>verify the passwordless communication to local system, if you are doing ssh for the first tim, type \u201cyes\u201d to add RSA keys to known hosts.<\/p>\n<pre>$ ssh 127.0.0.1<\/pre>\n<h2>Download Hadoop:<\/h2>\n<p>You can visit <a href=\"http:\/\/www.apache.org\/dyn\/closer.cgi\/hadoop\/common\/\" target=\"_blank\" rel=\"noopener\">apache hadoop<\/a> page to download the latest hadoop package, or simple issue the following command in terminal to download Hadoop 2.6.0.<\/p>\n<pre>$ wget http:\/\/apache.bytenet.in\/hadoop\/common\/hadoop-2.6.0\/hadoop-2.6.0.tar.gz\n\n$ tar -zxvf hadoop-2.6.0.tar.gz\n\n$ mv hadoop-2.6.0 hadoop<\/pre>\n<h2>Install apache Hadoop:<\/h2>\n<p>Hadoop supports three modes of clusters<\/p>\n<ol>\n<li>\u00a0\u00a0\u00a0 Local (Standalone) Mode \u2013 It runs as single java process.<\/li>\n<li>\u00a0\u00a0\u00a0 Pseudo-Distributed Mode \u2013 Each hadoop daemon runs in a separate process.<\/li>\n<li>\u00a0\u00a0\u00a0 Fully Distributed Mode \u2013 Actual multinode cluster ranging from few nodes to extremely large cluster.<\/li>\n<\/ol>\n<h3>Setup environmental variables:<\/h3>\n<p>Here we will be configuring hadoop in Pseudo-Distributed mode, configure environmental variable in ~\/.bashrc file.<\/p>\n<pre>$ vi ~\/.bashrc\n\nexport JAVA_HOME=\/opt\/jdk1.8.0_05\/\nexport HADOOP_HOME=\/home\/hadoop\/hadoop\nexport HADOOP_INSTALL=$HADOOP_HOME\nexport HADOOP_MAPRED_HOME=$HADOOP_HOME\nexport HADOOP_COMMON_HOME=$HADOOP_HOME\nexport HADOOP_HDFS_HOME=$HADOOP_HOME\nexport HADOOP_YARN_HOME=$HADOOP_HOME\nexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME\/lib\/native\nexport PATH=$PATH:$HADOOP_HOME\/sbin:$HADOOP_HOME\/bin<\/pre>\n<p>Apply environmental variables to current running session.<\/p>\n<pre>$ source ~\/.bashrc<\/pre>\n<h3>Modify Configuartion files:<\/h3>\n<p>Edit $HADOOP_HOME\/etc\/hadoop\/hadoop-env.sh and set JAVA_HOME environment variable.<\/p>\n<pre>export JAVA_HOME=\/usr\/jdk1.8.0_05\/<\/pre>\n<p>Hadoop has many configuration files depend on the cluster modes, since we are to set up Pseudo-Distributed cluster, edit the following files.<\/p>\n<pre>$ cd $HADOOP_HOME\/etc\/hadoop<\/pre>\n<p>Edit <strong>core-site.xm<\/strong>l<\/p>\n<pre>&lt;configuration&gt;\n&lt;property&gt;\n&lt;name&gt;fs.defaultFS&lt;\/name&gt;\n&lt;value&gt;hdfs:\/\/localhost:9000&lt;\/value&gt;\n&lt;\/property&gt;\n&lt;\/configuration&gt;\n\n<\/pre>\n<p>Edit <strong>hdfs-site.xml<\/strong><\/p>\n<pre>&lt;configuration&gt;\n&lt;property&gt;\n&lt;name&gt;dfs.replication&lt;\/name&gt;\n&lt;value&gt;1&lt;\/value&gt;\n&lt;\/property&gt;\n\n&lt;property&gt;\n&lt;name&gt;dfs.name.dir&lt;\/name&gt;\n&lt;value&gt;file:\/\/\/home\/hadoop\/hadoopdata\/hdfs\/namenode&lt;\/value&gt;\n&lt;\/property&gt;\n\n&lt;property&gt;\n&lt;name&gt;dfs.data.dir&lt;\/name&gt;\n&lt;value&gt;file:\/\/\/home\/hadoop\/hadoopdata\/hdfs\/datanode&lt;\/value&gt;\n&lt;\/property&gt;\n&lt;\/configuration&gt;\n\n<\/pre>\n<p>Edit <strong>mapred-site.xml<\/strong><\/p>\n<pre>$ cp $HADOOP_HOME\/etc\/hadoop\/mapred-site.xml.template $HADOOP_HOME\/etc\/hadoop\/mapred-site.xml<\/pre>\n<pre>&lt;configuration&gt;\n&lt;property&gt;\n&lt;name&gt;mapreduce.framework.name&lt;\/name&gt;\n&lt;value&gt;yarn&lt;\/value&gt;\n&lt;\/property&gt;\n&lt;\/configuration&gt;<\/pre>\n<p class=\"heading3\">Edit <strong>yarn-site.xml<\/strong><\/p>\n<pre>&lt;configuration&gt;\n&lt;property&gt;\n&lt;name&gt;yarn.nodemanager.aux-services&lt;\/name&gt;\n&lt;value&gt;mapreduce_shuffle&lt;\/value&gt;\n&lt;\/property&gt;\n&lt;\/configuration&gt;<\/pre>\n<p>Now format namenode using following command, do not forget to check the storage directory.<\/p>\n<pre>$ hdfs namenode -format<\/pre>\n<p>Start NameNode daemon and DataNode daemon by using the scripts provided by hadoop, make sure you are in sbin directory of hadoop.<\/p>\n<pre>$ cd $HADOOP_HOME\/sbin\/<\/pre>\n<pre>$ start-dfs.sh<\/pre>\n<p>Browse the web interface for the NameNode; by default it is available at: <strong>http:\/\/your-ip-address:50070\/<\/strong><\/p>\n<figure id=\"attachment_8447\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-8447\" src=\"http:\/\/www.itzgeek.com\/wp-content\/uploads\/2015\/02\/Hadoop-NameNode-Information.jpg\" alt=\"Hadoop NameNode Information\" width=\"793\" height=\"683\" title=\"\"><figcaption class=\"wp-caption-text\">Hadoop NameNode Information<\/figcaption><\/figure>\n<p>Start ResourceManager daemon and NodeManager daemon:<\/p>\n<pre>$ start-yarn.sh<\/pre>\n<p>Browse the web interface for the ResourceManager; by default it is available at: <strong>http:\/\/your-ip-address:8088\/<\/strong><\/p>\n<figure id=\"attachment_8448\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-8448\" src=\"http:\/\/www.itzgeek.com\/wp-content\/uploads\/2015\/02\/Hadoop-YARN-Cluster-Information.jpg\" alt=\"Hadoop YARN - Cluster Information\" width=\"900\" height=\"680\" title=\"\"><figcaption class=\"wp-caption-text\">Hadoop YARN \u2013 Cluster Information<\/figcaption><\/figure>\n<h2>Testing Hadoop single node cluster:<\/h2>\n<p>Before carryiging out the upload, lets create a directory at HDFS in order to upload a files.<\/p>\n<pre>$ hdfs dfs -mkdir \/raj<\/pre>\n<p>Lets upload messages file into\u00a0 HDFS directory called \u201craj\u201d<\/p>\n<pre>$ hdfs dfs -put \/var\/log\/messages \/raj<\/pre>\n<p>Uploaded files can be viewed by visiting the following url. <strong>http:\/\/your-ip-address:50070\/explorer.html#\/raj<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8449\" src=\"http:\/\/www.itzgeek.com\/wp-content\/uploads\/2015\/02\/Hadoop-Directory-browsing.jpg\" alt=\"Hadoop Directory browsing\" width=\"905\" height=\"682\" title=\"\"><\/p>\n<p>Copy the files from HDFS to your local file systems.<\/p>\n<pre>$ hdfs dfs -get \/raj \/tmp\/<\/pre>\n<p>You can delete the files and directories using the following commands.<\/p>\n<pre>hdfs dfs -rm\u00a0 \/raj\/messages\nhdfs dfs -r -f \/raj<\/pre>\n<p>That\u2019s All!, you have successfully configured single node hadoop cluster.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hadoop Apache Hadoop is a an open-source software framework written in Java for distributed storage and distributed process, it handles very large size of data sets by distributing it across computer clusters. Rather than rely&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-255","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/posts\/255","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/comments?post=255"}],"version-history":[{"count":0,"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/posts\/255\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/media?parent=255"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/categories?post=255"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.asianux.org.vn\/index.php\/wp-json\/wp\/v2\/tags?post=255"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}