过去十年,Apache Hadoop从无到有,到如今她已支撑起若干全球最大的生产集群。目前的版本是Hadoop 3.0,是基于JDK1.8开发的(Hadoop 2.0是基于JDK 1.7开发的,JDK 1.7在2015年4月停止了更新),她引入了一些重要的功能和优化,包括HDFS可擦除编码、支持多个Namenode、MR Native Task优化、YARN基于cgroup的内存和磁盘IO隔离、YARN container resizing,CLASSPATH的隔离等。下图为Hadoop家族的组件,核心是HDFS和Mapreduce,其他组件围绕着核心在扩展和整合。
是Apache开源组织的一个分布式计算开源框架,提供了一个分布式文件系统子项目(HDFS)和支持MapReduce分布式计算的软件架构。
是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。
是一个基于Hadoop的大规模数据分析工具,它提供的SQL-LIKE语言叫Pig Latin,该语言的编译器会把类SQL的数据分析请求转换为一系列经过优化处理的MapReduce运算。
是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。
是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
是一个为分布式应用所设计的分布的、开源的协调服务,它主要是用来解决分布式应用中经常遇到的一些数据管理问题,简化分布式应用协调及其管理的难度,提供高性能的分布式服务
是基于Hadoop的机器学习和数据挖掘的一个分布式框架。Mahout用MapReduce实现了部分数据挖掘算法,解决了并行挖掘的问题。
是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存简单格式数据,集Google BigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身
是一个数据序列化系统,设计用于支持数据密集型,大批量数据交换的应用。Avro是新的数据序列化格式与传输工具,将逐步取代Hadoop原有的IPC机制
是一种基于Web的工具,支持Hadoop集群的供应、管理和监控。
是一个开源的用于监控大型分布式系统的数据收集系统,它可以将各种各样类型的数据收集成适合 Hadoop 处理的文件保存在 HDFS 中供 Hadoop 进行各种 MapReduce 操作。
是一个基于HDFS的BSP(Bulk Synchronous Parallel)并行计算框架, Hama可用于包括图、矩阵和网络算法在内的大规模、大数据计算。
是一个分布的、可靠的、高可用的海量日志聚合的系统,可用于日志数据收集,日志数据处理,日志数据传输。
是一个可伸缩的分布式迭代图处理系统, 基于Hadoop平台,灵感来自 BSP (bulk synchronous parallel) 和 Google 的 Pregel。
是一个工作流引擎服务器, 用于管理和协调运行在Hadoop平台上(HDFS、Pig和MapReduce)的任务。
是基于Google的FlumeJava库编写的Java库,用于创建MapReduce程序。与Hive,Pig类似,Crunch提供了用于实现如连接数据、执行聚合和排序记录等常见任务的模式库
是一套运行于云服务的类库(包括Hadoop),可提供高度的互补性。Whirr学支持Amazon EC2和Rackspace的服务。
是一个对Hadoop及其周边生态进行打包,分发和测试的工具。
是基于Hadoop的数据表和存储管理,实现中央的元数据和模式管理,跨越Hadoop和RDBMS,利用Pig和Hive提供关系视图。
是一个基于WEB的监控和管理系统,实现对HDFS,MapReduce/YARN, HBase, Hive, Pig的web化操作和管理。
10.0.10.80 hdpm.ohsdba.cn hdpm 10.0.10.81 hdps1.ohsdba.cn hdps1 10.0.10.82 hdps2.ohsdba.cn hdps2Hadoop 3.0需要配置的文件有core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、hadoop-env.sh、workers
配置hosts文件(每个节点都执行)
[root@hdpm ~]$ cat /etc/hosts 127.0.0.1 localhost4.localdomain4 localhost ::1 localhost.localdomain localhost 10.0.10.80 hdpm.ohsdba.cn hdpm 10.0.10.81 hdps1.ohsdba.cn hdps1 10.0.10.82 hdps2.ohsdba.cn hdps2 [root@hdpm ~]$
创建用户和组(每个节点都执行)
[root@hdpm ~]# groupadd hadoop [root@hdpm ~]# useradd -g hadoop hdp [root@hdpm ~]# passwd hdp Changing password for user hdp. New password: BAD PASSWORD: it is based on a dictionary word BAD PASSWORD: is too simple Retype new password: passwd: all authentication tokens updated successfully. [root@hdpm ~]#
建立三个节点之间的信任
[hdp@hdpm ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/hdp/.ssh/id_rsa): Created directory '/home/hdp/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hdp/.ssh/id_rsa. Your public key has been saved in /home/hdp/.ssh/id_rsa.pub. The key fingerprint is: 63:9c:ba:e4:e8:9c:e8:2d:46:96:85:cd:df:d3:89:2b hdp@hdpm.ohsdba.cn The key's randomart image is: +--[ RSA 2048]----+ | | | | | + | | . + . . | | o . .So . | | + .o+.o | | o o o | | o+ =E.. | | ooo* o. | +-----------------+ [hdp@hdpm ~]$ [root@hdps1 ~]# su - hdp [hdp@hdps1 ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/hdp/.ssh/id_rsa): Created directory '/home/hdp/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hdp/.ssh/id_rsa. Your public key has been saved in /home/hdp/.ssh/id_rsa.pub. The key fingerprint is: bf:20:43:8a:54:ee:60:bc:e9:02:35:87:bf:0c:05:b9 hdp@hdps1.ohsdba.cn The key's randomart image is: +--[ RSA 2048]----+ | . | | o | | +. | | .Eoo | | .==. . S | |.o.*.o . | |. +oo.o . . | |.. o o . . | | .. . | +-----------------+ [hdp@hdps1 ~]$ [hdp@hdps1 ~]$ [root@hdps2 ~]# su - hdp [hdp@hdps2 ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/hdp/.ssh/id_rsa): Created directory '/home/hdp/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hdp/.ssh/id_rsa. Your public key has been saved in /home/hdp/.ssh/id_rsa.pub. The key fingerprint is: af:77:90:6a:5f:50:8f:8c:a4:e0:aa:a9:f4:32:31:59 hdp@hdps2.ohsdba.cn The key's randomart image is: +--[ RSA 2048]----+ | | | | | . . . | | E . . o + o | | o . S o.o . | | + . .o. | | .o . .... | |.o.o o.... | |..=. ..o.. | +-----------------+ [hdp@hdps2 ~]$ [hdp@hdpm ~]$ cat .ssh/id_rsa.pub >>.ssh/authorized_keys [hdp@hdpm ~]$ scp hdp@hdps1:~/.ssh/id_rsa.pub .ssh/id_rsa.pub.hdps1 The authenticity of host 'hdps1 (10.0.10.81)' can't be established. RSA key fingerprint is 4f:68:99:eb:54:4b:61:fb:aa:f3:d9:fa:cd:09:f2:f4. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hdps1,10.0.10.81' (RSA) to the list of known hosts. hdp@hdps1's password: id_rsa.pub 100% 401 0.4KB/s 00:00 [hdp@hdpm ~]$ scp hdp@hdps2:~/.ssh/id_rsa.pub .ssh/id_rsa.pub.hdps2 The authenticity of host 'hdps2 (10.0.10.82)' can't be established. RSA key fingerprint is 4f:68:99:eb:54:4b:61:fb:aa:f3:d9:fa:cd:09:f2:f4. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hdps2,10.0.10.82' (RSA) to the list of known hosts. hdp@hdps2's password: id_rsa.pub 100% 401 0.4KB/s 00:00 [hdp@hdpm ~]$ [hdp@hdpm .ssh]$ cat id_rsa.pub.hdps1 id_rsa.pub.hdps2 >>authorized_keys [hdp@hdpm .ssh]$ scp authorized_keys hdps1:`pwd` hdp@hdps1's password: authorized_keys 100% 1202 1.2KB/s 00:00 [hdp@hdpm .ssh]$ scp authorized_keys hdps2:`pwd` hdp@hdps2's password: authorized_keys 100% 1202 1.2KB/s 00:00 [hdp@hdpm .ssh]$ [hdp@hdpm .ssh]$ chmod 600 authorized_keys [hdp@hdpm .ssh]$ ssh hdps1.ohsdba.cn [hdp@hdps1 ~]$ chmod 600 .ssh/authorized_keys [hdp@hdps1 ~]$ exitConnection to hdps1 closed. [hdp@hdpm .ssh]$ ssh hdps2.ohsdba.cn [hdp@hdps2 ~]$ chmod 600 .ssh/authorized_keys [hdp@hdps2 ~]$
移除java1.7和1.6如果已经安装(每个节点都执行)
[root@hdps2 ~]# rpm -qa|grep jdk java-1.7.0-openjdk-1.7.0.99-2.6.5.1.0.1.el6.x86_64 java-1.6.0-openjdk-1.6.0.38-1.13.10.4.el6.x86_64 [root@hdps2 ~]# rpm -e java-1.7.0-openjdk [root@hdps2 ~]# rpm -e java-1.6.0-openjdk [root@hdps2 ~]# [root@hdpm ~]# rpm -ivh /home/hdp/jdk-8u112-linux-x64.rpm Preparing... ########################################### [100%] 1:jdk1.8.0_112 ########################################### [100%] Unpacking JAR files... tools.jar... plugin.jar... javaws.jar... deploy.jar... rt.jar... jsse.jar... charsets.jar... localedata.jar... [root@hdpm ~]#
建立目录(为安装Hadoop做准备)
[hdp@hdpm ~]$ mkdir -p /pohs/tmp /pohs/hdfs/data /pohs/hdfs/name [hdp@hdpm ~]$ [hdp@hdps1 ~]$ mkdir -p /pohs/tmp /pohs/hdfs/data /pohs/hdfs/name [hdp@hdps1 ~]$ [hdp@hdps2 ~]$ mkdir -p /pohs/tmp /pohs/hdfs/data /pohs/hdfs/name [hdp@hdps2 ~]$
下载Hadoop安装文件(hdpm节点)
[root@hdpm ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha1/hadoop-3.0.0-alpha1.tar.gz --2016-10-25 18:09:29-- http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha1/hadoop-3.0.0-alpha1.tar.gz Resolving mirrors.tuna.tsinghua.edu.cn... 166.111.206.63, 2402:f000:1:4 16:166:111:206:63 Connecting to mirrors.tuna.tsinghua.edu.cn|166.111.206.63|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 331219821 (316M) [application/octet-stream] Saving to: “hadoop-3.0.0-alpha1.tar.gz” 100%[=============================>] 331,219,821 2.45M/s in 2m 4s 2016-10-25 18:11:33 (2.55 MB/s) - “hadoop-3.0.0-alpha1.tar.gz.1” saved [331219821/331219821] [root@hdpm ~]# ls -l hadoop-3.0.0-alpha1.tar.gz -rw-r--r--. 1 root root 331219821 Sep 7 00:48 hadoop-3.0.0-alpha1.tar.gz [root@hdpm ~]# su - hdp [hdp@hdpm pohs]$ tar zxvf hadoop-3.0.0-alpha1.tar.gz [hdp@hdpm pohs]$ mv hadoop-3.0.0-alpha1 hadoop3 [hdp@hdpm pohs]$ pwd /pohs [hdp@hdpm pohs]$ [hdp@hdpm pohs]$ ls -ltr total 323480 drwxr-xr-x. 9 hdp hadoop 4096 Aug 30 15:18 hadoop3 drwx------. 2 hdp hadoop 16384 Oct 28 13:56 lost+found -rwxr-xr-x. 1 hdp hadoop 331219821 Oct 29 12:47 hadoop-3.0.0-alpha1.tar.gz [hdp@hdpm pohs]$
设置环境变量(全部节点)
修改.bash_profile,增加以下内容
export JAVA_HOME=/usr/java/jdk1.8.0_112 export HADOOP_HOME=/pohs/hadoop3 export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export CLASSPATH=:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib export CLASSPATH=:$CLASSPATH:$HADOOP_HOME/lib export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
修改配置文件(在Master上)
配置文件在$HADOOP_HOME/etc/hadoop/目录下,在主节点上修改完成后,可复制HADOOP_HOME到其他节点
hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_112core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hdpm.ohsdba.cn:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/pohs/tmp</value> <description>Abase for other temporary directories.</description> </property> </configuration>hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>hdpm.ohsdba.cn:9001</value> </property> <property> <name>dfs.blocksize</name> <value>268435456</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>100</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/pohs/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/pohs/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.admin.user.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hdpm.ohsdba.cn:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hdpm.ohsdba.cn:19888</value> </property> </configuration>yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hdpm.ohsdba.cn:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hdpm.ohsdba.cn:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hdpm.ohsdba.cn:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hdpm.ohsdba.cn:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hdpm.ohsdba.cn:8088</value> </property> </configuration>workers
hdps1.ohsdba.cn hdps2.ohsdba.cn
注意:workers是3.0中文件的名字,在2.x.x中文件名为slaves
复制hadoop文件到其他节点
[hdp@hdpm ~]$ cd /pohs [hdp@hdpm ~]$ scp -rp hadoop3 hdps1:/pohs [hdp@hdpm ~]$ scp -rp hadoop3 hdps2:/pohs
格式化namenode(主节点上执行,启动之前必须先完成此步骤)
[hdp@hdpm ~]$ hdfs namenode -format WARNING: /pohs/hadoop3/logs does not exist. Creating. 2016-11-02 11:56:34,520 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: user = hdp STARTUP_MSG: host = hdpm.ohsdba.cn/10.0.10.80 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 3.0.0-alpha1 STARTUP_MSG: classpath = STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r a990d2ebcd6de5d7dc2d3684930759b0f0ea4dc3; compiled by 'andrew' on 2016-08-30T07:02Z STARTUP_MSG: java = 1.8.0_112 ************************************************************/ 2016-11-02 11:56:34,579 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 2016-11-02 11:56:34,596 INFO namenode.NameNode: createNameNode [-format] 2016-11-02 11:56:35,730 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Formatting using clusterid: CID-f76c6f99-04f2-4b4d-bffc-e6e0006b36b6 2016-11-02 11:56:36,531 INFO namenode.FSEditLog: Edit logging is async:false 2016-11-02 11:56:36,538 INFO namenode.FSNamesystem: KeyProvider: null 2016-11-02 11:56:36,538 INFO namenode.FSNamesystem: fsLock is fair:true 2016-11-02 11:56:36,599 INFO namenode.FSNamesystem: fsOwner = hdp (auth:SIMPLE) 2016-11-02 11:56:36,605 INFO namenode.FSNamesystem: supergroup = supergroup 2016-11-02 11:56:36,605 INFO namenode.FSNamesystem: isPermissionEnabled = true 2016-11-02 11:56:36,606 INFO namenode.FSNamesystem: HA Enabled: false 2016-11-02 11:56:36,824 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 2016-11-02 11:56:36,825 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 2016-11-02 11:56:36,834 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 2016-11-02 11:56:36,852 INFO blockmanagement.BlockManager: The block deletion will start around 2016 Nov 02 11:56:36 2016-11-02 11:56:36,854 INFO util.GSet: Computing capacity for map BlocksMap 2016-11-02 11:56:36,855 INFO util.GSet: VM type = 64-bit 2016-11-02 11:56:36,861 INFO util.GSet: 2.0% max memory 421.5 MB = 8.4 MB 2016-11-02 11:56:36,861 INFO util.GSet: capacity = 2^20 = 1048576 entries 2016-11-02 11:56:36,963 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 2016-11-02 11:56:36,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 2016-11-02 11:56:36,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0 2016-11-02 11:56:36,968 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000 2016-11-02 11:56:36,968 INFO blockmanagement.BlockManager: defaultReplication = 3 2016-11-02 11:56:36,969 INFO blockmanagement.BlockManager: maxReplication = 512 2016-11-02 11:56:36,969 INFO blockmanagement.BlockManager: minReplication = 1 2016-11-02 11:56:36,969 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 2016-11-02 11:56:36,970 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 2016-11-02 11:56:36,970 INFO blockmanagement.BlockManager: encryptDataTransfer = false 2016-11-02 11:56:36,970 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 2016-11-02 11:56:37,676 INFO util.GSet: Computing capacity for map INodeMap 2016-11-02 11:56:37,677 INFO util.GSet: VM type = 64-bit 2016-11-02 11:56:37,677 INFO util.GSet: 1.0% max memory 421.5 MB = 4.2 MB 2016-11-02 11:56:37,677 INFO util.GSet: capacity = 2^19 = 524288 entries 2016-11-02 11:56:37,678 INFO namenode.FSDirectory: ACLs enabled? false 2016-11-02 11:56:37,678 INFO namenode.FSDirectory: XAttrs enabled? true 2016-11-02 11:56:37,680 INFO namenode.NameNode: Caching file names occuring more than 10 times 2016-11-02 11:56:37,696 INFO util.GSet: Computing capacity for map cachedBlocks 2016-11-02 11:56:37,697 INFO util.GSet: VM type = 64-bit 2016-11-02 11:56:37,697 INFO util.GSet: 0.25% max memory 421.5 MB = 1.1 MB 2016-11-02 11:56:37,697 INFO util.GSet: capacity = 2^17 = 131072 entries 2016-11-02 11:56:37,709 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10 2016-11-02 11:56:37,710 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10 2016-11-02 11:56:37,710 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25 2016-11-02 11:56:37,714 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 2016-11-02 11:56:37,714 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 2016-11-02 11:56:37,718 INFO util.GSet: Computing capacity for map NameNodeRetryCache 2016-11-02 11:56:37,718 INFO util.GSet: VM type = 64-bit 2016-11-02 11:56:37,719 INFO util.GSet: 0.029999999329447746% max memory 421.5 MB = 129.5 KB 2016-11-02 11:56:37,719 INFO util.GSet: capacity = 2^14 = 16384 entries 2016-11-02 11:56:37,831 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1576596739-10.0.10.80-1478058997799 2016-11-02 11:56:37,896 INFO common.Storage: Storage directory /pohs/hdfs/name has been successfully formatted. 2016-11-02 11:56:37,977 INFO namenode.FSImageFormatProtobuf: Saving image file /pohs/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 2016-11-02 11:56:38,155 INFO namenode.FSImageFormatProtobuf: Image file /pohs/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 331 bytes saved in 0 seconds. 2016-11-02 11:56:38,223 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 2016-11-02 11:56:38,232 INFO util.ExitUtil: Exiting with status 0 2016-11-02 11:56:38,244 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hdpm.ohsdba.cn/10.0.10.80 ************************************************************/
启动Hadoop(主节点执行即可,之前配置了ssh信任)
[hdp@hdpm name]$ start-dfs.sh Starting namenodes on [hdpm.ohsdba.cn] hdpm.ohsdba.cn: Warning: Permanently added 'hdpm.ohsdba.cn,10.0.10.80' (RSA) to the list of known hosts. Starting datanodes hdps2.ohsdba.cn: WARNING: /pohs/hadoop3/logs does not exist. Creating. hdps1.ohsdba.cn: WARNING: /pohs/hadoop3/logs does not exist. Creating. Starting secondary namenodes [hdpm.ohsdba.cn] 2016-11-02 11:59:46,439 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hdp@hdpm name]$ start-yarn.sh Starting resourcemanager Starting nodemanagers [hdp@hdpm ~]$ mapred --daemon start historyserver [hdp@hdpm ~]$注意:start-all.sh在3.0中已被遗弃,可通过start-dfs.sh和start-yarn.sh启动。mr-jobhistory-daemon.sh start historyserver脚本被mapred --daemon start historyserver取代。
[hdp@hdpm name]$ start-all.sh
This script is deprecated. Use start-dfs.sh and start-yarn.sh instead.
查看进程
[hdp@hdpm name]$ jps 5104 Jps 4498 SecondaryNameNode 4341 NameNode 4826 ResourceManager [hdp@hdpm name]$ [hdp@hdps1 ~]$ jps 15236 NodeManager 15942 Jps 15112 DataNode [hdp@hdps1 ~]$ [hdp@hdps2 ~]$ jps 15393 Jps 14694 NodeManager 14570 DataNode [hdp@hdps2 ~]$ [hdp@hdpm ~]$ hdfs dfsadmin -report 2016-11-02 15:29:04,259 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 96546439168 (89.92 GB) Present Capacity: 88428929024 (82.36 GB) DFS Remaining: 88428871680 (82.36 GB) DFS Used: 57344 (56 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (2): Name: 10.0.10.81:9866 (hdps1.ohsdba.cn) Hostname: hdps1.ohsdba.cn Decommission Status : Normal Configured Capacity: 48273219584 (44.96 GB) DFS Used: 28672 (28 KB) Non DFS Used: 4058755072 (3.78 GB) DFS Remaining: 44214435840 (41.18 GB) DFS Used%: 0.00% DFS Remaining%: 91.59% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Wed Nov 02 15:29:05 CST 2016 Name: 10.0.10.82:9866 (hdps2.ohsdba.cn) Hostname: hdps2.ohsdba.cn Decommission Status : Normal Configured Capacity: 48273219584 (44.96 GB) DFS Used: 28672 (28 KB) Non DFS Used: 4058755072 (3.78 GB) DFS Remaining: 44214435840 (41.18 GB) DFS Used%: 0.00% DFS Remaining%: 91.59% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Wed Nov 02 15:29:04 CST 2016 [hdp@hdpm ~]$
通过web查看
Daemon | Web Interface | Notes |
---|---|---|
NameNode | http://10.0.10.80:9870 | Default HTTP port is 9870. |
ResourceManager | http://10.0.10.80:8088 | Default HTTP port is 8088. |
MapReduce JobHistory Server |
http://10.0.10.80:19888 | Default HTTP port is 19888. |
如果以上网页能正常打开,可以说明Hadoop集群安装成功。
关闭Hadoop
$HADOOP_HOME/sbin/stop-dfs.sh
$HADOOP_HOME/sbin/stop-yarn.sh
$HADOOP_HOME/bin/mapred --daemon stop historyserver
整合HBase,Hive等
这两个组件还是比较常用的。经测试,Hive2.1.0还不支持Hadoop 3.0.0,有兴趣的可以整合Hadoop 2.x版本。
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
https://hadoopecosystemtable.github.io/
http://blog.fens.me/series-hadoop-family/