环境:

  • 物理机: i7 4核8线程,16G 内存的组装兼容机
  • VirtualBox Host: Ubuntu 14.04
  • VirtualBox: 4.3.20
  • VirtualBox Guest: CentOS 6.6 x86_64

本文的目的只是为了测试和理解Hadoop在集群机器中的运行过程。具体针对物理机的Hadoop 集群的一些操作不能以本位作为参考。

首先安装一台VM, 设置好之后,用linked clone 3台,这3台只需改一下IP, hostname 和一些必要的配置,这样操作的话可以减少很多时间和节约Host机的硬盘空间。

总共4台虚拟机,第一台 6G内存,其他2G, 因为我的Host机器只有16G 内存,这样的 分配应该不至于内存溢出。第一台机器因为承担比较重的任务,所以分配的内存比其他 三台都要高。

创建第一台机器,也就是克隆的原型

如果之前没有没有安装过CentOS, 国内同学可以去理工大服务器下载。

http://mirror.bit.edu.cn/centos/6.6/isos/x86_64/

VirtualBox的设定如下

  • 网络使用桥接, Bridge Network
  • 创建80G的VDI, 动态大小
  • 2G内存, 双核
  • 将CD Rom指向 CentOS的ISO

有过安装经验的话,可以用 expert text 方式安装, 只安装最少的包。 开机时候在第一个选项按Tab键,在最后输入 linux text 即可。

安装后进行一些配置

vi /etc/resolv.conf

search rockyfeng.me
nameserver 10.0.0.1

vi /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=centostpl.rockyfeng.me
GATEWAY=10.0.0.1

vi /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
ONBOOT=yes
PROTO=static
IPADDR=10.0.0.200
NETMASK=255.255.255.0

vi /etc/selinux/config

SELINUX=disabled

vi /etc/yum/pluginconf.d/fastestmirror.conf

enabled=0

启动网卡

chkconfig iptables off
/etc/init.d/network restart
ifconfig

更新到最新的系统

yum update
reboot

重启机器会出现一下错误,是因为禁用selinux 引起的

Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-279.5.1.el6.x86_64 #1
Call Trace:
[<ffffffff814fd24a>] ? panic+0xa0/0x168
[<ffffffff81070bd2>] ? do_exit+0x862/0x870
[<ffffffff8117cba5>] ? fput+0x25/0x30
[<ffffffff81070c38>] ? do_group_exit+0x58/0xd0
[<ffffffff81070cc7>] ? sys_exit_group+0x17/0x20
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b

参考 http://superuser.com/questions/730387/kernel-panic-not-syncing-attempted-to-kill-init-pid-1-comm-init-not-tainted-c

得修改grub选项先进入系统,再修改grub配置文件 /boot/grub/grub.conf

selinux=0 enforcing=0

编辑一下hosts, 使得克隆出来的机器可以直接知道对方 vi /etc/hosts

10.0.0.201 hadoop1.rockyfeng.me hadoop1
10.0.0.202 hadoop2.rockyfeng.me hadoop2
10.0.0.203 hadoop3.rockyfeng.me hadoop3
10.0.0.204 hadoop4.rockyfeng.me hadoop4

设置SSH, 使得他们可以互相访问对方

yum -y install openssh-clients
ssh-keygen
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

vi /etc/ssh/ssh_config

StrictHostKeyChecking no

然后就可以关闭虚拟机,准备克隆了

poweroff

克隆时候,最好选择上把网卡的mac地址重新生成的选项,特别命名为hadoop[1-4], 其中第一台机器的内存改为6144m, 其他均是2048

然后启动每台机器,均做以下操作

vi /etc/sysconfig/network

HOSTNAME=hadoop[n].rockyfeg.me

vi /etc/sysconfig/network-scripts/ifcfg-eth0

IPADDR=10.0.1.20[n]

删除以下文件,因为他记录了eth0网卡的mac地址

rm /etc/udev/rules.d/70-persistant-net.rules

重启机器 reboot

再hadoop1 中安装Cloudera Manager

curl -O http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin
chmod +x cloudera-manager-installer.bin
./cloudera-manager-installer.bin

安装过程很慢很慢,如果是在忍受不了,就手动安装把

先再Oracle 官网下载最新的JDK

http://download.oracle.com/otn-pub/java/jdk/7u71-b14/jdk-7u71-linux-x64.rpm

rpm -Uvh jdk-7u71-linux-x64.rpm

如果/usr/bin/java没有被设置上,就运行以下命令

alternatives --install /usr/bin/java java /usr/java/default/bin/java 3

安装Postgresql

yum install postgresql postgresql-server

拷贝Cloudera Manager的仓库

yum install yum-utils createrepo

编辑repo的配置, vi /etc/yum.repos.d/cloudera-manager.repo

[cloudera-manager]
name=Cloudera Manager
baseurl=http://archive.cloudera.com/cm4/redhat/6/x86_64/cm/4/
gpgkey=http://archive.cloudera.com/cm4/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera
gpgcheck=1

开始下载Cloudera manager 的repository, 东西也不少,去运动运动。

mkdir -p /usr/local/repos
cd /usr/local/repos
reposync -r cloudera-manager
createrepo /usr/local/repos/cloudera-manager

修改repo路径,vi /etc/yum.repos.d/cloudera-manager.repo

baseurl=file:///usr/local/repos/cloudera-manager/

导入GPG key

rpm --import http://archive.cloudera.com/cm4/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera

安装 Cloudera Manager 组件

yum install cloudera-manager-daemons
yum install cloudera-manager-server
yum install cloudera-manager-server-db

初始化Cloudera Manager的数据库

service cloudera-scm-server-db initdb

同理将 CDH的仓库克隆到本机, vi /etc/yum.repos.d/cloudera-cdh4.repo

[cloudera-cdh4]
name=Cloudera Distribution for Hadoop, Version 4
baseurl=http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4/
gpgkey=http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
gpgcheck=1

将其下载下来

cd /usr/local/repos
reposync -r cloudera-cdh4
createrepo /usr/local/repos/cloudera-cdh4

修改repo路径, vi /etc/yum.repos.d/cloudera-cdh4.repo

baseurl=file:///usr/local/repos/cloudera-cdh4/

导入GPG key

rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

安装CDH需要的包

yum install hadoop-0.20-mapreduce-jobtracker
yum install hadoop-hdfs-namenode
yum install hadoop-hdfs-secondarynamenode
yum install hadoop-0.20-mapreduce-tasktracker
yum install hadoop-hdfs-datanode
yum install hadoop-client
yum install zookeeper-server
yum install hbase

启动 管理机 服务

service cloudera-scm-server-db start
service cloudera-scm-server start

使用Nginx添加本地Repo支持 vi /etc/yum.repos.d/nginx.repo

[nginx]
name=nginx repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=0
enabled=1

安装Nginx

yum update
yum install nginx
service nginx start

vi /etc/nginx/conf.d/default.conf

location /cdh4 {
    autoindex on;
    alias   /usr/local/repos/cloudera-cdh4;
}

location /cm4 {
    autoindex on;
    alias   /usr/local/repos/cloudera-manager;
}

重新启动 service nginx restart

登录到10.0.0.202, 也就是hadoop2的机器, 同理安装hadoop3, hadoop4

vi /etc/yum.repos.d/cloudera-manager.repo

[cloudera-manager]
name = Cloudera Manager, Version 4.8.5
baseurl=http://hadoop1.rockyfeng.me/cm4
gpgkey = http://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera
gpgcheck = 1

vi /etc/yum.repos.d/cloudera-cdh4.repo

[cloudera-cdh4]
name=Cloudera Distribution for Hadoop, Version 4
baseurl=http://hadoop1.rockyfeng.me/cdh4
gpgkey=http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
gpgcheck=1

导入key

rpm --import  http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

安装服务

yum update
yum install hadoop-0.20-mapreduce-jobtracker
yum install hadoop-hdfs-namenode
yum install hadoop-hdfs-secondarynamenode
yum install hadoop-0.20-mapreduce-tasktracker
yum install hadoop-hdfs-datanode
yum install hadoop-client
yum install zookeeper-server
yum install hbase

网页登录http://10.0.0.201:7180/

选择免费的版本,然后再填入安装的机器

hadoop2.rockyfeng.me hadoop3.rockyfeng.me hadoop4.rockyfeng.me

输入root密码开始安装, 这又是一个漫长的过程。主要是中间还需要下载很多额外的包。

  • http://blog.cloudera.com/blog/2014/01/how-to-create-a-simple-hadoop-cluster-with-virtualbox/
  • http://darutk-oboegaki.blogspot.com/2012/12/install-cloudera-manager-and-cdh4.html