本文共 3757 字,大约阅读时间需要 12 分钟。
HAWQ是一种基于HDFS的MPP(Massively Parallel Processing) SQL引擎,支持标准SQL/事务处理,性能比原生Hive快几百倍。
本文介绍在E-MapReduce集群上面如何搭建HAWQ。
HAWQ有多种部署模式
非HA
HA
本文以HA-yarn模式为例,其它部署模式配置方面相对简单点,可以参考文档。
在E-MapReduce产品页创建集群,本例使用HA集群。
5台机器:
master:emr-header-1 standby:emr-header-2 slaves: emr-worker-1 emr-worker-2 emr-worker-3
在集群的所有机器上面操作:
> sudo su hadoop > sudo useradd -G hadoop gpadmin > sudo passwd gpadmin # 设置一个密码 > sudo vi /etc/sudoers 末尾添加 gpadmin ALL=(ALL) NOPASSWD: ALL 并保存
master节点
安装hawq
> sudo su root > wget http://emr-agent-pack.oss-cn-hangzhou.aliyuncs.com/hawq/hawq-2.0.0.0-22126.x86_64.rpm > rpm -ivh hawq-2.0.0.0-22126.x86_64.rpm
打通ssh
> sudo su gpadmin > vi hosts ## 添加集群所有节点的IP > vi segment ## 添加所有slave节点的IP > vi masters ## 添加所有master/standby节点的IP > source /usr/local/hawq/greenplum_path.sh > hawq ssh-exkeys -f hosts
修改系统参数
> hawq ssh -f hosts -e 'sudo sysctl -w kernel.sem=\"50100 128256000 50100 2560\"'
安装其它节点HAWQ
> hawq scp -f hosts hawq-2.0.0.0-22126.x86_64.rpm =:~/ > hawq ssh -f hosts -e "sudo rpm -ivh ~/hawq-*.rpm"
创建HAWQ相关文件夹
> hawq ssh -f masters -e 'sudo mkdir /mnt/disk{2..4}' > hawq ssh -f masters -e 'sudo chown hdfs:hadoop /mnt/disk{2..4}' > hawq ssh -f masters -e 'sudo chmod 770 /mnt/disk{2..4}' > hawq ssh -f masters -e 'mkdir -p /mnt/disk1/hawq/data/master' > hawq ssh -f segment -e 'mkdir -p /mnt/disk1/hawq/data/segment' > hawq ssh -f hosts -e 'mkdir -p /mnt/disk{1..4}/hawq/tmp'
修改yarn为capacity-scheduler调度模式
> vi /etc/emr/hadoop-conf/yarn-site.xml 添加属性:将master的yarn-site.xml同步到其它所有节点 重启集群yarn yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
修改HAWQ配置
> vi /usr/local/hawq/etc/hawq-site.xml 修改如下属性
属性名 | 值 | 说明 |
---|---|---|
hawq_master_address_host | emr-header-1 | |
hawq_standby_address_host | emr-header-2 | |
hawq_dfs_url | emr-cluster/hawq_default | |
hawq_master_directory | /mnt/disk1/hawq/data/master | |
hawq_segment_directory | /mnt/disk1/hawq/data/segment | |
hawq_master_temp_directory | /mnt/disk1/hawq/tmp,/mnt/disk2/hawq/tmp,/mnt/disk3/hawq/tmp,/mnt/disk4/hawq/tmp | |
hawq_segment_temp_directory | /mnt/disk1/hawq/tmp,/mnt/disk2/hawq/tmp,/mnt/disk3/hawq/tmp,/mnt/disk4/hawq/tmp | |
hawq_global_rm_type | yarn | |
hawq_rm_yarn_address | emr-header-1:8032,emr-header-2:8032 | |
hawq_rm_yarn_scheduler_address | emr-header-1:8030,emr-header-2:8030 |
> vi /usr/local/hawq/etc/hdfs-client.xml#打开HA的注释
属性名 | 值 | 说明 |
---|---|---|
dfs.nameservices | emr-cluster | |
dfs.ha.namenodes.emr-cluster | nn1,nn2 | |
dfs.namenode.rpc-address.emr-cluster.nn1 | emr-header-1:8020 | |
dfs.namenode.rpc-address.emr-cluster.nn2 | emr-header-2:8020 | |
dfs.namenode.http-address.emr-cluster.nn1 | emr-header-1:50070 | |
dfs.namenode.http-address.emr-cluster.nn2 | emr-header-2:50070 |
> vi /usr/local/hawq/etc/yarn-client.xml#打开HA的注释
属性名 | 值 | 说明 |
---|---|---|
yarn.resourcemanager.ha | emr-header-1:8032,emr-header-2:8032 | |
yarn.resourcemanager.scheduler.ha | emr-heaer-1:8030,emr-header-2:8030 |
> vi /usr/local/hawq/etc/slaves #添加segment节点IP
综上修改完master节点的HAWQ配置之后,需要同步到其它所有节点
> hawq scp -f hosts /usr/local/hawq/etc/yarn-client.xml /usr/local/hawq/etc/hdfs-client.xml /usr/local/hawq/etc/hawq-site.xml /usr/local/hawq/etc/slaves =:/usr/local/hawq/etc/
> hawq init cluster
> psql -d postgrespostgres=# create database mytest;CREATE DATABASEpostgres=# \c mytestYou are now connected to database "mytest" as user "gpadmin".mytest=# create table t (i int);CREATE TABLEmytest=# insert into t select generate_series(1,100);INSERT 0 100mytest=# \timingTiming is on.mytest=# select count(*) from t; count------- 100(1 row)Time: 77.333 msmytest=# select * from t;
转载地址:http://knbgx.baihongyu.com/