redhat 계열에서의 Pacemaker를 설치 하는 방법이다.
pacemaker는 Redhat에서 나오는 고가용성클러스터이다.
- Corosync란: 클러스터 인프라 지원(Quorum 관리, 메시지 관리 등)
- Pacemaker란: 클러스터 자원 관리자
- pcs란: corosync와 pacemaker를 손쉽게 관리할 수 있는 management 프로그램
TEST 환경
구분 | node01 | node02 |
hostname | cluster01 | cluster02 |
OS | centos7.6 | centos7.6 |
IP | 172.10.2.5 | 172.10.2.6 |
VirtualIP | 172.10.2.4 |
사전 작업
1. host resolation 설정 (양쪽노드)
[root@cluster01 ~]# echo -e "\n172.10.2.5\tcluster01
172.10.2.6\tcluster02" >> /etc/hosts
[root@cluster01 ~]#
[root@cluster01 ~]#
[root@cluster01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.10.2.5 cluster01
172.10.2.6 cluster02
2. name server 설정
[root@cluster01 ~]# cat /etc/resolv.conf
nameserver 168.126.64.1
nameserver 8.8.4.4
3. selinux disable 및 iptables 종료
iptable을 사용할 경우TCP( 2224,3121,21064) port와 UDP(5405) port를 오픈해줘야 한다.
<selinux disable>
[root@cluster01 ~]# sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config #영구반영을 위한 config 변경 (변경된 config를 반영하려면 시스템 재기동 필요)
[root@cluster01 ~]# getenforce ## 현재 Selinux 동작 상태 확인
Enforcing
[root@cluster01 ~]# setenforce 0 ## 임시적으로 disable 설정
[root@cluster01 ~]# getenforce
Permissive
<리눅스 방화벽 종료>
[root@cluster01 ~]# systemctl status firewalld.service ## 방화벽 동작상태 확인
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running) since 월 2019-06-24 17:29:07 KST; 17h ago
Docs: man:firewalld(1)
Main PID: 2832 (firewalld)
CGroup: /system.slice/firewalld.service
└─2832 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
6월 24 17:29:07 cluster01 systemd[1]: Starting firewalld - dynamic firewall daemon...
6월 24 17:29:07 cluster01 systemd[1]: Started firewalld - dynamic firewall daemon.
[root@cluster01 ~]# systemctl stop firewalld.service ## 방화벽 종료
[root@cluster01 ~]# systemctl disable firewalld.service ## 재기동시 동작 되지 않도록 설정
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
PKG Install
1. PKG 설치
[root@cluster01 ~]# yum install -y pacemaker corosync pcs psmisc policycoreutils-python
2. pcs daemon 실행
>> pcs commans line 인터페이스와 함께 모든 클러스터 노드에서 구성을 동기화 하는 역활
[root@cluster01 ~]# systemctl status pcsd.service ## 설치 후 동작 상태 확인
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pcsd(8)
man:pcs(8)
[root@cluster01 ~]# systemctl start pcsd.service ## pcs Daemon 실행
[root@cluster01 ~]# systemctl status pcsd.service ## 실행 후 동작 상태 확인
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
Active: active (running) since 화 2019-06-25 11:32:28 KST; 44s ago
Docs: man:pcsd(8)
man:pcs(8)
Main PID: 21819 (pcsd)
CGroup: /system.slice/pcsd.service
└─21819 /usr/bin/ruby /usr/lib/pcsd/pcsd
6월 25 11:32:27 cluster01 systemd[1]: Starting PCS GUI and remote configuration interface...
6월 25 11:32:28 cluster01 systemd[1]: Started PCS GUI and remote configuration interface.
[root@cluster01 ~]# systemctl enable pcsd.service ## 시스템 재기동시에도 동작 될수 있도록 설정
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@cluster01 ~]#
3. hacluster 계정의 패스워드 설정
>> PKG가 설치되면서 자동으로 hacluster계정이 생성된다.
[root@cluster01 ~]# cat /etc/passwd | grep "hacluster"
hacluster:x:189:189:cluster user:/home/hacluster:/sbin/nologin
[root@cluster01 ~]# passwd hacluster
hacluster 사용자의 비밀 번호 변경 중
새 암호: cluster.123
새 암호 재입력: cluster.123
passwd: 모든 인증 토큰이 성공적으로 업데이트 되었습니다.
4. corosync를 설정한다.
>> 이때 한쪽 노드에서만 설정한다.
<사용자 인증>
[root@cluster01 ~]# pcs cluster auth cluster01 cluster02
Username: hacluster
Password:
cluster02: Authorized
cluster01: Authorized
** 주의 : 만약 사용자 인증 단계에서 시간이 좀 오려 걸릴 경우 /etc/hosts에 정상적으로 설정 되어 있는지와 hostname으로 ping이 정상적으로 되는지 확인한다.
5. corosync를 구성하고 동기화 한다.
[root@cluster01 ~]# pcs cluster setup --name tcluster cluster01 cluster02
Destroying cluster on nodes: cluster01, cluster02...
cluster01: Stopping Cluster (pacemaker)...
cluster02: Stopping Cluster (pacemaker)...
cluster01: Successfully destroyed cluster
cluster02: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'cluster01', 'cluster02'
cluster01: successful distribution of the file 'pacemaker_remote authkey'
cluster02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
cluster01: Succeeded
cluster02: Succeeded
Synchronizing pcsd certificates on nodes cluster01, cluster02...
cluster02: Success
cluster01: Success
Restarting pcsd on the nodes in order to reload the certificates...
cluster02: Success
cluster01: Success
6. 확인
6-1. cluster 동작 상태 확인
[root@cluster01 ~]# pcs cluster start --all
cluster01: Starting Cluster (corosync)...
cluster02: Starting Cluster (corosync)...
cluster01: Starting Cluster (pacemaker)...
cluster02: Starting Cluster (pacemaker)...
[root@cluster01 ~]#
6-2. cluster 통신 확인
<cluster01>
[root@cluster01 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 172.10.2.5
status = ring 0 active with no faults
<cluster02>
[root@cluster02 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id = 172.10.2.6
status = ring 0 active with no faults
6-3. 멤버쉽과 쿼럼 확인
[root@cluster01 ~]# corosync-cmapctl | egrep -i members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(172.10.2.5)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(172.10.2.6)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
<cluster01>
[root@cluster01 ~]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 cluster01 (local)
2 1 cluster02
[root@cluster01 ~]# pcs status
Cluster name: tcluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:50:08 2019
Last change: Tue Jun 25 11:46:09 2019 by hacluster via crmd on cluster01
2 nodes configured
0 resources configured
Online: [ cluster01 cluster02 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
<cluster02>
[root@cluster02 ~]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 cluster01
2 1 cluster02 (local)
[root@cluster02 ~]# pcs status
Cluster name: tcluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:50:07 2019
Last change: Tue Jun 25 11:46:09 2019 by hacluster via crmd on cluster01
2 nodes configured
0 resources configured
Online: [ cluster01 cluster02 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
7. active/passive 클러스터 생성
>> 데이처 무결성을 확보하기 위해 STONITH가 활성화 되어 있어 처음 실행 시 오류가 발생하고 STONITH를 비활성화 하고 다시 실행하면 오류가 발생하지 않는다.
[root@cluster01 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@cluster01 ~]# pcs property set stonith-enabled=false
[root@cluster01 ~]# crm_verify -L -V
8. Cluster VIP생성
>> Cluster의 기능으로 Active인 node의 network interface에 VIP가 설정되도록 하는 설정
[root@cluster01 ~]# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=172.10.2.4 cidr_netmask=20 op monitor interval=30s ##VIP resource 생성 및 VIP 할당
[root@cluster01 ~]# pcs status ## Cluster에서 생성 된 resource 확인
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:58:21 2019
Last change: Tue Jun 25 11:58:14 2019 by root via cibadmin on cluster01
2 nodes configured
1 resource configured
Online: [ cluster01 cluster02 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started cluster01 ###어느 노드에 VIP가 있는지도 표시
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cluster01 ~]# ip a | grep secondary ## VIP설정 여부 확인
inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0
[root@cluster01 ~]#
## VIP 삭제 방법
[root@cluster01 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:03:30 2019
Last change: Tue Jun 25 12:03:25 2019 by root via cibadmin on cluster01
2 nodes configured
1 resource configured
Online: [ cluster01 cluster02 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started cluster01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cluster01 ~]# pcs resource delete VirtualIP
Attempting to stop: VirtualIP... Stopped
[root@cluster01 ~]# ip a | grep secondary
[root@cluster01 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:03:55 2019
Last change: Tue Jun 25 12:03:37 2019 by root via cibadmin on cluster01
2 nodes configured
0 resources configured
Online: [ cluster01 cluster02 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cluster01 ~]#
resource "ocf:heartbeat:IPaddr2" 의 filed정보는 아래와 같다.
ocf:heartbeat:IPaddr2 ┃ ┃ ┖-> 리소스의 스크립트의 이름 ┃ ┖-> 리소스의 프로바이더 ┖-> 리소스의 standard 정보 |
리소스의 standard 정보 확인 방법
[root@cluster01 ~]# pcs resource standards
lsb
ocf
service
systemd
리소스의 프로바이더 확인
[root@cluster01 ~]# pcs resource providers
heartbeat
openstack
pacemaker
리소스의 스크립트의 이름확인
[root@cluster01 ~]# pcs resource agents ocf:heartbeat
aliyun-vpc-move-ip
apache
aws-vpc-move-ip
awseip
awsvip
azure-lb
clvm
conntrackd
CTDB
db2
Delay
dhcpd
docker
Dummy
ethmonitor
exportfs
Filesystem
galera
garbd
iface-vlan
IPaddr
IPaddr2
IPsrcaddr
iSCSILogicalUnit
iSCSITarget
LVM
LVM-activate
lvmlockd
MailTo
mysql
nagios
named
nfsnotify
nfsserver
nginx
NodeUtilization
oraasm
oracle
oralsnr
pgsql
portblock
postfix
rabbitmq-cluster
redis
Route
rsyncd
SendArp
slapd
Squid
sybaseASE
symlink
tomcat
vdo-vol
VirtualDomain
Xinetd
9. failover test
>> cluster01를 정지시켜서 failover 되도록 한다.
<cluster01>
[root@cluster01 ~]# pcs cluster stop cluster01 ## cluster동작 정지
cluster01: Stopping Cluster (pacemaker)...
cluster01: Stopping Cluster (corosync)...
[root@cluster01 ~]# pcs status ## cluster동작이 정지되면 상태를 확인 할수 없다.
Error: cluster is not currently running on this node
[root@cluster01 ~]# ip a | grep secondary ## 설정한 VIP도 다른 node로 넘어 간다.
[root@cluster01 ~]#
<cluster02>
[root@cluster02 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:14:51 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01
2 nodes configured
1 resource configured
Online: [ cluster02 ]
OFFLINE: [ cluster01 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started cluster02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cluster02 ~]# ip a | grep secondary ##cluster01에 있던 VIP가 설정된다.
inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0
** cluster01을 다시 실행해도 VIP는 계속 cluster02에서 동작되고 VIP절체가 되지 않는다.
<cluster01>
[root@cluster01 ~]# pcs cluster start cluster01
cluster01: Starting Cluster (corosync)...
cluster01: Starting Cluster (pacemaker)...
[root@cluster01 ~]# pcs status ; ip a | grep secondary
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:18:38 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01
2 nodes configured
1 resource configured
Online: [ cluster01 cluster02 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started cluster02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cluster01 ~]#
<cluster02>
[root@cluster02 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:18:16 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01
2 nodes configured
1 resource configured
Online: [ cluster01 cluster02 ]
Full list of resources:
VirtualIP (ocf::heartbeat:IPaddr2): Started cluster02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cluster02 ~]# ip a | grep secondary
inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0
'운영체제 > RHEL&CENTOS' 카테고리의 다른 글
[Linux] KVM(for Kernel-based Virtual Machine) (3) - Network 추가 하기 (2) | 2019.06.26 |
---|---|
[Linux] KVM GUI가상머신 관리자에서 생성한 VM에 virsh console로 접속하기 (0) | 2019.06.25 |
[Linux] KVM(for Kernel-based Virtual Machine) (2) - GUI 배포 (0) | 2019.06.25 |
[Linux] virsh command 사용법 (0) | 2019.06.25 |
[Linux] KVM(for Kernel-based Virtual Machine) (2) - CLI 배포 (0) | 2019.06.24 |