[Centos/RHEL] HAcluster - pacemaker (1)

운영체제/RHEL&CENTOS

[Centos/RHEL] HAcluster - pacemaker (1)

louky 2019. 6. 25. 15:40

redhat 계열에서의 Pacemaker를 설치 하는 방법이다.

pacemaker는 Redhat에서 나오는 고가용성클러스터이다.

Corosync란: 클러스터 인프라 지원(Quorum 관리, 메시지 관리 등)
Pacemaker란: 클러스터 자원 관리자
pcs란: corosync와 pacemaker를 손쉽게 관리할 수 있는 management 프로그램

TEST 환경

구분	node01	node02
hostname	cluster01	cluster02
OS	centos7.6	centos7.6
IP	172.10.2.5	172.10.2.6
VirtualIP	172.10.2.4

사전 작업

1. host resolation 설정 (양쪽노드)

[root@cluster01 ~]# echo -e "\n172.10.2.5\tcluster01
172.10.2.6\tcluster02" >> /etc/hosts
[root@cluster01 ~]#
[root@cluster01 ~]#
[root@cluster01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6


172.10.2.5    cluster01
172.10.2.6    cluster02

2. name server 설정

[root@cluster01 ~]# cat /etc/resolv.conf
nameserver 168.126.64.1
nameserver 8.8.4.4

3. selinux disable 및 iptables 종료

iptable을 사용할 경우TCP( 2224,3121,21064) port와 UDP(5405) port를 오픈해줘야 한다.

[root@cluster01 ~]# sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config #영구반영을 위한 config 변경 (변경된 config를 반영하려면 시스템 재기동 필요)
[root@cluster01 ~]# getenforce        ## 현재 Selinux 동작 상태 확인 
Enforcing
[root@cluster01 ~]# setenforce 0      ## 임시적으로 disable 설정 
[root@cluster01 ~]# getenforce
Permissive

<리눅스 방화벽 종료>

[root@cluster01 ~]# systemctl status firewalld.service     ## 방화벽 동작상태 확인 
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since 월 2019-06-24 17:29:07 KST; 17h ago
     Docs: man:firewalld(1)
Main PID: 2832 (firewalld)
   CGroup: /system.slice/firewalld.service
           └─2832 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid


6월 24 17:29:07 cluster01 systemd[1]: Starting firewalld - dynamic firewall daemon...
6월 24 17:29:07 cluster01 systemd[1]: Started firewalld - dynamic firewall daemon.
[root@cluster01 ~]# systemctl stop firewalld.service       ## 방화벽 종료
[root@cluster01 ~]# systemctl disable firewalld.service    ## 재기동시 동작 되지 않도록 설정
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

PKG Install

1. PKG 설치

[root@cluster01 ~]# yum install -y pacemaker corosync pcs psmisc policycoreutils-python

2. pcs daemon 실행

>> pcs commans line 인터페이스와 함께 모든 클러스터 노드에서 구성을 동기화 하는 역활

[root@cluster01 ~]# systemctl status pcsd.service         ## 설치 후 동작 상태 확인 
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:pcsd(8)
           man:pcs(8)
[root@cluster01 ~]# systemctl start pcsd.service          ## pcs Daemon 실행
[root@cluster01 ~]# systemctl status pcsd.service         ## 실행 후 동작 상태 확인
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
   Active: active (running) since 화 2019-06-25 11:32:28 KST; 44s ago
     Docs: man:pcsd(8)
           man:pcs(8)
Main PID: 21819 (pcsd)
   CGroup: /system.slice/pcsd.service
           └─21819 /usr/bin/ruby /usr/lib/pcsd/pcsd


6월 25 11:32:27 cluster01 systemd[1]: Starting PCS GUI and remote configuration interface...
6월 25 11:32:28 cluster01 systemd[1]: Started PCS GUI and remote configuration interface.
[root@cluster01 ~]# systemctl enable pcsd.service       ##  시스템 재기동시에도 동작 될수 있도록 설정 
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@cluster01 ~]#

3. hacluster 계정의 패스워드 설정

>> PKG가 설치되면서 자동으로 hacluster계정이 생성된다.

[root@cluster01 ~]# cat /etc/passwd | grep "hacluster"
hacluster:x:189:189:cluster user:/home/hacluster:/sbin/nologin
[root@cluster01 ~]# passwd hacluster
hacluster 사용자의 비밀 번호 변경 중
새  암호: cluster.123
새  암호 재입력:  cluster.123
passwd: 모든 인증 토큰이 성공적으로 업데이트 되었습니다.

4. corosync를 설정한다.

>> 이때 한쪽 노드에서만 설정한다.

<사용자 인증>

[root@cluster01 ~]# pcs cluster auth cluster01 cluster02
Username: hacluster
Password:
cluster02: Authorized
cluster01: Authorized

** 주의 : 만약 사용자 인증 단계에서 시간이 좀 오려 걸릴 경우 /etc/hosts에 정상적으로 설정 되어 있는지와 hostname으로 ping이 정상적으로 되는지 확인한다.

5. corosync를 구성하고 동기화 한다.

[root@cluster01 ~]# pcs cluster setup --name tcluster cluster01 cluster02
Destroying cluster on nodes: cluster01, cluster02...
cluster01: Stopping Cluster (pacemaker)...
cluster02: Stopping Cluster (pacemaker)...
cluster01: Successfully destroyed cluster
cluster02: Successfully destroyed cluster


Sending 'pacemaker_remote authkey' to 'cluster01', 'cluster02'
cluster01: successful distribution of the file 'pacemaker_remote authkey'
cluster02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
cluster01: Succeeded
cluster02: Succeeded


Synchronizing pcsd certificates on nodes cluster01, cluster02...
cluster02: Success
cluster01: Success
Restarting pcsd on the nodes in order to reload the certificates...
cluster02: Success
cluster01: Success

6. 확인

6-1. cluster 동작 상태 확인

[root@cluster01 ~]# pcs cluster start --all
cluster01: Starting Cluster (corosync)...
cluster02: Starting Cluster (corosync)...
cluster01: Starting Cluster (pacemaker)...
cluster02: Starting Cluster (pacemaker)...
[root@cluster01 ~]#

6-2. cluster 통신 확인

[root@cluster01 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
    id    = 172.10.2.5
    status    = ring 0 active with no faults

[root@cluster02 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
    id    = 172.10.2.6
    status    = ring 0 active with no faults

6-3. 멤버쉽과 쿼럼 확인

[root@cluster01 ~]# corosync-cmapctl | egrep -i members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(172.10.2.5)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(172.10.2.6)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

[root@cluster01 ~]# pcs status corosync


Membership information
----------------------
    Nodeid      Votes Name
         1          1 cluster01 (local)
         2          1 cluster02

[root@cluster01 ~]# pcs status
Cluster name: tcluster


WARNINGS:
No stonith devices and stonith-enabled is not false


Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:50:08 2019
Last change: Tue Jun 25 11:46:09 2019 by hacluster via crmd on cluster01


2 nodes configured
0 resources configured


Online: [ cluster01 cluster02 ]


No resources




Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

[root@cluster02 ~]# pcs status corosync


Membership information
----------------------
    Nodeid      Votes Name
         1          1 cluster01
         2          1 cluster02 (local)

[root@cluster02 ~]# pcs status
Cluster name: tcluster


WARNINGS:
No stonith devices and stonith-enabled is not false


Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:50:07 2019
Last change: Tue Jun 25 11:46:09 2019 by hacluster via crmd on cluster01


2 nodes configured
0 resources configured


Online: [ cluster01 cluster02 ]


No resources




Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

7. active/passive 클러스터 생성

>> 데이처 무결성을 확보하기 위해 STONITH가 활성화 되어 있어 처음 실행 시 오류가 발생하고 STONITH를 비활성화 하고 다시 실행하면 오류가 발생하지 않는다.

[root@cluster01 ~]# crm_verify -L -V
   error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@cluster01 ~]# pcs property set stonith-enabled=false
[root@cluster01 ~]# crm_verify -L -V

8. Cluster VIP생성

>> Cluster의 기능으로 Active인 node의 network interface에 VIP가 설정되도록 하는 설정

[root@cluster01 ~]# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=172.10.2.4 cidr_netmask=20 op monitor interval=30s     ##VIP resource 생성 및 VIP 할당 
[root@cluster01 ~]# pcs status              ## Cluster에서  생성 된 resource 확인 
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:58:21 2019
Last change: Tue Jun 25 11:58:14 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster01 cluster02 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster01        ###어느 노드에  VIP가 있는지도 표시


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster01 ~]# ip a | grep secondary            ## VIP설정 여부 확인 
    inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0
[root@cluster01 ~]#

## VIP 삭제 방법

[root@cluster01 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:03:30 2019
Last change: Tue Jun 25 12:03:25 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster01 cluster02 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster01


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster01 ~]# pcs resource delete VirtualIP
Attempting to stop: VirtualIP... Stopped

[root@cluster01 ~]# ip a | grep secondary
[root@cluster01 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:03:55 2019
Last change: Tue Jun 25 12:03:37 2019 by root via cibadmin on cluster01


2 nodes configured
0 resources configured


Online: [ cluster01 cluster02 ]


No resources




Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster01 ~]#

resource "ocf:heartbeat:IPaddr2" 의 filed정보는 아래와 같다.

ocf:heartbeat:IPaddr2

┃ ┃ ┖－> 리소스의 스크립트의 이름

┃ ┖－> 리소스의 프로바이더

┖－> 리소스의 standard 정보

리소스의 standard 정보 확인 방법

[root@cluster01 ~]# pcs resource standards
lsb
ocf
service
systemd

리소스의 프로바이더 확인

[root@cluster01 ~]# pcs resource providers
heartbeat
openstack
pacemaker

리소스의 스크립트의 이름확인

[root@cluster01 ~]# pcs resource agents ocf:heartbeat
aliyun-vpc-move-ip
apache
aws-vpc-move-ip
awseip
awsvip
azure-lb
clvm
conntrackd
CTDB
db2
Delay
dhcpd
docker
Dummy
ethmonitor
exportfs
Filesystem
galera
garbd
iface-vlan
IPaddr
IPaddr2
IPsrcaddr
iSCSILogicalUnit
iSCSITarget
LVM
LVM-activate
lvmlockd
MailTo
mysql
nagios
named
nfsnotify
nfsserver
nginx
NodeUtilization
oraasm
oracle
oralsnr
pgsql
portblock
postfix
rabbitmq-cluster
redis
Route
rsyncd
SendArp
slapd
Squid
sybaseASE
symlink
tomcat
vdo-vol
VirtualDomain
Xinetd

9. failover test

>> cluster01를 정지시켜서 failover 되도록 한다.

[root@cluster01 ~]# pcs cluster stop cluster01    ## cluster동작 정지
cluster01: Stopping Cluster (pacemaker)...
cluster01: Stopping Cluster (corosync)...
[root@cluster01 ~]# pcs status                    ## cluster동작이 정지되면 상태를 확인 할수 없다. 
Error: cluster is not currently running on this node
[root@cluster01 ~]# ip a | grep secondary         ## 설정한 VIP도 다른 node로 넘어 간다. 
[root@cluster01 ~]#

[root@cluster02 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:14:51 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster02 ]
OFFLINE: [ cluster01 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster02


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster02 ~]# ip a | grep secondary            ##cluster01에 있던 VIP가 설정된다. 
    inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0

** cluster01을 다시 실행해도 VIP는 계속 cluster02에서 동작되고 VIP절체가 되지 않는다.

[root@cluster01 ~]# pcs cluster start cluster01
cluster01: Starting Cluster (corosync)...
cluster01: Starting Cluster (pacemaker)...
[root@cluster01 ~]# pcs status ; ip a | grep secondary
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:18:38 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster01 cluster02 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster02


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster01 ~]#

[root@cluster02 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:18:16 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster01 cluster02 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster02


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster02 ~]# ip a | grep secondary
    inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0

저작자표시 비영리 변경금지

'운영체제 > RHEL&CENTOS' 카테고리의 다른 글

[Linux] KVM(for Kernel-based Virtual Machine) (3) - Network 추가 하기 (2)	2019.06.26
[Linux] KVM GUI가상머신 관리자에서 생성한 VM에 virsh console로 접속하기 (0)	2019.06.25
[Linux] KVM(for Kernel-based Virtual Machine) (2) - GUI 배포 (0)	2019.06.25
[Linux] virsh command 사용법 (0)	2019.06.25
[Linux] KVM(for Kernel-based Virtual Machine) (2) - CLI 배포 (0)	2019.06.24

현재글[Centos/RHEL] HAcluster - pacemaker (1)

Nam's Daily

일상을 기억하다...

tar error, linux zabbix, virt-install, bsd tar, 링크파일 확인, kvm console 빠져나오기, nova install, libarchive.xattr.com.apple.provenance, openstack "Not Found", softlink, omreport, date format, 원본 링크파일, openstack keystone, KVM VM생성, terraform install, tar: ignoring unknown extended, tar, The requested URL /auth/login/ was not found on this server., glance install,

Today :
Yesterday :

Nam's Daily