운영체제/RHEL&CENTOS

[Centos/RHEL] HAcluster - pacemaker (1)

louky 2019. 6. 25. 15:40
반응형

 

redhat 계열에서의 Pacemaker를 설치 하는 방법이다. 

 

pacemaker는 Redhat에서 나오는 고가용성클러스터이다. 

 

  • Corosync란: 클러스터 인프라 지원(Quorum 관리, 메시지 관리 등)
  • Pacemaker란: 클러스터 자원 관리자
  • pcs란: corosync와 pacemaker를 손쉽게 관리할 수 있는 management 프로그램

 

TEST 환경

구분  node01 node02
hostname  cluster01 cluster02
OS centos7.6 centos7.6
IP 172.10.2.5 172.10.2.6
VirtualIP 172.10.2.4

 

 

사전 작업 

1.  host resolation 설정 (양쪽노드)

[root@cluster01 ~]# echo -e "\n172.10.2.5\tcluster01
172.10.2.6\tcluster02" >> /etc/hosts
[root@cluster01 ~]#
[root@cluster01 ~]#
[root@cluster01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6


172.10.2.5    cluster01
172.10.2.6    cluster02

2.  name  server 설정 

[root@cluster01 ~]# cat /etc/resolv.conf
nameserver 168.126.64.1
nameserver 8.8.4.4

 

3. selinux disable 및  iptables 종료

iptable을 사용할 경우TCP( 2224,3121,21064) port와  UDP(5405) port를 오픈해줘야 한다. 

 

<selinux disable>

[root@cluster01 ~]# sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config #영구반영을 위한 config 변경 (변경된 config를 반영하려면 시스템 재기동 필요)
[root@cluster01 ~]# getenforce        ## 현재 Selinux 동작 상태 확인 
Enforcing
[root@cluster01 ~]# setenforce 0      ## 임시적으로 disable 설정 
[root@cluster01 ~]# getenforce
Permissive

 

<리눅스 방화벽 종료>

[root@cluster01 ~]# systemctl status firewalld.service     ## 방화벽 동작상태 확인 
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since 월 2019-06-24 17:29:07 KST; 17h ago
     Docs: man:firewalld(1)
Main PID: 2832 (firewalld)
   CGroup: /system.slice/firewalld.service
           └─2832 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid


6월 24 17:29:07 cluster01 systemd[1]: Starting firewalld - dynamic firewall daemon...
6월 24 17:29:07 cluster01 systemd[1]: Started firewalld - dynamic firewall daemon.
[root@cluster01 ~]# systemctl stop firewalld.service       ## 방화벽 종료
[root@cluster01 ~]# systemctl disable firewalld.service    ## 재기동시 동작 되지 않도록 설정
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

 

PKG Install  

1.  PKG  설치 

[root@cluster01 ~]# yum install -y pacemaker corosync pcs psmisc policycoreutils-python

 

2. pcs daemon 실행 

>> pcs commans line  인터페이스와 함께 모든 클러스터 노드에서 구성을 동기화 하는 역활

[root@cluster01 ~]# systemctl status pcsd.service         ## 설치 후 동작 상태 확인 
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:pcsd(8)
           man:pcs(8)
[root@cluster01 ~]# systemctl start pcsd.service          ## pcs Daemon 실행
[root@cluster01 ~]# systemctl status pcsd.service         ## 실행 후 동작 상태 확인
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; disabled; vendor preset: disabled)
   Active: active (running) since 화 2019-06-25 11:32:28 KST; 44s ago
     Docs: man:pcsd(8)
           man:pcs(8)
Main PID: 21819 (pcsd)
   CGroup: /system.slice/pcsd.service
           └─21819 /usr/bin/ruby /usr/lib/pcsd/pcsd


6월 25 11:32:27 cluster01 systemd[1]: Starting PCS GUI and remote configuration interface...
6월 25 11:32:28 cluster01 systemd[1]: Started PCS GUI and remote configuration interface.
[root@cluster01 ~]# systemctl enable pcsd.service       ##  시스템 재기동시에도 동작 될수 있도록 설정 
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@cluster01 ~]#

 

3. hacluster  계정의 패스워드 설정

>> PKG가 설치되면서 자동으로 hacluster계정이 생성된다. 

[root@cluster01 ~]# cat /etc/passwd | grep "hacluster"
hacluster:x:189:189:cluster user:/home/hacluster:/sbin/nologin
[root@cluster01 ~]# passwd hacluster
hacluster 사용자의 비밀 번호 변경 중
새  암호: cluster.123
새  암호 재입력:  cluster.123
passwd: 모든 인증 토큰이 성공적으로 업데이트 되었습니다.

 

4. corosync를 설정한다. 

>> 이때 한쪽 노드에서만 설정한다. 

 

<사용자 인증>

[root@cluster01 ~]# pcs cluster auth cluster01 cluster02
Username: hacluster
Password:
cluster02: Authorized
cluster01: Authorized

** 주의 : 만약 사용자 인증 단계에서 시간이 좀 오려 걸릴 경우 /etc/hosts에 정상적으로 설정 되어 있는지와 hostname으로 ping이 정상적으로 되는지 확인한다. 

 

5. corosync를 구성하고 동기화 한다. 

[root@cluster01 ~]# pcs cluster setup --name tcluster cluster01 cluster02
Destroying cluster on nodes: cluster01, cluster02...
cluster01: Stopping Cluster (pacemaker)...
cluster02: Stopping Cluster (pacemaker)...
cluster01: Successfully destroyed cluster
cluster02: Successfully destroyed cluster


Sending 'pacemaker_remote authkey' to 'cluster01', 'cluster02'
cluster01: successful distribution of the file 'pacemaker_remote authkey'
cluster02: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
cluster01: Succeeded
cluster02: Succeeded


Synchronizing pcsd certificates on nodes cluster01, cluster02...
cluster02: Success
cluster01: Success
Restarting pcsd on the nodes in order to reload the certificates...
cluster02: Success
cluster01: Success

 

6. 확인 

 6-1. cluster 동작 상태 확인 

[root@cluster01 ~]# pcs cluster start --all
cluster01: Starting Cluster (corosync)...
cluster02: Starting Cluster (corosync)...
cluster01: Starting Cluster (pacemaker)...
cluster02: Starting Cluster (pacemaker)...
[root@cluster01 ~]#

6-2. cluster  통신 확인 

 

<cluster01>

[root@cluster01 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
    id    = 172.10.2.5
    status    = ring 0 active with no faults

<cluster02> 

[root@cluster02 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
    id    = 172.10.2.6
    status    = ring 0 active with no faults

6-3. 멤버쉽과 쿼럼 확인 

[root@cluster01 ~]# corosync-cmapctl | egrep -i members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(172.10.2.5)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(172.10.2.6)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

 

<cluster01>

[root@cluster01 ~]# pcs status corosync


Membership information
----------------------
    Nodeid      Votes Name
         1          1 cluster01 (local)
         2          1 cluster02

[root@cluster01 ~]# pcs status
Cluster name: tcluster


WARNINGS:
No stonith devices and stonith-enabled is not false


Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:50:08 2019
Last change: Tue Jun 25 11:46:09 2019 by hacluster via crmd on cluster01


2 nodes configured
0 resources configured


Online: [ cluster01 cluster02 ]


No resources




Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

 

<cluster02>

[root@cluster02 ~]# pcs status corosync


Membership information
----------------------
    Nodeid      Votes Name
         1          1 cluster01
         2          1 cluster02 (local)

[root@cluster02 ~]# pcs status
Cluster name: tcluster


WARNINGS:
No stonith devices and stonith-enabled is not false


Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:50:07 2019
Last change: Tue Jun 25 11:46:09 2019 by hacluster via crmd on cluster01


2 nodes configured
0 resources configured


Online: [ cluster01 cluster02 ]


No resources




Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

 

7. active/passive 클러스터 생성

 >> 데이처 무결성을 확보하기 위해 STONITH가  활성화 되어 있어 처음 실행 시  오류가 발생하고 STONITH를 비활성화 하고 다시 실행하면 오류가 발생하지 않는다. 

[root@cluster01 ~]# crm_verify -L -V
   error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@cluster01 ~]# pcs property set stonith-enabled=false
[root@cluster01 ~]# crm_verify -L -V

 

8.  Cluster VIP생성 

>> Cluster의 기능으로 Active인  node의 network interface에 VIP가 설정되도록 하는 설정 

[root@cluster01 ~]# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=172.10.2.4 cidr_netmask=20 op monitor interval=30s     ##VIP resource 생성 및 VIP 할당 
[root@cluster01 ~]# pcs status              ## Cluster에서  생성 된 resource 확인 
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 11:58:21 2019
Last change: Tue Jun 25 11:58:14 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster01 cluster02 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster01        ###어느 노드에  VIP가 있는지도 표시


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster01 ~]# ip a | grep secondary            ## VIP설정 여부 확인 
    inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0
[root@cluster01 ~]#

 

## VIP 삭제 방법 

[root@cluster01 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:03:30 2019
Last change: Tue Jun 25 12:03:25 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster01 cluster02 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster01


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster01 ~]# pcs resource delete VirtualIP
Attempting to stop: VirtualIP... Stopped

[root@cluster01 ~]# ip a | grep secondary
[root@cluster01 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster01 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:03:55 2019
Last change: Tue Jun 25 12:03:37 2019 by root via cibadmin on cluster01


2 nodes configured
0 resources configured


Online: [ cluster01 cluster02 ]


No resources




Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster01 ~]#

 

 

resource  "ocf:heartbeat:IPaddr2" 의 filed정보는 아래와 같다. 

ocf:heartbeat:IPaddr2

                   -> 리소스의 스크립트의 이름 

           ->  리소스의 프로바이더 

  ┖->  리소스의 standard 정보 

 

 

리소스의 standard 정보  확인 방법

[root@cluster01 ~]# pcs resource standards
lsb
ocf
service
systemd

 

리소스의 프로바이더  확인

[root@cluster01 ~]# pcs resource providers
heartbeat
openstack
pacemaker

 

리소스의 스크립트의 이름확인 

[root@cluster01 ~]# pcs resource agents ocf:heartbeat
aliyun-vpc-move-ip
apache
aws-vpc-move-ip
awseip
awsvip
azure-lb
clvm
conntrackd
CTDB
db2
Delay
dhcpd
docker
Dummy
ethmonitor
exportfs
Filesystem
galera
garbd
iface-vlan
IPaddr
IPaddr2
IPsrcaddr
iSCSILogicalUnit
iSCSITarget
LVM
LVM-activate
lvmlockd
MailTo
mysql
nagios
named
nfsnotify
nfsserver
nginx
NodeUtilization
oraasm
oracle
oralsnr
pgsql
portblock
postfix
rabbitmq-cluster
redis
Route
rsyncd
SendArp
slapd
Squid
sybaseASE
symlink
tomcat
vdo-vol
VirtualDomain
Xinetd

 

 

 

9. failover test 

>> cluster01를 정지시켜서 failover 되도록 한다. 

 

<cluster01>

[root@cluster01 ~]# pcs cluster stop cluster01    ## cluster동작 정지
cluster01: Stopping Cluster (pacemaker)...
cluster01: Stopping Cluster (corosync)...
[root@cluster01 ~]# pcs status                    ## cluster동작이 정지되면 상태를 확인 할수 없다. 
Error: cluster is not currently running on this node
[root@cluster01 ~]# ip a | grep secondary         ## 설정한 VIP도 다른 node로 넘어 간다. 
[root@cluster01 ~]#

 

<cluster02>

[root@cluster02 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:14:51 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster02 ]
OFFLINE: [ cluster01 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster02


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster02 ~]# ip a | grep secondary            ##cluster01에 있던 VIP가 설정된다. 
    inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0

 

**  cluster01을 다시 실행해도 VIP는 계속 cluster02에서 동작되고 VIP절체가 되지 않는다.

<cluster01>

[root@cluster01 ~]# pcs cluster start cluster01
cluster01: Starting Cluster (corosync)...
cluster01: Starting Cluster (pacemaker)...
[root@cluster01 ~]# pcs status ; ip a | grep secondary
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:18:38 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster01 cluster02 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster02


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster01 ~]#

 

<cluster02>

[root@cluster02 ~]# pcs status
Cluster name: tcluster
Stack: corosync
Current DC: cluster02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Tue Jun 25 12:18:16 2019
Last change: Tue Jun 25 12:04:17 2019 by root via cibadmin on cluster01


2 nodes configured
1 resource configured


Online: [ cluster01 cluster02 ]


Full list of resources:


VirtualIP    (ocf::heartbeat:IPaddr2):    Started cluster02


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@cluster02 ~]# ip a | grep secondary
    inet 172.10.2.4/20 brd 172.10.15.255 scope global secondary eth0

 

반응형