Remove a faulty node and add a new node on Pacemaker cluster

Remove a faulty node and add a new node on Pacemaker cluster

Sometimes cluster nodes may fail from cluster due to disk failure, operating system crash or some other reason. Here in this demonstration we will configure, how to remove a failed cluster node and add a new cluster node on Pacemaker cluster.


Topic

  • How to remove a faulty node and add a new cluster node on Pacemaker cluster?
  • How to add a new node on Pacemaker cluster?
  • How to remove a node from Pacemaker cluster?



Solution


In this demonstration, we have an exiting two node cluster. node2 is not responding on the cluster so we will remove node2 from cluster and add a new node on pacemaker cluster.

Remove Node from PaceMaker cluster [1]
Execute the following command to remove node2 from Pacemaker cluster.
[root@node1 ~]# pcs cluster node remove node2.example.local
node2.example.local: Stopping Cluster (pacemaker)...
node2.example.local: Successfully destroyed cluster
node1.example.local: Corosync updated

Verify PaceMaker Cluster status [2]
[root@node1 ~]# pcs status 
Cluster name: pgsqlcluster
Stack: corosync
Current DC: node1.example.local (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Sun Dec 29 09:45:39 2019      Last change: Sun Dec 29 09:44:24 2019 by root via crm_node on node1.example.local

1 node and 4 resources configured

Online: [ node1.example.local ]

Full list of resources:

 Resource Group: postgres-group
     postgres-lvm-res   (ocf::heartbeat:LVM):   Started node1.example.local
     postgres-fs-res    (ocf::heartbeat:Filesystem):    Started node1.example.local
     POSTGRES-VIP   (ocf::heartbeat:IPaddr2):   Started node1.example.local
     postgresql_service (ocf::heartbeat:pgsql): Started node1.example.local

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Add a new node on PaceMaker cluster [3]
[root@node2 ~]# echo "centos" | passwd hacluster --stdin
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.

  • Start and enable Pacemaker cluster services.
# systemctl start pcsd.service; systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.

Authorize new node to PaceMaker cluster [4]
  • Execute the following command on active cluster node to authorize new cluster node.
# pcs cluster auth node2.example.local -u hacluster -p centos
node2.example.local: Authorized

  • Execute the following command on active cluster node to add the new node to the existing cluster. This command will sync the cluster configuration file from exiting node to new cluster node.
[root@node1 ~]# pcs cluster node add node2.example.local
Disabling SBD service...
node2.example.local: sbd disabled
node1.example.local: Corosync updated
Setting up corosync...
node2.example.local: Succeeded
Synchronizing pcsd certificates on nodes node2.example.local...
node2.example.local: Success

Restarting pcsd on the nodes in order to reload the certificates...
node2.example.local: Success

Start cluster service manually on new node(node2) [5]
[root@node2 ~]# pcs cluster start

[root@node2 ~]# pcs status
Cluster name: pgsqlcluster
Stack: corosync
Current DC: node1.example.local (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Sun Dec 29 11:36:07 2019
Last change: Sun Dec 29 11:32:43 2019 by hacluster via crmd on node1.example.local

2 nodes configured
4 resources configured

Online: [ node1.example.local node2.example.local ]

Full list of resources:

 Resource Group: postgres-group
     postgres-lvm-res   (ocf::heartbeat:LVM):   Started node1.example.local
     postgres-fs-res    (ocf::heartbeat:Filesystem):    Started node1.example.local
     POSTGRES-VIP   (ocf::heartbeat:IPaddr2):   Started node1.example.local
     postgresql_service (ocf::heartbeat:pgsql): Started node1.example.local

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Validate PaceMaker cluster service
  • Reboot active cluster node and check whether cluster service is moving to newly added cluster node.
[root@node1 ~]# reboot

[root@node2 ~]# pcs status 
Cluster name: pgsqlcluster
Stack: corosync
Current DC: node2.example.local (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Sun Dec 29 11:41:44 2019
Last change: Sun Dec 29 11:41:00 2019 by hacluster via crmd on node2.example.local

2 nodes configured
4 resources configured

Online: [ node2.example.local ]
OFFLINE: [ node1.example.local ]

Full list of resources:

 Resource Group: postgres-group
     postgres-lvm-res   (ocf::heartbeat:LVM):   Started node2.example.local
     postgres-fs-res    (ocf::heartbeat:Filesystem):    Started node2.example.local
     POSTGRES-VIP   (ocf::heartbeat:IPaddr2):   Started node2.example.local
     postgresql_service (ocf::heartbeat:pgsql): Started node2.example.local

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

  • Now we validated that cluster service is moved to new cluster node.


You May Also Like

About the Author: Andrew Joseph

Leave a Reply

Your email address will not be published. Required fields are marked *