
Sometimes cluster nodes may fail from cluster due to disk failure, operating system crash or some other reason. Here in this demonstration we will configure, how to remove a failed cluster node and add a new cluster node on Pacemaker cluster.
Topic
-
How to remove a faulty node and add a new cluster node on Pacemaker cluster?
-
How to add a new node on Pacemaker cluster?
-
How to remove a node from Pacemaker cluster?
Solution
In this demonstration, we have an exiting two node cluster. node2
is not responding on the cluster so we will remove node2
from cluster and add a new node on pacemaker cluster.
Remove Node from PaceMaker cluster [1]
Execute the following command to remove node2
from Pacemaker cluster.
[root@node1 ~]# pcs cluster node remove node2.example.local
node2.example.local: Stopping Cluster (pacemaker)...
node2.example.local: Successfully destroyed cluster
node1.example.local: Corosync updated
Verify PaceMaker Cluster status [2]
[root@node1 ~]# pcs status
Cluster name: pgsqlcluster
Stack: corosync
Current DC: node1.example.local (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Sun Dec 29 09:45:39 2019 Last change: Sun Dec 29 09:44:24 2019 by root via crm_node on node1.example.local
1 node and 4 resources configured
Online: [ node1.example.local ]
Full list of resources:
Resource Group: postgres-group
postgres-lvm-res (ocf::heartbeat:LVM): Started node1.example.local
postgres-fs-res (ocf::heartbeat:Filesystem): Started node1.example.local
POSTGRES-VIP (ocf::heartbeat:IPaddr2): Started node1.example.local
postgresql_service (ocf::heartbeat:pgsql): Started node1.example.local
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Add a new node on PaceMaker cluster [3]
-
We removed
node2
from cluster configuration. Now create a new node with the appropriate settings as described in the following articles. -
Make sure to set same password for
hacluster
user
[root@node2 ~]# echo "centos" | passwd hacluster --stdin
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
-
Start and enable
Pacemaker
cluster services.
# systemctl start pcsd.service; systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Authorize new node to PaceMaker cluster [4]
- Execute the following command on active cluster node to authorize new cluster node.
# pcs cluster auth node2.example.local -u hacluster -p centos
node2.example.local: Authorized
- Execute the following command on active cluster node to add the new node to the existing cluster. This command will sync the cluster configuration file from exiting node to new cluster node.
[root@node1 ~]# pcs cluster node add node2.example.local
Disabling SBD service...
node2.example.local: sbd disabled
node1.example.local: Corosync updated
Setting up corosync...
node2.example.local: Succeeded
Synchronizing pcsd certificates on nodes node2.example.local...
node2.example.local: Success
Restarting pcsd on the nodes in order to reload the certificates...
node2.example.local: Success
Start cluster service manually on new node(node2) [5]
[root@node2 ~]# pcs cluster start
[root@node2 ~]# pcs status
Cluster name: pgsqlcluster
Stack: corosync
Current DC: node1.example.local (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Sun Dec 29 11:36:07 2019
Last change: Sun Dec 29 11:32:43 2019 by hacluster via crmd on node1.example.local
2 nodes configured
4 resources configured
Online: [ node1.example.local node2.example.local ]
Full list of resources:
Resource Group: postgres-group
postgres-lvm-res (ocf::heartbeat:LVM): Started node1.example.local
postgres-fs-res (ocf::heartbeat:Filesystem): Started node1.example.local
POSTGRES-VIP (ocf::heartbeat:IPaddr2): Started node1.example.local
postgresql_service (ocf::heartbeat:pgsql): Started node1.example.local
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Validate PaceMaker cluster service
- Reboot active cluster node and check whether cluster service is moving to newly added cluster node.
[root@node1 ~]# reboot
[root@node2 ~]# pcs status
Cluster name: pgsqlcluster
Stack: corosync
Current DC: node2.example.local (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Sun Dec 29 11:41:44 2019
Last change: Sun Dec 29 11:41:00 2019 by hacluster via crmd on node2.example.local
2 nodes configured
4 resources configured
Online: [ node2.example.local ]
OFFLINE: [ node1.example.local ]
Full list of resources:
Resource Group: postgres-group
postgres-lvm-res (ocf::heartbeat:LVM): Started node2.example.local
postgres-fs-res (ocf::heartbeat:Filesystem): Started node2.example.local
POSTGRES-VIP (ocf::heartbeat:IPaddr2): Started node2.example.local
postgresql_service (ocf::heartbeat:pgsql): Started node2.example.local
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
- Now we validated that cluster service is moved to new cluster node.