Skip to content

Latest commit

 

History

History
3118 lines (2290 loc) · 118 KB

RHCS.md

File metadata and controls

3118 lines (2290 loc) · 118 KB

RHCS, Red Hat Cluster Suite

实验目的:

  • Demo 1 使用 RedHat Enterprise Linux 7.6 上搭建一套双机双业务互为冗余的 VSFTPD RHCS 集群

    Guest Name Hostname Management IP HeartBeat IP Storage IP (Optional)
    rhel76-01 rhel76-node01 192.168.161.12 10.168.161.12 20.168.161.12
    rhel76-02 rhel76-node02 192.168.161.13 10.168.161.13 20.168.161.13
    rhel76-qnetd rhel76-qnetd 10.168.161.14
  • Demo 2 使用 RedHat Enterprise Linux 6.4 上搭建一套双机双业务互为冗余的 VSFTPD RHCS 集群

    Guest Name Hostname Management IP HeartBeat IP Storage IP (Optional)
    rhel64-01 rhel64-node01 192.168.161.15 10.168.161.15 20.168.161.15
    rhel64-02 rhel64-node02 192.168.161.16 10.168.161.16 20.168.161.16

环境准备工作

准备时间源

使用 chrony 或者 ntp 搭建时间源, 此处不做赘述

准备共享存储

共享存储种类:

  生产环境: 一般使用 SAN 存储或者 iSCSI 企业级软件实现共享存储 (如 `OpenFiler`)
  实验环境: 可使用 Linux 系统 Linux-IO Target 实现 iSCSI 共享存储; 或者 KVM/VMware 等虚拟化平台虚拟的共享磁盘
  • Linux-IO Target

    示例: (以下仅配置一块共享磁盘, 如果需要双机双业务, 是需要两块共享磁盘的)

    • 安装

      yum install targetcli
      
      systemctl enable --now target
      ~] targetcli ls 
      
      o- / .................................................................... [...]
        o- backstores ......................................................... [...]
        | o- block ............................................. [Storage Objects: 0]
        | o- fileio ............................................ [Storage Objects: 0]
        | o- pscsi ............................................. [Storage Objects: 0]
        | o- ramdisk ........................................... [Storage Objects: 0]
        o- iscsi ....................................................... [Targets: 0]
        o- loopback .................................................... [Targets: 0]
    • 创建 block

      targetcli /backstores/block create disk01 /dev/sdb
    • 创建 target, 并分配给 Inititor

      targetcli /iscsi create iqn.2019-12.com.test:rhcs
      targetcli /iscsi/iqn.2019-12.com.test:rhcs/tpg1/acls create iqn.2019-12.com.test:rhcs_node01
      targetcli /iscsi/iqn.2019-12.com.test:rhcs/tpg1/acls create iqn.2019-12.com.test:rhcs_node02
    • 将之前创建的 block 分配给 target

      targetcli /iscsi/iqn.2019-12.com.test:rhcs/tpg1/luns create /backstores/block/disk01
    • 配置监听: 取消默认的 0.0.0.0:3260, 设置为存储网的IP 20.168.161.240:3260

      targetcli /iscsi/iqn.2019-12.com.test:rhcs/tpg1/portals delete 0.0.0.0 3260
      
      targetcli /iscsi/iqn.2019-12.com.test:rhcs/tpg1/portals create 20.168.161.240 3260

    配置完成:

    ~] targetcli ls 
    
    o- / ........................................................................... [...]
      o- backstores ................................................................ [...]
      | o- block .................................................... [Storage Objects: 1]
      | | o- disk01 ............................ [/dev/sdc (10.0GiB) write-thru activated]
      | |   o- alua ..................................................... [ALUA Groups: 1]
      | |     o- default_tg_pt_gp ......................... [ALUA state: Active/optimized]
      | o- fileio ................................................... [Storage Objects: 0]
      | o- pscsi .................................................... [Storage Objects: 0]
      | o- ramdisk .................................................. [Storage Objects: 0]
      o- iscsi .............................................................. [Targets: 1]
      | o- iqn.2019-12.com.test:rhcs ........................................... [TPGs: 1]
      |   o- tpg1 ................................................. [no-gen-acls, no-auth]
      |     o- acls ............................................................ [ACLs: 2]
      |     | o- iqn.2019-12.com.test:rhcs_node01........................ [Mapped LUNs: 1]
      |     | | o- mapped_lun0 .................................. [lun0 block/disk01 (rw)]
      |     | o- iqn.2019-12.com.test:rhcs_node02........................ [Mapped LUNs: 1]
      |     |   o- mapped_lun0 .................................. [lun0 block/disk01 (rw)]
      |     o- luns ............................................................ [LUNs: 1]
      |     | o- lun0 ........................[block/disk01 (/dev/sdb) (default_tg_pt_gp)]
      |     o- portals ...................................................... [Portals: 1]
      |       o- 20.168.161.240:3260 ................................................ [OK]
      o- loopback ........................................................... [Targets: 0]
    
    • 防火墙配置

      ~] netstat -an | grep 3260
      
      tcp        0      0 20.168.161.240:3260       0.0.0.0:*               LISTEN 
      firewall-cmd --add-service=iscsi-target --permanent
      firewall-cmd --reload
    • 已知问题: 如果客户端挂载了服务端共享的磁盘, 并对磁盘使用 Lvm 创建相应 PV, VG, LV; 当服务端操作系统重启后, target 可能丢失 block 。

      原因: 服务端的 lvm2-lvmetad.service 将客户端的 Lvm 元素据识别并纳管, 导致 target 绑定的磁盘 /dev/sdb 无法被识别。

      解决: 修改 /etc/lvm/lvm.confvolume_list = [ "rhel_host0" ], 即只将主机上的卷组添加进去, 其他的不添加。修改完毕以后, 关闭 target 服务, 重启 lvm2-lvmetad.service (建议重启操作系统)

  • KVM 虚拟机使用共享磁盘

    # 创建
    qemu-img create -f raw /path/to/10g-01.raw 10G
    qemu-img create -f raw /path/to/10g-02.raw 10G
    
    # 为两个节点挂载上共享磁盘
    # 此处两个节点假设为 node01 和 node02
    virsh attach-disk --domain node01 --source /path/to/10g-01.raw --target vdb --targetbus virtio --driver qemu --subdriver raw --shareable --current
    virsh attach-disk --domain node01 --source /path/to/10g-01.raw --target vdb --targetbus virtio --driver qemu --subdriver raw --shareable --config
    virsh attach-disk --domain node01 --source /path/to/10g-02.raw --target vdc --targetbus virtio --driver qemu --subdriver raw --shareable --current
    virsh attach-disk --domain node01 --source /path/to/10g-02.raw --target vdc --targetbus virtio --driver qemu --subdriver raw --shareable --config
    
    virsh attach-disk --domain node02 --source /path/to/10g-01.raw --target vdb --targetbus virtio --driver qemu --subdriver raw --shareable --current
    virsh attach-disk --domain node02 --source /path/to/10g-01.raw --target vdc --targetbus virtio --driver qemu --subdriver raw --shareable --config
    virsh attach-disk --domain node02 --source /path/to/10g-02.raw --target vdb --targetbus virtio --driver qemu --subdriver raw --shareable --current
    virsh attach-disk --domain node02 --source /path/to/10g-02.raw --target vdc --targetbus virtio --driver qemu --subdriver raw --shareable --config
  • VMware 虚拟机使用共享磁盘

    Workstation/vSphere 等可创建使用共享磁盘, 此处不做介绍。

Demo 1 - RHEL7.6 - 双机双业务互为冗余的 VSFTPD RHCS 集群

1.1 配置时间同步

两个节点配置到同一时间源, 使用 ntpd 或者 chronyd 均可

1.2 配置主机解析记录

两个节点都需要配置, 在 /etc/hosts 添加以下两行; 注意使用的 IP 是心跳 IP, 如果资源不足也可和管理 IP 共用

~] vi /etc/hosts

10.168.161.12 rhel76-node01
10.168.161.13 rhel76-node02

1.3 配置网卡绑定

有网络冗余要求, 可配置 Team 或者 Bonding, Refer to: Bonding or Team

1.4 配置共享存储

KVM/VMware 虚拟机使用共享磁盘, 直接在平台操作挂载以后即可, 无需额外操作, 下文介绍 iSCSI Inititor 配置方法

  • 1.4.1 安装

    yum install iscsi-initiator-utils
  • 1.4.2 配置

    修改 InititorName, 与 Target 端配置的保持一致:

    # node01
    ~] vi /etc/iscsi/initiatorname.iscsi
    InitiatorName=iqn.2019-12.com.test:rhcs_node01
    
    # node02
    ~] vi /etc/iscsi/initiatorname.iscsi   
    InitiatorName=iqn.2019-12.com.test:rhcs_node02

    启动 iscsiiscsid 服务, 并设置自启:

    systemctl restart iscsi
    systemctl restart iscsid
    
    systemctl enable iscsi
    systemctl enable iscsid
  • 1.4.3 发现 iSCSI 目标

    ~] iscsiadm --mode discoverydb --type sendtargets --portal 20.192.168.1 --discover
    
    20.192.168.1:3260,1 iqn.2019-12.com.test:rhcs
  • 1.4.4 登录/连接

    ~] iscsiadm --mode node --targetname iqn.2019-12.com.test:rhcs --portal 20.192.168.1:3260 --login
    
    Logging in to [iface: default, target: iqn.2019-12.com.test:targer01, portal: 20.20.20.240,3260] (multiple)
    Login to [iface: default, target: iqn.2019-12.com.test:targer01, portal: 20.20.20.240,3260] successful.
  • 1.4.5 登出/断开连接

    先取消所有磁盘占用, 然后执行以下命令:

    iscsiadm --mode node --targetname iqn.2019-12.com.test:rhcs --portal 20.192.168.1:3260 --logout

以上三步 (挂载, 登录, 登出) 可参考 iscsiadm man 文档的 EXAMPLE 部分获取帮助

两个节点均发现磁盘, 表明配置正常:

~] lsblk

NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0          11:0    1 1024M  0 rom  
vda         253:0    0   20G  0 disk 
├─vda1      253:1    0    1G  0 part /boot
├─vda2      253:2    0    2G  0 part [SWAP]
└─vda3      253:3    0   17G  0 part /
vdb         253:16   0   10G  0 disk 
vdc         253:32   0   10G  0 disk 

1.5 配置文件系统

任一节点执行创建操作:

pvcreate /dev/vdb
vgcreate rhcs01 /dev/vdb
lvcreate -n data01 -l 100%FREE rhcs01
mkfs.xfs /dev/mapper/rhcs01-data01

pvcreate /dev/vdc
vgcreate rhcs02 /dev/vdc
lvcreate -n data02 -l 100%FREE rhcs02
mkfs.xfs /dev/mapper/rhcs02-data02

执行导入导出, 让两个节点都能识别 LVM 信息:

  • 当前节点将卷组失活, 然后导出卷组:

    vgchange -an rhcs01 rhcs02
    vgexport rhcs01 rhcs02
  • 另一节点导入, 并激活卷组:

    vgimport rhcs01 rhcs02
    vgchange -ay rhcs01 rhcs02

    查看

    ~] lvs
      LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      data01 rhcs01 -wi-a----- <10.00g
      data02 rhcs02 -wi-a----- <10.00g
  • 正常识别后, 将所有节点将卷组取消激活

    vgchange -an rhcs01
    vgchange -an rhcs02

1.6 配置 VSFTPD 服务

本次实验搭建双机双业务互为冗余的 VSFTPD 集群, 因此两个节点都需要配置 VSFTPD 服务

  • 1.6.1 添加用户及挂载点

    mkdir /data01
    mkdir /data02
    yum install -y vsftpd
    
    useradd ftpuser01
    useradd ftpuser0101
    useradd ftpuser02
    useradd ftpuser0202
    for user in ftpuser01{,01} ftpuser02{,02} ; do echo '111' | passwd --stdin ${user}; done
  • 1.6.2 修改 VSFTPD 配置文件

    两个节点都需要添加这两个配置文件 /etc/vsftpd/vsftpd_01.conf, /etc/vsftpd/vsftpd_02.conf, 分别配置两个 VSFTPD 实例:

    ~] vi /etc/vsftpd/vsftpd_01.conf
    
    anonymous_enable=NO
    
    local_enable=YES
    local_root=/data01
    chroot_local_user=NO
    chroot_list_enable=YES
    chroot_list_file=/etc/vsftpd/chroot_list01
    allow_writeable_chroot=NO
    guest_enable=NO
    
    dirmessage_enable=YES
    connect_from_port_20=YES
    listen=YES
    listen_address=192.168.161.14
    listen_ipv6=NO
    pam_service_name=vsftpd
    userlist_enable=YES
    userlist_deny=NO
    userlist_file=/etc/vsftpd/user_list01
    tcp_wrappers=YES
    
    # 日志配置
    xferlog_enable=YES
    xferlog_std_format=YES
    xferlog_file=/var/log/xferlog01
    dual_log_enable=YES
    vsftpd_log_file=/var/log/vsftpd01.log
    ~] vi /etc/vsftpd/vsftpd_02.conf
    
    anonymous_enable=NO
    
    local_enable=YES
    local_root=/data02
    chroot_local_user=NO
    chroot_list_enable=YES
    chroot_list_file=/etc/vsftpd/chroot_list02
    allow_writeable_chroot=NO
    guest_enable=NO
    
    dirmessage_enable=YES
    connect_from_port_20=YES
    listen=YES
    listen_address=192.168.161.15
    listen_ipv6=NO
    pam_service_name=vsftpd
    userlist_enable=YES
    userlist_deny=NO
    userlist_file=/etc/vsftpd/user_list02
    tcp_wrappers=YES
    
    # 日志配置
    xferlog_enable=YES
    xferlog_std_format=YES
    xferlog_file=/var/log/xferlog02
    dual_log_enable=YES
    vsftpd_log_file=/var/log/vsftpd02.log

    两个节点都需要在 /etc/vsftpd/ 下添加 user_listchroot_list 共四个文件, 和主配置文件中相应配置项保持一致:

    ~] vi user_list01
    ——
    
    ~] vi user_list02
    ftpuser02
    ftpuser0202
    
    ~] vi chroot_list01
    ftpuser01
    ftpuser0101
    
    ~] vi chroot_list02
    ftpuser02
    ftpuser0202

    如果需要 "禁用主动模式, 启动被动模式", 并限制端口范围, 可以参考以下配置:

    port_enable=NO
    pasv_enable=YES
    pasv_min_port=2226
    pasv_max_port=2229
    
  • 1.6.3 防火墙配置

    如果启用了防火墙, 则需要添加策略:

    firewall-cmd --add-service=ftp --permanent
    firewall-cmd --reload

1.7 配置集群

  • 1.7.1 安装集群套件

    yum groupinstall 'High Availability'

    如果启用了防火墙, 则需要添加策略:

    firewall-cmd --add-service=high-availability --permanent
    firewall-cmd --reload
  • 1.7.2 初始化集群

    • (1) 启动 pcsd 服务

      设置开机自启:

      systemctl start pcsd.service
      systemctl enable pcsd.service
    • (2) 修改 hacluster 服务用户密码

      hacluster 用户是集群 pcsd 进程认证需要使用的用户; 添加节点到集群时, 需要验证此用户的密码

      echo '123qweQ' | passwd hacluster  --stdin
    • (3) 节点认证

      pcs cluster auth [node] [...] [-u username] [-p password]
      • 每个节点中 pcsd 管理员用户名必须为 hacluster
      • 如果未指定用户名或密码, 系统会在执行该命令时提示您为每个节点指定那些参数
      • 如果未指定任何节点, 且之前运行过该命令, 则这个命令会在所有所有使用 pcs cluster setup 命令指定的节点中认证 pcsd
      • 授权令牌保存在 ~/.pcs/tokens/var/lib/pcsd/tokens
      ~] pcs cluster auth rhel76-node01 rhel76-node02
      
      Username: hacluster
      Password: 
      node01: Authorized
      node02: Authorized
  • 1.7.3 创建集群

    (1) 创建

    pcs cluster setup --name Cluster-VSFTPD rhel76-node01 rhel76-node02

    创建完以后可查看集群状态, 此时集群未启动

    ~] pcs status
    Error: cluster is not currently running on this node

    (2) 启动

    pcs cluster start --all

    上面命令等同于以下两条命令:

    systemctl start corosync.service
    systemctl start pacemaker.service

    (3) 设置自启动

    systemctl enable corosync.service pacemaker.service
  • 1.7.4 状态检查

    • (1) 检查 corosync 状态

      • corosync 通信状态:

         ~] corosync-cfgtool -s
        
        Printing ring status.
        Local node ID 2
        RING ID 0
                id      = 10.168.161.13
                status  = ring 0 active with no faults
      • 成员关系与 quorum:

        ~] corosync-cmapctl  | grep members
        
        runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
        runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.168.161.12) 
        runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
        runtime.totem.pg.mrp.srp.members.1.status (str) = joined
        runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
        runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.168.161.13) 
        runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
        runtime.totem.pg.mrp.srp.members.2.status (str) = joined
        
        ~] pcs status corosync
        
        Membership information
        ----------------------
            Nodeid      Votes Name
                 1          1 rhel76-node01 (local)
                 2          1 rhel76-node02
    • (2) 检查 pacemaker 状态

      ~] ps axf |grep pacemaker
      
      4810 pts/0    S+     0:00      |   \_ grep --color=auto pacemaker
      4619 ?        Ss     0:00 /usr/sbin/pacemakerd -f
      4620 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib
      4621 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
      4622 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd
      4623 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd
      4624 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine
      4625 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd
      
      ~] pcs status
      ~] pcs cluster cib
    • (3) 集群基础配置信息检测

      ~] crm_verify -L -V
      
         error: unpack_resources:	Resource start-up disabled since no STONITH resources have been defined
         error: unpack_resources:	Either configure some or disable STONITH with the stonith-enabled option
         error: unpack_resources:	NOTE: Clusters with shared data need STONITH to ensure data integrity
      Errors found during check: config not valid

      注: STONITH/Fencing 默认开启, 可以先暂时关闭:

      By default pacemaker enables STONITH (Shoot The Other Node In The Head ) / Fencing in an order to protect the data. Fencing is mandatory when you use the shared storage to avoid the data corruptions.)

      ~] pcs property set stonith-enabled=false
      
      ~] pcs property show stonith-enabled
      Cluster Properties:
       stonith-enabled: false

1.8 配置服务托管

  • 1.8.1 准备工作

    查看集群资源代理标准:

    ~] pcs resource standards
    lsb            # Open cluster Framework
    ocf            # Linux standard base (legacy init scripts)
    service        # Based on Linux "service" command
    systemd        # systemd based service Management
    stonith        # Fencing Resource standard (实际测试没有该项, 不知道是不是VM的原因)

    查看 ocf 资源代理提供程序:

    ~] pcs resource providers
    heartbeat
    openstack
    pacemaker

    查看 ocf 标准, heartbeat 提供的内建类型:

    pcs resource agents ocf             # 查看 ocf 提供的所有内建类型
    
    pcs resource agents ocf:heartbeat   # 查看 ocf 标准 heartbeat 提供的内建类型

    查看所有资源类型:

    pcs resource list

    查看具体资源类型的信息:

    pcs resource list IPaddr2
    pcs resource describe IPaddr2
  • 1.8.2 添加 IP

    pcs resource create IP_161.14 ocf:heartbeat:IPaddr2 ip=192.168.161.14 cidr_netmask=24 nic=eth0 op monitor interval=30s
    
    pcs resource create IP_161.15 ocf:heartbeat:IPaddr2 ip=192.168.161.15 cidr_netmask=24 nic=eth0 op monitor interval=30s

    关于 op monitor interval=30s: 此项配置是修改监控间隔为 30s, 覆盖默认的配置; 30s 间隔并不是每隔 30s 就检测一次, 而是上一次完成检测后 20s 再次进行检测

    IPaddr2 默认的 op:

    Default operations:
      start: interval=0s timeout=20s
      stop: interval=0s timeout=20s
      monitor: interval=10s timeout=20s
    

    查看创建后的资源信息:

    ~] pcs resource show IP_161.14
    
     Resource: IP_161.14 (class=ocf provider=heartbeat type=IPaddr2)
      Attributes: cidr_netmask=24 ip=192.168.161.14 nic=eth0
      Operations: monitor interval=30s (IP_161.14-monitor-interval-30s)
                  start interval=0s timeout=20s (IP_161.14-start-interval-0s)
                  stop interval=0s timeout=20s (IP_161.14-stop-interval-0s)
    
    ~] pcs resource show IP_161.15
    
     Resource: IP_161.15 (class=ocf provider=heartbeat type=IPaddr2)
      Attributes: cidr_netmask=24 ip=192.168.161.15 nic=eth0
      Operations: monitor interval=30s (IP_161.15-monitor-interval-30s)
                  start interval=0s timeout=20s (IP_161.15-start-interval-0s)
                  stop interval=0s timeout=20s (IP_161.15-stop-interval-0s)
  • 1.8.3 添加 HA-LVM

    将卷组交由 RHCS 集群管理, 需先解除本地 LVM 对卷组的管理, 然后配置集群资源管理卷组

    • (1) 解除本地 LVM 对卷组的管理

      • 修改配置文件

        ~] vi /etc/lvm/lvm.conf
        
        locking_type = 1
        use_lvmetad = 0
        volume_list = [ "rhel-root" ]

        注: volume_list = [ "rhel-root" ] 标记本地 LVM 管理的卷组, 除集群管理的卷组均需要填写进去; 如果无, 则配置成 "volume_list = [ ]"

      • 关闭服务:

        systemctl stop lvm2-lvmetad.service lvm2-lvmetad.socket
        systemctl disable lvm2-lvmetad.service

      以上两步可以直接执行 lvmconf --enable-halvm --services --startstopservices, 然后检查 /etc/lvm/lvm.conf 配置, 注意非集群管理的卷组都要包含在 volume_list = [ ]

      • 重建 initramfs

        cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.$(date +'%Y-%m-%d-%H%M%S').bak
        dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
      • 重启操作系统

    • (2) 配置集群资源管理卷组

      pcs resource create VG_rhcs01 ocf:heartbeat:LVM volgrpname=rhcs01 exclusive=yes
      pcs resource create VG_rhcs02 ocf:heartbeat:LVM volgrpname=rhcs02 exclusive=yes

      注: (1) ocf:heartbeat:LVM 可简写成 LVM; (2) exclusive=yes 表示独占激活

      添加完成以后, 两个卷组分别挂载到不同节点:

      ~] pcs status 
      
      Cluster name: Cluster-VSFTPD
      Stack: corosync
      Current DC: rhel76-node01 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
      Last updated: Tue Mar 15 10:14:45 2022
      Last change: Tue Mar 15 09:59:46 2022 by root via cibadmin on rhel76-node01
      
      2 nodes configured
      4 resources configured
      
      Online: [ rhel76-node01 rhel76-node02 ]
      
      Full list of resources:
      
       IP_161.14      (ocf::heartbeat:IPaddr2):       Started rhel76-node01
       IP_161.15      (ocf::heartbeat:IPaddr2):       Started rhel76-node02
       VG_rhcs01      (ocf::heartbeat:LVM):   Started rhel76-node01    # <= 节点 1
       VG_rhcs02      (ocf::heartbeat:LVM):   Started rhel76-node02    # <= 节点 2
      
      Daemon Status:
        corosync: active/enabled
        pacemaker: active/enabled
        pcsd: active/enabled

      此时, 分别登录两个节点查看 LV 信息, 一个节点只有一个 LV 是 active 状态

      rhel76_node01 ] lvs
        LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
        data01 rhcs01 -wi-a----- <10.00g  # <= a: active
        data02 rhcs02 -wi------- <10.00g 
      
      rhel76_node02 ] lvs
        LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
        data01 rhcs01 -wi------- <10.00g  
        data02 rhcs02 -wi-a----- <10.00g  # <= a: active
  • 1.8.4 添加 FileSystem

    pcs resource create FS_data01 ocf:heartbeat:Filesystem device="/dev/mapper/rhcs01-data01" directory="/data01" fstype="xfs"
    pcs resource create FS_data02 ocf:heartbeat:Filesystem device="/dev/mapper/rhcs02-data02" directory="/data02" fstype="xfs"

    注: (1) ocf:heartbeat:Filesystem 可简写成 Filesystem

  • 1.8.5 添加 VSFTPD 服务

    取消 Systemd 开机自启动:

    systemctl disable vsftpd

    添加服务托管:

    pcs resource create VSFTPD_01 systemd:vsftpd@vsftpd_01
    pcs resource create VSFTPD_02 systemd:vsftpd@vsftpd_02
  • 1.8.6 创建资源组

    pcs resource group add VSFTPD_GROUP_01 IP_161.14 VG_rhcs01 FS_data01 VSFTPD_01
    pcs resource group add VSFTPD_GROUP_02 IP_161.15 VG_rhcs02 FS_data02 VSFTPD_02
  • 1.8.7 添加约束条件

    查看约束条件可使用以下格式:

    pcs constraint ref <resource>                               # 列出指定资源的约束条件
    pcs constraint [order|colocation|location] [show] [--full]  # 列出约束条件
        --full      # If '--full' is specified also list the constraint ids
    
    • (1) 添加 order 类约束

      语法:

      order [action] <resource id> then [action] <resource id> [options]
      

      配置:

      1. 确保 IP 和 FS 都正常启动以后, 才启动 VSFTPD
      2. 确保 VG 正常识别后, 才挂载 FS
      pcs constraint order start IP_161.14 then VSFTPD_01
      pcs constraint order start FS_data01 then VSFTPD_01
      pcs constraint order start VG_rhcs01 then FS_data01
      
      pcs constraint order start IP_161.15 then VSFTPD_02
      pcs constraint order start FS_data02 then VSFTPD_02
      pcs constraint order start VG_rhcs02 then FS_data02

      查看配置结果:

      ~] pcs constraint order --full
      
      Ordering Constraints:
        start IP_161.14 then start VSFTPD_01 (kind:Mandatory) (id:order-IP_161.14-VSFTPD_01-mandatory)
        start FS_data01 then start VSFTPD_01 (kind:Mandatory) (id:order-FS_data01-VSFTPD_01-mandatory)
        start VG_rhcs01 then start FS_data01 (kind:Mandatory) (id:order-VG_rhcs01-FS_data01-mandatory)
        start IP_161.15 then start VSFTPD_02 (kind:Mandatory) (id:order-IP_161.15-VSFTPD_02-mandatory)
        start FS_data02 then start VSFTPD_02 (kind:Mandatory) (id:order-FS_data02-VSFTPD_02-mandatory)
        start VG_rhcs02 then start FS_data02 (kind:Mandatory) (id:order-VG_rhcs02-FS_data02-mandatory)
    • (2) 添加 colocation 类约束

      注: 如果设置了资源组, colocation 类可不用设置, 因为资源组本就是只能启动在一个节点上

      语法:

      colocation add [master|slave] <source resource id> with [master|slave] <target resource id> [score] [options] [id=constraint-id]
      
      # Request <source resource> to run on the same node where pacemaker has determined <target resource> should run.
      

      配置:

      pcs constraint colocation add VG_rhcs01 with FS_data01
      pcs constraint colocation add IP_161.14 with VSFTPD_01
      pcs constraint colocation add FS_data01 with VSFTPD_01
      
      pcs constraint colocation add VG_rhcs02 with FS_data02
      pcs constraint colocation add IP_161.15 with VSFTPD_02
      pcs constraint colocation add FS_data02 with VSFTPD_02

      查看配置结果:

      ~] pcs constraint colocation
      Colocation Constraints:
        VG_rhcs01 with FS_data01 (score:INFINITY)
        IP_161.14 with VSFTPD_01 (score:INFINITY)
        FS_data01 with VSFTPD_01 (score:INFINITY)
        VG_rhcs02 with FS_data02 (score:INFINITY)
        IP_161.15 with VSFTPD_02 (score:INFINITY)
        FS_data02 with VSFTPD_02 (score:INFINITY)
      
    • (3) 添加location类约束

      语法:

      # Create a location constraint on a resource to prefer the specified node with score (default score: INFINITY).
      location <resource> prefers <node>[=<score>] [<node>[=<score>]]...
      
      # Create a location constraint on a resource to avoid the specified node with score (default score: INFINITY).
      location <resource> avoids <node>[=<score>] [<node>[=<score>]]...
      

      配置:

      pcs constraint location VSFTPD_GROUP_01 prefers rhel76-node01=200 rhel76-node02=20
      pcs constraint location VSFTPD_GROUP_02 prefers rhel76-node01=20 rhel76-node02=200

      查看配置结果:

      ~] pcs constraint location show
      Location Constraints:
        Resource: VSFTPD_GROUP_01
          Enabled on: rhel76-node01 (score:200)
          Enabled on: rhel76-node02 (score:20)
        Resource: VSFTPD_GROUP_02
          Enabled on: rhel76-node01 (score:20)
          Enabled on: rhel76-node02 (score:200)

1.9 配置 Fence

  • 1.9.1 引言

    上面配置完成以后: VSFTPD_GROUP_01 运行在 rhel76-node01 上, VSFTPD_GROUP_02 运行在 rhel76-node02 上;

    如果 down 掉 rhel76-node01 的心跳网卡 eth1, 模拟节点网卡故障:

    • rhel76-node02 "认为" rhel76-node01 失联 - 开始接管 VSFTPD_GROUP_01 服务
    • rhel76-node01 同样 "认为" rhel76-node02 失联 - 开始接管 VSFTPD_GROUP_02 服务

    上面的情形很容易造成相互抢占资源, 而且不释放已经争抢到的资源; 严重情况下可能会导致数据丢失, 磁盘损坏等.

    为了避免上因此需要给集群各节点配置 Fence 监控节点状态, 如果节点出现故障而未释放资源时, 做出预设的操作来保证集群正常工作; 偶数节点/两节点的集群, 同时搭配仲裁设备来完善.

  • 1.9.2 Fence 类型

    1. 如果使用的是 VMware vSphere 虚拟化平台的虚拟机来搭建的 RHCS 集群, 可使用 vCenter/ESXi 的接口来配置 Fence (fence_vmware_soap)
    2. 如果使用的是 KVM 类虚拟化平台的虚拟机搭建 RHCS 集群, 可在宿主机配置 fence_virtd 来执行节点 Fence (fence_xvm)
    3. 物理机搭建 RHCS 时, 可配置通过 带外/管理口/IPMI 来配置 Fence (fence_ipmilan).
  • 1.9.3 前置配置

    触发 Fence 操作时, 节点主机应该立刻 "断电关机/重启", 即 powered off immediately, 而不是执行普通的 "系统关机", 即 shutdown gracefully

    为了达到此要求, 需要关闭 主机/操作系统 的 ACPI Soft-Off 功能:

    1. 主机层面, 可以在 BIOS 中关闭
    2. 操作系统层面, 可以通过 禁用对应服务 或者配置内核参数彻底禁用此功能。

    具体操作如下:

    • RHEL 5,6:

      The preferred method of disabling ACPI Soft-Off is with chkconfig management. If the preferred method is not effective for your cluster, you can disable ACPI Soft-Off with the BIOS power management. If neither of those methods is effective for your cluster, you can disable ACPI completely by appending acpi=off to the kernel boot command line in the grub.conf file.

      • Disabling ACPI Soft-Off with the BIOS

        BIOS CMOS Setup Utility, Soft-Off by PWR-BTTN set to Instant-Off

        Refer to Redhat Document ->

      • Disabling ACPI Soft-Off with chkconfig

        chkconfig --del acpid

        or

        chkconfig --level 345 acpid off

        Then reboot the node.

      • Disabling ACPI Completely in the grub.conf File

        ~] vi /boot/grub/grub.conf
        ...
        title Red Hat Enterprise Linux Server (2.6.32-193.el6.x86_64)
                root (hd0,0)
                kernel /vmlinuz-2.6.32-193.el6.x86_64 ... acpi=off   # <= 添加 acpi=off
        ...
        
        ~] reboot
    • RHEL 7,8:

      You can disable ACPI Soft-Off with one of the following alternate methods:

      • Disabling ACPI Soft-Off with the BIOS

        BIOS CMOS Setup Utility, Soft-Off by PWR-BTTN set to Instant-Off

        Refer to Redhat Document ->

      • Disabling ACPI Soft-Off in the logind.conf file

        ~] vi /etc/systemd/logind.conf
        HandlePowerKey=ignore
        
        ~] systemctl daemon-reload
        ~] systemctl restart systemd-logind.service
        
      • Disabling ACPI Completely in the GRUB 2 File

        This method completely disables ACPI; some computers do not boot correctly if ACPI is completely disabled. Use this method only if the other methods are not effective for your cluster.

        ~] grubby --args=acpi=off --update-kernel=ALL
        ~] reboot
  • 1.9.4 添加 vCenter 或 Esxi 作为 Fence 设备

    # Examples:
        Hostnames: node1, node2.
        VM names: node1-vm, node2-vm.
    

    检查连接是否正常:

    ~] fence_vmware_soap -a <vCenter/ESXi IP address> -l <vCenter/ESXi username> -p <vCenter/ESXi password> [--ssl] --ssl-insecure -o status
    Status: ON

    找到虚拟机信息:

    ~] fence_vmware_soap -a <vCenter/ESXi IP address> -l <vCenter/ESXi username> -p <vCenter/ESXi password> [--ssl] --ssl-insecure -o list | egrep '(node1-vm|node2-vm)'
    node1-vm,11111111-aaaa-bbbb-cccc-111111111111
    node2-vm,22222222-dddd-eeee-ffff-222222222222

    添加 Fencing:

    参考链接: https://access.redhat.com/solutions/917813

    # 查看 fence_vmware_soap 的配置参考 
    pcs stonith describe fence_vmware_soap
    
    # 添加
    pcs stonith create FTP_fence_vmware fence_vmware_soap inet4_only=1 ipport=443 ipaddr="192.168.163.252" login="[email protected]" passwd="1qaz@WSX4rfv" ssl_insecure=1 pcmk_host_map="node1:11111111-aaaa-bbbb-cccc-111111111111;node2:22222222-dddd-eeee-ffff-222222222222" pcmk_host_list="node1-vm,node2-vm" pcmk_host_check=static-list
    # pcmk_host_map 也可以写成 "node1:node1-vm;node2:node2-vm"
  • 1.9.5 IPMI 设置 Fence

    # 检查连接状态
    ~] fence_ipmilan -a <IP> -P -l <username> -p <password> –o status
    Status: ON    # ON 表示正常
    
    # 检查连接状态
    ~] ipmitool -H <IP> -I lanplus -U <username> [-L ADMINISTRATOR] -P <password> chassis power status -vvv
    
    # 配置
    ~] pcs stonith create <NAME> fence_ipmilan pcmk_host_list='cnsz03016' pcmk_host_check='static-list' ipaddr='10.0.64.115' login='USERID' passwd='PASSW0RD' lanplus=1 power_wait=4 pcmk_reboot_action='reboot' op monitor interval=30s

    pcmk_reboot_action 用于指定 Fence 操作, 默认指令为 reboot, 可按需求修改, 如改成 off (只关机不开机)

  • 1.9.6 KVM 虚拟机配置 Fence

    • KVM 宿主机配置

      It is needed to setup fence_virtd on the KVM host so that fence_xvm can be configured on the virtual machines. fence_virtd is a host daemon designed to route fencing requests for virtual machines

      1. Install:

        yum install fence-virt fence-virtd fence-virtd-libvirt fence-virtd-multicast fence-virtd-serial
      2. Create and distribute fence key:

        mkdir -p /etc/cluster
        dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4k count=1
        
        # copy key to all nodes
        scp /etc/cluster/fence_xvm.key nodeX:/etc/cluster/
      3. Create /etc/fence_virt.conf file:

        ~] fence_virtd -c
        
        ...
        Interface [virbr0]: br-heartb   # <= br-heartb: 心跳网
        ...
        ...
        Replace /etc/fence_virt.conf with the above [y/N]? y    # <= y: 确认修改
      4. Start the fence_virtd service

        # <= 6
        service fence_virtd restart
        chkconfig fence_virtd on
        
        # >= 7
        systemctl restart fence_virtd
        systemctl enable fence_virtd
    • 节点配置

      1. Ensure fence-virt package is installed on each cluster node

        rpm -qa fence-virt
      2. Firewall settings

        # <= 6
        iptables -I INPUT -m state --state NEW -p tcp --dport 1229 -j ACCEPT
        service iptables save
        service iptables restart
        
        # >= 7
        firewall-cmd --permanent --add-port=1229/tcp
        firewall-cmd --reload
      3. Test fencing: In order that the fencing to be successful, below command should succeed on host as well as cluster nodes.

        fence_xvm -o list
        fence_xvm -o reboot -H <cluster-node>
      4. (Optional) Edit /etc/hosts: 按需决定是否添加 Fence 使用的网络到虚拟机名称的解析记录(最好使用与心跳IP不同网段)

        配置了解析以后, 添加 Fence 设备时, 可以直接使用 IP 配置, 而不用指定主机的 Guest Name (虚拟机名字)

        ~] vi /etc/hosts
        
        10.168.161.12 rhel76-node01
        10.168.161.13 rhel76-node02
        
        xx.xx.xx.xx rhel76-01
        xx.xx.xx.xx rhel76-02
    • 为集群节点添加 Fence 代理

      pcs stonith create VSFTPD_xvmfence fence_xvm key_file=/etc/cluster/fence_xvm.key
      pcs stonith create VSFTPD_xvmfence fence_xvm pcmk_host_check=static-list pcmk_host_map="rhel76-node01:rhel76-01;rhel76-node02:rhel76-02" key_file=/etc/cluster/fence_xvm.key
  • 1.9.7 后置操作

    前文中将 STONITH/Fencing 暂时关闭了, 配置完成以后需要开启:

    ~] pcs property set stonith-enabled=true
    
    ~] pcs property show
    
    Cluster Properties:
     cluster-infrastructure: corosync
     cluster-name: Cluster-VSFTPD
     dc-version: 1.1.19-8.el7-c3c624ea3d
     have-watchdog: false
     last-lrm-refresh: 1647849911
     stonith-enabled: true   # <= 此处已修改成 true
  • 1.9.7 查看 Fence 配置

    ~] pcs stonith show --full
    
    Resource: FTP_fence_vmware (class=stonith type=fence_vmware_soap)
    Attributes: inet4_only=1 ipaddr=192.168.163.252 ipport=443 [email protected] passwd=1qaz@WSX4rfv pcmk_host_check=static-list pcmk_host_list=node01,node02 pcmk_host_map=node01:422a97b9-5f92-a095-db50-c6a08eccda73;node02:422aa805-fe81-638a-02a5-a1985085f68e ssl_insecure=1
    Operations: monitor interval=60s (FTP_fence_vmware-monitor-interval-60s)
    

1.10 配置仲裁

RHEL 使用 votequorum 服务配合 fencing 来避免集群出现 "脑裂" 情况, 以下是关于仲裁的相关介绍:

  • 1.10.1 Quorum - votequorum

    Refer to: votequorum(5)

    • (1) 查看当前集群 Quorum 状态

      The following command shows the quorum configuration.

      pcs quorum [config]

      The following command shows the quorum runtime status.

      pcs quorum status
      ~] pcs quorum status
      Quorum information
      ------------------
      Date:             Sat Mar 26 23:23:35 2022
      Quorum provider:  corosync_votequorum
      Nodes:            2
      Node ID:          1
      Ring ID:          1/212
      Quorate:          Yes
      
      Votequorum information
      ----------------------
      Expected votes:   2
      Highest expected: 2
      Total votes:      2
      Quorum:           1  
      Flags:            2Node Quorate WaitForAll 
      
      Membership information
      ----------------------
          Nodeid      Votes    Qdevice Name
              1          1         NR rhel76-node01 (local)
              2          1         NR rhel76-node02
    • (2) 修改集群 Quorum 选项

      pcs quorum update [auto_tie_breaker=[0|1]] [last_man_standing=[0|1]] [last_man_standing_window=[time-in-ms] [wait_for_all=[0|1]]
      • two_node

        Enables two node cluster operations (default: 0).

        NOTES: enabling two_node: 1 automatically enables wait_for_all. It is still possible to override wait_for_all by explicitly setting it to 0. If more than 2 nodes join the cluster, the two_node option is automatically disabled.

      • wait_for_all

        Enables Wait For All (WFA) feature (default: 0).

        The general behaviour of votequorum is to switch a cluster from inquorate to quorate as soon as possible. For example, in an 8 node cluster, where every node has 1 vote, expected_votes is set to 8 and quorum is (50% + 1) 5. As soon as 5 (or more) nodes are visible to each other, the partition of 5 (or more) becomes quorate and can start operating. (As soon as 5 nodes become quorate, with the other 3 still offline, the remaining 3 nodes will be fenced.)

        When WFA is enabled, the cluster will be quorate for the first time only after all nodes have been visible at least once at the same time.

      • last_man_standing / last_man_standing_window: 10000

        Enables Last Man Standing (LMS) feature (default: 0). Tunable last_man_standing_window (default: 10 seconds expressed in ms).

        Using for example an 8 node cluster where each node has 1 vote, expected_votes is set to 8 and quorate to 5. This condition allows a total failure of 3 nodes. If a 4th node fails, the cluster becomes inquorate and it will stop providing services.

        Enabling LMS allows the cluster to dynamically recalculate expected_votes and quorum under specific circumstances. It is essential to enable WFA when using LMS in High Availability clusters.

        Using the above 8 node cluster example, with LMS enabled the cluster can retain quorum and continue operating by losing, in a cascade fashion, up to 6 nodes with only 2 remaining active.

        Example chain of events:

           1) cluster is fully operational with 8 nodes.
              (expected_votes: 8 quorum: 5)
        
           2) 3 nodes die, cluster is quorate with 5 nodes.
        
           3) after last_man_standing_window timer expires,
              expected_votes and quorum are recalculated.
              (expected_votes: 5 quorum: 3)
        
           4) at this point, 2 more nodes can die and
              cluster will still be quorate with 3.
        
           5) once again, after last_man_standing_window
              timer expires expected_votes and quorum are
              recalculated.
              (expected_votes: 3 quorum: 2)
        
           6) at this point, 1 more node can die and
              cluster will still be quorate with 2.
        
           7) one more last_man_standing_window timer
              (expected_votes: 2 quorum: 2)
        

        NOTES:

        In order for the cluster to downgrade automatically from 2 nodes to a 1 node cluster, the auto_tie_breaker feature must also be enabled (see below).

        If auto_tie_breaker is not enabled, and one more failure occurs, the remaining node will not be quorate.

        LMS does not work with asymmetric voting schemes, each node must vote 1.

        LMS is also incompatible with quorum devices, if last_man_standing is specified in corosync.conf then the quorum device will be disabled.

      • auto_tie_breaker

        Enables Auto Tie Breaker (ATB) feature (default: 0).

        The general behaviour of votequorum allows a simultaneous node failure up to 50% - 1 node, assuming each node has 1 vote.

        When enabled, the cluster can suffer up to 50% of the nodes failing at the same time, in a deterministic fashion. The cluster partition, or the set of nodes that are still in contact with the nodeid configured in auto_tie_breaker_node (or lowest nodeid if not set), will remain quorate. The other nodes will be inquorate.

      • auto_tie_breaker_node: lowest|highest|<list of node IDs>

        节点间出现隔离时, 如果配置 lowest: 默认配置, 使得节点序号小的节点达到 quorate; highest: 使得节点序号大的节点达到 quorate; <list of node IDs>: 指定的列表为优先顺序(空格分割; 此处的 nodeid 可以通过 pcs quorum status 查询)

    • (3) 关闭 quorum

      pcs cluster quorum unblock
    • (4) 管理 quorum device

      1.10.2 详解

    Quorum 相关的管理命令汇总:

    pcs quorum [config]
    pcs quorum status
    pcs quorum device status [--full]
    pcs quorum device add [<generic options>] model <device model> [<model options>]
    pcs quorum device update [<generic options>] [model <model options>]
    pcs quorum device remove
    pcs quorum expectd-vote <vote>
    pcs quorum unblock [--force]
    pcs quorum update [auto_tie_breaker=[0|1]] [last_man_standing=[0|1]] [last_man_standing_window=[<time in ms>]] [wait_for_all=[0|1]]
  • 1.10.2 Quorum Device

    在 RHEL7.4/CentOS7.4 中, Pacemaker 新增了 Quorum Device 的功能, 通过一个新增的服务器作为 Quorum Device, 原有节点通过网络连接到Quorum Device上, 由 Quorum Device 进行仲裁。

    QDeviceQNetd 会参与仲裁决定。在仲裁方 corosync-qnetd 的协助下, corosync-qdevice 会提供一个可配置的投票数, 以使群集可以承受大于标准仲裁规则所允许的节点故障数量。

    QNetd (corosync-qnetd): 一个不属于群集的 systemd 服务, 向 corosync-qdevice 守护程序提供投票的 systemd 守护程序。

    QDevice (corosync-qdevice): 每个群集节点上与 Corosync 一起运行的 systemd 服务。这是 corosync-qnetd 的客户端。QDevice 可以与不同的仲裁方配合工作, 但目前仅支持与 QNetd 配合工作。

    原有的节点保持不动, 找一台新的机器搭建 Quorum Device. 注: 一个集群只能连接到一个 Quorum Device, 而一个 Quorum Device 可以被多个集群所使用。所以如果有多个集群环境, 有一个 Quorum Device 的机器就足够为这些集群提供服务了

    Refer to: corosync-qdevice(8)

    配置 Quorum device 主机:

    1. 额外找一台主机 (10.168.161.14), 安装 pcscorosync-qnetd

      yum install pcs corosync-qnetd
    2. 启动 pcsd 服务

      systemctl enable --now pcsd
    3. 防火墙配置

      # 放行整个 HA 服务
      firewall-cmd --add-service=high-availability
      
      # 或者直接关闭防火墙
      systemctl disable --now firewalld
    4. 配置 quorum device

      仲裁设备目前只支持 net 类型, 其提供以下两种算法:

      • ffsplit: fifty-fifty split. 为活动节点数最多的分区提供一票。

      • lms: last-man-standing. 如果该节点是集群中唯一可以看到 qnetd 服务器(仲裁设备)的节点, 那么它得到一票。

      (1) 添加并启动一个 net 格式的仲裁设备, 同时设置开机自启动

      ~] pcs qdevice setup model net --enable --start
      
      Quorum device 'net' initialized
      quorum device enabled
      Starting quorum device...
      quorum device started

      (2) 添加完成以后, 检查仲裁设备状态

      ~] pcs qdevice status net --full
      
      QNetd address:                  *:5403
      TLS:                            Supported (client certificate required)
      Connected clients:              0
      Connected clusters:             0
      Maximum send/receive size:      32768/32768 bytes

      Quorum Device 节点相关的管理命令汇总:

      pcs qdevice setup model <device model> [--enable] [--start]
      pcs qdevice status <device model> [--full] [<cluster_name>]
      pcs qdevice [start|stop|enable|disable|kill] <device model>
      pcs qdevice destroy <device model>
    5. 添加仲裁设备到集群中

      (1) 集群对仲裁设备节点认证

      # 修改 hacluster 用户密码
      rhel76-qnetd ~] echo '123qweQ.' | passwd --stdin hacluster
      
      # 配置 hosts
      rhel76-qnetd ~] vi /etc/hosts
      ...
      10.168.161.12 rhel76-node01
      10.168.161.13 rhel76-node02
      10.168.161.14 rhel76-qnetd
      ...
      
      rhel76-node01 ~] vi /etc/hosts
      ...
      10.168.161.12 rhel76-node01
      10.168.161.13 rhel76-node02
      10.168.161.14 rhel76-qnetd
      ...
      
      rhel76-node02 ~] vi /etc/hosts
      ...
      10.168.161.12 rhel76-node01
      10.168.161.13 rhel76-node02
      10.168.161.14 rhel76-qnetd
      ...
      
      # 新增认证节点: 任意找一个集群节点, 执行以下命令对 quorum device 节点进行认证
      rhel76-node01 ~] pcs cluster auth rhel76-qnetd

      (2) 添加仲裁设备

      pcs cluster stop --all
      pcs quorum device add model net host=rhel76-qnetd algorithm=ffsplit
      pcs cluster start --all

      (3) 查看 quorum 配置状态

      ~] pcs quorum config
      
      Options:
      Device:
      votes: 1
      Model: net
          algorithm: ffsplit
          host: rhel76-qnetd

      (4) 查看 quorum 运行状态

      ~] pcs quorum status
      
      Quorum information
      ------------------
      Date:             Sun Mar 27 16:39:29 2022
      Quorum provider:  corosync_votequorum
      Nodes:            2
      Node ID:          2
      Ring ID:          1/240
      Quorate:          Yes
      
      Votequorum information
      ----------------------
      Expected votes:   3
      Highest expected: 3
      Total votes:      3
      Quorum:           2  
      Flags:            Quorate Qdevice 
      
      Membership information
      ----------------------
          Nodeid      Votes    Qdevice Name
               1          1    A,V,NMW rhel76-node01 (local)
               2          1    A,V,NMW rhel76-node02
               0          1            Qdevice

      NOTES:

      1. pcs quorum status 等同于直接执行 corosync-quorumtool 命令

      2. Quorate: Yes 表示集群仲裁状态正常, 且当前节点正常

      3. Qdevice 状态:

        符号 含义
        A, NA (active) 显示 QDeviceCorosync 之间的连接状态
        V, NV (vote) 显示仲裁设备是否已为节点投票; 两节点集群异常情况时, 一个节点为 V, 一个 NV
        MW, NMW (master_wins) 显示是否为主体获胜
        NR (not register) 表示节点未在使用仲裁设备

      (4) 查看 quorum device 运行状态

      ~] pcs quorum device status 
      
      Qdevice information
      -------------------
      Model:                  Net
      Node ID:                2
      Configured node list:
          0   Node ID = 1
          1   Node ID = 2
      Membership node list:   1, 2
      
      Qdevice-net information
      ----------------------
      Cluster name:           Cluster-VSFTPD
      QNetd host:             rhel76-qnetd:5403
      Algorithm:              Fifty-Fifty split
      Tie-breaker:            Node with lowest node ID
      State:                  Connected

      仲裁设备配置命令汇总:

      pcs quorum device status [--full]
      pcs quorum device add [<generic options>] model <device model> [<model options>]
      pcs quorum device update [<generic options>] [model <model options>]
      pcs quorum device remove
      pcs quorum device heuristics remove

Demo 2 - RHEL6.4 - 双机双业务互为冗余的 VSFTPD RHCS 集群

Hostname Management IP HeartBeat IP Storage IP (Optional)
rhel64-node01 192.168.161.16 10.168.161.16 20.168.161.16
rhel64-node02 192.168.161.17 10.168.161.17 20.168.161.17

2.1 配置时间同步

两个节点配置到同一时间源, 使用 ntpd 同步或者定时执行 ntpupdate 均可。

2.2 配置主机解析记录

两个节点都需要配置, 在 /etc/hosts 添加以下两行; 注意使用的 IP 是心跳 IP, 如果资源不足也可和管理 IP 共用

~] vi /etc/hosts

10.168.161.16 rhel64-node01
10.168.161.17 rhel64-node02

2.3 配置网卡绑定

需要关闭 NetworkManager

有网络冗余要求, 可配置 Team 或者 Bonding, Refer to: Bonding or Team

2.4 配置共享存储

下文使用 KVM 虚拟机进行实验, 参照 准备共享存储 为两个节点添加两块共享存储; 如果需要使用 ISCSI 共享存储, 配置方法参见 1.4 配置共享存储

在宿主机执行:

# 创建
qemu-img create -f raw /var/lib/libvirt/images/rhel64-rhcs-10g-01.raw 10G
qemu-img create -f raw /var/lib/libvirt/images/rhel64-rhcs-10g-02.raw 10G

# 挂载
virsh attach-disk --domain rhel64-01 --source /var/lib/libvirt/images/rhel64-rhcs-10g-01.raw --target vdb --targetbus virtio --driver qemu --subdriver raw --shareable --current
virsh attach-disk --domain rhel64-01 --source /var/lib/libvirt/images/rhel64-rhcs-10g-01.raw --target vdb --targetbus virtio --driver qemu --subdriver raw --shareable --config
virsh attach-disk --domain rhel64-01 --source /var/lib/libvirt/images/rhel64-rhcs-10g-02.raw --target vdc --targetbus virtio --driver qemu --subdriver raw --shareable --current
virsh attach-disk --domain rhel64-01 --source /var/lib/libvirt/images/rhel64-rhcs-10g-02.raw --target vdc --targetbus virtio --driver qemu --subdriver raw --shareable --config

virsh attach-disk --domain rhel64-02 --source /var/lib/libvirt/images/rhel64-rhcs-10g-01.raw --target vdb --targetbus virtio --driver qemu --subdriver raw --shareable --current
virsh attach-disk --domain rhel64-02 --source /var/lib/libvirt/images/rhel64-rhcs-10g-01.raw --target vdb --targetbus virtio --driver qemu --subdriver raw --shareable --config
virsh attach-disk --domain rhel64-02 --source /var/lib/libvirt/images/rhel64-rhcs-10g-02.raw --target vdc --targetbus virtio --driver qemu --subdriver raw --shareable --current
virsh attach-disk --domain rhel64-02 --source /var/lib/libvirt/images/rhel64-rhcs-10g-02.raw --target vdc --targetbus virtio --driver qemu --subdriver raw --shareable --config

两个节点均发现磁盘, 表明配置正常:

~] lsblk

NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0                           11:0    1 1024M  0 rom  
vda                          252:0    0   20G  0 disk 
├─vda1                       252:1    0  500M  0 part /boot
└─vda2                       252:2    0 19.5G  0 part 
  ├─vg_rhel64-lv_root (dm-0) 253:0    0 17.6G  0 lvm  /
  └─vg_rhel64-lv_swap (dm-1) 253:1    0    2G  0 lvm  [SWAP]
vdb                          252:16   0   10G  0 disk 
vdc                          252:32   0   10G  0 disk 

2.5 配置文件系统

任一节点执行创建操作:

pvcreate /dev/vdb
vgcreate rhcs01 /dev/vdb
lvcreate -n data01 -l 100%FREE rhcs01
mkfs.ext4 /dev/mapper/rhcs01-data01

pvcreate /dev/vdc
vgcreate rhcs02 /dev/vdc
lvcreate -n data02 -l 100%FREE rhcs02
mkfs.ext4 /dev/mapper/rhcs02-data02

执行导入导出, 让两个节点都能识别 LVM 信息:

  • 当前节点将卷组失活, 然后导出卷组:

    vgchange -an rhcs01 rhcs02
    vgexport rhcs01 rhcs02
  • 另一节点导入, 并激活卷组:

    vgimport rhcs01 rhcs02
    vgchange -ay rhcs01 rhcs02

    查看

    ~] lvs
      LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
      data01 rhcs01 -wi-a----- <10.00g
      data02 rhcs02 -wi-a----- <10.00g
  • 正常识别后, 将所有节点将卷组取消激活

    vgchange -an rhcs01
    vgchange -an rhcs02

2.6 配置 VSFTPD 服务

参照 1.6 配置 VSFTPD 服务; 如果需要 "防火墙配置" 时, 则需要保证 iptables 中包含以下规则:

-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp --dport 21 -j ACCEPT
-A OUTPUT -p tcp --sport 20 -j ACCEPT

2.7 配置集群

  • 2.7.1 安装集群套件

    yum groupinstall 'High Availability'
    yum install -y luci             # 若要使用 luci/conga 用户界面, 需要安装此包(按需安装, 不要求每个节点都安装)
    yum install -y lvm2-cluster     # 若使用 clvm, 则需要安装此包(每个节点都需要)

    如果启用了防火墙, 则需要添加规则。配置防火墙有两种方法:

    • 第一种取巧的配置, 集群节点之间全部端口都放开, 不做任何限制

      #  rhel64-node01 上配置信任 rhel64-node02 (要用心跳 IP)
      -A INPUT -s 10.168.161.16 -j ACCEPT
      -A OUTPUT -s 10.168.161.17 -j ACCEPT
      
      #  rhel64-node02 上配置信任 rhel64-node01  (要用心跳 IP)
      -A INPUT -s 10.168.161.17 -j ACCEPT
      -A OUTPUT -s 10.168.161.16 -j ACCEPT
    • 第二种配置具体端口

      端口 协议 组件
      5404,5405 UDP corosync/cman(集群管理器)
      21064 TCP dlm
      16851 TCP modclusterd
      11111 TCP ricci(为 luci 提供接口)
      80841 TCP luci (conga用户界面)

      按照上表列出的端口, 则节点在 node01 上可以配置 node02 的访问策略 (node02 上配置类似):

      -A INPUT -m state --state NEW -p udp -s <node02> -d <node01> -m multiport --dports 5404,5405 -j ACCEPT
      -A INPUT -m addrtype --dst-type MULTICAST -m state --state NEW -p udp -m multiport -s <node02> --dports 5404,5405 -j ACCEPT
      -A INPUT -m state --state NEW -p tcp -s <node02> -d <node01> -m multiport --dports 11111,21064,16851 -j ACCEPT
      -A INPUT -m state --state NEW -p tcp -s <IP_of_Luci_CLient> -d <IP_of_Luci_Listen> --dport 8084 -j ACCEPT
      -A INPUT -p igmp -j ACCEPT  # For igmp (Internet Group Management Protocol)

      上面的规则摘自红帽官方文档, 可以适当的简略一下:

      -A INPUT -p udp -s <node02> -m multiport --dports 5404,5405 -j ACCEPT
      -A INPUT -p tcp -s <node02> -m multiport --dports 11111,21064,16851 -j ACCEPT
      -A INPUT -p igmp -j ACCEPT       # For igmp (Internet Group Management Protocol)
      
      -A INPUT -p tcp --dport 8084 -j ACCEPT  # 如果有安装 luci

      1 luci 配置文件 "/etc/sysconfig/luci" 中的 "port = 8084" 可以修改端口

  • 2.7.2 初始化集群

    • (1) 启动 ricci 服务

      设置开机自启:

      chkconfig ricci on
      service ricci start
    • (2) 修改 ricci 服务用户密码

      ricci 用户是集群认证需要使用的用户; 添加节点到集群时, 需要验证此用户的密码

      echo '123qweQ' | passwd ricci --stdin
    • (3) 节点认证

      与 RHCS 7 不同, 在后续创建集群、添加节点、同步配置文件等操作时才会需要输入密码做节点认证。

  • 2.7.3 创建集群

    Cluster Operations:
          --createcluster <cluster>
                            Create a new cluster.conf (removing old one if it exists)
          --getversion      Get the current cluster.conf version
          --setversion <n>  Set the cluster.conf version
          --incversion      Increment the cluster.conf version by 1
          --startall        Start *AND* enable cluster services on reboot for all nodes
          --stopall         Stop *AND* disable cluster services on reboot for all nodes
          --start           Start *AND* enable cluster services on reboot for host specified with -h
          --stop            Stop *AND* disable cluster services on reboot for host specified with -h
    Node Operations:
          --lsnodes         List all nodes in the cluster
          --addnode <node>  Add node <node> to the cluster
          --rmnode <node>
                            Remove a node from the cluster
          --nodeid <nodeid> Specify nodeid when adding a node
          --votes <votes>   Specify number of votes when adding a node
          --addalt <node name> <alt name> [alt options]
                            Add an altname to a node for RRP
          --rmalt <node name>
                            Remove an altname from a node for RRP

    (1) 创建

    在其中一个节点上执行命令创建集群:

    # css -h <host> --createcluster <cluster_name>
    css -h rhel64-node01 --createcluster Cluster-VSFTPD  # <= 输入 rhel64-node01 上 ricci 用户密码

    上面的操作实际上是在 rhel64-node01 节点上新建一个配置文件 /etc/cluster/cluster.conf

    ~] cat /etc/cluster/cluster.conf
    ~] ccs -f /etc/cluster/cluster.conf --getconf   # 查看指定配置文件
    ~] ccs -h rhel64-node01 --getconf               # 查看指定节点的配置文件
    
    <?xml version="1.0"?>
    <cluster config_version="1" name="Cluster-VSFTPD">  
      <fence_daemon/>   
      <clusternodes/>  
      <cman/>  
      <fencedevices/>  
      <rm>    
        <failoverdomains/>    
        <resources/>    
      </rm>  
    </cluster>

    (2) 添加节点

    # ccs -h <host> --addnode <host> [--nodeid <node_id>] [--votes <votes>]
    # "--addnode": 添加节点, 一次只能添加一个节点; 如果要删除节点, 使用 "--rmnode"
    # "--nodeid": 指定节点的 id
    # "--votes": 指定节点的投票权
    
    ccs -h rhel64-node01 --addnode rhel64-node01
    ccs -h rhel64-node01 --addnode rhel64-node02
    ~] ccs -h localhost --lsnodes
    
    rhel64-node01: nodeid=1
    rhel64-node02: nodeid=2
    
    ~] ccs -h rhel64-node01 --getconf
    
    <cluster config_version="3" name="Cluster-VSFTPD">  
      <fence_daemon/>  
      <clusternodes>    
        <clusternode name="rhel64-node01" nodeid="1"/>    # < 新增的行
        <clusternode name="rhel64-node02" nodeid="2"/>    # < 新增的行
      </clusternodes>  
      <cman/>  
      <fencedevices/>  
      <rm>    
        <failoverdomains/>    
        <resources/>    
      </rm>  
    </cluster>

    NOTES: 查看 /etc/cluster/cluster.conf 文件可以发现: 多了两行 clusternode 配置, 而且 config_version1 变成 3。这是因为任何一个节点对集群配置文件进行修改, 这个值都会自增 1, 后续集群间配置文件同步时, 也是由 config_version 的值决定谁是 "最新" 的。

2.8 配置资源

Service Operations:
      --lsserviceopts [service type]
                        List available services.  If a service type is
                        specified, then list options for the specified
                        service type
      --lsservices      List currently configured services and resources in
                        the cluster
      --addresource <resource type> [resource options] ...
                        Add global cluster resources to the cluster
                        Resource types and variables can be found in the
                        online documentation under 'HA Resource Parameters'
      --rmresource <resource type> [resource options]
                        Remove specified resource with resource options
      --addservice <servicename> [service options] ...
                        Add service to cluster
      --rmservice <servicename>
                        Removes a service and all of its subservices
      --addvm <virtual machine name> [vm options] ...
                        Add a virtual machine to the cluster
      --rmvm <virtual machine name>
                        Removes named virtual machine from the cluster
      --addsubservice <servicename> <subservice> [service options] ...
                        Add individual subservices, if adding child services,
                        use ':' to separate parent and child subservices
                        and brackets to identify subservices of the same type

                        Subservice types and variables can be found in the
                        online documentation in 'HA Resource Parameters'

                        To add a nfsclient subservice as a child of the 2nd
                        nfsclient subservice in the 'service_a' service use
                        the following example: --addsubservice service_a \
                                               nfsclient[1]:nfsclient \
                                               ref=/test
      --rmsubservice <servicename> <subservice>
                        Removes a specific subservice specified by the
                        subservice, using ':' to separate elements and
                        brackets to identify between subservices of the
                        same type.
                        To remove the 1st nfsclient child subservice
                        of the 2nd nfsclient subservice in the 'service_a'
                        service, use the following example:
                                            --rmsubservice service_a \
                                            nfsclient[1]:nfsclient
  • 2.8.1 准备工作

    关于 resourceservice: 可以将多个 resource 绑定在一起, 创建成一个 service, 类似于 RHCS 7 中的 "资源组"。

    ccs -h <host> --lsresourceopt       # 列出所有支持的 resource
    ccs -h <host> --lsresourceopt ip    # 列出指定 resource 的配置选项
    
    ccs -h <host> --lsservices          # 列出所有已经配置的 resource 和 service
    ccs -h <host> --addresource resourcetype [resource options]   # 添加
    ccs -h <host> --rmresource resourcetype [resource options]    # 删除
    ~] ccs -h rhel64-node01 --lsresourceopt
    
    service - Defines a service (resource group).
    ASEHAagent - Sybase ASE Failover Instance
    SAPDatabase - Manages any SAP database (based on Oracle, MaxDB, or DB2)
    SAPInstance - SAP instance resource agent
    apache - Defines an Apache web server
    clusterfs - Defines a cluster file system mount.
    fs - Defines a file system mount.
    ip - This is an IP address.
    lvm - LVM Failover script
    mysql - Defines a MySQL database server
    named - Defines an instance of named server
    netfs - Defines an NFS/CIFS file system mount.
    nfsclient - Defines an NFS client.
    nfsexport - This defines an NFS export.
    nfsserver - This defines an NFS server resource.
    openldap - Defines an Open LDAP server
    oracledb - Oracle 10g Failover Instance
    orainstance - Oracle 10g Failover Instance
    oralistener - Oracle 10g Listener Instance
    postgres-8 - Defines a PostgreSQL server
    samba - Dynamic smbd/nmbd resource agent
    script - LSB-compliant init script as a clustered resource.
    tomcat-6 - Defines a Tomcat server
    vm - Defines a Virtual Machine
  • 2.8.2 添加 IP

    ~] ccs -h rhel64-node01 --lsserviceopt ip
    
    ip - This is an IP address.
      Required Options:
        address: IP Address
      Optional Options:
        family: Family
        monitor_link: Monitor NIC Link
        nfslock: Enable NFS lock workarounds
        sleeptime: Amount of time (seconds) to sleep.
        disable_rdisc: Disable updating of routing using RDISC protocol
        prefer_interface: Network interface
        __independent_subtree: Treat this and all children as an independent subtree.
        __enforce_timeouts: Consider a timeout for operations as fatal.
        __max_failures: Maximum number of failures before returning a failure to a status check.
        __failure_expire_time: Amount of time before a failure is forgotten.
        __max_restarts: Maximum number restarts for an independent subtree before giving up.
        __restart_expire_time: Amount of time before a failure is forgotten for an independent subtree.

    使用以下命令 添加/删除 IP 资源:

    # 添加
    ccs -h rhel64-node01 --addresource ip address="192.168.161.18/24" family=ipv4 monitor_link=1 sleeptime=10 prefer_interface=eth0
    ccs -h rhel64-node01 --addresource ip address="192.168.161.19/24" family=ipv4 monitor_link=1 sleeptime=10 prefer_interface=eth0
    
    # 删除
    # ccs -h <host> --rmresource <resourcetype> [resource options]
    ccs -h rhel64-node01 --rmresource ip address="192.168.161.18/24"
    ccs -h rhel64-node01 --rmresource ip address="192.168.161.19/24"
  • 2.8.3 添加 HA-LVM

    将卷组交由 RHCS 集群管理, 需先解除本地 LVM 对卷组的管理, 然后配置集群资源管理卷组。RHCS 6 中有两种方法配置 HA-LVM:

    • (Perferred) 使用 CLVM 在节点上管理 LVM (此时节点会独占 LVM 上所有的逻辑卷)

      1. 安装软件包

        yum groupinstall "Resilient Storage"
        # or
        yum install lvm2-cluster
      2. 修改 lvm 配置

        ~] vi /etc/lvm/lvm.conf
        
        # locking_type = 1
        locking_type = 3
      3. 需要启动 clvmd

        service clvmd start
        chkconfig clvmd on
      4. 创建卷组时的注意点

        示例:

        pvcreate /dev/vdb1
        vgcreate -cy shared_vg /dev/vdb1    # 此时要为卷组指定 -c, --clustered {y|n}
        lvcreate -L 10G -n ha_lv shared_vg
        mkfs.ext4 /dev/shared_vg/ha_lv
        lvchange -an shared_vg/ha_lv
    • 使用 LVM 本地 tag 管理

      1. 修改 lvm 配置

        ~] vi /etc/lvm/lvm.conf
        
        locking_type = 1
        use_lvmetad = 0
        volume_list = [ "VolGroup00", "@rhel64-node01" ] # 填写本机使用的卷组, 集群管理的卷组不能写进去
                                                         # 同时填写主机名, 要与集群配置的节点名称一致
        
        # 另一个节点使用:  volume_list = [ "VolGroup00", "@rhel64-node02" ] 

        使用 lvmconf --enable-halvm 命令可以直接将 locking_typeuse_lvmetad 配置好

      2. 重建 initramfs

        cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.$(date +%m-%d-%H%M%S).bak
        dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
      3. reboot

    配置完以后, 添加 HA-LVM 到集群:

    ~] ccs -h rhel64-node01 --lsserviceopt lvm
    vm - LVM Failover script
     Required Options:
       name: Name
       vg_name: Volume group name
     Optional Options:
       lv_name: Logical Volume name (optional).
       self_fence: Fence the node if it is not able to clean up LVM tags
       nfslock: Enable NFS lock workarounds
       __independent_subtree: Treat this and all children as an independent subtree.
       __enforce_timeouts: Consider a timeout for operations as fatal.
       __max_failures: Maximum number of failures before returning a failure to a status check.
       __failure_expire_time: Amount of time before a failure is forgotten.
       __max_restarts: Maximum number restarts for an independent subtree before giving up.
       __restart_expire_time: Amount of time before a failure is forgotten for an independent subtree.
    
    
    ~] ccs -h rhel64-node01 --addresource lvm name="LVM_RHCS01" vg_name="rhcs01" lv_name="data01" self_fence=1
    ~] ccs -h rhel64-node01 --addresource lvm name="LVM_RHCS02" vg_name="rhcs02" lv_name="data02" self_fence=1
  • 2.8.4 添加 FileSystem

    ~] ccs -h rhel64-node01 --lsserviceopt fs
    
    fs - Defines a file system mount.
      Required Options:
        name: File System Name
        mountpoint: Mount Point
        device: Device or Label
      Optional Options:
        fstype: File system type
        force_unmount: Force Unmount
        quick_status: Quick/brief status checks.
        self_fence: Seppuku Unmount
        nfslock: Enable NFS lock workarounds
        nfsrestart: Enable NFS daemon and lockd workaround
        fsid: NFS File system ID
        force_fsck: Force fsck support
        options: Mount Options
        __independent_subtree: Treat this and all children as an independent subtree.
        __enforce_timeouts: Consider a timeout for operations as fatal.
        __max_failures: Maximum number of failures before returning a failure to a status check.
        __failure_expire_time: Amount of time before a failure is forgotten.
        __max_restarts: Maximum number restarts for an independent subtree before giving up.
        __restart_expire_time: Amount of time before a failure is forgotten for an independent subtree.
    
    
    ~] ccs -h rhel64-node01 --addresource fs name="FS_data01" mountpoint="/data01" device="/dev/mapper/rhcs01-data01" fstype="ext4" self_fence=1 force_fsck=1
    ~] ccs -h rhel64-node01 --addresource fs name="FS_data02" mountpoint="/data02" device="/dev/mapper/rhcs02-data02" fstype="ext4" self_fence=1 force_fsck=1
  • 2.8.5 添加 VSFTPD

    RHCS 6 中没有办法将一个系统服务添加到集群, 需要使用 script 来替代。

    1. 从 /etc/init.d/vsftpd 复制两份出来, 分别作为两个节点 VSFTPD 服务的服务文件(启动脚本)

      cp -a /etc/init.d/vsftpd /etc/init.d/vsftpd_01
      cp -a /etc/init.d/vsftpd /etc/init.d/vsftpd_02
    2. 修改服务文件, 保证只按指定的配置文件启动 VSFTPD

      将原有的 CONFS 行注释, 新增一行 CONFS:

      ~] vi /etc/init.d/vsftpd_01
      ...
      # CONFS=`ls /etc/vsftpd/*.conf 2>/dev/null`
      CONFS=`ls /etc/vsftpd/vsftpd_01.conf 2>/dev/null`
      ...
      
      ~] vi /etc/init.d/vsftpd_02
      ...
      # CONFS=`ls /etc/vsftpd/*.conf 2>/dev/null`
      CONFS=`ls /etc/vsftpd/vsftpd_02.conf 2>/dev/null`
      ...
    3. 添加 script 到集群

      ~] ccs -h rhel64-node01 --lsserviceopt script
      script - LSB-compliant init script as a clustered resource.
        Required Options:
          name: Name
          file: Path to script
        Optional Options:
          service_name: Inherit the service name.
          __independent_subtree: Treat this and all children as an independent subtree.
          __enforce_timeouts: Consider a timeout for operations as fatal.
          __max_failures: Maximum number of failures before returning a failure to a status check.
          __failure_expire_time: Amount of time before a failure is forgotten.
          __max_restarts: Maximum number restarts for an independent subtree before giving up.
          __restart_expire_time: Amount of time before a failure is forgotten for an independent subtree.
      
      
      ~] ccs -h rhel64-node01 --addresource script name="VSFTPD_01" file="/etc/init.d/vsftpd_01"
      ~] ccs -h rhel64-node01 --addresource script name="VSFTPD_02" file="/etc/init.d/vsftpd_02"

    添加完 IP, LVM, FS 和 SCRIPT 后, 配置文件内容如下:

    ~] ccs -h rhel64-node01 --getconf
    
    <cluster config_version="15" name="Cluster-VSFTPD">  
      <fence_daemon/>  
      <clusternodes>    
        <clusternode name="rhel64-node01" nodeid="1"/>    
        <clusternode name="rhel64-node02" nodeid="2"/>    
      </clusternodes>  
      <cman/>  
      <fencedevices/>  
      <rm>    
        <failoverdomains/>    
        <resources>      
          <ip address="192.168.161.18/24" family="ipv4" monitor_link="1" prefer_interface="eth0" sleeptime="10"/>      <!-- IP -->
          <ip address="192.168.161.19/24" family="ipv4" monitor_link="1" prefer_interface="eth0" sleeptime="10"/>      <!-- IP -->
          <lvm lv_name="data01" name="LVM_RHCS01" self_fence="1" vg_name="rhcs01"/>      <!-- LVM -->
          <lvm lv_name="data02" name="LVM_RHCS02" self_fence="1" vg_name="rhcs02"/>      <!-- LVM -->
          <fs device="/dev/mapper/rhcs01-data01" fstype="ext4" mountpoint="/data01" name="FS_data01" self_fence="1"/>      <!-- FS -->
          <fs device="/dev/mapper/rhcs02-data02" fstype="ext4" mountpoint="/data02" name="FS_data02" self_fence="1"/>      <!-- FS -->
          <script file="/etc/init.d/vsftpd_01" name="VSFTPD_01"/>      <!-- SCRIPT -->
          <script file="/etc/init.d/vsftpd_02" name="VSFTPD_02"/>      <!-- SCRIPT -->
        </resources>    
      </rm>  
    </cluster>
    ~] ccs -h rhel64-node01 --lsservices
    
    resources: 
      ip: monitor_link=1, sleeptime=10, prefer_interface=eth0, family=ipv4, address=192.168.161.18/24
      ip: monitor_link=1, sleeptime=10, prefer_interface=eth0, family=ipv4, address=192.168.161.19/24
      lvm: name=LVM_RHCS01, self_fence=1, vg_name=rhcs01, lv_name=data01
      lvm: name=LVM_RHCS02, self_fence=1, vg_name=rhcs02, lv_name=data02
      fs: name=FS_data01, device=/dev/mapper/rhcs01-data01, mountpoint=/data01, self_fence=1, fstype=ext4
      fs: name=FS_data02, device=/dev/mapper/rhcs02-data02, mountpoint=/data02, self_fence=1, fstype=ext4
      script: name=VSFTPD_01, file=/etc/init.d/vsftpd_01
      script: name=VSFTPD_02, file=/etc/init.d/vsftpd_02

2.9 配置 Fence

  • 前言

    RHCS 6 配置 Fence 时, 有两种配置方式。以双节点为例:

    • 方式一: 配置一个 Fence 设备, 两个节点作为两个实例添加到该 Fence 设备。适用于选择 vCenter/ESXi/KVM 等虚拟化平台或者集中式电源管理作为 Fence 设备的情况。配置示例:

          <clusternode name="rhel64-node01" nodeid="1">      
              <fence>        
                  <method name="xvm_method">          
                      <device name="XVM_FENCE" port="rhel64-01"/>          
                  </method>        
              </fence>      
          </clusternode>    
          <clusternode name="rhel64-node02" nodeid="2">      
              <fence>        
                  <method name="xvm_method">          
                      <device name="XVM_FENCE" port="rhel64-02"/>          
                  </method>        
              </fence>      
          </clusternode> 
      ...
      <fencedevices>    
              <fencedevice agent="fence_xvm" name="XVM_FENCE"/>    
      </fencedevices> 
    • 方式二: 配置两个 Fence 设备, 两个节点分别使用不同的 Fence 设备。适用于使用物理机 IPMI/带外/管理 接口作为 Fence 设备的情况。vCenter/ESXi/KVM 同样适用。配置示例:

          <clusternode name="rhel64-node01" nodeid="1" votes="1">
              <fence>
                  <method name="xvm_method">
                      <device delay="5" name="fencedev1"/>
                  </method>
              </fence>
              </clusternode>
          <clusternode name="rhel64-node02" nodeid="2" votes="1">
              <fence>
                  <method name="xvm_method">
                      <device name="fencedev2"/>
                  </method>
              </fence>
          </clusternode>
      ...
      <fencedevices>
          <fencedevice agent="fence_xvm" name="XVM_FENCE_1" port="rhel64-01"/>
          <fencedevice agent="fence_xvm" name="XVM_FENCE_2" port="rhel64-02"/>
      </fencedevices>

      .,bvcx

    配置语法:

    Fencing Operations:
          --lsfenceopts [fence type]
                            List available fence devices.  If a fence type is
                            specified, then list options for the specified
                            fence type
          --lsfencedev      List all of the fence devices configured
          --lsfenceinst [<node>]
                            List all of the fence methods and instances on the
                            specified node or all nodes if no node is specified
          --addmethod <method> <node>
                            Add a fence method to a specific node
          --rmmethod <method> <node>
                            Remove a fence method from a specific node
          --addfencedev <device name> [fence device options]
                            Add fence device. Fence devices and parameters can be
                            found in online documentation in 'Fence Device
                            Parameters'
          --rmfencedev <fence device name>
                            Remove fence device
          --addfenceinst <fence device name> <node> <method> [options]
                            Add fence instance. Fence instance parameters can be
                            found in online documentation in 'Fence Device
                            Parameters'
          --rmfenceinst <fence device name> <node> <method>
                            Remove all instances of the fence device listed from
                            the given method and node
          --addunfenceinst <fence device name> <node> [options]
                            Add an unfence instance
          --rmunfenceinst <fence device name> <node>
                            Remove all instances of the fence device listed from
                            the unfence section of the node
    

    常用的 Fence 设备:

    ~] ccs -h rhel64-node01 --lsfenceopt
    
    ...
    fence_ipmilan - Fence agent for IPMI over LAN
    fence_vmware_soap - Fence agent for VMWare over SOAP API
    fence_xvm - Fence agent for virtual machines
  • 前置操作

    参考 1.9 配置 Fence 1.9.3 前置配置 中的前置操作

    When using SELinux with the High Availability Add-On in a VM environment, you should ensure that the SELinux boolean fenced_can_network_connect is persistently set to on. This allows the fence_xvm fencing agent to work properly, enabling the system to fence virtual machines.

    关于post_fail_delay,post_join_delay两个参数

    • post_fail_delay: the number of seconds the fence daemon ( fenced ) waits before fencing a node (a member of the fence domain) after the node has failed (default 0) .
    • post_join_delay: the number of seconds the fence daemon ( fenced ) waits before fencing a node after the node joins the fence domain. The post_join_delay default value is 6. A typical setting for post_join_delay is between 20 and 30 seconds, but can vary according to cluster and network performance.

    这两个参数需要同时设置, 如果只单独设置一个, 另一个会重置为默认值

    ccs -h rhel64-node01 --setfencedaemon post_fail_delay=0 post_join_delay=25
  • 使用 vCenter 作为 Fence 设备

    ] ccs -h rhel64-node01 --lsfenceopt fence_vmware_soap
    
    fence_vmware_soap - Fence agent for VMWare over SOAP API
      Required Options:
      Optional Options:
        option: No description available
        action: Fencing Action
        ipaddr: IP Address or Hostname
        login: Login Name
        passwd: Login password or passphrase
        passwd_script: Script to retrieve password
        ssl: SSL connection
        port: Physical plug number or name of virtual machine
        uuid: The UUID of the virtual machine to fence.
        ipport: TCP port to use for connection with device
        verbose: Verbose mode
        debug: Write debug information to given file
        version: Display version information and exit
        help: Display help and exit
        separator: Separator for CSV created by operation list
        power_timeout: Test X seconds for status change after ON/OFF
        shell_timeout: Wait X seconds for cmd prompt after issuing command
        login_timeout: Wait X seconds for cmd prompt after login
        power_wait: Wait X seconds after issuing ON/OFF
        delay: Wait X seconds before fencing is started
        retry_on: Count of attempts to retry power on
    # Example
    # Hostname: node01,node02; 
    # VM name: vm-node01,vm-node02
    
    # 找到虚拟机
    ~] fence_vmware_soap -a 192.168.163.252 -z -l [email protected] -p 1qaz@WSX4rfv -o list
    ...
    vm-node01,422ad512-3ce5-c046-0046-9516094be718
    vm-node02,422ac3f0-e2f9-31a7-1816-7980e4757b80
    ...
    
    # 创建 fence 设备
    ~] ccs -h node01 --addfencedev VC_Fence agent=fence_vmware_soap ipaddr="192.168.163.252" login="[email protected]" passwd="1qaz@WSX4rfv" action="reboot"
    
    # 为节点添加一个 method
    ~] ccs -h node01 --addmethod method_name node01
    ~] ccs -h node01 --addmethod method_name node02
    
    # 添加实例
    ~] ccs -h node01 --addfenceinst VC_Fence node01 method_name port=vm-node01 ssl=on uuid=422ad512-3ce5-c046-0046-9516094be718
    ~] ccs -h node01 --addfenceinst VC_Fence node02 method_name port=vm-node02 ssl=on uuid=422ac3f0-e2f9-31a7-1816-7980e4757b80
    
    # 删除
    ccs -h <host> --rmmethod <method> <node>
    ccs -h <host> --rmfenceinst --rmfenceinst <fence device name> <node> <method>
  • ipmi: fence_ipmilan

    ~] ccs -h rhel64-node01 --lsfenceopt fence_ipmilan
    
    fence_ipmilan - Fence agent for IPMI over LAN
      Required Options:
      Optional Options:
        option: No description available
        auth: IPMI Lan Auth type (md5, password, or none)
        ipaddr: IPMI Lan IP to talk to
        passwd: Password (if required) to control power on IPMI device
        passwd_script: Script to retrieve password (if required)
        lanplus: Use Lanplus
        login: Username/Login (if required) to control power on IPMI device
        action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata
        timeout: Timeout (sec) for IPMI operation
        cipher: Ciphersuite to use (same as ipmitool -C parameter)
        method: Method to fence (onoff or cycle)
        power_wait: Wait X seconds after on/off operation
        delay: Wait X seconds before fencing is started
        privlvl: Privilege level on IPMI device
        verbose: Verbose mode
    # 验证
    ~] ipmitool -I lanplus -H x.x.x.x -U root -P 'Yth@2019' -v chassis power status
    
    # 创建 Fence 设备
    ccs -h node01 --addfencedev IPMI_Fence_01 agent=fence_ipmilan ipaddr="192.168.1.10" auth="password" login="admin" passwd="passw0rd" lanplus=1 power_wait=4
    ccs -h node01 --addfencedev IPMI_Fence_02 agent=fence_ipmilan ipaddr="192.168.1.11" auth="password" login="admin" passwd="passw0rd" lanplus=1 power_wait=4
    
    # 添加 method 和 instances
    ccs -h node01 --addmethod ipmi_method node01
    ccs -h node01 --addmethod ipmi_method node02
    
    ccs -h node01 --addfenceinst IPMI_Fence_01 node01 ipmi_method
    ccs -h node01 --addfenceinst IPMI_Fence_02 node02 ipmi_method
  • KVM 虚拟机: fence_xvm

    1. 从 KVM 宿主机(配置了 fence_virtd )中过去 Key 文件

      rhel64-node01 ~] scp {kvm_host}:/etc/cluster/fence_xvm.key /etc/cluster/
      rhel64-node02 ~] scp {kvm_host}:/etc/cluster/fence_xvm.key /etc/cluster/
    2. 验证本地能通过以下命令获取到各个节点信息, 并且状态 on

      ~] fence_xvm -o list
      rhel64-01            1cdcf5d4-d6f6-4251-9864-ec4b516fd344 on
      rhel64-02            999303cd-a80e-4a44-af38-b15fe7302f86 on
    3. 添加 Fence device, method, instance

      ccs -h rhel64-node01 --addfencedev XVM_FENCE_01 agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" port="rhel64-01"
      ccs -h rhel64-node01 --addfencedev XVM_FENCE_02 agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" port="rhel64-02"
      
      ccs -h rhel64-node01 --addmethod xvm_method rhel64-node01
      ccs -h rhel64-node01 --addmethod xvm_method rhel64-node02
      
      ccs -h rhel64-node01 --addfenceinst XVM_FENCE_01 rhel64-node01 xvm_method
      ccs -h rhel64-node01 --addfenceinst XVM_FENCE_02 rhel64-node02 xvm_method
  • 后置操作

    检查/测试 fence 状态:

    ~] fence_check     # 需要启动集群才能验证
    
    fence_check run at Wed Oct 14 14:49:47 CST 2020 pid: 19117
    Testing node03 method 1: success
    Testing node04 method 1: success

    测试 Fence 某个节点:

    ~] fence_node node01
    ~] fence_node -vv node01

2.10 配置故障切换域

Failover Domain Operations:
      --lsfailoverdomain
                        Lists all of the failover domains and failover domain
                        nodes configured in the cluster
      --addfailoverdomain <name> [restricted] [ordered] [nofailback]
                        Add failover domain
      --rmfailoverdomain <name>
                        Remove failover domain
      --addfailoverdomainnode <failover domain> <node> [priority]
                        Add node to given failover domain
      --rmfailoverdomainnode <failover domain> <node>
                        Remove node from failover domain

关于参数解释:

  1. restricted: 配置此参数, 集群服务限制在该故障切换域内运行; 如果域中无可用成员, 则服务启动失败。
  2. ordered: 配置此参数, 故障切换域成员按列表顺序排优先级, 列表顶端的成员是首选成员, 接下来是列表中的第二个成员, 依此类推。
  3. nofailback: 配置此参数, 故障节点恢复后, 服务不切回到原来节点上运行

创建故障切换域:

ccs -h rhel64-node01 --addfailoverdomain VSFTPD_Domain_01 restricted ordered

ccs -h rhel64-node01 --addfailoverdomain VSFTPD_Domain_02 restricted ordered

添加域成员, 并指定顺序:

ccs -h rhel64-node01 --addfailoverdomainnode VSFTPD_Domain_01 rhel64-node01 1 
ccs -h rhel64-node01 --addfailoverdomainnode VSFTPD_Domain_01 rhel64-node02 2 

ccs -h rhel64-node01 --addfailoverdomainnode VSFTPD_Domain_02 rhel64-node02 1 
ccs -h rhel64-node01 --addfailoverdomainnode VSFTPD_Domain_02 rhel64-node01 2 

添加完以后, 查看配置情况:

~] ccs -h rhel64-node01 --lsfailoverdomain

VSFTPD_Domain_01: restricted=1, ordered=1, nofailback=0
  rhel64-node01: priority=1
  rhel64-node02: priority=2
VSFTPD_Domain_02: restricted=1, ordered=1, nofailback=0
  rhel64-node02: priority=1
  rhel64-node01: priority=2

~] ccs -h rhel64-node01 --getconf

<cluster config_version="21" name="Cluster-VSFTPD">  
  <fence_daemon/>  
  <clusternodes>    
    <clusternode name="rhel64-node01" nodeid="1"/>    
    <clusternode name="rhel64-node02" nodeid="2"/>    
  </clusternodes>  
  <cman/>  
  <fencedevices/>  
  <rm>    
    <failoverdomains>      
      <failoverdomain name="VSFTPD_Domain_01" nofailback="0" ordered="1" restricted="1">        <!-- Failback Domain -->
        <failoverdomainnode name="rhel64-node01" priority="1"/>        
        <failoverdomainnode name="rhel64-node02" priority="2"/>        
      </failoverdomain>      
      <failoverdomain name="VSFTPD_Domain_02" nofailback="0" ordered="1" restricted="1">        <!-- Failback Domain -->
        <failoverdomainnode name="rhel64-node02" priority="1"/>        
        <failoverdomainnode name="rhel64-node01" priority="2"/>        
      </failoverdomain>      
    </failoverdomains>    
    <resources>      
      <ip address="192.168.161.18/24" family="ipv4" monitor_link="1" prefer_interface="eth0" sleeptime="10"/>      
      <ip address="192.168.161.19/24" family="ipv4" monitor_link="1" prefer_interface="eth0" sleeptime="10"/>      
      <lvm lv_name="data01" name="LVM_RHCS01" self_fence="1" vg_name="rhcs01"/>      
      <lvm lv_name="data02" name="LVM_RHCS02" self_fence="1" vg_name="rhcs02"/>      
      <fs device="/dev/mapper/rhcs01-data01" fstype="ext4" mountpoint="/data01" name="FS_data01" self_fence="1"/>      
      <fs device="/dev/mapper/rhcs02-data02" fstype="ext4" mountpoint="/data02" name="FS_data02" self_fence="1"/>      
      <script file="/etc/init.d/vsftpd_01" name="VSFTPD_01"/>      
      <script file="/etc/init.d/vsftpd_02" name="VSFTPD_02"/>      
    </resources>    
  </rm>  
</cluster>

2.11 配置仲裁

Quorum Disk is a disk-based quorum daemon, qdiskd, that provides supplemental heuristics to determine node fitness. With heuristics you can determine factors that are important to the operation of the node in the event of a network partition. For example, in a four-node cluster with a 3:1 split, ordinarily, the three nodes automatically "win" because of the three-to-one majority. Under those circumstances, the one node is fenced. With qdiskd however, you can set up heuristics that allow the one node to win based on access to a critical resource (for example, a critical network path). If your cluster requires additional methods of determining node health, then you should configure qdiskd to meet those needs.仲裁磁盘是使用磁盘的仲裁守护进程 qdiskd, 它可提供补充的试探法(heuristics)以确定节点是否正常运作。使用这些试探法, 您可以确定在网络分区事件中对节点操作十分重要的因素。例如: 在一个按 3:1 分割的有四个节点的集群中, 最初三个节点自动“获胜”, 因为三对一的占优。在那些情况下, 只有一个节点被 fence。但使用 qdiskd, 您可以设定试探法以便允许一个节点因访问重要资源获胜(例如: 关键网络路径)。如果您的集群需要额外的方法确定节点工作正常, 那么您应该将 qdiskd 配置为满足那些要求。

配置仲裁的一些要求:

  1. 每个集群节点投票权 (vote) 相同, 且都为 1;
  2. 仲裁设备成员超时值是根据 CMAN 成员超时值 ( 即 CMAN 认为节点已死, 并不再是成员前该节点不响应的时间 ) 自动配置的; 如果要修改这个值, 应当保证 CMAN 超时值至少是 仲裁设备的 2 倍;
  3. Fence 可用;
  4. 最多支持 16 节点;
  5. 最小 10Mb 的共享磁盘作为仲裁盘。
Quorum Operations:
      --lsquorum        List quorum options and heuristics
      --setquorumd [quorumd options] ...
                        Add quorumd options
      --addheuristic [heuristic options] ...
                        Add heuristics to quorumd
      --rmheuristic [heuristic options] ...
                        Remove heuristic specified by heurstic options
  • 2.11.1 为节点添加一块共享磁盘, 映射为 "vdd"

    kvm-host ~] qemu-img create -f raw rhel64-rhcs-100m.raw 100M
    
    kvm-host ~] virsh attach-disk --domain rhel64-01 --source /var/lib/libvirt/images/rhel64-rhcs-100m.raw --target vdd --targetbus virtio --driver qemu --subdriver raw --shareable --current
    kvm-host ~] virsh attach-disk --domain rhel64-01 --source /var/lib/libvirt/images/rhel64-rhcs-100m.raw --target vdd --targetbus virtio --driver qemu --subdriver raw --shareable --config
    
    kvm-host ~] virsh attach-disk --domain rhel64-02 --source /var/lib/libvirt/images/rhel64-rhcs-100m.raw --target vdd --targetbus virtio --driver qemu --subdriver raw --shareable --current
    kvm-host ~] virsh attach-disk --domain rhel64-02 --source /var/lib/libvirt/images/rhel64-rhcs-100m.raw --target vdd --targetbus virtio --driver qemu --subdriver raw --shareable --config
  • 2.11.2 格式化磁盘为仲裁盘

    usage: mkqdisk -L | -f <label> | -c <device> -l <label> [-d]
    
    ~] mkqdisk -c /dev/vdd -l rhel64-rhcs-qdisk
    
    ~] mkqdisk -L       # 检查创建结果, 两个节点都检查一下
    mkqdisk v3.0.12.1
    
    /dev/block/252:48:
    /dev/disk/by-path/pci-0000:00:0c.0-virtio-pci-virtio7:
    /dev/vdd:
            Magic:                eb7a62c2
            Label:                rhel64-rhcs-qdisk
            Created:              Fri Apr  1 15:21:06 2022
            Host:                 rhel64-node01
            Kernel Sector Size:   512
            Recorded Sector Size: 512
  • 2.11.3 添加仲裁盘到集群, 并配置启发式 (heuristic, 即检测脚本, 频率等)

    # ccs -h host --setquorumd [quorumd options]
    
    ccs -h rhel64-node01 --setquorumd label=rhel64-rhcs-qdisk device=/dev/vdd

    quorum disk options:

    Parameter Description
    interval The frequency of read/write cycles, in seconds.
    votes The number of votes the quorum daemon advertises to cman when it has a high enough score.
    tko The number of cycles a node must miss to be declared dead.
    min_score The minimum score for a node to be considered "alive".
    If omitted or set to 0, the default function, floor((n+1)/2), is used, where n is the sum of the heuristics scores.
    The Minimum Score value must never exceed the sum of the heuristic scores; otherwise, the quorum disk cannot be available.
    device The storage device the quorum daemon uses. The device must be the same on all nodes.
    label Specifies the quorum disk label created by the mkqdisk utility.
    If this field contains an entry, the label overrides the Device field.
    If this field is used, the quorum daemon reads /proc/partitions and checks for qdisk signatures on every block device found, comparing the label against the specified label.
    This is useful in configurations where the quorum device name differs among nodes.
    # ccs -h host --addheuristic [heuristic options]
    
    ccs -h rhel64-node01 --addheuristic program="/bin/ping -c1 -t2 10.168.161.14" interval=1 score=1 tko=5

    注: 实验测试过程中, 使用 KVM 宿主机的 bridge 网卡 IP (10.168.161.1) 作为 ping 检测的目标 IP, 会让 quorum 产生错误的判断: 当在节点 1 上执行 ifdown eth1 以后, 两个节点的日志文件中都出现了 fence 对方节点的日志, 但是实际上节点 2 会被先 fence; 节点 2 正常启动以后, 节点 1 重启。可能和 KVM/qemu 的网络有关系, 为了避免出错, 建议使用另一台虚拟机上的 IP 作为检查。

    quorum disk heuristic:

    Parameter Description
    program The path to the program used to determine if this heuristic is available.
    This can be anything that can be executed by /bin/sh -c. A return value of 0 indicates success; anything else indicates failure.
    This parameter is required to use a quorum disk.
    interval The frequency (in seconds) at which the heuristic is polled. The default interval for every heuristic is 2 seconds.
    score The weight of this heuristic. Be careful when determining scores for heuristics. The default score for each heuristic is 1.
    tko The number of consecutive failures required before this heuristic is declared unavailable.
  • 2.11.4 添加后检查

    ~] ccs -h rhel64-node01 --lsquorum
    
    Quorumd: device=/dev/vdd, label=rhel64-rhcs-qdisk
      heuristic: program=/bin/ping -c1 -t2 10.168.161.1, interval=2, score=1, tko=2
    
    
    ~] ccs -h rhel64-node01 --getconf
    
      <quorumd device="/dev/vdd" label="rhel64-rhcs-qdisk">    
        <heuristic interval="2" program="/bin/ping -c1 -t2 10.168.161.14" score="1" tko="2"/>    
      </quorumd> 

2.12 配置服务

  • 创建服务

    ~] ccs -h host --addservice <servicename> [service options]
    

    Service Options:

    • autostart — Specifies whether to autostart the service when the cluster starts. Use "1" to enable and "0" to disable; the default is enabled.

    • domain — Specifies a failover domain (if required).

    • exclusive — Specifies a policy wherein the service only runs on nodes that have no other services running on them.

    • recovery — Specifies a recovery policy for the service. The options are to relocate, restart, disable, or restart-disable the service.

      • The "restart" recovery policy indicates that the system should attempt to restart the failed service before trying to relocate the service to another node.
      • The "relocate" policy indicates that the system should try to restart the service in a different node.
      • The "disable" policy indicates that the system should disable the resource group if any component fails.
      • The "restart-disable" policy indicates that the system should attempt to restart the service in place if it fails, but if restarting the service fails the service will be disabled instead of being moved to another host in the cluster.

      If you select restart or restart-disable as the recovery policy for the service, you can specify the maximum number of restart failures before relocating or disabling the service, and you can specify the length of time in seconds after which to forget a restart.

    • __independent_subtree - Treat this and all children as an independent subtree.

    • __enforce_timeouts - Consider a timeout for operations as fatal.

    • __max_failures - Maximum number of failures before returning a failure to a status check.

    • __failure_expire_time - Amount of time before a failure is forgotten.

    • __max_restarts - Maximum number restarts for an independent subtree before giving up.

    • __restart_expire_time - Amount of time before a failure is forgotten for an independent subtree.

    ccs -h rhel64-node01 --addservice VSFTPD_SERVICE_01 autostart=1 domain=VSFTPD_Domain_01 exclusive=0 recovery=restart __max_failures=3 __restart_expire_time=300
    ccs -h rhel64-node01 --addservice VSFTPD_SERVICE_02 autostart=1 domain=VSFTPD_Domain_02 exclusive=0 recovery=restart __max_failures=3 __restart_expire_time=300
  • 添加全局资源到服务

    service: name=VSFTPD_SERVICE_01, exclusive=0, domain=VSFTPD_Domain_01, __max_failures=3, autostart=1, __restart_expire_time=300, recovery=restart
    service: name=VSFTPD_SERVICE_02, exclusive=0, domain=VSFTPD_Domain_02, __max_failures=3, autostart=1, __restart_expire_time=300, recovery=restart
    resources: 
      ip: monitor_link=1, sleeptime=10, prefer_interface=eth0, family=ipv4, address=192.168.161.18/24
      ip: monitor_link=1, sleeptime=10, prefer_interface=eth0, family=ipv4, address=192.168.161.19/24
      lvm: name=LVM_RHCS01, self_fence=1, vg_name=rhcs01, lv_name=data01
      lvm: name=LVM_RHCS02, self_fence=1, vg_name=rhcs02, lv_name=data02
      fs: name=FS_data01, device=/dev/mapper/rhcs01-data01, mountpoint=/data01, self_fence=1, fstype=ext4
      fs: name=FS_data02, device=/dev/mapper/rhcs02-data02, mountpoint=/data02, self_fence=1, fstype=ext4
      script: name=VSFTPD_01, file=/etc/init.d/vsftpd_01
      script: name=VSFTPD_02, file=/etc/init.d/vsftpd_02
    

    将 ip, lvm, fs, script 都添加到服务中:

    # ccs -h host --addsubservice servicename subservice [service options]
    
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_01 ip ref="192.168.161.18/24"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_01 lvm ref="LVM_RHCS01"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_01 fs ref="FS_data01"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_01 script ref="VSFTPD_01"
    
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_02 ip ref="192.168.161.19/24"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_02 lvm ref="LVM_RHCS02"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_02 fs ref="FS_data02"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_02 script ref="VSFTPD_02"

    添加完以后配置文件如下:

    ~] ccs -h rhel64-node01 --getconf
    
    ...
    <service __max_failures="3" __restart_expire_time="300" autostart="1" domain="VSFTPD_Domain_01" exclusive="0" name="VSFTPD_SERVICE_01" recovery="restart">      
      <ip ref="192.168.161.18/24"/>      
      <lvm ref="LVM_RHCS01"/>      
      <fs ref="FS_data01"/>      
      <script ref="VSFTPD_01"/>      
    </service>    
    <service __max_failures="3" __restart_expire_time="300" autostart="1" domain="VSFTPD_Domain_02" exclusive="0" name="VSFTPD_SERVICE_02" recovery="restart">      
      <ip ref="192.168.161.19/24"/>      
      <lvm ref="LVM_RHCS02"/>      
      <fs ref="FS_data02"/>      
      <script ref="VSFTPD_02"/>      
    </service> 
    ... 

    由于我们添加的资源有 “先后” 关系, 如 IP 启动后才能正常启动 VSFTPD, LVM 启动后 FS 才能正常挂载。

    因此服务添加应该按照以下方法为:

    # 移除资源
    ccs -h rhel64-node01 --rmsubservice VSFTPD_SERVICE_01 ip
    ccs -h rhel64-node01 --rmsubservice VSFTPD_SERVICE_01 lvm 
    ccs -h rhel64-node01 --rmsubservice VSFTPD_SERVICE_01 fs
    ccs -h rhel64-node01 --rmsubservice VSFTPD_SERVICE_01 script
    
    ccs -h rhel64-node01 --rmsubservice VSFTPD_SERVICE_02 ip
    ccs -h rhel64-node01 --rmsubservice VSFTPD_SERVICE_02 lvm 
    ccs -h rhel64-node01 --rmsubservice VSFTPD_SERVICE_02 fs
    ccs -h rhel64-node01 --rmsubservice VSFTPD_SERVICE_02 script
    
    # 重新添加资源, 按 "父-子" 顺序
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_01 ip ref="192.168.161.18/24"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_01 ip:lvm ref="LVM_RHCS01"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_01 ip:lvm:fs ref="FS_data01"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_01 ip:lvm:fs:script ref="VSFTPD_01"
    
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_02 ip ref="192.168.161.19/24"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_02 ip:lvm ref="LVM_RHCS02"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_02 ip:lvm:fs ref="FS_data02"
    ccs -h rhel64-node01 --addsubservice VSFTPD_SERVICE_02 ip:lvm:fs:script ref="VSFTPD_02"

    此时, 配置文件内容如下(注意与第一次添加时对比的差异):

    ~] ccs -h rhel64-node01 --getconf
    
    ...
    <service __max_failures="3" __restart_expire_time="300" autostart="1" domain="VSFTPD_Domain_01" exclusive="0" name="VSFTPD_SERVICE_01" recovery="restart">      
      <ip ref="192.168.161.18/24">        
        <lvm ref="LVM_RHCS01">          
          <fs ref="FS_data01">            
            <script ref="VSFTPD_01"/>            
          </fs>          
        </lvm>        
      </ip>      
    </service>    
    <service __max_failures="3" __restart_expire_time="300" autostart="1" domain="VSFTPD_Domain_02" exclusive="0" name="VSFTPD_SERVICE_02" recovery="restart">      
      <ip ref="192.168.161.19/24">        
        <lvm ref="LVM_RHCS02">          
          <fs ref="FS_data02">            
            <script ref="VSFTPD_02"/>            
          </fs>          
        </lvm>        
      </ip>      
    </service> 
    ...

2.13 配置其他集群属性

  • 2.13.1 查看集群其他属性配置情况

    ccs -h host --lsmisc
  • 2.13.2 集群配置文件版本

    ccs -h host --getversion     # 查看版本
    ccs -h host --setversion n   # 设置
    ccs -h host --incversion     # 版本值 +1
  • 2.13.3 多播地址

    ccs -h <host> --setmulticast <multicastaddress>  # 设置
    ccs -h host --setmulticast                       # 移除(不添加参数)

    如果未指定多播地址, cman 会基于集群 ID 自动生成: 239.192.x.x (IPv4) / FF15:: (IPv6)

  • 2.13.4 两节点集群的配置

    ccs -h <host> --setcman two_node=1 expected_votes=1
    
    # ccs -h rhel64-node01 --setcman two_node=1 expected_votes=1
  • 2.13.5 日志配置

    ~] man cluster.conf
    
    Logging
        Cluster daemons use a common logging section to configure their 
        loggging behavior.
    
            <cluster name="alpha" config_version="1">
                    <logging/>
            </cluster>
    
        Global settings apply to all:
    
            <logging debug="on"/>
    
        Per-daemon logging_daemon subsections override the global settings. 
        Daemon names that can be configured include: corosync, qdiskd, groupd, 
        fenced, dlm_controld, gfs_controld, rgmanager.
    
            <logging>
                    <logging_daemon name="qdiskd" debug="on"/>
                    <logging_daemon name="fenced" debug="on"/>
            </logging>
    
        Corosync daemon settings apply to all corosync subsystems by default, 
        but subsystems can also be configured individually. These include CLM, 
        CPG, MAIN, SERV, CMAN, TOTEM, QUORUM, CONFDB, CKPT, EVT.
    
            <logging>
                    <logging_daemon name="corosync" subsys="QUORUM" debug="on"/>
                    <logging_daemon name="corosync" subsys="CONFDB" debug="on"/>
            </logging>
    
        The attributes available at global, daemon and subsystem levels are:
    
        to_syslog
                enable/disable messages to syslog (yes/no), default "yes"
    
        to_logfile
                enable/disable messages to log file (yes/no), default "yes"
    
        syslog_facility
                facility used for syslog messages, default "daemon"
    
        syslog_priority
                messages at this level and up will be sent to syslog, default "info"
    
        logfile_priority
                messages at this level and up will be written to log file, default "info"
    
        logfile
                the log file name, default /var/log/cluster/<daemon>.log
    
        debug="on"
    
    EXAMPLE
        An explicit configuration for the default settings would be:
    
            <logging to_syslog="yes" to_logfile="yes" syslog_facility="daemon"
                    syslog_priority="info" logfile_priority="info">
                <logging_daemon name="qdiskd"
                        logfile="/var/log/cluster/qdiskd.log"/>
                <logging_daemon name="fenced"
                        logfile="/var/log/cluster/fenced.log"/>
                <logging_daemon name="dlm_controld"
                        logfile="/var/log/cluster/dlm_controld.log"/>
                <logging_daemon name="gfs_controld"
                        logfile="/var/log/cluster/gfs_controld.log"/>
                <logging_daemon name="rgmanager"
                        logfile="/var/log/cluster/rgmanager.log"/>
                <logging_daemon name="corosync"
                        logfile="/var/log/cluster/corosync.log"/>
            </logging>
    
        To include debug messages (and above) from all daemons in their default log files, 
        either of the following which are equivalent:
    
            <logging debug="on"/>
            <logging logfile_priority="debug"/>
    
        To exclude all log messages from syslog:
    
            <logging to_syslog="no"/>
    
        To disable logging to all log files:
    
            <logging to_file="no"/>
    
        To include debug messages (and above) from all daemons in syslog:
    
            <logging syslog_priority="debug"/>
    
        To limit syslog messages to error (and above), keeping info (and above) in log files 
        (this logfile_priority setting is the default so could be omitted):
    
            <logging syslog_priority="error" logfile_priority="info"/>

    典型配置:

    ccs -h rhel64-node01 --setlogging to_syslog=yes syslog_facility=daemon syslog_priority=info to_logfile=yes logfile_priority=info
    ccs -h rhel64-node01 --addlogging name=qdiskd logfile="/var/log/cluster/qdiskd.log"
    ccs -h rhel64-node01 --addlogging name=fenced logfile="/var/log/cluster/fenced.log"
    ccs -h rhel64-node01 --addlogging name=dlm_controld logfile="/var/log/cluster/dlm_controld.log"
    ccs -h rhel64-node01 --addlogging name=gfs_controld logfile="/var/log/cluster/gfs_controld.log"
    ccs -h rhel64-node01 --addlogging name=rgmanager logfile="/var/log/cluster/rgmanager.log"
    ccs -h rhel64-node01 --addlogging name=corosync logfile="/var/log/cluster/corosync.log"
  • 2.13.6 同步配置文件到其他节点

    ccs -h <host> --sync --activate
    ccs -h <host> --checkconf
    ccs -f <file> -h <host> --setconf
    ccs -f file --checkconf

2.14 管理集群

  • 集群管理

    ccs -h <host> --start  # Start *AND* enable cluster services on reboot for host specified with "-h"
    ccs -h <host> --stop   # Stop *AND* disable cluster services on reboot for host specified with "-h"
    ccs -h <host> --startall [--noenable]  # Start *AND* enable cluster services on reboot for all nodes
    ccs -h <host> --stopall [--noenable]   #Stop *AND* disable cluster services on reboot for all nodes
  • 节点管理

    ccs -h <host> --lsnode
    ccs -h <host> --addnode <node> [--nodeid <nodeid>] [--vote <nodeid>]
    ccs -h <host> --rmnode <node>
  • 集群服务管理: clusvcadm

    Resource Group Control Commands:
      -v                     Display version and exit
      -d <group>             Disable <group>.  This stops a group
                             until an administrator enables it again,
                             the cluster loses and regains quorum, or
                             an administrator-defined event script
                             explicitly enables it again.
      -e <group>             Enable <group>
      -e <group> -F          Enable <group> according to failover
                             domain rules (deprecated; always the
                             case when using central processing)
      -e <group> -m <member> Enable <group> on <member>
      -r <group> -m <member> Relocate <group> [to <member>]
                             Stops a group and starts it on another
                             cluster member.
      -M <group> -m <member> Migrate <group> to <member>
                             (e.g. for live migration of VMs)
      -q                     Quiet operation
      -R <group>             Restart a group in place.
      -s <group>             Stop <group>.  This temporarily stops
                             a group.  After the next group or
                             or cluster member transition, the group
                             will be restarted (if possible).
      -Z <group>             Freeze resource group.  This prevents
                             transitions and status checks, and is 
                             useful if an administrator needs to 
                             administer part of a service without 
                             stopping the whole service.
      -U <group>             Unfreeze (thaw) resource group.  Restores
                             a group to normal operation.
      -c <group>             Convalesce (repair, fix) resource group.
                             Attempts to start failed, non-critical 
                             resources within a resource group.
    Resource Group Locking (for cluster Shutdown / Debugging):
      -l                     Lock local resource group managers.
                             This prevents resource groups from
                             starting.
      -S                     Show lock state
      -u                     Unlock resource group managers.
                             This allows resource groups to start.
    

对比 RHCS 6 和 RHCS 7

  • Cluster configuration file locations

    Redhat Cluster Releases Configuration files Description
    Prior to Redhat Cluster 7 /etc/cluster/cluster.conf Stores all the configuration of cluster
    Redhat Cluster 7 (RHEL 7) /etc/corosync/corosync.conf Membership and Quorum configuration
    Redhat Cluster 7 (RHEL 7) /var/lib/heartbeat/crm/cib.xml Cluster node and Resource configuration.
  • Commands

    Configuration Method Prior to Redhat Cluster 7 Redhat Cluster 7 (RHEL 7)
    Command Line utiltiy ccs pcs
    GUI tool luci PCSD – Pacemaker Web GUI Utility
  • Services

    Redhat Cluster Releases Services Description
    Prior to Redhat Cluster 7 rgmanager Cluster Resource Manager.
    Prior to Redhat Cluster 7 cman Manages cluster quorum and cluster membership.
    Prior to Redhat Cluster 7 ricci Cluster management and configuration daemon.
    Redhat Cluster 7 (RHEL 7) pcsd.service Cluster  Resource Manager.
    Redhat Cluster 7 (RHEL 7) corosync.service Manages cluster quorum and cluster membership.

    NOTES: 上表中的 cman 服务, 实际上也是由 corosync 提供:

    ~] service cman status
    corosync is stopped
  • Cluster user

    User Access Prior to Redhat Cluster 7 Redhat Cluster 7 (RHEL 7)
    Cluster user name ricci hacluster
  • How simple to create a cluster on RHEL 7 ?

    Redhat Cluster Releases Cluster Creation Description
    Prior to Redhat Cluster 7 ccs -h node1.ua.com –createcluster uacluster Create cluster on first node using ccs
    Prior to Redhat Cluster 7 ccs -h node1.ua.com –addnode node2.ua.com Add the second node  to the existing cluster
    Redhat Cluster 7 (RHEL 7) pcs cluster setup uacluster node1 node2 Create a cluster on both the nodes using pcs
  • Is there any pain to remove a cluster in RHEL 7 ?  No. It’s very simple.

    Redhat Cluster Releases Remove Cluster Description
    Prior to Redhat Cluster 7 rm /etc/cluster/cluster.conf Remove the cluster.conf file on each cluster nodes
    Prior to Redhat Cluster 7 service rgmanager stop
    service cman stop
    service ricci stop
    Stop the cluster services on each cluster nodes
    Prior to Redhat Cluster 7 chkconfig rgmanager off
    chkconfig cman off
    chkconfig ricci off
    Disable the cluster services from startup
    Redhat Cluster 7 (RHEL 7) pcs cluster destroy Destroy the cluster in one-shot using pacemaker

Others