Back to the main page
Compare VCS (Veritas cluster) and SGLX (Serviceguard cluster on Linux)
SGLX overview
The Serviceguard for Linux is HP business continuity solution, where your application runs in cluster environment, application can failover between nodes,
so you have app always running, with very short downtime while app fails over.
Cluster quorum
When SGLX fails, it reforms itself with the surviving nodes. Each node has one vote.
During cluster reformation, nodes that can communicate with each other, regroup and reform a cluster.
A cluster can be reformed only if the cluster quorum condition is achieved and this is when more than 50% of nodes running previously is available to form the cluster.
Split brain
This is situation when SGLX fails and exactly 50% of nodes is running. So some method is needed to break a tie.
This method guarantees that only one half can form a cluster and the other half will be shutdown.
It is a cluster lock. Either Lock LUN or Quorum Server can be configured as cluster lock.
Of course this is must in the clusters with only two nodes.
Lock LUN (for SGLX with up to 4 nodes)
Lock LUN acts as a quorum arbitration method, it's shared external storage LUN (among nodes), say 1G size.
During split brain situation, nodes races to obtain lock on Lock LUN. Once lock is obtained, it's marked so other nodes recognize the Lock LUN is taken.
My drawing shows Lock LUN 2-node cluster, when there is communication loss between nodes, both nodes race to get lock and node 2 wins, and node 1 can reboot.
To get running configuration of SGLX run : cmgetconf | grep -v ^# | grep -v ^$
The example can be
CLUSTER_NAME cluster_name
HOSTNAME_ADDRESS_FAMILY IPV4
NODE_NAME node-1
NETWORK_INTERFACE bond0
HEARTBEAT_IP 16.44.128.142
NETWORK_INTERFACE bond1
HEARTBEAT_IP 16.44.140.83
CLUSTER_LOCK_LUN /dev/sdd
NODE_NAME node-2
NETWORK_INTERFACE bond0
HEARTBEAT_IP 16.44.128.143
NETWORK_INTERFACE bond1
HEARTBEAT_IP 16.44.140.84
CLUSTER_LOCK_LUN /dev/sdd
MEMBER_TIMEOUT 14000000
AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000
SUBNET 16.44.128.0
IP_MONITOR OFF
SUBNET 16.44.140.0
IP_MONITOR OFF
MAX_CONFIGURED_PACKAGES 300
USER_NAME ANY_USER
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE monitor
|
Quorum server
QS can also be used as membership arbitration method for multiple clusters at a time.
There is no limitation in terms of number of nodes in cluster. The QS can be configured as a SG package or standalone, but in either case it must run on a system outside of cluster for which it provides quorum services.
The QS listens to connection requests from the SG nodes on a known port.
The QS maintains a special area in memory for each cluster, and when a node obtains the cluster lock, this area is marked so that other nodes can recognize the lock as taken.
My drawing shows QS operation in 2-node cluster.
When there is a loss of communication between nodes, the QS chooses node 2 to continue running in the cluster. The node 1 is reset.
To get running configuration of SGLX run : cmgetconf | grep -v ^# | grep -v ^$
The example can be
CLUSTER_NAME some_cluster_name
HOSTNAME_ADDRESS_FAMILY IPV4
QS_HOST qs_hostname.domain.com
QS_POLLING_INTERVAL 300000000
NODE_NAME node1_hostname
NETWORK_INTERFACE bond0
HEARTBEAT_IP 16.216.169.21
NETWORK_INTERFACE bond1
HEARTBEAT_IP 16.217.168.159
NODE_NAME node2_hostname
NETWORK_INTERFACE bond0
HEARTBEAT_IP 16.216.169.173
NETWORK_INTERFACE bond1
HEARTBEAT_IP 16.217.168.163
MEMBER_TIMEOUT 20000000
AUTO_START_TIMEOUT 600000000
NETWORK_POLLING_INTERVAL 2000000
SUBNET 16.216.168.0
IP_MONITOR OFF
SUBNET 16.217.168.0
IP_MONITOR OFF
MAX_CONFIGURED_PACKAGES 300
USER_NAME ANY_USER
USER_HOST ANY_SERVICEGUARD_NODE
USER_ROLE monitor
|
If you need to re-configure cluster (change QS, etc), first get its configuration with cmgetconf and redirect into a file,
edit the file and then check configuration with : cmcheckconf -C cl.ascii
If there is no errors, apply new configuration with : cmapplyconf -C cl.ascii
If you QS is RHEL, then have "qs" rpm
# rpm -q -i qs
Name : qs Relocations: (not relocatable)
Version : A.04.00.00 Vendor: (none)
Release : 0.rhel5 Build Date: Tue 09 Dec 2008 08:29:59 AM UTC
Install Date: Tue 03 Apr 2012 11:30:20 PM UTC Build Host: itv4.cup.hp.com
Group : System Source RPM: qs-A.04.00.00-0.rhel5.src.rpm
Size : 199105 License: Hewlett Packard Company, see the /usr/local/qs/conf/COPYRIGHT file for details
Signature : (none)
Packager : integ
URL : http://h18022.www1.hp.com/solutions/enterprise/highavailability/linux/serviceguard/
Summary : High Availability Clustering Quorum Server
Description : A network quorum device used for Serviceguard high availibility clustering.
|
# rpm -q -l qs
/usr/local/qs
/usr/local/qs/bin
/usr/local/qs/bin/qs
/usr/local/qs/bin/qsc
/usr/local/qs/conf
/usr/local/qs/conf/COPYRIGHT
/usr/local/qs/core
/usr/local/qs/doc
/usr/local/qs/doc/man
/usr/local/qs/doc/man/man1
/usr/local/qs/doc/man/man1/qs.1
/var/log/qs
|
The file that has list of cluster nodes is /usr/local/qs/conf/qs_authfile (or any other path), it looks like
node1.domain-B.com # comment on cluster 1
node2.domain-B.com # comment on cluster 1
node-1.domain-A.com # comment on cluster 2
node-2.domain-A.com # comment on cluster 2
|
After adding new nodes to this file, run the command /usr/local/qs/bin/qs -update
Node fencing
A reboot is done if a cluster node cannot communicate with the majority of cluster members for a predetermined time,
split brain or under other circumstances like failure of the cluster daemon (cmcld).
This reboot is initiated by DEADMAN driver, which act as a fencing mechanism in SGLX cluster.
The DEADMAN driver is a dynamically loadable kernel module that is compiled into the kernel automatically when SGLX is installed.
# modinfo deadman
filename: /lib/modules/2.6.32-358.18.1.el6.x86_64/extra/deadman.ko
supported: external
description: Deadman Timer
author: Eric Soderberg
license: GPL
srcversion: A308092D2E273EDC451BAE6
depends:
vermagic: 2.6.32-358.18.1.el6.x86_64 SMP mod_unload modversions
parm: mode:Mode for system operation when safety time expires : reboot or panic (charp)
|
Networking in the SGLX cluster
SGLX uses one or more heartbeat networks to send heartbeat among all nodes and to maintain the cluster membership.
SGLX also uses the heartbeat network for communication between nodes. So it is recommended to configure multiple heartbeat networks.
Applications deployed in SGLX can use their own network in the cluster (for their client access). SGLX can monitor/manage IP used by apps on these networks.
During failover, SGLX moves the application's IP from the failed node to the target node.
Also, this "app network" can be configured for heartbeat network traffic.
SGLX exchanges only small messages and does not have demanding bandwidth requirements.
VCS overview
The VCS is Symantec's business continuity solution for your apps.
VCS cluster membership
VCS has two (2) types of communications:
- intra system communications : VCS uses "Inter Process Messaging" IPM to communicate with the GUI, CLI and the agents
- inter system communications : VCS uses cluster interconnects for network communication between nodes (uses LLT and GAB to share data among cluster nodes. )
LLT is Low Latency Transport, uses its own protocol, not TCP/IP. It's kernel to kernel communication.
LLT has configuration files:
- /etc/llthosts (exist and same on every node, list system ID and nodes)
0 hostname-1
1 hostname-2
|
- /etc/llttab (exist on every node, but it's not same, defines local system's private network links to other nodes, see "man llttab")
# set-node is uniq for each node
set-node local_node_hostname
# cluster ID is same on every node
set-cluster 1323
# link defines local network cards used by LLT
# this is for hostname-1 and hostname-1-hb
link link1 udp - udp 50000 - x.x.x.x -
link link2 udp - udp 50001 - y.y.y.y -
# set IP of node on a links, so this is system_id=1 and it's hostname-2
# link1 is like hostname-2 interface = z.z.z.z
# link2 is hostname-2-hb interface = q.q.q.q
set-addr 1 link1 z.z.z.z
set-addr 1 link2 q.q.q.q
|
- Once LLT config file is setup, run LLT with lltconfig -c
- Verify it with lltconfig -a list
Link 0 (link1):
Node 0 hostname-1 : 16.192.16.95 permanent
Node 1 hostname-2 : 16.196.48.126 permanent
Link 1 (link2):
Node 0 hostname-1 : 16.193.16.91 permanent
Node 1 hostname-2 : 16.197.48.219 permanent
|
- lltstat reports LLT status, see "man lltstat"
# lltstat -n
LLT node information:
Node State Links
* 0 hostname-1 OPEN 2
1 hostname-2 OPEN 2
|
The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible for cluster membership and cluster communications.
When node is down, it stops sending heartbeats to other nodes (others wait 21 sec before considering node dead).
- The gabconfig commands uses GAB's config file, which is /etc/gabtab , so it configures GSB diver for use with 2 nodes
- To display GAB driver port memberships
# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 37a20a membership 01
Port h gen 37a20c membership 01
|
Each node runs kernel fencing module which is responsible for ensuring current cluster membership during any cluster change through the process of membership arbitration.
# lsmod | grep -e vxfen -e gab -e llt -e Module
Module Size Used by
vxfen 242088 0
gab 242576 4 vxfen
llt 167192 7 gab
# lsmod | grep vx
vxodm 206503 1
vxfen 306607 1
gab 283802 6 vxfen
vxspec 3174 4
vxio 3294713 9 vxspec
vxdmp 381042 72 vxspec,vxio
vxportal 5855 0
fdd 53986 2 vxodm
vxfs 3004314 630 vxportal,fdd
# service --status-all | grep vx
USAGE: OVTrcSrv start|stop|restart
Status of Veritas vxdbd
/opt/VRTSdbed/common/bin/vxdbd ping SUCCESS
vxodm is running...
in.vxrsyncd is stopped
vxconfigd (pid 9086) is running...
vxconfigd (pid 9086) is running...
vxesd (pid 12108) is running...
vxrelocd (pid 12371 12316) is running...
vxattachd (pid 12454 12396) is running...
vxcached (pid 12551 12493) is running...
vxvvrsecdgd is stopped
vxconfigbackupd (pid 12653 12595) is running...
# chkconfig | grep vx
vxatd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
vxdbdctrl 0:off 1:off 2:off 3:on 4:off 5:on 6:off
vxdcli 0:off 1:off 2:off 3:on 4:off 5:on 6:off
vxfen 0:off 1:off 2:off 3:on 4:on 5:on 6:off
vxfs 0:off 1:on 2:on 3:on 4:on 5:on 6:off
vxnm-vxnetd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
vxodm 0:off 1:off 2:off 3:on 4:on 5:on 6:off
vxrsyncd.sh 0:off 1:off 2:on 3:on 4:on 5:on 6:off
vxvm-boot 0:off 1:on 2:on 3:on 4:on 5:on 6:off
vxvm-reconfig 0:off 1:off 2:on 3:on 4:on 5:on 6:off
vxvm-recover 0:off 1:off 2:on 3:on 4:on 5:on 6:off
|
VCS cluster uses service called the coordination point for membership arbitration.
Coordination points provide a lock mechanism to determine which nodes get to fence off data drives from other nodes.
A node must eject a peer from the coordination point before it can fence the peer from the data drive.
The coordination point is usually SAN LUN, but can be a server. It's recommended having three (3) coordinator LUNs (say each one is 1G).
At the time of cluster reformation, the kernel fencing module of each node races for the control of coordination LUNs.
VCS prevents split brain by allowing the winner partition to fence the ejected nodes from accessing the data disks.
Example: Replacing coordinator LUNs online
SCSI-3 Persistent Reservations (SCSI-3 PR) are required for I/O fencing and resolve the issues of using SCSI reservations in VCS.
SCSI-3 PR enables access for multiple nodes to a device and simultaneously blocks access for other nodes.
SCSI-3 reservations are persistent across SCSI bus resets and support multiple paths from a host to a disk.
In contrast, only one host can use SCSI-2 reservations with one path.
If the need arises to block access to a device because of data integrity concerns, only one host and one path remain active.
The requirements for larger clusters, with multiple nodes reading and writing to storage in a controlled manner, make SCSI-2 reservations obsolete.
SCSI-3 PR uses a concept of registration and reservation.
Each system registers its own "key" with a SCSI-3 device.
Multiple systems registering keys form a membership and establish a reservation, typically set to "Write Exclusive Registrants Only."
The WERO setting enables only registered systems to perform write operations.
For a given disk, only one reservation can exist amidst numerous registrations.
With SCSI-3 PR technology, blocking write access is as simple as removing a registration from a device.
Only registered members can "eject" the registration of another member.
A member wishing to eject another member issues a "preempt and abort" command.
Ejecting a node is final and atomic; an ejected node cannot eject another node.
In VCS, a node registers the same key for all paths to the device.
A single preempt and abort command ejects a node from all paths to the storage device.
Prepare disk for Veritas:
/etc/vx/bin/vxdisksetup -i $DISK format=cdsdisk
|
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
cciss/c0d0 auto:none - - online invalid
sda auto - - LVM
sdaw auto:LVM - - LVM
sdax auto:cdsdisk - - online thinrclm new coordinator LUN
sday auto:cdsdisk - - online thinrclm new coordinator LUN
sdaz auto:cdsdisk - - online thinrclm new coordinator LUN
sdb auto:cdsdisk sdb dg02 online shared
sdba auto:cdsdisk - - online thinrclm
sdbb auto:cdsdisk - - online thinrclm
sdc auto:cdsdisk - - online
sdd auto:cdsdisk - (dgcoord) online to be replaced
sde auto:cdsdisk - (dgcoord) online to be replaced
sdf auto:cdsdisk - (dgcoord) online to be replaced
# cat /etc/vxfendg
dgcoord
# vxdg -tfC import `cat /etc/vxfendg`
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
cciss/c0d0 auto:none - - online invalid
sda auto - - LVM
sdaw auto:LVM - - LVM
sdax auto:cdsdisk - - online thinrclm
sday auto:cdsdisk - - online thinrclm
sdaz auto:cdsdisk - - online thinrclm
sdb auto:cdsdisk sdb dg02 online shared
sdba auto:cdsdisk - - online thinrclm
sdbb auto:cdsdisk - - online thinrclm
sdc auto:cdsdisk - - online
sdd auto:cdsdisk sdd dgcoord online
sde auto:cdsdisk sde dgcoord online
sdf auto:cdsdisk sdf dgcoord online
# vxdg -g dgcoord set coordinator=off
# vxdiskadm ** Use vxdiskadm to remove sdd and sde **
# vxdiskadm
Volume Manager Support Operations
Menu:: VolumeManager/Disk
1 Add or initialize one or more disks
2 Encapsulate one or more disks
3 Remove a disk <------------------ remove disk
4 Remove a disk for replacement
5 Replace a failed or removed disk
6 Mirror volumes on a disk
7 Move volumes from a disk
8 Enable access to (import) a disk group
9 Remove access to (deport) a disk group
10 Enable (online) a disk device
11 Disable (offline) a disk device
12 Mark a disk as a spare for a disk group
13 Turn off the spare flag on a disk
14 Unrelocate subdisks back to a disk
15 Exclude a disk from hot-relocation use
16 Make a disk available for hot-relocation use
17 Prevent multipathing/Suppress devices from VxVM's view
18 Allow multipathing/Unsuppress devices from VxVM's view
19 List currently suppressed/non-multipathed devices
20 Change the disk naming scheme
21 Change/Display the default disk layouts
22 Mark a disk as allocator-reserved for a disk group
23 Turn off the allocator-reserved flag on a disk
list List disk information
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: <--- Select 2, follow instructions ...
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
cciss/c0d0 auto:none - - online invalid
sda auto - - LVM
sdaw auto:LVM - - LVM
sdax auto:cdsdisk - - online thinrclm
sday auto:cdsdisk - - online thinrclm
sdaz auto:cdsdisk - - online thinrclm
sdb auto:cdsdisk sdb dg02 online shared
sdba auto:cdsdisk - - online thinrclm
sdbb auto:cdsdisk - - online thinrclm
sdc auto:cdsdisk - - online
sdd auto:cdsdisk - - online
sde auto:cdsdisk - - online
sdf auto:cdsdisk sdf dgcoord online
# vxdg -g dgcoord adddisk sdax sday *** add new LUNS ***
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
cciss/c0d0 auto:none - - online invalid
sda auto - - LVM
sdaw auto:LVM - - LVM
sdax auto:cdsdisk sdax dgcoord online thinrclm
sday auto:cdsdisk sday dgcoord online thinrclm
sdaz auto:cdsdisk - - online thinrclm
sdb auto:cdsdisk sdb dg02 online shared
sdba auto:cdsdisk - - online thinrclm
sdbb auto:cdsdisk - - online thinrclm
sdc auto:cdsdisk - - online
sdd auto:cdsdisk - - online
sde auto:cdsdisk - - online
sdf auto:cdsdisk sdf dgcoord online
# vxdiskadm **** remove sdf ****
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
cciss/c0d0 auto:none - - online invalid
sda auto - - LVM
sdaw auto:LVM - - LVM
sdax auto:cdsdisk sdax dgcoord online thinrclm
sday auto:cdsdisk sday dgcoord online thinrclm
sdaz auto:cdsdisk - - online thinrclm
sdb auto:cdsdisk sdb dg02 online shared
sdba auto:cdsdisk - - online thinrclm
sdbb auto:cdsdisk - - online thinrclm
sdc auto:cdsdisk - - online
sdd auto:cdsdisk - - online
sde auto:cdsdisk - - online
sdf auto:cdsdisk - - online
# vxdg -g dgcoord adddisk sdaz *** add new LUN ***
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
cciss/c0d0 auto:none - - online invalid
sda auto - - LVM
sdaw auto:LVM - - LVM
sdax auto:cdsdisk sdax dgcoord online thinrclm
sday auto:cdsdisk sday dgcoord online thinrclm
sdaz auto:cdsdisk sdaz dgcoord online thinrclm
sdb auto:cdsdisk sdb dg02 online shared
sdba auto:cdsdisk - - online thinrclm
sdbb auto:cdsdisk - - online thinrclm
sdc auto:cdsdisk - - online
sdd auto:cdsdisk - - online
sde auto:cdsdisk - - online
sdf auto:cdsdisk - - online
# vxdg -g dgcoord set coordinator=on
# vxdg deport dgcoord
# vxdg -t import dgcoord
# vxdg deport dgcoord
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
cciss/c0d0 auto:none - - online invalid
sda auto - - LVM
sdaw auto:LVM - - LVM
sdax auto:cdsdisk - (dgcoord) online thinrclm
sday auto:cdsdisk - (dgcoord) online thinrclm
sdaz auto:cdsdisk - (dgcoord) online thinrclm
sdb auto:cdsdisk sdb dg02 online shared
sdba auto:cdsdisk - - online thinrclm
sdbb auto:cdsdisk - - online thinrclm
sdc auto:cdsdisk - - online
sdd auto:cdsdisk - - online
sde auto:cdsdisk - - online
sdf auto:cdsdisk - - online
|
Comparison of some terminologies between VCS and SGLX
VCS | SGLX | Description |
Service group | Package | Collection of all hardware/software resources together as single unit |
Resource agent |
Modules |
Way to manage (start/stop/monitor) hardware/software resources via SGLX package or VCS service groups
In VCS, you can include resources in service group and use corresponding resource agent for monitoring (like IP resource agent, filesystem resource agent, mount resource agent).
In Serviceguard, you can also include resources in a package by including corresponding resource module (modules of volume group, IP, filesystem).
These modules only start and stop the resources.
For monitoring the resources, you can configure generic resources in the SGLX package.
To configure a generic resource, a monitoring script needs to be written which can monitor the resource and set the status of generic resource.
|
Agent | Toolkit |
A framework to manage (start/stop/monitor) a specific application.
VCS agents are processes that manage resources. VCS has one resource agent per resource type.
A single agent manages all resources of that type.
The agent starts/stops the resource and periodically monitors the resource, and updates the VCS engine with the resource status.
The VCS agents are:
- Bundled agents (provided by VCS, include agents for disk, IP, mount, etc)
- Enterprise agents (control 3rd party applications, liek Oracle)
- Custom agents (developed for apps not supported by previous two)
SGLX toolkits enable integration of apps with SGLX and also starts/monitors/stops the application.
SGLX provides separate toolkits for supported applications (Oracle, NFS, Apache, MySQL, Samba) to create application specific SG package.
|
Disk fencing | Lock LUN | One of the cluster membership arbitration mechanisms |
Coordination point server | Quorum Server | One of the cluster membership arbitration mechanisms |
Failover service groups | Failover package | A packaged single unit which contains software/hardware resources and runs on one node at a time |
Parallel service groups | Multi node package | A service group/package that can run on multiple nodes at a time |
hastart | cmruncl | Command to start cluster |
hastop | cmhaltcl | Command to stop cluster |
hagrp -online <service-group> -sys <node> | cmrunpkg -n <node> pkgname | Command to start service_group or package |
hagrp -offline <service-group> -sys <node> | cmhaltpkg -n <node> pkgname | Command to stop cluster |
hastatus -sum | cmvielcl | Command to view cluster status |
haconf -makerw ; hagrp -add <service_group> ; haconf -dump –makero | cmmakepkg <pkg_conf_file> ; edit the file ; cmcheckconf -P pkg_conf ; cmapplyconf -P pkg_conf | To add/create VCS service group or SG package |
haconf -makerw ; hagrp -modify <service_group> ; haconf -dump –makero | cmgetconf -p pkg_name > pkg.ascii ; edit file ; cmcheckconf -P pkg_conf ; cmapplyconf -P pkg_conf | To modify VCS service group or SG package |
haconf -makerw ; hagrp -delete <service_group> ; haconf -dump –makero | cmdeleteconf -P pkg_name | To delete VCS service group or SG package |
Example : VCS Failover
# hagrp -list
ClusterService g5t0580g
ClusterService g6t0582g
gvt1207 g5t0580g
gvt1207 g6t0582g
# hagrp -switch ClusterService -to g6t0582g
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A g5t0580g RUNNING 0
A g6t0582g RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService g5t0580g Y N OFFLINE
B ClusterService g6t0582g Y N ONLINE
B gvt1207 g5t0580g Y N ONLINE
B gvt1207 g6t0582g Y N OFFLINE
# hagrp -switch gvt1207 -to g6t0582g
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A g5t0580g RUNNING 0
A g6t0582g RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService g5t0580g Y N OFFLINE
B ClusterService g6t0582g Y N ONLINE
B gvt1207 g5t0580g Y N STOPPING|PARTIAL
B gvt1207 g6t0582g Y N OFFLINE
-- RESOURCES OFFLINING
-- Group Type Resource System IState
F gvt1207 Application app_TidalSG3 g5t0580g W_OFFLINE_PROPAGATE
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A g5t0580g RUNNING 0
A g6t0582g RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService g5t0580g Y N OFFLINE
B ClusterService g6t0582g Y N ONLINE
B gvt1207 g5t0580g Y N OFFLINE
B gvt1207 g6t0582g Y N STARTING|PARTIAL
-- RESOURCES ONLINING
-- Group Type Resource System IState
E gvt1207 Application app_TidalSG3 g6t0582g W_ONLINE
# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A g5t0580g RUNNING 0
A g6t0582g RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService g5t0580g Y N OFFLINE
B ClusterService g6t0582g Y N ONLINE
B gvt1207 g5t0580g Y N OFFLINE
B gvt1207 g6t0582g Y N ONLINE
|
Or use Veritas Cluster Manager - Java Console, right click on service group and select Switch to ...
Example : Adding nfs module to SG package
Install NFS rpms with yum install nfs-utils
Install sg toolkit serviceguard-nfs-toolkit with yum install serviceguard-nfs-toolkit
Add nfs module to SG package
# cmmakepkg -i gvt2062.ascii.7.2 -m tkit/nfs/nfs gvt2062.ascii.7.2.nfs
Package template is created.
This file must be edited before it can be used.
|
Add NFS clients in netgroup (file /etc/netgroup) and sync it to other nodes
Alter /etc/nsswitch.conf and have line
Example of package ascii file is:
package_name gvt2062
package_description "Serviceguard Package"
module_name sg/basic
module_version 1
module_name sg/priority
module_version 1
module_name sg/pr_cntl
module_version 2
module_name sg/all
module_version 2
module_name sg/failover
module_version 1
module_name sg/dependency
module_version 1
module_name sg/weight
module_version 1
module_name sg/monitor_subnet
module_version 1
module_name sg/package_ip
module_version 1
module_name sg/service
module_version 1
module_name sg/generic_resource
module_version 1
module_name sg/volume_group
module_version 1
module_name sg/filesystem
module_version 1
module_name sg/pev
module_version 1
module_name sg/external_pre
module_version 1
module_name sg/external
module_version 1
module_name sg/acp
module_version 1
module_name tkit/nfs/nfs
module_version 1
package_type failover
node_name *
auto_run yes
node_fail_fast_enabled no
run_script_timeout no_timeout
halt_script_timeout no_timeout
successor_halt_timeout no_timeout
script_log_file /usr/local/cmcluster/run/log/gvt2062.log
operation_sequence $SGCONF/scripts/sg/external_pre.sh
operation_sequence $SGCONF/scripts/sg/pr_cntl.sh
operation_sequence $SGCONF/scripts/sg/volume_group.sh
operation_sequence $SGCONF/scripts/sg/filesystem.sh
operation_sequence $SGCONF/scripts/sg/package_ip.sh
operation_sequence $SGCONF/scripts/tkit/nfs/tkit_module.sh
operation_sequence $SGCONF/scripts/sg/external.sh
operation_sequence $SGCONF/scripts/sg/service.sh
failover_policy configured_node
failback_policy manual
priority no_priority
ip_subnet 16.216.168.0
ip_subnet_node g9t4711c
ip_subnet_node g9t4712c
ip_address 16.216.170.52
service_name gvt2062_nfs_monitor
service_cmd "$SGCONF/scripts/tkit/nfs/tkit_module.sh nfs_monitor"
service_restart none
service_fail_fast_enabled no
service_halt_timeout 300
vgchange_cmd "vgchange -a y"
vg vg01
concurrent_fsck_operations 1
concurrent_mount_and_umount_operations 1
fs_mount_retry_count 0
fs_umount_retry_count 1
fs_name /dev/vg01/lvol01
fs_server ""
fs_directory /opt/nfs/upp100
fs_type "ext4"
fs_mount_opt ""
fs_umount_opt ""
fs_fsck_opt ""
fs_name /dev/vg01/lvol02
fs_server ""
fs_directory /opt/nfs/upp200
fs_type "ext4"
fs_mount_opt ""
fs_umount_opt ""
fs_fsck_opt ""
fs_name /dev/vg01/lvol03
fs_server ""
fs_directory /opt/nfs/upp300
fs_type "ext4"
fs_mount_opt ""
fs_umount_opt ""
fs_fsck_opt ""
tkit/nfs/nfs/TKIT_DIR /usr/local/cmcluster/conf/modules/tkit/nfs
tkit/nfs/nfs/XFS "-o rw @upp:/opt/nfs/upp100"
tkit/nfs/nfs/XFS "-o rw @upp:/opt/nfs/upp200"
tkit/nfs/nfs/XFS "-o rw @upp:/opt/nfs/upp300"
tkit/nfs/nfs/QUOTA_MON yes
tkit/nfs/nfs/LOCK_MIGRATION no
tkit/nfs/nfs/MAINTENANCE_FLAG yes
tkit/nfs/nfs/MONITOR_INTERVAL 30
tkit/nfs/nfs/RETRY_INTERVAL 2
tkit/nfs/nfs/RETRY_TIMES 0
|
Check and apply configuration
# cmcheckconf -v -P gvt2062.ascii.7.2.nfs
# cmapplyconf -v -P gvt2062.ascii.7.2.nfs
|
After applying, the file hanfs.conf will be created in /usr/local/cmcluster/conf/modules/tkit/nfs/
Example : Samba in SG environment (not using samba kit)
Samba SG package configuration is:
# cmgetconf -p mapbase_itg | grep -v ^# | grep -v ^$
package_name mapbase_itg
package_description "Serviceguard Package"
module_name sg/basic
module_version 1
module_name sg/priority
module_version 1
module_name sg/pr_cntl
module_version 2
module_name sg/all
module_version 2
module_name sg/failover
module_version 1
module_name sg/dependency
module_version 1
module_name sg/weight
module_version 1
module_name sg/monitor_subnet
module_version 1
module_name sg/package_ip
module_version 1
module_name sg/service
module_version 1
module_name sg/generic_resource
module_version 1
module_name sg/volume_group
module_version 1
module_name sg/filesystem
module_version 1
module_name sg/pev
module_version 1
module_name sg/external_pre
module_version 1
module_name sg/external
module_version 1
module_name sg/acp
module_version 1
package_type failover
node_name *
auto_run yes
node_fail_fast_enabled no
run_script_timeout no_timeout
halt_script_timeout no_timeout
successor_halt_timeout no_timeout
script_log_file /usr/local/cmcluster/run/log/mapbase_itg.log
operation_sequence $SGCONF/scripts/sg/external_pre.sh
operation_sequence $SGCONF/scripts/sg/pr_cntl.sh
operation_sequence $SGCONF/scripts/sg/volume_group.sh
operation_sequence $SGCONF/scripts/sg/filesystem.sh
operation_sequence $SGCONF/scripts/sg/package_ip.sh
operation_sequence $SGCONF/scripts/sg/external.sh
operation_sequence $SGCONF/scripts/sg/service.sh
log_level 3
priority no_priority
failover_policy configured_node
failback_policy manual
ip_subnet 16.44.128.0
ip_subnet_node s48t0044c
ip_subnet_node s48t0045c
ip_address 16.44.128.169
service_name mapbase_itg_samba_monitor
service_cmd /usr/local/cmcluster/conf/mapbase/mapbase-mon.sh
service_restart none
service_fail_fast_enabled no
service_halt_timeout 300
vgchange_cmd "vgchange -a y"
vg vg03
concurrent_fsck_operations 1
concurrent_mount_and_umount_operations 1
fs_mount_retry_count 0
fs_umount_retry_count 1
fs_name /dev/vg03/lvol01
fs_server ""
fs_directory /opt/mapbase
fs_type "ext4"
fs_mount_opt ""
fs_umount_opt ""
fs_fsck_opt ""
external_script /usr/local/cmcluster/conf/mapbase/mapbase-samba.sh
|
Monitoring script is /usr/local/cmcluster/conf/mapbase/mapbase-mon.sh
#!/bin/bash
function monitor_mapbase_command
{
#sg_log 5 "monitoring_command"
sleep 5
smbd_pid=`cat /var/run/samba/mapbase/smbd-smb.conf.mapbase.pid`
nmbd_pid=`cat /var/run/samba/mapbase/nmbd-smb.conf.mapbase.pid`
while :; do
sleep 300
if [ -f /proc/$smbd_pid/stat ] && [ -f /proc/$nmbd_pid/stat ]
then
continue
else
break
fi
return 0
done
exit 1
}
monitor_mapbase_command
|
The external script (used by SGLX to start/stop samba) is:
# cat /usr/local/cmcluster/conf/mapbase/mapbase-samba.sh | grep -v ^#
if [[ -z $SG_UTILS ]]
then
. /etc/cmcluster.conf
SG_UTILS=$SGCONF/scripts/mscripts/utils.sh
fi
if [[ -f ${SG_UTILS} ]]; then
. ${SG_UTILS}
if (( $? != 0 ))
then
echo "ERROR: Unable to source package utility functions file: ${SG_UTILS}"
exit 1
fi
else
echo "ERROR: Unable to find package utility functions file: ${SG_UTILS}"
exit 1
fi
sg_source_pkg_env $*
function validate_command
{
# Output messages will only be displayed in STDOUT, if there is
# any error condition while executing the master control script
# with a "validate" parameter from cmcheckconf/cmapplyconf command
sg_log 5 "validate_command"
# ADD your package validation steps here
return 0
}
function start_command
{
sg_log 5 "start_command"
/usr/sbin/smbd -D -s /etc/samba/smb.conf.mapbase
/usr/sbin/nmbd -D -s /etc/samba/smb.conf.mapbase
# ADD your package start steps here
return 0
}
function stop_command
{
sg_log 5 "stop_command"
/bin/kill `cat /var/run/samba/mapbase/smbd-smb.conf.mapbase.pid`
/bin/kill `cat /var/run/samba/mapbase/nmbd-smb.conf.mapbase.pid`
# ADD your package halt steps here
return 0
}
sg_log 5 "customer defined script"
typeset -i exit_val=0
case ${1} in
start)
start_command $*
exit_val=$?
;;
stop)
stop_command $*
exit_val=$?
;;
validate)
validate_command $*
exit_val=$?
;;
*)
sg_log 0 "INFO: Unknown operation: $1"
;;
esac
exit $exit_val
|
See running processes:
# ps -ef | grep smb
root 8953 1 0 02:09 ? 00:00:00 /usr/sbin/smbd -D -s /etc/samba/smb.conf.mapbase
root 8955 1 0 02:09 ? 00:00:00 /usr/sbin/nmbd -D -s /etc/samba/smb.conf.mapbase
root 8963 8953 0 02:10 ? 00:00:00 /usr/sbin/smbd -D -s /etc/samba/smb.conf.mapbase
root 10127 17672 0 02:10 pts/1 00:00:00 grep smb
|
See samba configuration file:
# cat /etc/samba/smb.conf.mapbase
[global]
workgroup = AMERICAS
security = ADS
realm = AMERICAS.DOMAIN.COM
# use short name
netbios name = mapbaseitg
interfaces = 16.44.128.169/22 16.44.128.169/255.255.252.0 127.0.0.0/8
#interfaces = 16.44.128.169
bind interfaces only = yes
encrypt passwords = yes
domain master = no
os level = 10
load printers = no
guest ok = Yes
create mask = 0644
kernel oplocks = yes
unix extensions = no
client lanman auth = no
client ntlmv2 auth = yes
client plaintext auth = no
client schannel = auto
client signing = auto
client use spnego = yes
log file = /var/log/samba/mapbase/logs/log.%m
pid directory = /var/run/samba/mapbase
lock directory = /var/run/samba/mapbase/locks
private dir = /var/log/samba/mapbase/private
[testers]
comment = mapbase share
path = /opt/mapbase
browseable = yes
valid users = EMEA\name1 AMERICAS\name2
read only = no
create mode = 0666
|
Make sure the private directory is on shared LUN, so after failover samba virtual IP doesn't need to re-authorize.
Samba is using lightweight database called Trivial Database (tdb) to store persistent and transient data (in private directory).
Some tdb files are removed before restarting Samba, but others are used to store information that is vital to Samba behavior.
# ls -la /var/log/samba/mapbase/private
lrwxrwxrwx 1 root sys 27 Jul 11 02:06 /var/log/samba/mapbase/private -> /opt/mapbase/.samba/private
# ls -la /opt/mapbase/.samba/private
total 88
drwxr-xr-x 2 root sys 4096 Jul 10 22:51 .
drwxr-xr-x 3 root sys 4096 Jul 10 22:50 ..
-rw------- 1 root root 36864 Jul 10 22:26 passdb.tdb
-rw------- 1 root root 45056 Jul 10 22:26 secrets.tdb
|
See: https://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/tdb.html
passdb.tdb - This stores the Samba SAM account information when using a tdbsam password backend.
secrets.tdb - This tdb file stores machine and the domain SID, secret passwords that are used with LDAP,
the machine secret token, etc. This is an essential file that is stored in a secure area.
To checking config file
# testparm /etc/samba/smb.conf.mapbase
Load smb config files from /etc/samba/smb.conf.mapbase
rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384)
Processing section "[testers]"
Loaded services file OK.
Server role: ROLE_DOMAIN_MEMBER
Press enter to see a dump of your service definitions
[global]
workgroup = AMERICAS
realm = AMERICAS.DOMAIN.COM
netbios name = MAPBASE_ITG.IRL.DOMAIN.COM
interfaces = 16.44.128.169
bind interfaces only = Yes
security = ADS
private dir = /var/log/samba/mapbase/private
client NTLMv2 auth = Yes
log file = /var/log/samba/mapbase/logs/log.%m
os level = 10
domain master = No
lock directory = /var/run/samba/mapbase/locks
pid directory = /var/run/samba/mapbase
[testers]
comment = mapbase share
path = /opt/mapbase
valid users = EMEA\name1, AMERICAS\name2
read only = No
create mask = 0666
|
Joining to Domain
- In case of cluster you must have samba package running, since private folder is on shared LUN
- Run : net ads join -W "IT Servers\NGDC CIFS" -U account -s /etc/samba/smb.conf.gradebook
-
Commands to test
- net ads info -s /etc/samba/smb.conf.gradebook
- net ads status -s /etc/samba/smb.conf.gradebook
- net ads lookup -s /etc/samba/smb.conf.gradebook
Back to the main page