Compare VCS (Veritas cluster) and SGLX (Serviceguard cluster on Linux)

SGLX overview

The Serviceguard for Linux is HP business continuity solution, where your application runs in cluster environment, application can failover between nodes, so you have app always running, with very short downtime while app fails over.

Cluster quorum

When SGLX fails, it reforms itself with the surviving nodes. Each node has one vote. During cluster reformation, nodes that can communicate with each other, regroup and reform a cluster. A cluster can be reformed only if the cluster quorum condition is achieved and this is when more than 50% of nodes running previously is available to form the cluster.

Split brain

This is situation when SGLX fails and exactly 50% of nodes is running. So some method is needed to break a tie. This method guarantees that only one half can form a cluster and the other half will be shutdown. It is a cluster lock. Either Lock LUN or Quorum Server can be configured as cluster lock. Of course this is must in the clusters with only two nodes.

Lock LUN (for SGLX with up to 4 nodes)

Lock LUN acts as a quorum arbitration method, it's shared external storage LUN (among nodes), say 1G size. During split brain situation, nodes races to obtain lock on Lock LUN. Once lock is obtained, it's marked so other nodes recognize the Lock LUN is taken. My drawing shows Lock LUN 2-node cluster, when there is communication loss between nodes, both nodes race to get lock and node 2 wins, and node 1 can reboot.

To get running configuration of SGLX run : cmgetconf | grep -v ^# | grep -v ^$

The example can be

CLUSTER_NAME            cluster_name
HOSTNAME_ADDRESS_FAMILY         IPV4
NODE_NAME               node-1
  NETWORK_INTERFACE     bond0
    HEARTBEAT_IP        16.44.128.142
  NETWORK_INTERFACE     bond1
    HEARTBEAT_IP        16.44.140.83
  CLUSTER_LOCK_LUN      /dev/sdd 
NODE_NAME               node-2
  NETWORK_INTERFACE     bond0
    HEARTBEAT_IP        16.44.128.143
  NETWORK_INTERFACE     bond1
    HEARTBEAT_IP        16.44.140.84
  CLUSTER_LOCK_LUN      /dev/sdd 
MEMBER_TIMEOUT          14000000
AUTO_START_TIMEOUT      600000000
NETWORK_POLLING_INTERVAL        2000000
SUBNET 16.44.128.0
  IP_MONITOR OFF
SUBNET 16.44.140.0
  IP_MONITOR OFF
MAX_CONFIGURED_PACKAGES         300
USER_NAME               ANY_USER
USER_HOST               ANY_SERVICEGUARD_NODE
USER_ROLE               monitor

Quorum server

QS can also be used as membership arbitration method for multiple clusters at a time. There is no limitation in terms of number of nodes in cluster. The QS can be configured as a SG package or standalone, but in either case it must run on a system outside of cluster for which it provides quorum services. The QS listens to connection requests from the SG nodes on a known port. The QS maintains a special area in memory for each cluster, and when a node obtains the cluster lock, this area is marked so that other nodes can recognize the lock as taken. My drawing shows QS operation in 2-node cluster. When there is a loss of communication between nodes, the QS chooses node 2 to continue running in the cluster. The node 1 is reset.

To get running configuration of SGLX run : cmgetconf | grep -v ^# | grep -v ^$

The example can be

CLUSTER_NAME            some_cluster_name
HOSTNAME_ADDRESS_FAMILY         IPV4
QS_HOST                 qs_hostname.domain.com 
QS_POLLING_INTERVAL     300000000 
NODE_NAME               node1_hostname
  NETWORK_INTERFACE     bond0
    HEARTBEAT_IP        16.216.169.21
  NETWORK_INTERFACE     bond1
    HEARTBEAT_IP        16.217.168.159
NODE_NAME               node2_hostname
  NETWORK_INTERFACE     bond0
    HEARTBEAT_IP        16.216.169.173
  NETWORK_INTERFACE     bond1
    HEARTBEAT_IP        16.217.168.163
MEMBER_TIMEOUT          20000000
AUTO_START_TIMEOUT      600000000
NETWORK_POLLING_INTERVAL        2000000
SUBNET 16.216.168.0
  IP_MONITOR OFF
SUBNET 16.217.168.0
  IP_MONITOR OFF
MAX_CONFIGURED_PACKAGES         300
USER_NAME               ANY_USER
USER_HOST               ANY_SERVICEGUARD_NODE
USER_ROLE               monitor

If you need to re-configure cluster (change QS, etc), first get its configuration with cmgetconf and redirect into a file, edit the file and then check configuration with : cmcheckconf -C cl.ascii
If there is no errors, apply new configuration with : cmapplyconf -C cl.ascii

If you QS is RHEL, then have "qs" rpm

# rpm -q -i qs
Name        : qs                           Relocations: (not relocatable)
Version     : A.04.00.00                        Vendor: (none)
Release     : 0.rhel5                       Build Date: Tue 09 Dec 2008 08:29:59 AM UTC
Install Date: Tue 03 Apr 2012 11:30:20 PM UTC      Build Host: itv4.cup.hp.com
Group       : System                        Source RPM: qs-A.04.00.00-0.rhel5.src.rpm
Size        : 199105              License: Hewlett Packard Company, see the /usr/local/qs/conf/COPYRIGHT file for details
Signature   : (none)
Packager    : integ
URL         : http://h18022.www1.hp.com/solutions/enterprise/highavailability/linux/serviceguard/
Summary     : High Availability Clustering Quorum Server
Description : A network quorum device used for Serviceguard high availibility clustering.

#  rpm -q -l qs 
/usr/local/qs
/usr/local/qs/bin
/usr/local/qs/bin/qs
/usr/local/qs/bin/qsc
/usr/local/qs/conf
/usr/local/qs/conf/COPYRIGHT
/usr/local/qs/core
/usr/local/qs/doc
/usr/local/qs/doc/man
/usr/local/qs/doc/man/man1
/usr/local/qs/doc/man/man1/qs.1
/var/log/qs

The file that has list of cluster nodes is /usr/local/qs/conf/qs_authfile (or any other path), it looks like

node1.domain-B.com 	# comment on cluster 1
node2.domain-B.com 	# comment on cluster 1
node-1.domain-A.com 	# comment on cluster 2
node-2.domain-A.com 	# comment on cluster 2

After adding new nodes to this file, run the command /usr/local/qs/bin/qs -update

Node fencing

A reboot is done if a cluster node cannot communicate with the majority of cluster members for a predetermined time, split brain or under other circumstances like failure of the cluster daemon (cmcld). This reboot is initiated by DEADMAN driver, which act as a fencing mechanism in SGLX cluster. The DEADMAN driver is a dynamically loadable kernel module that is compiled into the kernel automatically when SGLX is installed.

#  modinfo deadman 
filename:       /lib/modules/2.6.32-358.18.1.el6.x86_64/extra/deadman.ko
supported:      external
description:    Deadman Timer
author:         Eric Soderberg
license:        GPL
srcversion:     A308092D2E273EDC451BAE6
depends:
vermagic:       2.6.32-358.18.1.el6.x86_64 SMP mod_unload modversions
parm:           mode:Mode for system operation when safety time expires : reboot or panic (charp)

Networking in the SGLX cluster

SGLX uses one or more heartbeat networks to send heartbeat among all nodes and to maintain the cluster membership. SGLX also uses the heartbeat network for communication between nodes. So it is recommended to configure multiple heartbeat networks. Applications deployed in SGLX can use their own network in the cluster (for their client access). SGLX can monitor/manage IP used by apps on these networks. During failover, SGLX moves the application's IP from the failed node to the target node. Also, this "app network" can be configured for heartbeat network traffic. SGLX exchanges only small messages and does not have demanding bandwidth requirements.

VCS overview

The VCS is Symantec's business continuity solution for your apps.

VCS cluster membership

VCS has two (2) types of communications:

intra system communications : VCS uses "Inter Process Messaging" IPM to communicate with the GUI, CLI and the agents
inter system communications : VCS uses cluster interconnects for network communication between nodes (uses LLT and GAB to share data among cluster nodes. )

LLT is Low Latency Transport, uses its own protocol, not TCP/IP. It's kernel to kernel communication. LLT has configuration files:

/etc/llthosts (exist and same on every node, list system ID and nodes)

0 hostname-1
1 hostname-2

/etc/llttab (exist on every node, but it's not same, defines local system's private network links to other nodes, see "man llttab")

# set-node is uniq for each node
set-node local_node_hostname
# cluster ID is same on every node
set-cluster 1323
# link defines local network cards used by LLT
# this is for hostname-1 and hostname-1-hb
link link1 udp - udp 50000 - x.x.x.x -
link link2 udp - udp 50001 - y.y.y.y -
# set IP of node on a links, so this is system_id=1 and it's hostname-2
# link1 is like hostname-2 interface = z.z.z.z
# link2 is hostname-2-hb interface = q.q.q.q
set-addr        1 link1 z.z.z.z
set-addr        1 link2 q.q.q.q

Once LLT config file is setup, run LLT with lltconfig -c
Verify it with lltconfig -a list

Link 0 (link1):
  Node   0 hostname-1   :   16.192.16.95  permanent
  Node   1 hostname-2   :   16.196.48.126  permanent

Link 1 (link2):
  Node   0 hostname-1   :   16.193.16.91  permanent
  Node   1 hostname-2   :   16.197.48.219  permanent

lltstat reports LLT status, see "man lltstat"

# lltstat -n
LLT node information:
    Node                 State    Links
   * 0 hostname-1          OPEN        2
     1 hostname-2          OPEN        2

The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible for cluster membership and cluster communications. When node is down, it stops sending heartbeats to other nodes (others wait 21 sec before considering node dead).

The gabconfig commands uses GAB's config file, which is /etc/gabtab , so it configures GSB diver for use with 2 nodes

/sbin/gabconfig -c -n2

To display GAB driver port memberships

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   37a20a membership 01
Port h gen   37a20c membership 01

Each node runs kernel fencing module which is responsible for ensuring current cluster membership during any cluster change through the process of membership arbitration.

#  lsmod | grep -e vxfen -e gab -e llt -e Module 
Module                  Size  Used by
vxfen                 242088  0
gab                   242576  4 vxfen
llt                   167192  7 gab

#  lsmod | grep vx 
vxodm                 206503  1
vxfen                 306607  1
gab                   283802  6 vxfen
vxspec                  3174  4
vxio                 3294713  9 vxspec
vxdmp                 381042  72 vxspec,vxio
vxportal                5855  0
fdd                    53986  2 vxodm
vxfs                 3004314  630 vxportal,fdd

#  service --status-all | grep vx 
USAGE: OVTrcSrv start|stop|restart
Status of Veritas vxdbd
/opt/VRTSdbed/common/bin/vxdbd ping SUCCESS
vxodm is running...
in.vxrsyncd is stopped
vxconfigd (pid 9086) is running...
vxconfigd (pid 9086) is running...
vxesd (pid 12108) is running...
vxrelocd (pid 12371 12316) is running...
vxattachd (pid 12454 12396) is running...
vxcached (pid 12551 12493) is running...
vxvvrsecdgd is stopped
vxconfigbackupd (pid 12653 12595) is running...

#  chkconfig | grep vx 
vxatd           0:off   1:off   2:on    3:on    4:on    5:on    6:off
vxdbdctrl       0:off   1:off   2:off   3:on    4:off   5:on    6:off
vxdcli          0:off   1:off   2:off   3:on    4:off   5:on    6:off
vxfen           0:off   1:off   2:off   3:on    4:on    5:on    6:off
vxfs            0:off   1:on    2:on    3:on    4:on    5:on    6:off
vxnm-vxnetd     0:off   1:off   2:on    3:on    4:on    5:on    6:off
vxodm           0:off   1:off   2:off   3:on    4:on    5:on    6:off
vxrsyncd.sh     0:off   1:off   2:on    3:on    4:on    5:on    6:off
vxvm-boot       0:off   1:on    2:on    3:on    4:on    5:on    6:off
vxvm-reconfig   0:off   1:off   2:on    3:on    4:on    5:on    6:off
vxvm-recover    0:off   1:off   2:on    3:on    4:on    5:on    6:off

VCS cluster uses service called the coordination point for membership arbitration. Coordination points provide a lock mechanism to determine which nodes get to fence off data drives from other nodes. A node must eject a peer from the coordination point before it can fence the peer from the data drive. The coordination point is usually SAN LUN, but can be a server. It's recommended having three (3) coordinator LUNs (say each one is 1G). At the time of cluster reformation, the kernel fencing module of each node races for the control of coordination LUNs. VCS prevents split brain by allowing the winner partition to fence the ejected nodes from accessing the data disks.

Example: Replacing coordinator LUNs online

SCSI-3 Persistent Reservations (SCSI-3 PR) are required for I/O fencing and resolve the issues of using SCSI reservations in VCS. SCSI-3 PR enables access for multiple nodes to a device and simultaneously blocks access for other nodes.
SCSI-3 reservations are persistent across SCSI bus resets and support multiple paths from a host to a disk. In contrast, only one host can use SCSI-2 reservations with one path. If the need arises to block access to a device because of data integrity concerns, only one host and one path remain active. The requirements for larger clusters, with multiple nodes reading and writing to storage in a controlled manner, make SCSI-2 reservations obsolete.
SCSI-3 PR uses a concept of registration and reservation. Each system registers its own "key" with a SCSI-3 device. Multiple systems registering keys form a membership and establish a reservation, typically set to "Write Exclusive Registrants Only." The WERO setting enables only registered systems to perform write operations. For a given disk, only one reservation can exist amidst numerous registrations.
With SCSI-3 PR technology, blocking write access is as simple as removing a registration from a device. Only registered members can "eject" the registration of another member. A member wishing to eject another member issues a "preempt and abort" command. Ejecting a node is final and atomic; an ejected node cannot eject another node. In VCS, a node registers the same key for all paths to the device. A single preempt and abort command ejects a node from all paths to the storage device.

Prepare disk for Veritas:

/etc/vx/bin/vxdisksetup -i $DISK format=cdsdisk

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
sda          auto            -            -            LVM
sdaw         auto:LVM        -            -            LVM
sdax         auto:cdsdisk    -            -            online thinrclm  new coordinator LUN
sday         auto:cdsdisk    -            -            online thinrclm  new coordinator LUN
sdaz         auto:cdsdisk    -            -            online thinrclm  new coordinator LUN
sdb          auto:cdsdisk    sdb          dg02         online shared
sdba         auto:cdsdisk    -            -            online thinrclm
sdbb         auto:cdsdisk    -            -            online thinrclm
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            (dgcoord)    online  to be replaced 
sde          auto:cdsdisk    -            (dgcoord)    online  to be replaced 
sdf          auto:cdsdisk    -            (dgcoord)    online  to be replaced 

# cat /etc/vxfendg
dgcoord

#  vxdg -tfC import `cat /etc/vxfendg` 

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
sda          auto            -            -            LVM
sdaw         auto:LVM        -            -            LVM
sdax         auto:cdsdisk    -            -            online thinrclm
sday         auto:cdsdisk    -            -            online thinrclm
sdaz         auto:cdsdisk    -            -            online thinrclm
sdb          auto:cdsdisk    sdb          dg02         online shared
sdba         auto:cdsdisk    -            -            online thinrclm
sdbb         auto:cdsdisk    -            -            online thinrclm
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    sdd          dgcoord      online
sde          auto:cdsdisk    sde          dgcoord      online
sdf          auto:cdsdisk    sdf          dgcoord      online

# vxdg -g dgcoord set coordinator=off

# vxdiskadm    ** Use vxdiskadm to remove sdd and sde ** 

# vxdiskadm

Volume Manager Support Operations
Menu:: VolumeManager/Disk

 1      Add or initialize one or more disks
 2      Encapsulate one or more disks
 3      Remove a disk   <------------------ remove disk 
 4      Remove a disk for replacement
 5      Replace a failed or removed disk
 6      Mirror volumes on a disk
 7      Move volumes from a disk
 8      Enable access to (import) a disk group
 9      Remove access to (deport) a disk group
 10     Enable (online) a disk device
 11     Disable (offline) a disk device
 12     Mark a disk as a spare for a disk group
 13     Turn off the spare flag on a disk
 14     Unrelocate subdisks back to a disk
 15     Exclude a disk from hot-relocation use
 16     Make a disk available for hot-relocation use
 17     Prevent multipathing/Suppress devices from VxVM's view
 18     Allow multipathing/Unsuppress devices from VxVM's view
 19     List currently suppressed/non-multipathed devices
 20     Change the disk naming scheme
 21     Change/Display the default disk layouts
 22     Mark a disk as allocator-reserved for a disk group
 23     Turn off the allocator-reserved flag on a disk
 list   List disk information

 ?      Display help about menu
 ??     Display help about the menuing system
 q      Exit from menus

Select an operation to perform: <--- Select 2, follow instructions ...

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
sda          auto            -            -            LVM
sdaw         auto:LVM        -            -            LVM
sdax         auto:cdsdisk    -            -            online thinrclm
sday         auto:cdsdisk    -            -            online thinrclm
sdaz         auto:cdsdisk    -            -            online thinrclm
sdb          auto:cdsdisk    sdb          dg02         online shared
sdba         auto:cdsdisk    -            -            online thinrclm
sdbb         auto:cdsdisk    -            -            online thinrclm
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online
sde          auto:cdsdisk    -            -            online
sdf          auto:cdsdisk    sdf          dgcoord      online

#  vxdg -g dgcoord adddisk sdax sday     *** add new LUNS *** 

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
sda          auto            -            -            LVM
sdaw         auto:LVM        -            -            LVM
sdax         auto:cdsdisk    sdax         dgcoord      online thinrclm
sday         auto:cdsdisk    sday         dgcoord      online thinrclm
sdaz         auto:cdsdisk    -            -            online thinrclm
sdb          auto:cdsdisk    sdb          dg02         online shared
sdba         auto:cdsdisk    -            -            online thinrclm
sdbb         auto:cdsdisk    -            -            online thinrclm
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online
sde          auto:cdsdisk    -            -            online
sdf          auto:cdsdisk    sdf          dgcoord      online

#  vxdiskadm        **** remove sdf **** 

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
sda          auto            -            -            LVM
sdaw         auto:LVM        -            -            LVM
sdax         auto:cdsdisk    sdax         dgcoord      online thinrclm
sday         auto:cdsdisk    sday         dgcoord      online thinrclm
sdaz         auto:cdsdisk    -            -            online thinrclm
sdb          auto:cdsdisk    sdb          dg02         online shared
sdba         auto:cdsdisk    -            -            online thinrclm
sdbb         auto:cdsdisk    -            -            online thinrclm
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online
sde          auto:cdsdisk    -            -            online
sdf          auto:cdsdisk    -            -            online

#  vxdg -g dgcoord adddisk sdaz     *** add new LUN *** 

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
sda          auto            -            -            LVM
sdaw         auto:LVM        -            -            LVM
sdax         auto:cdsdisk    sdax         dgcoord      online thinrclm
sday         auto:cdsdisk    sday         dgcoord      online thinrclm
sdaz         auto:cdsdisk    sdaz         dgcoord      online thinrclm
sdb          auto:cdsdisk    sdb          dg02         online shared
sdba         auto:cdsdisk    -            -            online thinrclm
sdbb         auto:cdsdisk    -            -            online thinrclm
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online
sde          auto:cdsdisk    -            -            online
sdf          auto:cdsdisk    -            -            online

#  vxdg -g dgcoord set coordinator=on  

#  vxdg deport dgcoord  

#  vxdg -t import dgcoord  

#  vxdg deport dgcoord  

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
sda          auto            -            -            LVM
sdaw         auto:LVM        -            -            LVM
sdax         auto:cdsdisk    -            (dgcoord)    online thinrclm
sday         auto:cdsdisk    -            (dgcoord)    online thinrclm
sdaz         auto:cdsdisk    -            (dgcoord)    online thinrclm
sdb          auto:cdsdisk    sdb          dg02         online shared
sdba         auto:cdsdisk    -            -            online thinrclm
sdbb         auto:cdsdisk    -            -            online thinrclm
sdc          auto:cdsdisk    -            -            online
sdd          auto:cdsdisk    -            -            online
sde          auto:cdsdisk    -            -            online
sdf          auto:cdsdisk    -            -            online

Comparison of some terminologies between VCS and SGLX

VCS	SGLX	Description
Service group	Package	Collection of all hardware/software resources together as single unit
Resource agent	Modules	Way to manage (start/stop/monitor) hardware/software resources via SGLX package or VCS service groups In VCS, you can include resources in service group and use corresponding resource agent for monitoring (like IP resource agent, filesystem resource agent, mount resource agent). In Serviceguard, you can also include resources in a package by including corresponding resource module (modules of volume group, IP, filesystem). These modules only start and stop the resources. For monitoring the resources, you can configure generic resources in the SGLX package. To configure a generic resource, a monitoring script needs to be written which can monitor the resource and set the status of generic resource.
Agent	Toolkit	A framework to manage (start/stop/monitor) a specific application. VCS agents are processes that manage resources. VCS has one resource agent per resource type. A single agent manages all resources of that type. The agent starts/stops the resource and periodically monitors the resource, and updates the VCS engine with the resource status. The VCS agents are: Bundled agents (provided by VCS, include agents for disk, IP, mount, etc) Enterprise agents (control 3rd party applications, liek Oracle) Custom agents (developed for apps not supported by previous two) SGLX toolkits enable integration of apps with SGLX and also starts/monitors/stops the application. SGLX provides separate toolkits for supported applications (Oracle, NFS, Apache, MySQL, Samba) to create application specific SG package.
Disk fencing	Lock LUN	One of the cluster membership arbitration mechanisms
Coordination point server	Quorum Server	One of the cluster membership arbitration mechanisms
Failover service groups	Failover package	A packaged single unit which contains software/hardware resources and runs on one node at a time
Parallel service groups	Multi node package	A service group/package that can run on multiple nodes at a time
hastart	cmruncl	Command to start cluster
hastop	cmhaltcl	Command to stop cluster
hagrp -online <service-group> -sys <node>	cmrunpkg -n <node> pkgname	Command to start service_group or package
hagrp -offline <service-group> -sys <node>	cmhaltpkg -n <node> pkgname	Command to stop cluster
hastatus -sum	cmvielcl	Command to view cluster status
haconf -makerw ; hagrp -add <service_group> ; haconf -dump �makero	cmmakepkg <pkg_conf_file> ; edit the file ; cmcheckconf -P pkg_conf ; cmapplyconf -P pkg_conf	To add/create VCS service group or SG package
haconf -makerw ; hagrp -modify <service_group> ; haconf -dump �makero	cmgetconf -p pkg_name > pkg.ascii ; edit file ; cmcheckconf -P pkg_conf ; cmapplyconf -P pkg_conf	To modify VCS service group or SG package
haconf -makerw ; hagrp -delete <service_group> ; haconf -dump �makero	cmdeleteconf -P pkg_name	To delete VCS service group or SG package

Example : VCS Failover

#  hagrp -list 
ClusterService          g5t0580g
ClusterService          g6t0582g
gvt1207                 g5t0580g
gvt1207                 g6t0582g

#  hagrp  -switch ClusterService -to g6t0582g 

#  hastatus -sum 

-- SYSTEM STATE
-- System               State                Frozen
A  g5t0580g             RUNNING              0
A  g6t0582g             RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State
B  ClusterService  g5t0580g             Y          N               OFFLINE
B  ClusterService  g6t0582g             Y          N               ONLINE
B  gvt1207         g5t0580g             Y          N               ONLINE
B  gvt1207         g6t0582g             Y          N               OFFLINE

#  hagrp  -switch gvt1207 -to g6t0582g 

#  hastatus -sum 

-- SYSTEM STATE
-- System               State                Frozen
A  g5t0580g             RUNNING              0
A  g6t0582g             RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State
B  ClusterService  g5t0580g             Y          N               OFFLINE
B  ClusterService  g6t0582g             Y          N               ONLINE
B  gvt1207         g5t0580g             Y          N               STOPPING|PARTIAL
B  gvt1207         g6t0582g             Y          N               OFFLINE

-- RESOURCES OFFLINING
-- Group           Type            Resource             System               IState
F  gvt1207         Application     app_TidalSG3         g5t0580g             W_OFFLINE_PROPAGATE

#  hastatus -sum 

-- SYSTEM STATE
-- System               State                Frozen
A  g5t0580g             RUNNING              0
A  g6t0582g             RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State
B  ClusterService  g5t0580g             Y          N               OFFLINE
B  ClusterService  g6t0582g             Y          N               ONLINE
B  gvt1207         g5t0580g             Y          N               OFFLINE
B  gvt1207         g6t0582g             Y          N               STARTING|PARTIAL

-- RESOURCES ONLINING
-- Group           Type            Resource             System               IState
E  gvt1207         Application     app_TidalSG3         g6t0582g             W_ONLINE

#  hastatus -sum 

-- SYSTEM STATE
-- System               State                Frozen
A  g5t0580g             RUNNING              0
A  g6t0582g             RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State
B  ClusterService  g5t0580g             Y          N               OFFLINE
B  ClusterService  g6t0582g             Y          N               ONLINE
B  gvt1207         g5t0580g             Y          N               OFFLINE
B  gvt1207         g6t0582g             Y          N               ONLINE

Or use Veritas Cluster Manager - Java Console, right click on service group and select Switch to ...

Example : Adding nfs module to SG package

Install NFS rpms with yum install nfs-utils
Install sg toolkit serviceguard-nfs-toolkit with yum install serviceguard-nfs-toolkit
Add nfs module to SG package
# cmmakepkg -i gvt2062.ascii.7.2 -m tkit/nfs/nfs gvt2062.ascii.7.2.nfs Package template is created. This file must be edited before it can be used.

Add NFS clients in netgroup (file /etc/netgroup) and sync it to other nodes
cmsync /etc/netgroup

Alter /etc/nsswitch.conf and have line
netgroup: files

Example of package ascii file is:
package_name gvt2062 package_description "Serviceguard Package" module_name sg/basic module_version 1 module_name sg/priority module_version 1 module_name sg/pr_cntl module_version 2 module_name sg/all module_version 2 module_name sg/failover module_version 1 module_name sg/dependency module_version 1 module_name sg/weight module_version 1 module_name sg/monitor_subnet module_version 1 module_name sg/package_ip module_version 1 module_name sg/service module_version 1 module_name sg/generic_resource module_version 1 module_name sg/volume_group module_version 1 module_name sg/filesystem module_version 1 module_name sg/pev module_version 1 module_name sg/external_pre module_version 1 module_name sg/external module_version 1 module_name sg/acp module_version 1 module_name tkit/nfs/nfs module_version 1 package_type failover node_name * auto_run yes node_fail_fast_enabled no run_script_timeout no_timeout halt_script_timeout no_timeout successor_halt_timeout no_timeout script_log_file /usr/local/cmcluster/run/log/gvt2062.log operation_sequence $SGCONF/scripts/sg/external_pre.sh operation_sequence $SGCONF/scripts/sg/pr_cntl.sh operation_sequence $SGCONF/scripts/sg/volume_group.sh operation_sequence $SGCONF/scripts/sg/filesystem.sh operation_sequence $SGCONF/scripts/sg/package_ip.sh operation_sequence $SGCONF/scripts/tkit/nfs/tkit_module.sh operation_sequence $SGCONF/scripts/sg/external.sh operation_sequence $SGCONF/scripts/sg/service.sh failover_policy configured_node failback_policy manual priority no_priority ip_subnet 16.216.168.0 ip_subnet_node g9t4711c ip_subnet_node g9t4712c ip_address 16.216.170.52 service_name gvt2062_nfs_monitor service_cmd "$SGCONF/scripts/tkit/nfs/tkit_module.sh nfs_monitor" service_restart none service_fail_fast_enabled no service_halt_timeout 300 vgchange_cmd "vgchange -a y" vg vg01 concurrent_fsck_operations 1 concurrent_mount_and_umount_operations 1 fs_mount_retry_count 0 fs_umount_retry_count 1 fs_name /dev/vg01/lvol01 fs_server "" fs_directory /opt/nfs/upp100 fs_type "ext4" fs_mount_opt "" fs_umount_opt "" fs_fsck_opt "" fs_name /dev/vg01/lvol02 fs_server "" fs_directory /opt/nfs/upp200 fs_type "ext4" fs_mount_opt "" fs_umount_opt "" fs_fsck_opt "" fs_name /dev/vg01/lvol03 fs_server "" fs_directory /opt/nfs/upp300 fs_type "ext4" fs_mount_opt "" fs_umount_opt "" fs_fsck_opt "" tkit/nfs/nfs/TKIT_DIR /usr/local/cmcluster/conf/modules/tkit/nfs tkit/nfs/nfs/XFS "-o rw @upp:/opt/nfs/upp100" tkit/nfs/nfs/XFS "-o rw @upp:/opt/nfs/upp200" tkit/nfs/nfs/XFS "-o rw @upp:/opt/nfs/upp300" tkit/nfs/nfs/QUOTA_MON yes tkit/nfs/nfs/LOCK_MIGRATION no tkit/nfs/nfs/MAINTENANCE_FLAG yes tkit/nfs/nfs/MONITOR_INTERVAL 30 tkit/nfs/nfs/RETRY_INTERVAL 2 tkit/nfs/nfs/RETRY_TIMES 0

Check and apply configuration
# cmcheckconf -v -P gvt2062.ascii.7.2.nfs # cmapplyconf -v -P gvt2062.ascii.7.2.nfs

After applying, the file hanfs.conf will be created in /usr/local/cmcluster/conf/modules/tkit/nfs/

Example : Samba in SG environment (not using samba kit)

Samba SG package configuration is:

# cmgetconf -p mapbase_itg | grep -v ^# | grep -v ^$
package_name                    mapbase_itg
package_description                     "Serviceguard Package"
module_name                     sg/basic
module_version                  1
module_name                     sg/priority
module_version                  1
module_name                     sg/pr_cntl
module_version                  2
module_name                     sg/all
module_version                  2
module_name                     sg/failover
module_version                  1
module_name                     sg/dependency
module_version                  1
module_name                     sg/weight
module_version                  1
module_name                     sg/monitor_subnet
module_version                  1
module_name                     sg/package_ip
module_version                  1
module_name                     sg/service
module_version                  1
module_name                     sg/generic_resource
module_version                  1
module_name                     sg/volume_group
module_version                  1
module_name                     sg/filesystem
module_version                  1
module_name                     sg/pev
module_version                  1
module_name                     sg/external_pre
module_version                  1
module_name                     sg/external
module_version                  1
module_name                     sg/acp
module_version                  1
package_type                    failover
node_name                       *
auto_run                        yes
node_fail_fast_enabled                  no
run_script_timeout                      no_timeout
halt_script_timeout                     no_timeout
successor_halt_timeout                  no_timeout
script_log_file                 /usr/local/cmcluster/run/log/mapbase_itg.log
operation_sequence                      $SGCONF/scripts/sg/external_pre.sh
operation_sequence                      $SGCONF/scripts/sg/pr_cntl.sh
operation_sequence                      $SGCONF/scripts/sg/volume_group.sh
operation_sequence                      $SGCONF/scripts/sg/filesystem.sh
operation_sequence                      $SGCONF/scripts/sg/package_ip.sh
operation_sequence                      $SGCONF/scripts/sg/external.sh
operation_sequence                      $SGCONF/scripts/sg/service.sh
log_level                       3
priority                        no_priority
failover_policy                 configured_node
failback_policy                 manual
ip_subnet                       16.44.128.0
ip_subnet_node                  s48t0044c
ip_subnet_node                  s48t0045c
ip_address                      16.44.128.169
service_name                    mapbase_itg_samba_monitor
service_cmd                     /usr/local/cmcluster/conf/mapbase/mapbase-mon.sh
service_restart                 none
service_fail_fast_enabled                       no
service_halt_timeout                    300
vgchange_cmd                    "vgchange -a y"
vg                      vg03
concurrent_fsck_operations                      1
concurrent_mount_and_umount_operations                  1
fs_mount_retry_count                    0
fs_umount_retry_count                   1
fs_name                 /dev/vg03/lvol01
fs_server                       ""
fs_directory                    /opt/mapbase
fs_type                 "ext4"
fs_mount_opt                    ""
fs_umount_opt                   ""
fs_fsck_opt                     ""
external_script                 /usr/local/cmcluster/conf/mapbase/mapbase-samba.sh

Monitoring script is /usr/local/cmcluster/conf/mapbase/mapbase-mon.sh

#!/bin/bash

function monitor_mapbase_command
{
        #sg_log 5 "monitoring_command"
        sleep 5
        smbd_pid=`cat /var/run/samba/mapbase/smbd-smb.conf.mapbase.pid`
        nmbd_pid=`cat /var/run/samba/mapbase/nmbd-smb.conf.mapbase.pid`
                while :; do
                sleep 300
                if [ -f /proc/$smbd_pid/stat ] && [ -f /proc/$nmbd_pid/stat ]
                then
                       continue
                else
                        break
                fi
                return 0
        done
        exit 1
}
monitor_mapbase_command

The external script (used by SGLX to start/stop samba) is:

#  cat /usr/local/cmcluster/conf/mapbase/mapbase-samba.sh | grep -v ^# 

if [[ -z $SG_UTILS ]]
then
    . /etc/cmcluster.conf
    SG_UTILS=$SGCONF/scripts/mscripts/utils.sh
fi

if [[ -f ${SG_UTILS} ]]; then
    . ${SG_UTILS}
    if (( $? != 0 ))
    then
        echo "ERROR: Unable to source package utility functions file: ${SG_UTILS}"
        exit 1
    fi
else
    echo "ERROR: Unable to find package utility functions file: ${SG_UTILS}"
    exit 1
fi

sg_source_pkg_env $*

function validate_command
{

    # Output messages will only be displayed in STDOUT, if there is
    # any error condition while executing the master control script
    # with a "validate" parameter from cmcheckconf/cmapplyconf command
    sg_log 5 "validate_command"
    # ADD your package validation steps here
    return 0
}

function start_command
{
    sg_log 5 "start_command"
    /usr/sbin/smbd -D -s /etc/samba/smb.conf.mapbase
    /usr/sbin/nmbd -D -s /etc/samba/smb.conf.mapbase
    # ADD your package start steps here
    return 0
}

function stop_command
{
    sg_log 5 "stop_command"
    /bin/kill `cat /var/run/samba/mapbase/smbd-smb.conf.mapbase.pid`
    /bin/kill `cat /var/run/samba/mapbase/nmbd-smb.conf.mapbase.pid`
    # ADD your package halt steps here
    return 0
}

sg_log 5 "customer defined script"

typeset -i exit_val=0

case ${1} in
     start)
           start_command $*
           exit_val=$?
           ;;

     stop)
           stop_command $*
           exit_val=$?
           ;;

     validate)
           validate_command $*
           exit_val=$?
           ;;
     *)
           sg_log 0 "INFO: Unknown operation: $1"
           ;;
esac

exit $exit_val

See running processes:

# ps -ef | grep smb
root      8953     1  0 02:09 ?        00:00:00 /usr/sbin/smbd -D -s /etc/samba/smb.conf.mapbase
root      8955     1  0 02:09 ?        00:00:00 /usr/sbin/nmbd -D -s /etc/samba/smb.conf.mapbase
root      8963  8953  0 02:10 ?        00:00:00 /usr/sbin/smbd -D -s /etc/samba/smb.conf.mapbase
root     10127 17672  0 02:10 pts/1    00:00:00 grep smb

See samba configuration file:

#  cat /etc/samba/smb.conf.mapbase 

[global]
workgroup        = AMERICAS
security         = ADS
realm            = AMERICAS.DOMAIN.COM

# use short name
netbios name     = mapbaseitg
interfaces = 16.44.128.169/22 16.44.128.169/255.255.252.0 127.0.0.0/8
#interfaces = 16.44.128.169

bind interfaces only = yes
encrypt passwords = yes
domain master         = no
os level              = 10

load printers = no
guest ok = Yes
create mask = 0644
kernel oplocks = yes
unix extensions = no

client lanman auth    = no
client ntlmv2 auth    = yes
client plaintext auth = no
client schannel       = auto
client signing        = auto
client use spnego     = yes


log file         = /var/log/samba/mapbase/logs/log.%m
pid directory    = /var/run/samba/mapbase
lock directory   = /var/run/samba/mapbase/locks
private dir      = /var/log/samba/mapbase/private

[testers]
  comment = mapbase share
  path = /opt/mapbase
  browseable = yes
  valid users = EMEA\name1 AMERICAS\name2 
  read only = no
  create mode = 0666

Make sure the private directory is on shared LUN, so after failover samba virtual IP doesn't need to re-authorize.
Samba is using lightweight database called Trivial Database (tdb) to store persistent and transient data (in private directory).
Some tdb files are removed before restarting Samba, but others are used to store information that is vital to Samba behavior.

# ls -la /var/log/samba/mapbase/private
lrwxrwxrwx 1 root sys 27 Jul 11 02:06 /var/log/samba/mapbase/private -> /opt/mapbase/.samba/private

# ls -la /opt/mapbase/.samba/private
total 88
drwxr-xr-x 2 root sys   4096 Jul 10 22:51 .
drwxr-xr-x 3 root sys   4096 Jul 10 22:50 ..
-rw------- 1 root root 36864 Jul 10 22:26 passdb.tdb  
-rw------- 1 root root 45056 Jul 10 22:26 secrets.tdb

See: https://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/tdb.html
passdb.tdb - This stores the Samba SAM account information when using a tdbsam password backend.
secrets.tdb - This tdb file stores machine and the domain SID, secret passwords that are used with LDAP, the machine secret token, etc. This is an essential file that is stored in a secure area.

To checking config file
# testparm /etc/samba/smb.conf.mapbase Load smb config files from /etc/samba/smb.conf.mapbase rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384) Processing section "[testers]" Loaded services file OK. Server role: ROLE_DOMAIN_MEMBER Press enter to see a dump of your service definitions [global] workgroup = AMERICAS realm = AMERICAS.DOMAIN.COM netbios name = MAPBASE_ITG.IRL.DOMAIN.COM interfaces = 16.44.128.169 bind interfaces only = Yes security = ADS private dir = /var/log/samba/mapbase/private client NTLMv2 auth = Yes log file = /var/log/samba/mapbase/logs/log.%m os level = 10 domain master = No lock directory = /var/run/samba/mapbase/locks pid directory = /var/run/samba/mapbase [testers] comment = mapbase share path = /opt/mapbase valid users = EMEA\name1, AMERICAS\name2 read only = No create mask = 0666

Joining to Domain

In case of cluster you must have samba package running, since private folder is on shared LUN
Run : net ads join -W "IT Servers\NGDC CIFS" -U account -s /etc/samba/smb.conf.gradebook
Commands to test
- net ads info -s /etc/samba/smb.conf.gradebook
- net ads status -s /etc/samba/smb.conf.gradebook
- net ads lookup -s /etc/samba/smb.conf.gradebook

Back to the main page