================================================================================
http://ftp.support.veritas.com/pub/support/products/ClusterServer_UNIX/283995.pdf
http://seer.entsupport.symantec.com/docs #docs
================================================================================
Install logs: /opt/VRTS/install/logs
================================================================================
GCO port - 14155, open between any/all systems that have GCO SG
ClusterService must run (and must have this name - keyword, don't change its SG name) for GCO to work.
================================================================================
hagrp -switch globalgroup-sg -any -clus remote-clus-name
================================================================================
import shared diskgroups:
vxdg -s import DG
to make them NON shared
vxdg -Cf import DF
================================================================================
Samples:
FILE=/opt/VRTSvcs/bin/sample_triggers/[postonline,preonline,preoffline,postoffline]
cp /opt/VRTSvcs/bin/sample_triggers/FILE /opt/VRTSvcs/bin/triggers
chmod 755 /opt/VRTSvcs/bin/triggers/FILE
# run the trigger to test
hatrigger -postonline 0 slkprdm21 dbms-dm-g-sg
================================================================================
/opt/VRTSvcs/wizards/bin/nfs
/opt/VRTSvcs/wizards/bin/applications
================================================================================
#
#print out what is online, versus parsing the hastatus -sum
#
hagrp -display | grep ONLINE | sed -e 's/|//g' | awk '{print $1":"$3":"$4}'
#
# Get a group list from the main.cf
#
sed -n '/^group aqrydb01/,/^group/p' /etc/*vcs/conf/config/main.cf \
| grep -v '//' | sed '$d'
================================================================================
How to Configure the VERITAS Cluster Server GUI to work through a Web
Details:
The VERITAS Cluster Server (VCS) Web-based graphical user interface enables
you to monitor the cluster and perform basic cluster administration,
including many of the same operations as the command-line interface and VCS
Java Console. Below is a sample configuration:
Sample Configuration
The following sample illustrates a typical configuration of the
ClusterService service group.
ClusterService (
SystemList = { vcssun5, vcssun6 }
AutoStartList = { vcssun5, vcssun6 }
OnlineRetryLimit = 3
)
IP webip (
Address = "162.39.9.85"
NetMask = "255.255.255.0"
Device = "hme0"
)
Process VRTSweb (
PathName = "/opt/VRTSvcs/bin/haweb"
Arguments = "172.29.9.108 8181"
)
VRTSweb requires webip
To log into this cluster using a Web browser, you would need to point the web
browser at the IP address: 8181
Connecting to Cluster Manager (Web Console)
This graphical user interface works from a Web server on a single system in
the cluster. This ensures high availability for Cluster Manager (Web
Console). When a system is taken offline, the Web server fails over to
another system, ensuring continuous access. Cluster Manager (Web Console) can
be accessed with the URL http://IP_alias:8181/vcs The variable IP_alias is
the virtual IP address configured in the ClusterService service group and
8181 is the default VERITAS Web port. You must use a valid VCS user name and
password to log on to the cluster.
Refer to the VCS 2.0 user guide for more information.
================================================================================
cd /opt/VRTSvcs/install; ./installvcs sys1 sys2 -usessh -configure
installvcs -configfile #unattended install
installvcs -installonly #do not config
installvcs -configonly #do not install (previously installed)
================================================================================
LLT links can be added or removed while clients are connected. Shutting down
GAB or the high-availability daemon, HAD, is not required.
To add a link
lltconfig -d device -t tag
To remove a link
lltconfig -u tag
Changes take effect immediately and are lost on the next reboot. For changes
to span reboots you must also update /etc/llttab.
Note: LLT clients do not recognize the difference unless only one link is
available and GAB declares jeopardy.
================================================================================
After a system or private network card is replaced in a VERITAS Cluster
Server (VCS) cluster, GAB may refuse to start. One of the following messages
may appear on the console or in /var/adm/messages:
GAB: Port a registration waiting for seed port membership
or
GAB: Port a closed
To avoid this problem, enter the following command on all remaining systems
in the cluster, preferably after the old system or network card has been
removed, and before the new system or network card is introduced:
# /sbin/lltconfig -a flush
If the problem has already occurred, shut down VCS by entering the following
commands on any system in the cluster:
# haconf -dump -makero
# /opt/VRTSvcs/bin/hastop -all -force
Next, shut down GAB by entering the following command on each system in the
cluster:
# /sbin/gabconfig -u
Run the following commands on each system:
# lltconfig -a flush
# sh -x /etc/rc2.d/S92gab start
# hastart -force
The lltconfig -a flush command flushes the ARP-like cache of private network
MAC addresses contained within LLT. LLT can now learn the MAC address of the
new system or network card. See lltconfig(1M) for more information regarding
other -a options.
Note: You must use the -force option with the command hastart when restarting
VCS.
================================================================================
VCS on windows
net stop vcscomm
hastop
net stop gab
net stop llt
net start llt
net start gab
hastart
================================================================================
#Start VCS with only one node
on singlenode/onenode: gabconfig -c -x
Start as a single node cluster:
hastart -onenode
Permanent - in startup script (note: AIX has an automatic check for this and will start on a single node if the main.cf has only one system listed)
HASTART="/opt/VRTSvcs/bin/hastart -onenode"
================================================================================
VXFS: mincache=dsync,convosync=dsync,largefiles
================================================================================
hacf -verify /etc/VRTSvcs/conf/config
================================================================================
NEW: http://WEB-VIRTUAL:8181/cmc/
NEW: https://WEB-VIRTUAL:8443/cmc/
OLD: http://WEB-VIRTUAL:8181/vcs/index
OLD: https://WEB-VIRTUAL:8443/vcs/index
================================================================================
Combine many little cluster into one large cluster.
Combine HB's into similar VLANs (HB1 and HB2). Do so without incurring cluster outages: either shutdown the clusters OR do a single HB network at a time until things get back into sync before doing the next.
ssh keys or rsh enable all nodes to all nodes
main.cf - combine and copy to all nodes (make sure to verify it). all objects must have unique names
select one ClusterService IP for the entire cluster
select a cluster name that encompasses all sub-clusters
llttab - same cluster ID in all files
llthosts - all hosts listed w/ consistent index numbers
have DNS entries fo other ClusterService IPs deleted, have the "living" one renamed to ${clustername}-adm (or something generic)
================================================================================
if auto-disable
on host where it is auto-disabled (or what last had it, or where you want it to start (as long as it is not seen on another node - avoid split brain))
hagrp -autoenable GROUP -sys HOST
================================================================================
LOGS: /var/VRTSvcs/log/
engine_[A,B].log
hashadow_[A,B].log
AGENTNAME_[A,B].log
/var/adm/streams (GAB, LLT, HB messages)
================================================================================
Agents: /etc/VRTSvcs/conf/AGENTNAME/[online,offline,monitor]
haagent -stop $AGENT -sys `hostname`
haagent -start $AGENT -sys `hostname`
haagent -stop SRDF -sys `hostname`
ps -ef | grep SRDF
haagent -start SRDF -sys `hostname`
(The "open" script runs in /opt/*vcs/bin/SRDF and creates a lock file for
each SRDF disk group that "learns" the type of SRDF it is dealing with.)
ps -ef | grep SRDF
see agent AND perl script (until perl finishes - runs the open)
root 524330 483830 0 18:14:17 pts/14 0:00 grep SRDF
root 536808 1 0 18:12:42 - 0:00
/opt/VRTSvcs/bin/SRDF/SRDFAgent -type SRDF
root 909540 536808 0 18:13:11 - 0:00 /opt/VRTSperl/bin/perl -I
/opt/VRTSvcs/bin/SRDF/../../lib -S /opt/VRTSvcs/bin/SRDF/online dbms-dm-srdf
/usr/symcli slkprdm_db 2 0 0
================================================================================
perl debug of an Agent
ls -la /opt/VRTSvcs/bin/SRDF
#look for .*lock* files
hares -display dbms-tn-srdf
#
# the arguments are from the hares display above
#
perl -d -I /opt/VRTSvcs/lib ./open dbms-tn-srdf /usr/symcli slkprtn_db 2 0 0 ""
>n
>l
>c LINENUM
>p $var
>p @array
>q
/usr/symcli/bin/symdev -dynamic list
================================================================================
symdg list
symrdf -g $RDFGRP [query,split,failover,failback)
VCS lock file for SRDF - /opt/VRTSvcs/bin/SRDF/.VCS_SRDF_$resource
read on pages: 11,34,35 of SRDF agent
R2->R1 update back:
hares -action srdf_res_name update -sys dr-sys-that-owns-disks-now
R1s become UPDATED when done
================================================================================
hacf -cftocmd /etc/*vcs/conf/config -dest /tmp
more /tmp/main.cmd
================================================================================
hagrp -clearadminwait -fault GROUP -sys SYSTEM
================================================================================
#Do ping on LLT
#setup PING server
system1# /opt/VRTSllt/getmac /dev/hme:0
/dev/hme:0 00.AA.00.A2.63.B1
system1# /opt/VRTSllt/dlpiping -vs /dev/hme:0 &
#setup PING Client
system2# /opt/VRTSllt/dlpiping -vc /dev/hme:0 00.AA.00.A2.63.B1
00.AA.00.A2.63.B1 is alive
/opt/VRTSllt/lltdump -f /dev/qfe:0 -V -A -R
/opt/VRTSllt/lltshow -n 0 | more
system2# /opt/VRTSgab/getcomms #saves config stuff in /tmp
================================================================================
Things to do to make a node a VCS cluster member:
Do NOT allow "Stop-A" or "breaks" and then a "boot" on any VCS nodes
#update the password
haconf -makerw
hauser -update admin
haconf -dump -makero
- get a service name in DNS
- install 2nd & 3rd NIC for heartbeat and connect to HB VLANs
- eeprom "local-mac-address?=true"
- umount any local mounts to go under VCS, then remove their entries from /etc/vfstab
- stop any volumes to go under VCS
- deport any diskgroups to go under VCS
- get a veritas VCS license and any agent licenses
- create any accounts and propogate if they are not in NIS
- make sure LUN and zones are setup for shared disks
- VxFS 3.4 needs patch02
- VxFS needs an fsck before you mount it under VCS
more /etc/llthosts
0 munich
1 bern
2 rome
more /etc/llttab
#
# Sample /etc/llttab
#
# -- comment lines begin with pound, NO comments at end of line
# -- order of items is important, some must precede others
#
#
# the minimum required:
# -- set the node ID number
# -- link a network interface
# for verbose messages from lltconfig, add this line first in llttab
set-verbose 1
# set the node ID for this machine. may be a number, name, or filename.
# -- number is taken literally as a node ID
# -- a name is translated to a node ID via /etc/llthosts
# -- a filename will take the first word in the file and translate it
# via /etc/llthosts to a node ID
#
# the node ID must be in the range 0..(max-1), where max is defined in
# /kernel/drv/llt.conf as "nodes=max"
#
#set-node 1
# or
#set-node system1
# or
set-node /etc/nodename
# if multiple distinct and separate clusters are for some
# reason sharing to the same network, then each must have
# a unique cluster ID or the node IDs will conflict.
# the default cluster ID is 0
# (alternatively each cluster could use a uniqe SAP)
#
#set-cluster 44
set-cluster 140
# link a network interface below LLT
# format: link tag-name device-name:device-unit node-range link-type SAP
# MTU
# [IP-address broadcast-address]
#
# -- tag-name
# a symbolic name used to reference this link in set-addr
# commands and lltstat output
# -- device-name:device-unit
# the DLPI STREAMS device for the LAN interface, and the unit number
# on that device, -or- the device /dev/udp for UDP/IP links
# -- node-range
# the range of nodes that should process this command. a dash '-'
# is the default for "all nodes". this is useful to use the same file
# on multiple nodes which have differing hardware.
# -- link-type
# the type of network. currently supported values: ether udp
# -- SAP (port)
# the SAP used to bind to the network link, or the UDP port if the
# link-type is udp. a dash '-' is the default. NOTE: each UDP link
# is required to use a unique port.
# -- MTU
# the maximum transmission size for packets on the network link.
# a dash '-' is the default.
# -- IP-address
# for links of link-type udp only. The IP address of this node on
# this link. Must be specified here since all UDP is funnelled through
# /dev/udp and there is no way to individually query like for DLPI
# links. Format is either a.b.c.d dot-notation, a hostname to lookup
# with gethostbyname(), or a filename from which the first word will
# be read and looked up with gethostbyname().
# -- broadcast-address
# for links of link-type udp only. The broadcast address of this
# node on this link. Format is a.b.c.d dot-notation.
#
#
# Solaris example
#link qfe0 /dev/qfe:0 - ether - -
link hme0 /dev/hme:0 - ether - -
link-lowpri ge0 /dev/ge:0 - ether - -
#
#
# UDP/IP example
#link lan1 /dev/udp - udp 0x2345 - 192.1.2.3 192.1.2.255
# link a network interface below LLT and send heartbeats over it,
# and send data over it as a last resort
#
#link-lowpri le0 /dev/le:0 - ether - -
# option to set the range of valid node IDs in the cluster
# the default is all nodes included (0-31)
# (see /kernel/drv/llt.conf "nodes=nnn" for the maximum)
#
#include 0-31
#
# the following will cause only nodes 0-7 to
# be valid for cluster participation
#
#exclude 8-31
# MAC addresses are set here manually for networks
# that do not support broadcasting
#
# format: set-addr node-id tag-name address
#
# set address for node 2 on link le0
#set-addr 2 le0 01:02:03:01:02:03
#
# set address for node 2 on link lan0
#set-addr 2 lan0 01:02:03:01:02:03
#
# set address for node 2 on link lan1 (link-type udp)
#set-addr 2 lan1 192.1.2.4
# set the warning level for console LLT kernel warnings
# level 0 is no warnings. the default level is 20.
#set-warn 0
# uncomment these two commands for networks
# that do not support broadcasting
#
# this one disables broadcasts that dynamically learn MAC addresses
#set-arp 0
#
# this one disables using broadcasts for heartbeats
#set-bcasthb 0
#
# options to modify the LLT timer intervals (in 1/100 second)
#
# send a heartbeat 2 times per second
#
#set-timer heartbeat:50
#
# send a heartbeat 1 time per second (for link-lowpri links)
#
#set-timer heartbeatlo:100
#
# mark a link to a peer down after 16 sec of missed heartbeats
# (peerinact must be larger than either heartbeat timer)
#set-timer peerinact:1600
#
# other timers
#
#set-timer oos:10
#set-timer retrans:10
#set-timer service:100
#set-timer arp:30000
# options to set the LLT flow control threshholds (in packets)
#
#set-flow lowater:40
#set-flow hiwater:80
#set-flow window:60
#
# NOTE: the "start" command is obsolete and ignored
# it's functionality is now implicit at the end-of-file
#
# required to start the LLT protocol
#
start
================================================================================
So you don't have to copy the main.cf around: (doesn't work in 5.X and newer)
source node:
hastart
all other nodes
hastart -stale
================================================================================
#Start VCS with only one node
on the one node: gabconfig -c -x
================================================================================
more /etc/gabtab
/sbin/gabconfig -c -n 3
================================================================================
vcsconfig
cfscluster config
cfscluster status
vcsmmdebug
vcsmm.conf
================================================================================
on unix host
hagui
package: load VRTSjre => VRTScscm
================================================================================
haconf -makerw
hauser -add guest
Enter New Password:
Enter Again:
User added with guest privileges.
To assign Administrators/Operators privileges at cluster or group level
use the following commands:
hauser -add dba -priv Operators
OR
hauser -add dba -priv Administrators
hauser -update Administrators|Operators -add
hagrp -modify Administrators/Operators -add
haconf -dump -makero
vxdctl -c mode #see who is master of the cfs
================================================================================
OS Setup
Assume an OS w/ basic environment. Have licenses from Veritas for correct version and product.
Load SFORARAC package (rsh enabled before) on all nodes in cluster (run from one, pushes to all)
Load maintenance pack MPX (rsh enabled before) on all nodes (run from one, pushes to all)
Do a configure CVM and CFS Manually on just one node:
#/opt/VRTSvxfs/cfs/bin/cfscluster config
accept default timeout of 200
*** answer N to starting CVM ***
(IF ORACLE RAC) Update the /kernel/drv/vcsmm.conf file on all nodes, add the line (on continuous line not wrapped).
#vi /kernel/drv/vcsmm.conf
name="vcsmm" parent="pseudo" \ slave_members=8192 instance=0;
Setup Coordinator Disks - fencing disks
Must be at least 3 drives, odd-number of them (3,5,7,...)
Standard disks
Make them the smallest disk size allowed (save disk space)
Mirror these disks
Do NOT write data to these disks
Must be in a separate diskgroup
Must have SCSI-3 persistent reservation enabled on all
All must be seen by all cluster nodes
Verify that ALL disks have passed SCSI-3 PR testing.
#vxfenadm -G all -f /etc/vxfentab
If they fail testing, then
#/opt/VRTSvcs/vxfen/bin/vxfentsthdw
From the primary cluster node, create a disk group called vxcoorddg. This group must contain odd number of coordinator LUNs and a minimum of 3 disks/LUNs is needed.
Deport the disk group
# vxdg deport vxcoorddg
Import the disk group with the -t option so that it is not automatically imported when the systems are rebooted.
# vxdg -t import vxcoorddg
# vxdg -o groupreserve -o clearreserve -t import vxcoorddg
Deport the disk group. Deporting the disk group prevents the coordinator disks from being used for other purposes.
# vxdg deport vxcoorddg
Enter the following commands on each cluster node.
# echo vxcoorddg > /etc/vxfendg
# cat /etc/vxfendg
Edit /etc/VRTSvcs/conf/config/main.cf
add:
UseFence=SCSI3
default:
UseFence=none
haclus -value UseFence #list value in cluster
Start the LLT, GAB, VXFEN, VCSMM, LMX, and ODM drivers.
Start LLT
# cd /etc
# cd init.d
#/etc/init.d/llt.rc start
Starting LLT ...
Starting LLT done.
Start GAB
# /etc/init.d/gab start
Starting GAB ... Starting GAB done.
Start I/O fencing
# /etc/init.d/vxfen start
Starting vxfen..
Starting vxfen..
Done VCS FEN vxfenconfig
# NOTICE Driver will use SCSI-3 compliant disks.
(IF ORACLE RAC) Start the VCSMM driver
# ./vcsmm start
(IF ORACLE RAC) Start the LMX driver
# ./lmx start
STARTING LMX
(IF ORACLE) Start ODM
# umount /dev/odm
# ./odm start
Use gabconfig -a to check the status of the drivers. Ports a, b, d, and o should show membership on node 0
# gabconfig a
# GAB Port Memberships
==========================================
Port a gen 4a1c0001 membership 01
Port b gen g8ty0002 membership 01
Port d gen 40100001 membership 01
Port f gen f1990002 membership 01
Port h gen d8850002 membership 01
Port o gen f1100002 membership 01
Port q gen 28d10002 membership 01
Port v gen 1fc60002 membership 01
Port w gen 15ba0002 membership 01
The output of the gabconfig -a command displays which cluster systems have membership with the modules that have been installed and configured thus far in the installation. The first line indicates that each system (0,1,n..) has membership with the GAB utility, which uses Port a. The ports listed, including port a, are configured for the following functions:
==========================================
Port Function
a GAB
b I/O fencing
d ODM (Oracle Disk Manager)
f CFS (Cluster File System)
h VCS (high availability daemon)
o VCSMM driver
q QuickLog daemon
v CVM (Cluster Volume Manager)
w vxconfigd (module for cvm)
u CVM membership joining
==========================================
================================================================================
All nodes fenced off (partition-in-time)
Node 1 fails first, Node 0 fails before Node 1 is online, Node 1 is repaired and boots while node 0 is down, Node 1 cannot access coordinator disk because node 0's keys are on disks.
Verify node 0 is down
node 1: vxfenadm -g all -f /etc/vxfentab
see keys
node 1: /opt/VRTSvcs/rac/bin/vxfenclearpre
repair the faulted system
reboot all nodes in cluster
================================================================================
Symptom:
What is AutoDisable and how did the service groups become that way?
Solution:
VERITAS has made extensive modification to the features and behaviors of the newest release of VERITAS Cluster Server (VCS) for Unix, version 1.3.0. Most notable is the change to the group state "auto-disable." This is an examination of what auto-disable is, when it will be used and how to deal with it.
Briefly, auto-disable is a flag set by the VCS engine on a service group that will prevent that group from becoming on-line. Auto-disable is used as a way to prevent data corruption from occurring by avoiding a situation called split-brain. Split-brain is where data is being updated by two or more hosts simultaneously. With normal means of accessing storage through filesystems or volume management which is not cluster aware, this can very easily cause data to become corrupt. This could be the Volume Manager diskgroup metadata or filesystem metadata or regular data files or even database data files. Here is a simple example of how easy this state is to produce: Stop VCS on all nodes with the force option; this will leave the services running. Next start VCS on one node and online a service group and the data is instantly accessible to both hosts and can become invalid!
In VCS 1.0 through 1.1.x, auto-disable would only occur if VCS had failed on a system while a group was running, or if a system running the group left the cluster with an unreliable communication channel. This was very easy to produce; simply rebooting the node running the group would cause auto-disable to happen. In VCS 1.3.0, auto-disable will occur in any of the following conditions: the VCS engine (had) is not running, all the resources in the service group are not probed or the only functioning link from a node is a disk heartbeat. This means that if "hastop -local" is executed on a system, service groups which have that system in their system list will be auto-disabled. This does not apply to systems that have been powered off under VCS 1.3.0 as it would in VCS 1.1.2, due to the additional changes to reboot behavior.
Auto-disabling when the resources are not probed happens because VCS would not be able to determine the initial state of the resource when VCS is first started. This generally indicates a wide-spread configuration or system problem. If the system has a disk-based heartbeat and all other heartbeat links fail (normal links and low-pri links), then VCS will mark the group as auto-disabled. In these cases, the service group does not need to be running on the node with the problem; if this type of problem occurs on any node in the group, the auto-disable flag will be set. This makes it critical that all nodes in the cluster are healthy and fully functional. It also means that administrators need to be aware of this behavior during an extended period of maintenance on one node because that will auto-disable the group on all systems.
Dealing with an auto-disable situation is simple and is the same for version 1.0 through 1.3.0. Using the command "hagrp -autoenable -sys " will clear the auto-disable flag for the listed system. Important note: The system name that should be used is that of the system which caused the auto-disable state. For instance, if there is a service group called "A" running on node 1 and it has nodes 2 and 3 in its system list, and node 3 is not running VCS for an upgrade, the command would be "hagrp -autoenable A -sys 3" to clear the autodisable. The other option is to remove the affected system from the service group's system list. To achieve this, use "hagrp -modify SystemList -delete ". For the above example, to change the system list for group A to remove node 3, the command would be "hagrp -modify A SystemList -delete 3", where node 1 has a priority of 0 and node 2 has a priority of 1.
The changes made to the auto-disable behavior in VCS 1.3.0 were put in place to increase the protection of the data when VCS does not completely know the states of the resources. It can be argued that the changes are on the conservative side, but when hundreds of gigabytes of data are being controlled by VCS, erring on the side of caution is wise
================================================================================
Check that local-mac-address? is set to true
Update the PATH and MANPATH Environment Variables for the root Unix account on
PATH=${PATH}:/usr/lib/vxvm/bin:/usr/bin/fs/vxfs:/opt/VRTSvxfs/sbin:
/opt/VRTSvcs/bin:/opt/VRTSvcs/rac/bin:/opt/VRTSob/bin:/opt/VRTSvxfs/cf
s/bin:/opt/VRTSllt
MANPATH=${MANPATH}:/opt/VRTS/man
export PATH MANPATH
Allow 'rsh' access for root between servers
name="vcsmm" parent="pseudo" slave_members=8192 instance=0;
Port Function
a GAB
b I/O fencing
d ODM (Oracle Disk Manager)
f CFS (Cluster File System)
h VCS (VERITAS Cluster Server: high availability daemon)
o VCSMM driver
q QuickLog daemon
v CVM (Cluster Volume Manager)
w vxconfigd (module for cvm)
u CVM membership joining
================================================================================
m=merge
c=copy for orignal OS
n=new copy
m /kernel/drv/vssmm.conf
m /etc/llttab
c /etc/llthosts
c /etc/gabtab
c /etc/vxfentab
c /etc/VRTSvcs/conf/config/main.cf
c /etc/vcsmmtab
c /kernel/drv/st.conf
c /kernel/drv/sd.conf
c /kernel/drv/lpfc.conf
c /etc/hostname.*
c /etc/hosts
c /etc/netmasks
m /etc/default/init
m /etc/default/login
c /etc/printers.conf
m /etc/system
c /.ssh/authorized_keys
c /.ssh/known_host
m /etc/auto_*
m /etc/vfstab
m /etc/dumpadm.conf
c /etc/ntp.conf
c /etc/nsswitch.conf
m /etc/inetd.conf dr-sun
m /etc/inet/ipsecinit.conf dr-sun
fence disk - import and setup keys
get new VX licenses
================================================================================
How to get a servicegroup to DEFAULTstart on one of the nodes:
#hagrp -modify grpnam AutoStartList nodename
****************************************************************
Process apache (
PathName = "/opt/apache/httpd"
Arguments = "-f /opt/apache/conf/httpd.conf"
)
Simple support for apache web-server (also in VCS WEB-edition)
***************************************************************
Verify your configuration:
# cd /etc/VRTSvcs/conf/config
# hacf -verify .
6. Start VCS services:
# /etc/rc2.d/S70llt start
# /etc/rc2.d/S92gab start
# /etc/rc3.d/S99vcs start
***************************************************************
To change the tcp port the VERITAS Cluster Server Cluster Monitor (hagui)
uses, a port number in the /etc/inet/services file on each system of the
cluster must be specified. That entry should look like:
vcs 25000/tcp vcs #
All the machines in the cluster will need to be restarted for the change to
take effect. When starting the hagui, right click on the cluster name in the
Cluster
Monitor system window and set the port to reflect the changed port number.
In this example 25000 was used, but any unused tcp port can be used with the
same results.
By default, the hagui uses port 14141.
****************************************************************
Need Port Application Connection Flow (not data flow) From To
Required
25 SMTP notification emails one way all servers email server(s)
162 SNMP one way all servers datacenter HP/OV server
8181 HTTP - web interface one way admin's servers and to operations servers all servers in datacenters
8443 HTTP - web interface one way admin's servers and to operations servers all servers in datacenters
14141 Java admin application bi admin's servers all servers in datacenters
14155 WAC (wide area connector - for GCO - between datacenters) bi all servers at both sites all servers at both sites
14300 Administrative bi admin's servers all servers in datacenters
Optional
14153 Java admin application - simulator bi admin's servers all servers in datacenters
15550-15558 simulator instances bi admin's servers all servers in datacenters
15560-15563 simulator WAC instances bi admin's servers all servers in datacenters
****************************************************************
How do I test a VERITAS Cluster Server (VCS) agent that doesn't appear to be working correctly?
Solution:
To activate agent debug messages:
# hatype -modify resource_type LogLevel info
To check the status of a resource:
# hares -display resource_name
To bring a resource online:
# hares -online resource_name -sys system_name
This causes the online entry point of the corresponding agent to be called.
To take a resource offline:
# hares -offline resource_name -sys system_name
This causes the offline entry point of the corresponding agent to be called.
To deactivate agent debug messages:
# hatype -modify resource_type LogLevel error
********************************************************************
Sybase agent times out when it is running the online script (on the takeover
host), fails to start Sybase and does not put anything useful in the VERITAS
Cluster Server (VCS) logs.
Solution:
The cause can be the incorrect setup of the interface in Sybase's
$SYBASE_HOME/interfaces file. By default, the Sybase installation uses the
interface
available on the host, which may not be the interface that VCS is failing
over. Since the unavailable interface causes a Sybase timeout instead of an
error,
nothing is logged in the VCS engine log.
From the VCS Enterprise Agent for Sybase 1.0.2:
"Transparent TCP/IP Failover"
For transparent failover to Sybase clients, create an IP address as part of
the Sybase service group. This IP address must match the dataserver and
backup
server entries in the $SYBASE_HOME/interfaces file. For information on the
format for adding entries to the $SYBASE_HOME/interfaces file, see the
Sybase SQL Server Installation and Configuration Guide."
*******************************************************************
The VERITAS Cluster Server (VCS) version 1.1 Sybase agent fails to recognize
that the database is running. The agent triggers a shutdown of the
resource and changes the state to FAILED.
Solution:
The VCS Sybase agent is very sensitive to the order of arguments of the data
server process.
If the "-s" is visible as a data server argument but is preceded by a "-d"
string which also contained the data server name:
/opt/sybase/bin/dataserver -d/sybase//syst/master.dat -s -e...
you may experience problems with the agent monitoring the Sybase database
resources.
Put the "-s" in front of "-d" to solve this monitoring problem.
This the default order, when the files are generated by the Sybase
installation utility (Release 11.0.3, 11.5.1, 11.9.2). The order of the
argument list does not
influence the behavior of the Sybase server daemons, but have an impact on
the VCS agents.
******************************************************************''
How and when to use the ProxyAgent for VERITAS Cluster Server (VCS).
The ProxyAgent is used to replicate the state of one resource in a service
group into another service group. For instance, it is normal to have IP
resources
depend on NIC resources in service groups. Instead of having three or four
NIC resources testing the same network interface every sixty seconds, a
ProxyAgent could be used to replicate the state of one NIC resource. In
terms of system load there would be one check to determine if the network
interface
was active and the other service groups would have a ProxyAgent reflecting
the state. In the event of a fault of the NIC resource the ProxyAgent would
fault on that system as well, producing the same behavior. The ProxyAgent is
best used to replicate None resources, None means VCS has no control over
either starting or stopping.
================================================================================
How to set up VERITAS Cluster Server node names that do not depend on the
system host name
Details:
A VERITAS Cluster Server (VCS) cluster can be set up using its own node name
that corresponds to a node ID.
For example, if the two systems are named king and queen (the output from
hostname or uname -n.), then the best thing to do is change king and queen to
something else and set up the VCS node names as sysA for king and sysB for
queen. This way, if host names king and queen need to be changed to any
other host names in the future, the VCS cluster will not be affected by it,
and the cluster node names will remain as sysA and sysB. Here are the steps
to accomplish this:
1. On all systems within the cluster, the /etc/llthosts file must have both
the node IDs and node name. For example, if the node ID are 0 and 1, then
/etc/llthosts should be:
0 sysA
1 sysB
2. All systems within the cluster must have an /etc/VRTSvcs/conf/sysname
file. This file must have the cluster node names defined for the system. In
this case, sysA for king and sysB for queen.
On the system king /etc/VRTSvcs/conf/sysname, it should have just this:
sysA
On the system queen /etc/VRTSvcs/conf/sysname, it should have just this:
sysB
NOTE: The sysname file must be in the conf directory where the VRTSvcs is
installed.
3. On all systems within the cluster, the /etc/llttab file must point to the
/etc/VRTSvcs/conf/sysname file for its "set-node" token. Here is a sample of
/etc/llttab:
set-cluster 1
set-node /etc/VRTSvcs/conf/sysname
link qfe0 /dev/qfe:0
link qfe1 /dev/qfe:1
link-lowpri hme0 /dev/hme:0
start
4. The /etc/VRTSvcs/conf/config/main.cf must reference the systems as sysA
and sysB. Here is a sample of /etc/VRTSvcs/conf/config/main.cf from a
cluster called royal:
include "types.cf"
cluster royal (
UserNames = { root = pwxzyyZyykKo }
CounterInterval = 5
Factor = { runque = 5, memory = 1, disk = 10, cpu = 25,
network = 5 }
MaxFactor = { runque = 100, memory = 10, disk = 100, cpu = 100,
network = 100 }
)
system sysA
system sysB
group File_test (
SystemList = { sysB, sysA }
PrintTree = 0
AutoStartList = { sysB, sysA }
)
FileOnOff ftest (
PathName = "/tmp/file_test"
)
5. Continue to follow the installation guide that came with the software to
configure global atomic broadcast (GAB) and start VCS.
================================================================================
Change a system in the SystemList from X to Y
hagrp -modify app-tr-g-sg SystemList -delete slkprwd21
hagrp -modify app-tr-g-sg SystemList -add slkprdb21 2
================================================================================
To enable Logging -
#In your case, you would replace $typename with "DNS" (without the quotes).
#Output saved in /var/VRTSvcs/log/$typename_A.log
typename=DNS
haconf -makerw
hatype -modify $typename LogDbg DBG_AGINFO DBG_AGTRACE DBG_AGDEBUG
hatype -modify $typename LogDbg -add DBG_1
hatype -modify $typename LogDbg -add DBG_2
hatype -modify $typename LogDbg -add DBG_3
hatype -modify $typename LogDbg -add DBG_4
haconf -dump -makero
#To disable logging, you would do the following:
haconf -makerw
hatype -modify $typename LogDbg -delete -keys
haconf -dump -makero
#See the setting (default value shown)
hatype -display | grep $typename | grep -i LogDbg
....
DNS LogDbg
....
================================================================================
SNMP MIB /etc/VRTSvcs/snmp/vcs.mib
SNMP MIB /etc/VRTSvcs/snmp/vcs_trapd
HP/OV: xnmevents -merge vcs_trapd
SNMP-specific files
VCS includes two SNMP-specific files: vcs.mib and vcs_trapd, which are created in /etc/VRTSvcs/snmp. The file vcs.mib is the textual MIB for built-in traps that are supported by VCS. Load this MIB into your SNMP console to add it to the list of recognized traps. The file vcs_trapd is specific to the HP OpenView Network Node Manager (NNM) SNMP console. The file includes sample events configured for the built-in SNMP traps supported by VCS. To merge these events with those configured for SNMP traps:
# xnmevents -merge vcs_trapd
When you merge events, the SNMP traps sent by VCS by way of notifier are displayed in the HP OpenView NNM SNMP console.
================================================================================
Log messages to the engine log in via an agent perl script: system("/opt/VRTSvcs/bin/halog -add A \"Completed query for $Alias in $ns\"");
================================================================================
encrypt an agent password (like oracle listenet LsnrPwd): vcsencrypt -agent
To encrypt a password for an agent configuration: vcsencrypt -agent
To encrypt a VCS user password: vcsencrypt -vcs
================================================================================
What the agent uses to look for a ps string: ps -u $USER -o args
ps -u $USER -eo pid,comm
================================================================================
use VIPs and not hostnames
work with VCS framework
done restart without using VCS or freezing object withing VCS
done remove a mount, process, etc
don't do things locally
cron, /etc/inittab, /etc/rc.d (or you manage it)
tune on all systems together, patch all systems together
must do things globally
instead of /etc/filesystems, use VCS
================================================================================
Doc TOI (turn over information) - my notes, my docs, V FAQ, V docs
/Veritas/bin
veritas recovery scripts (adm_cfig.veritas - vxvm,vcs)
change registration for licenses (keep ALL licenses & register w/ veritas)
give URLs to veritas sites
VCS packages (include simulator)
include: simulator, CLI/App(windows GUI app, Motif)/Web
man vxintro, vxdctl, vxconfigd, vxconvert, vxdiskadm, vxdg, vxprint
VM/VxFS/Qio; cmds; CLI/GUI; difference in AIX, Solaris, HP/UX
cron jobs for defrag
================================================================================
- Restrict Access to Cluster Nodes to Authorized VCS Users
You must set the value of the cluster attribute AllowNativeCliUsers to the
default value of 0. This attribute is no longer supported.
- Use halogin command to Save Authentication for Running VCS Commands
When non-root users execute ha commands, they are prompted for their VCS user
name and password to authenticate themselves. Use the halogin command to save
the authentication information so that you do not have to enter your
credentials every time you run a VCS command.
The command stores authentication information in the user's home directory.
If you run the command for different hosts, VCS stores authentication
information for each host.
To log on to VCS:
# halogin
To end a session for any host:
# halogin -end session
To end a session for the local host
# halogin -end session local host
To end all sessions:
# halogin -endallsessions
When you end all sessions, VCS prompts you for credentials every time you
run a VCS command.
================================================================================
List SG dependencies: i.e. hagrp -dep ClusterService
================================================================================
hastop - new
EngineShutdown Attribute
enable - process all hastop commands (default, normal behavior)
disable - reject all hastop commands
disableclustop - do no process the "hastop -all" command, but do others
promptclusstop - prompt before "hastop -all", not for others
promptlocal - prompt before "hastop -local", reject all others
promptalways - promopt for all hastop commands
================================================================================
Agent locations: /opt/VRTSagents/ha/bin/*agent*
Agent locations: /etc/VRTSagents/ha/conf/*.cf
================================================================================
hastart [-v] [-version]
had [-v] [-version]
================================================================================
VCS 5.0 items
Bug in VCS 5.0: vxfenconfig "unable to unconfigure vxfen" - IGNORE it
Bug in VCS 5.0: if using only PID files to monitor process: issue with PID
file existing if a system crashes and may have a different
process that now has the same PID as was in the file
Custom agents: C++ via Forte Developer 6
/usr/lib/libvcsagfw.so -> libvcsagfw.so.2
If the agents use scripts, link to ScriptAgent: Script50Agent for VCS 5.0
Removing VRTSat - need to save credentials
vssat showbackuplist
tar/cp /var/VRTSatSnapShot directory
remove package
restore credentials
cd /var/VRTSatSnapShot/profile #or where ever you saved them
cp ABAuthSource /var/VRTSat
cp RBAuthSource /var/VRTSat
cp VRTSat.conf /etc/vx/vss
cd /var/VRTSatSnapShot
cp -r profile /var/VRTSat/.VRTSat
export NFS shares with FQDN (but don't use FQDN's elsewhere)
================================================================================
Setup a restart before faulting (oracle listener)
From 4.0 on, you can set these values for a resource rather than for a resource-type only. The commands to change the RestartLimit for a resource to 2 would be:
hares -override RestartLimit
hares -modify RestartLimit 2
If you want to remove the overridden value from the resource, you can do so by issuing:
hares -undo_override RestartLimit
================================================================================
Defining the remotecluster and heartbeat Cluster Objects
After running the GCO configuration wizard, add the remotecluster cluster object to define the IP address of the cluster on the secondary site, and the heartbeat object to define the cluster-to-cluster heartbeat. Refer to the examples in the steps below to make these changes:
1. On the primary site, enable write access to the configuration:
# haconf -makerw
2. Define the remotecluster and its virtual IP address. In this example, the remote cluster is rac_cluster2 and its IP address is 10.190.99.199:
# haclus -add rac_cluster2 10.190.99.199
3. Complete step 1 and step 2 on the secondary site using the name and IP address of the primary cluster (rac_cluster1 and 10.180.88.188).
4. On the primary site, add the heartbeat object for the cluster. Heartbeats monitor the health of remote clusters. VCS can communicate with the remote cluster only after you set up the heartbeat resource on both clusters. In this example, the heartbeat method is ICMP ping.
# hahb -add Icmp
5. Define the following attributes for the heartbeat resource: ClusterList lists the remote cluster. Arguments enables you to define the virtual UP address for the remote cluster. For example:
# hahb -modify Icmp ClusterList rac_cluster2
# hahb -modify Icmp Arguments 10.190.99.199 -clus rac_cluster2
6. Save the configuration and change the access to read
#hconf -dump -makero
================================================================================
GCO - pings
ping of remote cluster service IP is via regular ping - no special port
symm ping is by "symrdf -sid SID ping"
Use 'hadb -display" to list the info for that system (SID, Cluster IP)
================================================================================
Turn off secure user in VCS
-stop cluster
-remove "SecureCluster = 1" in main.cf
-rm %VRTSconfdir%/.secure
-start cluster
-open conf
-hauser -add admin
-close conf
Files that help with knowing about secure
-VRTSatlocal.conf
-vssconfig.log
-vxato.log
-VRTSat_broker.txt
================================================================================