================================================================================ http://ftp.support.veritas.com/pub/support/products/ClusterServer_UNIX/283995.pdf http://seer.entsupport.symantec.com/docs #docs ================================================================================ Install logs: /opt/VRTS/install/logs ================================================================================ GCO port - 14155, open between any/all systems that have GCO SG ClusterService must run (and must have this name - keyword, don't change its SG name) for GCO to work. ================================================================================ hagrp -switch globalgroup-sg -any -clus remote-clus-name ================================================================================ import shared diskgroups: vxdg -s import DG to make them NON shared vxdg -Cf import DF ================================================================================ Samples: FILE=/opt/VRTSvcs/bin/sample_triggers/[postonline,preonline,preoffline,postoffline] cp /opt/VRTSvcs/bin/sample_triggers/FILE /opt/VRTSvcs/bin/triggers chmod 755 /opt/VRTSvcs/bin/triggers/FILE # run the trigger to test hatrigger -postonline 0 slkprdm21 dbms-dm-g-sg ================================================================================ /opt/VRTSvcs/wizards/bin/nfs /opt/VRTSvcs/wizards/bin/applications ================================================================================ # #print out what is online, versus parsing the hastatus -sum # hagrp -display | grep ONLINE | sed -e 's/|//g' | awk '{print $1":"$3":"$4}' # # Get a group list from the main.cf # sed -n '/^group aqrydb01/,/^group/p' /etc/*vcs/conf/config/main.cf \ | grep -v '//' | sed '$d' ================================================================================ How to Configure the VERITAS Cluster Server GUI to work through a Web Details: The VERITAS Cluster Server (VCS) Web-based graphical user interface enables you to monitor the cluster and perform basic cluster administration, including many of the same operations as the command-line interface and VCS Java Console. Below is a sample configuration: Sample Configuration The following sample illustrates a typical configuration of the ClusterService service group. ClusterService ( SystemList = { vcssun5, vcssun6 } AutoStartList = { vcssun5, vcssun6 } OnlineRetryLimit = 3 ) IP webip ( Address = "162.39.9.85" NetMask = "255.255.255.0" Device = "hme0" ) Process VRTSweb ( PathName = "/opt/VRTSvcs/bin/haweb" Arguments = "172.29.9.108 8181" ) VRTSweb requires webip To log into this cluster using a Web browser, you would need to point the web browser at the IP address: 8181 Connecting to Cluster Manager (Web Console) This graphical user interface works from a Web server on a single system in the cluster. This ensures high availability for Cluster Manager (Web Console). When a system is taken offline, the Web server fails over to another system, ensuring continuous access. Cluster Manager (Web Console) can be accessed with the URL http://IP_alias:8181/vcs The variable IP_alias is the virtual IP address configured in the ClusterService service group and 8181 is the default VERITAS Web port. You must use a valid VCS user name and password to log on to the cluster. Refer to the VCS 2.0 user guide for more information. ================================================================================ cd /opt/VRTSvcs/install; ./installvcs sys1 sys2 -usessh -configure installvcs -configfile #unattended install installvcs -installonly #do not config installvcs -configonly #do not install (previously installed) ================================================================================ LLT links can be added or removed while clients are connected. Shutting down GAB or the high-availability daemon, HAD, is not required. To add a link lltconfig -d device -t tag To remove a link lltconfig -u tag Changes take effect immediately and are lost on the next reboot. For changes to span reboots you must also update /etc/llttab. Note: LLT clients do not recognize the difference unless only one link is available and GAB declares jeopardy. ================================================================================ After a system or private network card is replaced in a VERITAS Cluster Server (VCS) cluster, GAB may refuse to start. One of the following messages may appear on the console or in /var/adm/messages: GAB: Port a registration waiting for seed port membership or GAB: Port a closed To avoid this problem, enter the following command on all remaining systems in the cluster, preferably after the old system or network card has been removed, and before the new system or network card is introduced: # /sbin/lltconfig -a flush If the problem has already occurred, shut down VCS by entering the following commands on any system in the cluster: # haconf -dump -makero # /opt/VRTSvcs/bin/hastop -all -force Next, shut down GAB by entering the following command on each system in the cluster: # /sbin/gabconfig -u Run the following commands on each system: # lltconfig -a flush # sh -x /etc/rc2.d/S92gab start # hastart -force The lltconfig -a flush command flushes the ARP-like cache of private network MAC addresses contained within LLT. LLT can now learn the MAC address of the new system or network card. See lltconfig(1M) for more information regarding other -a options. Note: You must use the -force option with the command hastart when restarting VCS. ================================================================================ VCS on windows net stop vcscomm hastop net stop gab net stop llt net start llt net start gab hastart ================================================================================ #Start VCS with only one node on singlenode/onenode: gabconfig -c -x Start as a single node cluster: hastart -onenode Permanent - in startup script (note: AIX has an automatic check for this and will start on a single node if the main.cf has only one system listed) HASTART="/opt/VRTSvcs/bin/hastart -onenode" ================================================================================ VXFS: mincache=dsync,convosync=dsync,largefiles ================================================================================ hacf -verify /etc/VRTSvcs/conf/config ================================================================================ NEW: http://WEB-VIRTUAL:8181/cmc/ NEW: https://WEB-VIRTUAL:8443/cmc/ OLD: http://WEB-VIRTUAL:8181/vcs/index OLD: https://WEB-VIRTUAL:8443/vcs/index ================================================================================ Combine many little cluster into one large cluster. Combine HB's into similar VLANs (HB1 and HB2). Do so without incurring cluster outages: either shutdown the clusters OR do a single HB network at a time until things get back into sync before doing the next. ssh keys or rsh enable all nodes to all nodes main.cf - combine and copy to all nodes (make sure to verify it). all objects must have unique names select one ClusterService IP for the entire cluster select a cluster name that encompasses all sub-clusters llttab - same cluster ID in all files llthosts - all hosts listed w/ consistent index numbers have DNS entries fo other ClusterService IPs deleted, have the "living" one renamed to ${clustername}-adm (or something generic) ================================================================================ if auto-disable on host where it is auto-disabled (or what last had it, or where you want it to start (as long as it is not seen on another node - avoid split brain)) hagrp -autoenable GROUP -sys HOST ================================================================================ LOGS: /var/VRTSvcs/log/ engine_[A,B].log hashadow_[A,B].log AGENTNAME_[A,B].log /var/adm/streams (GAB, LLT, HB messages) ================================================================================ Agents: /etc/VRTSvcs/conf/AGENTNAME/[online,offline,monitor] haagent -stop $AGENT -sys `hostname` haagent -start $AGENT -sys `hostname` haagent -stop SRDF -sys `hostname` ps -ef | grep SRDF haagent -start SRDF -sys `hostname` (The "open" script runs in /opt/*vcs/bin/SRDF and creates a lock file for each SRDF disk group that "learns" the type of SRDF it is dealing with.) ps -ef | grep SRDF see agent AND perl script (until perl finishes - runs the open) root 524330 483830 0 18:14:17 pts/14 0:00 grep SRDF root 536808 1 0 18:12:42 - 0:00 /opt/VRTSvcs/bin/SRDF/SRDFAgent -type SRDF root 909540 536808 0 18:13:11 - 0:00 /opt/VRTSperl/bin/perl -I /opt/VRTSvcs/bin/SRDF/../../lib -S /opt/VRTSvcs/bin/SRDF/online dbms-dm-srdf /usr/symcli slkprdm_db 2 0 0 ================================================================================ perl debug of an Agent ls -la /opt/VRTSvcs/bin/SRDF #look for .*lock* files hares -display dbms-tn-srdf # # the arguments are from the hares display above # perl -d -I /opt/VRTSvcs/lib ./open dbms-tn-srdf /usr/symcli slkprtn_db 2 0 0 "" >n >l >c LINENUM >p $var >p @array >q /usr/symcli/bin/symdev -dynamic list ================================================================================ symdg list symrdf -g $RDFGRP [query,split,failover,failback) VCS lock file for SRDF - /opt/VRTSvcs/bin/SRDF/.VCS_SRDF_$resource read on pages: 11,34,35 of SRDF agent R2->R1 update back: hares -action srdf_res_name update -sys dr-sys-that-owns-disks-now R1s become UPDATED when done ================================================================================ hacf -cftocmd /etc/*vcs/conf/config -dest /tmp more /tmp/main.cmd ================================================================================ hagrp -clearadminwait -fault GROUP -sys SYSTEM ================================================================================ #Do ping on LLT #setup PING server system1# /opt/VRTSllt/getmac /dev/hme:0 /dev/hme:0 00.AA.00.A2.63.B1 system1# /opt/VRTSllt/dlpiping -vs /dev/hme:0 & #setup PING Client system2# /opt/VRTSllt/dlpiping -vc /dev/hme:0 00.AA.00.A2.63.B1 00.AA.00.A2.63.B1 is alive /opt/VRTSllt/lltdump -f /dev/qfe:0 -V -A -R /opt/VRTSllt/lltshow -n 0 | more system2# /opt/VRTSgab/getcomms #saves config stuff in /tmp ================================================================================ Things to do to make a node a VCS cluster member: Do NOT allow "Stop-A" or "breaks" and then a "boot" on any VCS nodes #update the password haconf -makerw hauser -update admin haconf -dump -makero
  1. get a service name in DNS
  2. install 2nd & 3rd NIC for heartbeat and connect to HB VLANs
  3. eeprom "local-mac-address?=true"
  4. umount any local mounts to go under VCS, then remove their entries from /etc/vfstab
  5. stop any volumes to go under VCS
  6. deport any diskgroups to go under VCS
  7. get a veritas VCS license and any agent licenses
  8. create any accounts and propogate if they are not in NIS
  9. make sure LUN and zones are setup for shared disks
  10. VxFS 3.4 needs patch02
  11. VxFS needs an fsck before you mount it under VCS
more /etc/llthosts 0 munich 1 bern 2 rome more /etc/llttab # # Sample /etc/llttab # # -- comment lines begin with pound, NO comments at end of line # -- order of items is important, some must precede others # # # the minimum required: # -- set the node ID number # -- link a network interface # for verbose messages from lltconfig, add this line first in llttab set-verbose 1 # set the node ID for this machine. may be a number, name, or filename. # -- number is taken literally as a node ID # -- a name is translated to a node ID via /etc/llthosts # -- a filename will take the first word in the file and translate it # via /etc/llthosts to a node ID # # the node ID must be in the range 0..(max-1), where max is defined in # /kernel/drv/llt.conf as "nodes=max" # #set-node 1 # or #set-node system1 # or set-node /etc/nodename # if multiple distinct and separate clusters are for some # reason sharing to the same network, then each must have # a unique cluster ID or the node IDs will conflict. # the default cluster ID is 0 # (alternatively each cluster could use a uniqe SAP) # #set-cluster 44 set-cluster 140 # link a network interface below LLT # format: link tag-name device-name:device-unit node-range link-type SAP # MTU # [IP-address broadcast-address] # # -- tag-name # a symbolic name used to reference this link in set-addr # commands and lltstat output # -- device-name:device-unit # the DLPI STREAMS device for the LAN interface, and the unit number # on that device, -or- the device /dev/udp for UDP/IP links # -- node-range # the range of nodes that should process this command. a dash '-' # is the default for "all nodes". this is useful to use the same file # on multiple nodes which have differing hardware. # -- link-type # the type of network. currently supported values: ether udp # -- SAP (port) # the SAP used to bind to the network link, or the UDP port if the # link-type is udp. a dash '-' is the default. NOTE: each UDP link # is required to use a unique port. # -- MTU # the maximum transmission size for packets on the network link. # a dash '-' is the default. # -- IP-address # for links of link-type udp only. The IP address of this node on # this link. Must be specified here since all UDP is funnelled through # /dev/udp and there is no way to individually query like for DLPI # links. Format is either a.b.c.d dot-notation, a hostname to lookup # with gethostbyname(), or a filename from which the first word will # be read and looked up with gethostbyname(). # -- broadcast-address # for links of link-type udp only. The broadcast address of this # node on this link. Format is a.b.c.d dot-notation. # # # Solaris example #link qfe0 /dev/qfe:0 - ether - - link hme0 /dev/hme:0 - ether - - link-lowpri ge0 /dev/ge:0 - ether - - # # # UDP/IP example #link lan1 /dev/udp - udp 0x2345 - 192.1.2.3 192.1.2.255 # link a network interface below LLT and send heartbeats over it, # and send data over it as a last resort # #link-lowpri le0 /dev/le:0 - ether - - # option to set the range of valid node IDs in the cluster # the default is all nodes included (0-31) # (see /kernel/drv/llt.conf "nodes=nnn" for the maximum) # #include 0-31 # # the following will cause only nodes 0-7 to # be valid for cluster participation # #exclude 8-31 # MAC addresses are set here manually for networks # that do not support broadcasting # # format: set-addr node-id tag-name address # # set address for node 2 on link le0 #set-addr 2 le0 01:02:03:01:02:03 # # set address for node 2 on link lan0 #set-addr 2 lan0 01:02:03:01:02:03 # # set address for node 2 on link lan1 (link-type udp) #set-addr 2 lan1 192.1.2.4 # set the warning level for console LLT kernel warnings # level 0 is no warnings. the default level is 20. #set-warn 0 # uncomment these two commands for networks # that do not support broadcasting # # this one disables broadcasts that dynamically learn MAC addresses #set-arp 0 # # this one disables using broadcasts for heartbeats #set-bcasthb 0 # # options to modify the LLT timer intervals (in 1/100 second) # # send a heartbeat 2 times per second # #set-timer heartbeat:50 # # send a heartbeat 1 time per second (for link-lowpri links) # #set-timer heartbeatlo:100 # # mark a link to a peer down after 16 sec of missed heartbeats # (peerinact must be larger than either heartbeat timer) #set-timer peerinact:1600 # # other timers # #set-timer oos:10 #set-timer retrans:10 #set-timer service:100 #set-timer arp:30000 # options to set the LLT flow control threshholds (in packets) # #set-flow lowater:40 #set-flow hiwater:80 #set-flow window:60 # # NOTE: the "start" command is obsolete and ignored # it's functionality is now implicit at the end-of-file # # required to start the LLT protocol # start ================================================================================ So you don't have to copy the main.cf around: (doesn't work in 5.X and newer) source node: hastart all other nodes hastart -stale ================================================================================ #Start VCS with only one node on the one node: gabconfig -c -x ================================================================================ more /etc/gabtab /sbin/gabconfig -c -n 3 ================================================================================ vcsconfig cfscluster config cfscluster status vcsmmdebug vcsmm.conf ================================================================================ on unix host hagui package: load VRTSjre => VRTScscm ================================================================================ haconf -makerw hauser -add guest Enter New Password: Enter Again: User added with guest privileges. To assign Administrators/Operators privileges at cluster or group level use the following commands: hauser -add dba -priv Operators OR hauser -add dba -priv Administrators hauser -update Administrators|Operators -add hagrp -modify Administrators/Operators -add haconf -dump -makero vxdctl -c mode #see who is master of the cfs ================================================================================ OS Setup Assume an OS w/ basic environment. Have licenses from Veritas for correct version and product. Load SFORARAC package (rsh enabled before) on all nodes in cluster (run from one, pushes to all) Load maintenance pack MPX (rsh enabled before) on all nodes (run from one, pushes to all) Do a configure CVM and CFS Manually on just one node: #/opt/VRTSvxfs/cfs/bin/cfscluster config accept default timeout of 200 *** answer N to starting CVM *** (IF ORACLE RAC) Update the /kernel/drv/vcsmm.conf file on all nodes, add the line (on continuous line not wrapped). #vi /kernel/drv/vcsmm.conf name="vcsmm" parent="pseudo" \ slave_members=8192 instance=0; Setup Coordinator Disks - fencing disks Must be at least 3 drives, odd-number of them (3,5,7,...) Standard disks Make them the smallest disk size allowed (save disk space) Mirror these disks Do NOT write data to these disks Must be in a separate diskgroup Must have SCSI-3 persistent reservation enabled on all All must be seen by all cluster nodes Verify that ALL disks have passed SCSI-3 PR testing. #vxfenadm -G all -f /etc/vxfentab If they fail testing, then #/opt/VRTSvcs/vxfen/bin/vxfentsthdw From the primary cluster node, create a disk group called vxcoorddg. This group must contain odd number of coordinator LUNs and a minimum of 3 disks/LUNs is needed. Deport the disk group # vxdg deport vxcoorddg Import the disk group with the -t option so that it is not automatically imported when the systems are rebooted. # vxdg -t import vxcoorddg # vxdg -o groupreserve -o clearreserve -t import vxcoorddg Deport the disk group. Deporting the disk group prevents the coordinator disks from being used for other purposes. # vxdg deport vxcoorddg Enter the following commands on each cluster node. # echo vxcoorddg > /etc/vxfendg # cat /etc/vxfendg Edit /etc/VRTSvcs/conf/config/main.cf add: UseFence=SCSI3 default: UseFence=none haclus -value UseFence #list value in cluster Start the LLT, GAB, VXFEN, VCSMM, LMX, and ODM drivers. Start LLT # cd /etc # cd init.d #/etc/init.d/llt.rc start Starting LLT ... Starting LLT done. Start GAB # /etc/init.d/gab start Starting GAB ... Starting GAB done. Start I/O fencing # /etc/init.d/vxfen start Starting vxfen.. Starting vxfen.. Done VCS FEN vxfenconfig # NOTICE Driver will use SCSI-3 compliant disks. (IF ORACLE RAC) Start the VCSMM driver # ./vcsmm start (IF ORACLE RAC) Start the LMX driver # ./lmx start STARTING LMX (IF ORACLE) Start ODM # umount /dev/odm # ./odm start Use gabconfig -a to check the status of the drivers. Ports a, b, d, and o should show membership on node 0 # gabconfig a # GAB Port Memberships ========================================== Port a gen 4a1c0001 membership 01 Port b gen g8ty0002 membership 01 Port d gen 40100001 membership 01 Port f gen f1990002 membership 01 Port h gen d8850002 membership 01 Port o gen f1100002 membership 01 Port q gen 28d10002 membership 01 Port v gen 1fc60002 membership 01 Port w gen 15ba0002 membership 01 The output of the gabconfig -a command displays which cluster systems have membership with the modules that have been installed and configured thus far in the installation. The first line indicates that each system (0,1,n..) has membership with the GAB utility, which uses Port a. The ports listed, including port a, are configured for the following functions: ========================================== Port Function a GAB b I/O fencing d ODM (Oracle Disk Manager) f CFS (Cluster File System) h VCS (high availability daemon) o VCSMM driver q QuickLog daemon v CVM (Cluster Volume Manager) w vxconfigd (module for cvm) u CVM membership joining ========================================== ================================================================================ All nodes fenced off (partition-in-time) Node 1 fails first, Node 0 fails before Node 1 is online, Node 1 is repaired and boots while node 0 is down, Node 1 cannot access coordinator disk because node 0's keys are on disks. Verify node 0 is down node 1: vxfenadm -g all -f /etc/vxfentab see keys node 1: /opt/VRTSvcs/rac/bin/vxfenclearpre repair the faulted system reboot all nodes in cluster ================================================================================ Symptom: What is AutoDisable and how did the service groups become that way? Solution: VERITAS has made extensive modification to the features and behaviors of the newest release of VERITAS Cluster Server (VCS) for Unix, version 1.3.0. Most notable is the change to the group state "auto-disable." This is an examination of what auto-disable is, when it will be used and how to deal with it. Briefly, auto-disable is a flag set by the VCS engine on a service group that will prevent that group from becoming on-line. Auto-disable is used as a way to prevent data corruption from occurring by avoiding a situation called split-brain. Split-brain is where data is being updated by two or more hosts simultaneously. With normal means of accessing storage through filesystems or volume management which is not cluster aware, this can very easily cause data to become corrupt. This could be the Volume Manager diskgroup metadata or filesystem metadata or regular data files or even database data files. Here is a simple example of how easy this state is to produce: Stop VCS on all nodes with the force option; this will leave the services running. Next start VCS on one node and online a service group and the data is instantly accessible to both hosts and can become invalid! In VCS 1.0 through 1.1.x, auto-disable would only occur if VCS had failed on a system while a group was running, or if a system running the group left the cluster with an unreliable communication channel. This was very easy to produce; simply rebooting the node running the group would cause auto-disable to happen. In VCS 1.3.0, auto-disable will occur in any of the following conditions: the VCS engine (had) is not running, all the resources in the service group are not probed or the only functioning link from a node is a disk heartbeat. This means that if "hastop -local" is executed on a system, service groups which have that system in their system list will be auto-disabled. This does not apply to systems that have been powered off under VCS 1.3.0 as it would in VCS 1.1.2, due to the additional changes to reboot behavior. Auto-disabling when the resources are not probed happens because VCS would not be able to determine the initial state of the resource when VCS is first started. This generally indicates a wide-spread configuration or system problem. If the system has a disk-based heartbeat and all other heartbeat links fail (normal links and low-pri links), then VCS will mark the group as auto-disabled. In these cases, the service group does not need to be running on the node with the problem; if this type of problem occurs on any node in the group, the auto-disable flag will be set. This makes it critical that all nodes in the cluster are healthy and fully functional. It also means that administrators need to be aware of this behavior during an extended period of maintenance on one node because that will auto-disable the group on all systems. Dealing with an auto-disable situation is simple and is the same for version 1.0 through 1.3.0. Using the command "hagrp -autoenable -sys " will clear the auto-disable flag for the listed system. Important note: The system name that should be used is that of the system which caused the auto-disable state. For instance, if there is a service group called "A" running on node 1 and it has nodes 2 and 3 in its system list, and node 3 is not running VCS for an upgrade, the command would be "hagrp -autoenable A -sys 3" to clear the autodisable. The other option is to remove the affected system from the service group's system list. To achieve this, use "hagrp -modify SystemList -delete ". For the above example, to change the system list for group A to remove node 3, the command would be "hagrp -modify A SystemList -delete 3", where node 1 has a priority of 0 and node 2 has a priority of 1. The changes made to the auto-disable behavior in VCS 1.3.0 were put in place to increase the protection of the data when VCS does not completely know the states of the resources. It can be argued that the changes are on the conservative side, but when hundreds of gigabytes of data are being controlled by VCS, erring on the side of caution is wise ================================================================================ Check that local-mac-address? is set to true Update the PATH and MANPATH Environment Variables for the root Unix account on PATH=${PATH}:/usr/lib/vxvm/bin:/usr/bin/fs/vxfs:/opt/VRTSvxfs/sbin: /opt/VRTSvcs/bin:/opt/VRTSvcs/rac/bin:/opt/VRTSob/bin:/opt/VRTSvxfs/cf s/bin:/opt/VRTSllt MANPATH=${MANPATH}:/opt/VRTS/man export PATH MANPATH Allow 'rsh' access for root between servers name="vcsmm" parent="pseudo" slave_members=8192 instance=0; Port Function a GAB b I/O fencing d ODM (Oracle Disk Manager) f CFS (Cluster File System) h VCS (VERITAS Cluster Server: high availability daemon) o VCSMM driver q QuickLog daemon v CVM (Cluster Volume Manager) w vxconfigd (module for cvm) u CVM membership joining ================================================================================ m=merge c=copy for orignal OS n=new copy m /kernel/drv/vssmm.conf m /etc/llttab c /etc/llthosts c /etc/gabtab c /etc/vxfentab c /etc/VRTSvcs/conf/config/main.cf c /etc/vcsmmtab c /kernel/drv/st.conf c /kernel/drv/sd.conf c /kernel/drv/lpfc.conf c /etc/hostname.* c /etc/hosts c /etc/netmasks m /etc/default/init m /etc/default/login c /etc/printers.conf m /etc/system c /.ssh/authorized_keys c /.ssh/known_host m /etc/auto_* m /etc/vfstab m /etc/dumpadm.conf c /etc/ntp.conf c /etc/nsswitch.conf m /etc/inetd.conf dr-sun m /etc/inet/ipsecinit.conf dr-sun fence disk - import and setup keys get new VX licenses ================================================================================ How to get a servicegroup to DEFAULTstart on one of the nodes: #hagrp -modify grpnam AutoStartList nodename **************************************************************** Process apache ( PathName = "/opt/apache/httpd" Arguments = "-f /opt/apache/conf/httpd.conf" ) Simple support for apache web-server (also in VCS WEB-edition) *************************************************************** Verify your configuration: # cd /etc/VRTSvcs/conf/config # hacf -verify . 6. Start VCS services: # /etc/rc2.d/S70llt start # /etc/rc2.d/S92gab start # /etc/rc3.d/S99vcs start *************************************************************** To change the tcp port the VERITAS Cluster Server Cluster Monitor (hagui) uses, a port number in the /etc/inet/services file on each system of the cluster must be specified. That entry should look like: vcs 25000/tcp vcs # All the machines in the cluster will need to be restarted for the change to take effect. When starting the hagui, right click on the cluster name in the Cluster Monitor system window and set the port to reflect the changed port number. In this example 25000 was used, but any unused tcp port can be used with the same results. By default, the hagui uses port 14141. **************************************************************** Need Port Application Connection Flow (not data flow) From To Required 25 SMTP notification emails one way all servers email server(s) 162 SNMP one way all servers datacenter HP/OV server 8181 HTTP - web interface one way admin's servers and to operations servers all servers in datacenters 8443 HTTP - web interface one way admin's servers and to operations servers all servers in datacenters 14141 Java admin application bi admin's servers all servers in datacenters 14155 WAC (wide area connector - for GCO - between datacenters) bi all servers at both sites all servers at both sites 14300 Administrative bi admin's servers all servers in datacenters Optional 14153 Java admin application - simulator bi admin's servers all servers in datacenters 15550-15558 simulator instances bi admin's servers all servers in datacenters 15560-15563 simulator WAC instances bi admin's servers all servers in datacenters **************************************************************** How do I test a VERITAS Cluster Server (VCS) agent that doesn't appear to be working correctly? Solution: To activate agent debug messages: # hatype -modify resource_type LogLevel info To check the status of a resource: # hares -display resource_name To bring a resource online: # hares -online resource_name -sys system_name This causes the online entry point of the corresponding agent to be called. To take a resource offline: # hares -offline resource_name -sys system_name This causes the offline entry point of the corresponding agent to be called. To deactivate agent debug messages: # hatype -modify resource_type LogLevel error ******************************************************************** Sybase agent times out when it is running the online script (on the takeover host), fails to start Sybase and does not put anything useful in the VERITAS Cluster Server (VCS) logs. Solution: The cause can be the incorrect setup of the interface in Sybase's $SYBASE_HOME/interfaces file. By default, the Sybase installation uses the interface available on the host, which may not be the interface that VCS is failing over. Since the unavailable interface causes a Sybase timeout instead of an error, nothing is logged in the VCS engine log. From the VCS Enterprise Agent for Sybase 1.0.2: "Transparent TCP/IP Failover" For transparent failover to Sybase clients, create an IP address as part of the Sybase service group. This IP address must match the dataserver and backup server entries in the $SYBASE_HOME/interfaces file. For information on the format for adding entries to the $SYBASE_HOME/interfaces file, see the Sybase SQL Server Installation and Configuration Guide." ******************************************************************* The VERITAS Cluster Server (VCS) version 1.1 Sybase agent fails to recognize that the database is running. The agent triggers a shutdown of the resource and changes the state to FAILED. Solution: The VCS Sybase agent is very sensitive to the order of arguments of the data server process. If the "-s" is visible as a data server argument but is preceded by a "-d" string which also contained the data server name: /opt/sybase/bin/dataserver -d/sybase//syst/master.dat -s -e... you may experience problems with the agent monitoring the Sybase database resources. Put the "-s" in front of "-d" to solve this monitoring problem. This the default order, when the files are generated by the Sybase installation utility (Release 11.0.3, 11.5.1, 11.9.2). The order of the argument list does not influence the behavior of the Sybase server daemons, but have an impact on the VCS agents. ******************************************************************'' How and when to use the ProxyAgent for VERITAS Cluster Server (VCS). The ProxyAgent is used to replicate the state of one resource in a service group into another service group. For instance, it is normal to have IP resources depend on NIC resources in service groups. Instead of having three or four NIC resources testing the same network interface every sixty seconds, a ProxyAgent could be used to replicate the state of one NIC resource. In terms of system load there would be one check to determine if the network interface was active and the other service groups would have a ProxyAgent reflecting the state. In the event of a fault of the NIC resource the ProxyAgent would fault on that system as well, producing the same behavior. The ProxyAgent is best used to replicate None resources, None means VCS has no control over either starting or stopping. ================================================================================ How to set up VERITAS Cluster Server node names that do not depend on the system host name Details: A VERITAS Cluster Server (VCS) cluster can be set up using its own node name that corresponds to a node ID. For example, if the two systems are named king and queen (the output from hostname or uname -n.), then the best thing to do is change king and queen to something else and set up the VCS node names as sysA for king and sysB for queen. This way, if host names king and queen need to be changed to any other host names in the future, the VCS cluster will not be affected by it, and the cluster node names will remain as sysA and sysB. Here are the steps to accomplish this: 1. On all systems within the cluster, the /etc/llthosts file must have both the node IDs and node name. For example, if the node ID are 0 and 1, then /etc/llthosts should be: 0 sysA 1 sysB 2. All systems within the cluster must have an /etc/VRTSvcs/conf/sysname file. This file must have the cluster node names defined for the system. In this case, sysA for king and sysB for queen. On the system king /etc/VRTSvcs/conf/sysname, it should have just this: sysA On the system queen /etc/VRTSvcs/conf/sysname, it should have just this: sysB NOTE: The sysname file must be in the conf directory where the VRTSvcs is installed. 3. On all systems within the cluster, the /etc/llttab file must point to the /etc/VRTSvcs/conf/sysname file for its "set-node" token. Here is a sample of /etc/llttab: set-cluster 1 set-node /etc/VRTSvcs/conf/sysname link qfe0 /dev/qfe:0 link qfe1 /dev/qfe:1 link-lowpri hme0 /dev/hme:0 start 4. The /etc/VRTSvcs/conf/config/main.cf must reference the systems as sysA and sysB. Here is a sample of /etc/VRTSvcs/conf/config/main.cf from a cluster called royal: include "types.cf" cluster royal ( UserNames = { root = pwxzyyZyykKo } CounterInterval = 5 Factor = { runque = 5, memory = 1, disk = 10, cpu = 25, network = 5 } MaxFactor = { runque = 100, memory = 10, disk = 100, cpu = 100, network = 100 } ) system sysA system sysB group File_test ( SystemList = { sysB, sysA } PrintTree = 0 AutoStartList = { sysB, sysA } ) FileOnOff ftest ( PathName = "/tmp/file_test" ) 5. Continue to follow the installation guide that came with the software to configure global atomic broadcast (GAB) and start VCS. ================================================================================ Change a system in the SystemList from X to Y hagrp -modify app-tr-g-sg SystemList -delete slkprwd21 hagrp -modify app-tr-g-sg SystemList -add slkprdb21 2 ================================================================================ To enable Logging - #In your case, you would replace $typename with "DNS" (without the quotes). #Output saved in /var/VRTSvcs/log/$typename_A.log typename=DNS haconf -makerw hatype -modify $typename LogDbg DBG_AGINFO DBG_AGTRACE DBG_AGDEBUG hatype -modify $typename LogDbg -add DBG_1 hatype -modify $typename LogDbg -add DBG_2 hatype -modify $typename LogDbg -add DBG_3 hatype -modify $typename LogDbg -add DBG_4 haconf -dump -makero #To disable logging, you would do the following: haconf -makerw hatype -modify $typename LogDbg -delete -keys haconf -dump -makero #See the setting (default value shown) hatype -display | grep $typename | grep -i LogDbg .... DNS LogDbg .... ================================================================================ SNMP MIB /etc/VRTSvcs/snmp/vcs.mib SNMP MIB /etc/VRTSvcs/snmp/vcs_trapd HP/OV: xnmevents -merge vcs_trapd SNMP-specific files VCS includes two SNMP-specific files: vcs.mib and vcs_trapd, which are created in /etc/VRTSvcs/snmp. The file vcs.mib is the textual MIB for built-in traps that are supported by VCS. Load this MIB into your SNMP console to add it to the list of recognized traps. The file vcs_trapd is specific to the HP OpenView Network Node Manager (NNM) SNMP console. The file includes sample events configured for the built-in SNMP traps supported by VCS. To merge these events with those configured for SNMP traps: # xnmevents -merge vcs_trapd When you merge events, the SNMP traps sent by VCS by way of notifier are displayed in the HP OpenView NNM SNMP console. ================================================================================ Log messages to the engine log in via an agent perl script: system("/opt/VRTSvcs/bin/halog -add A \"Completed query for $Alias in $ns\""); ================================================================================ encrypt an agent password (like oracle listenet LsnrPwd): vcsencrypt -agent To encrypt a password for an agent configuration: vcsencrypt -agent To encrypt a VCS user password: vcsencrypt -vcs ================================================================================ What the agent uses to look for a ps string: ps -u $USER -o args ps -u $USER -eo pid,comm ================================================================================ use VIPs and not hostnames work with VCS framework done restart without using VCS or freezing object withing VCS done remove a mount, process, etc don't do things locally cron, /etc/inittab, /etc/rc.d (or you manage it) tune on all systems together, patch all systems together must do things globally instead of /etc/filesystems, use VCS ================================================================================ Doc TOI (turn over information) - my notes, my docs, V FAQ, V docs /Veritas/bin veritas recovery scripts (adm_cfig.veritas - vxvm,vcs) change registration for licenses (keep ALL licenses & register w/ veritas) give URLs to veritas sites VCS packages (include simulator) include: simulator, CLI/App(windows GUI app, Motif)/Web man vxintro, vxdctl, vxconfigd, vxconvert, vxdiskadm, vxdg, vxprint VM/VxFS/Qio; cmds; CLI/GUI; difference in AIX, Solaris, HP/UX cron jobs for defrag ================================================================================ - Restrict Access to Cluster Nodes to Authorized VCS Users You must set the value of the cluster attribute AllowNativeCliUsers to the default value of 0. This attribute is no longer supported. - Use halogin command to Save Authentication for Running VCS Commands When non-root users execute ha commands, they are prompted for their VCS user name and password to authenticate themselves. Use the halogin command to save the authentication information so that you do not have to enter your credentials every time you run a VCS command. The command stores authentication information in the user's home directory. If you run the command for different hosts, VCS stores authentication information for each host. To log on to VCS: # halogin To end a session for any host: # halogin -end session To end a session for the local host # halogin -end session local host To end all sessions: # halogin -endallsessions When you end all sessions, VCS prompts you for credentials every time you run a VCS command. ================================================================================ List SG dependencies: i.e. hagrp -dep ClusterService ================================================================================ hastop - new EngineShutdown Attribute enable - process all hastop commands (default, normal behavior) disable - reject all hastop commands disableclustop - do no process the "hastop -all" command, but do others promptclusstop - prompt before "hastop -all", not for others promptlocal - prompt before "hastop -local", reject all others promptalways - promopt for all hastop commands ================================================================================ Agent locations: /opt/VRTSagents/ha/bin/*agent* Agent locations: /etc/VRTSagents/ha/conf/*.cf ================================================================================ hastart [-v] [-version] had [-v] [-version] ================================================================================ VCS 5.0 items Bug in VCS 5.0: vxfenconfig "unable to unconfigure vxfen" - IGNORE it Bug in VCS 5.0: if using only PID files to monitor process: issue with PID file existing if a system crashes and may have a different process that now has the same PID as was in the file Custom agents: C++ via Forte Developer 6 /usr/lib/libvcsagfw.so -> libvcsagfw.so.2 If the agents use scripts, link to ScriptAgent: Script50Agent for VCS 5.0 Removing VRTSat - need to save credentials vssat showbackuplist tar/cp /var/VRTSatSnapShot directory remove package restore credentials cd /var/VRTSatSnapShot/profile #or where ever you saved them cp ABAuthSource /var/VRTSat cp RBAuthSource /var/VRTSat cp VRTSat.conf /etc/vx/vss cd /var/VRTSatSnapShot cp -r profile /var/VRTSat/.VRTSat export NFS shares with FQDN (but don't use FQDN's elsewhere) ================================================================================ Setup a restart before faulting (oracle listener) From 4.0 on, you can set these values for a resource rather than for a resource-type only. The commands to change the RestartLimit for a resource to 2 would be: hares -override RestartLimit hares -modify RestartLimit 2 If you want to remove the overridden value from the resource, you can do so by issuing: hares -undo_override RestartLimit ================================================================================ Defining the remotecluster and heartbeat Cluster Objects After running the GCO configuration wizard, add the remotecluster cluster object to define the IP address of the cluster on the secondary site, and the heartbeat object to define the cluster-to-cluster heartbeat. Refer to the examples in the steps below to make these changes: 1. On the primary site, enable write access to the configuration: # haconf -makerw 2. Define the remotecluster and its virtual IP address. In this example, the remote cluster is rac_cluster2 and its IP address is 10.190.99.199: # haclus -add rac_cluster2 10.190.99.199 3. Complete step 1 and step 2 on the secondary site using the name and IP address of the primary cluster (rac_cluster1 and 10.180.88.188). 4. On the primary site, add the heartbeat object for the cluster. Heartbeats monitor the health of remote clusters. VCS can communicate with the remote cluster only after you set up the heartbeat resource on both clusters. In this example, the heartbeat method is ICMP ping. # hahb -add Icmp 5. Define the following attributes for the heartbeat resource: ClusterList lists the remote cluster. Arguments enables you to define the virtual UP address for the remote cluster. For example: # hahb -modify Icmp ClusterList rac_cluster2 # hahb -modify Icmp Arguments 10.190.99.199 -clus rac_cluster2 6. Save the configuration and change the access to read #hconf -dump -makero ================================================================================ GCO - pings ping of remote cluster service IP is via regular ping - no special port symm ping is by "symrdf -sid SID ping" Use 'hadb -display" to list the info for that system (SID, Cluster IP) ================================================================================ Turn off secure user in VCS -stop cluster -remove "SecureCluster = 1" in main.cf -rm %VRTSconfdir%/.secure -start cluster -open conf -hauser -add admin -close conf Files that help with knowing about secure -VRTSatlocal.conf -vssconfig.log -vxato.log -VRTSat_broker.txt ================================================================================