This is not meant to be exhaustive, but a highlight of unix troubleshooting with some tips. You will find much more data in the Unix Doc (pdf). See sections on Admin, OS commands, and redirection.
If you are unsure about a command - call or email a systems administrator. Of course you may also try the command on a test node first.
We have a wide range of Sun hardware using Sparc processors. Most nodes are multiprocessor. To list number and speed of processors: psrinfo -v.
Our nodes are reached via terminal consoles made by Perle. Only one user can be on a console at a time. There are logged into via telnet with a terminal type of vt100. Messages written to the console are lost if the console is not active (logged into), but we do not normally leave them logged in because we want to access consoles from anywhere.
If a node crashes, and does not reboot, we want to catch register values on the console (/notes/crashed_system).
Ops can now do a normal reboot of a node when needed by permission of the Unix Oncall Admin using sudo. Steps as follows.
NOTE: init states are as follows:
If a node is hung. Work closely with the Unix SA oncall in this situation.
MEMORY
Sendmail logs its information to syslog.
The syslog file (/var/log/syslog) rolls over depending on size/time-period and have a numeric extension - higher the number, older the file. An "ls -l /var/log/syslog*" shows the date they were last modified (added to).
We have an internal DNS and an external DNS. Internal is done by XXX, external by XXX for the DMZ and our ISP (XXX) slaves our domain.
(For Bourne (sh) and Korn (ksh) only. C-Shell (csh) - use the & and it will alr
eady disconnect from startup terminal.)
cmdprompt> nohup someprocess &
DNS
Filesystem and Files
Hardware
Mail
Performance
Printers
Terminals & Devices
Users and Processes
Software Items
Terminals & Devices
Talk with another user in via an xterm.
talk user
talk user@host
See who is logged in and what tty.
who -lu
logins -a
Who is an active login, who is idle.
finger -i
Print an error message via buffer
fold | lp
STUFF TEXT
^D
Save an error message via buffer
cat > somefile
STUFF TEXT
^D
Share the output with another user.
CMD | tee /dev/pts/THEIR_TTY
CMD > /dev/pts/THEIR_TTY
Share the output with another user - Continuous.
script | tee /dev/pts/THEIR_TTY
Filesystems and Files
To see the permissions of a directory (without lising its parent).
ls -ld /some/dir
To see local mounted filesystems.
df -kl
To see where a directory is really mounted from.
Local will start with /dev, NFS will be node:, memory will be text and /proc
df -k /some/path
cd /some/path; df -k .
To view the current filesystem size by directory.
du -sk * | sort -nr | head
Some filesystem fills up, but du -sk does not help.
find . -mtime -1 #find files modified with a day
find . -mtime -1 -ls
OR
find . -size +500 #find files over 500 blocks in size
find . -size +500 -ls
A place to look if root fills up, but nothing else looks suspect.
find /dev -type f -ls # look for large file (bogus tar device/filename)
Normal OS logging dirs.
/var/adm
How to compress old logs, or files in a directory, that needs emergency space.
Find files in current directory and below (recursive) that are older than 3 days and then compress them. You must have write privs to the directory and file in order to compress it.
find . -type f -mtime +3 exec compress {} \;
Hardware
To see the hardware type and serial number.
/NFS-SHARE/admin/bin/suntype -v
To see the hardware layout and hardware errors (in addition to /var/adm/messages).
/usr/platform/sun4u/sbin/prtdiag -v
Messages and Errorlogs files.
/var/adm/messages
/var/log/syslog
Performance
How long a system has been up, and basic loading.
uptime
who -b #no loading stats
Memory usage and processor usage.
vmstat 2
vmstat -S 2
vmstat -s
Disk I/O usage.
iostat -xtnc 2
iostat -xnp 2
iostat -DMnpx 2
iostat -Een #disk types *** great command - remember ***
CPU usage.
mpstat 2
System usage - historical.
There are many options - do a 'man sar'. Similar to Patrol.
sar -A # list all available information
See process usage.
/NFS-SHARE/admin/bin/sps cpu #by percent cpu
/NFS-SHARE/admin/bin/sps rss #by percent memory
See swap usage.
swap -l
swap -s
User & Processes
Files with user information.
/etc/password
/etc/shadow
/etc/group
/etc/aliases
Commands to look at NIS information. NIS is on the internal network only - not on the DMZ
/usr/bin/ypmatch username passwd
/usr/bin/ypmatch username group
/usr/bin/ypcat passwd
/usr/bin/ypcat -k aliases
/usr/lib/sendmail -bv username
See user info.
finger username
See groups currently active.
groups
Processes
/usr/proc/bin/* (do a man on each one)
truss -aefp PID
/usr/openwin/bin/sdtprocess
ps -cafel OR /opt/admin/bin/sps (sort by CPU)
prstat -a (top like output in newer Solaris)
/usr/bin/prtmem
/usr/bin/memtool
/usr/proc/bin/pmap -x PID
/usr/openwin/bin/sdtprocess
ps -cafel OR /opt/admin/bin/sps (sort by RSS or SZ)
Email
Another reference file to look at is /notes/sendmail.faq.
Sendmail processes
The count should be greater that 1. We like to see it around 30. Run this on mailserver or the backup mailhost.
mailserver: ps -ef | grep -c sendmail
See number of messages in queue.
Each mail messages takes approixmately 3 lines, so to get a count of jobs in the queue:
mailserver: ls /var/spool/mqueue/qf* | wc -l
Find a users activity.
mailserver: grep USERNAME syslog
Watch for a users activity.
mailserver: tail -f syslog | grep USERNAME # ^C when your are done
See OS errors (and some other mail messages not is syslog).
mailserver: tail -100 /var/adm/messages
Send a test email (you can have the "tail -f syslog" running too)
/usr/ucb/mail -v someuser@some.domain
>Subject: test
>testing
>.
Show aliases for username
mailserver: /usr/lib/sendmail -bv USERNAME
See what version of sendmail you are running (we hide the version on mailserver):
mconnect mailserver
^D exits
OR
telnet mailserver 25
^]
View stats - (man mailstats to learn about colums/entries)
mailserver: /bin/mailstats
OR
mailserver: cd /var/log/mailstat
then look for the date you want (mailstats saved to a file)
DNS & Names
Nslookup: to look up a hosts IP address
nslookup hostname
OR
nslookup
> hostname
> ^D
Nslookup: to look up a domain's address
nslookup
> ls mydomain.com > /tmp/myfile
> ^D
more /tmp/myfile
Printers
print out known queues
lpstat -a
print out outstanding print jobs
lpstat -o [printername]
To find out the printer name a queue uses on a specific host
lpstat -a
find the queue name you want
IF you are using Sun/Unix printer software:
cd /etc/lp/printers;ls
IF you are using HP printer software:
grep dest= queue_name/configuration
grep dest= myqueue/configuration
#Use the dest=VALUE for printer DNS name
Options: protocol=bsd,dest=myqueue
cd /etc/lp/interfaces;ls
The output of the grep will give the DNS name of the printer
grep PERIPH= queuename
grep PERIPH= project
#Use the dest=VALUE for printer DNS name
PERIPH=project
Sudo commands for printer management, use with direction of Unix admins
/NFS-SHARE/admin/bin/sudo cancel [printer, request-id, and/or user]
/NFS-SHARE/admin/bin/sudo enable queuename
/NFS-SHARE/admin/bin/sudo disable queuename
/NFS-SHARE/admin/bin/sudo lpmove request-id queuename
/NFS-SHARE/admin/bin/sudo lpmove queuename-orig queuename-new
Software
To use global bins with global Libs: add to your environment: LM_LICENS E_FILE=/opt/software/workshop/licenses_combined;export LM_LICENSE_FILE
You may, or may not, want to change your LD_LIBRARY_PATH.
Sh & Ksh Info
How to leave a job running when you logout or you loose your window connection:
PC & Unix Interoperability Tools
cygwin tools (unix onWin)
VMware - virtual machines on Windows
VNC - display Windows on Unix