Article jul2007.tar

Questions and Answers

Amy Rich

Q I have a Sun X4200 on which I'm trying to install Solaris 10. Previously, all of my Sun experience has been with the SPARC line, and this is my first foray into Sun's x64 market. With the SPARCs, I could connect a serial cable to the console port and hook it up to a console server.

I'm having some difficulty doing this with the X4200. No output appears on the serial console using the exact same kind of cable I'd use for the SPARC. Since this is PC hardware, I'm sure there's something obvious I'm missing, so I hope you have an easy solution.

A Instead of using the serial console, you can use the networked portion of the Integrated Lights Out Manager Service Processor. If you choose to stick with the serial console, though, here's how to get it working.

To begin, make sure you're using a correctly made serial cable:

http://docs.sun.com/source/819-1157-18/appd_pinouts.html

Connect that cable from the RJ-45 SERIAL MGT port on the server's back panel to your laptop's terminal emulation software or your terminal/terminal concentrator.

Configure your laptop's terminal emulation software or your terminal/terminal concentrator with the following specifications:

  • 8N1: eight data bits, no parity, one stop bit
  • 9600 baud
  • Disable hardware flow control (CTS/RTS)
  • Disable software flow control (XON/XOFF)

Once you have your terminal device configured, press the enter key to establish a connection between it and the ILOM SP. You'll see a login prompt that looks like:

SUNSPXXXXXXXXXXXX login:

The string SUNSP is the same for all SPs, and the XXXXXXXXXXXX is the unique MAC address of the SP.

Log in to the ILOM with the username root and the password changeme.

You'll see the SP default command prompt of ->, from which you can run various ILOM commands. You can now run CLI commands to configure user accounts, network settings, access lists, alerts, and more. Be sure that one of the first things you do is change the default password:

set /SP/users/root password=newpassword

To start the serial console, type:

cd /SP/console
start

You can switch back to the SP CLI from the serial console by pressing the esc key. You can log out of the ILOM by typing exit. For a full list of ILOM commands, take a look at the ILOM Administration Guide:

http://docs.sun.com/source/819-1160-13/

Q We have a pretty geriatric Netra T1 200 running Solaris 8 that's been serving faithfully as a firewall for years. Recently, it has started losing its brains and the time/date is getting stuck in a 2-second loop even though we're running NTP. This is the output from the date command, run once every second:

Sat Dec 30 19:00:00 EST 1967
Sat Dec 30 19:00:01 EST 1967
Sat Dec 30 19:00:02 EST 1967
Sat Dec 30 19:00:00 EST 1967
Sat Dec 30 19:00:01 EST 1967

Note that it appears to be before the epoch, even, which seems very strange. A few minutes later, it jumped to a date from a couple days ago, but still in the two-second loop:

Sun Feb 25 08:27:15 EST 2007
Sun Feb 25 08:27:16 EST 2007
Sun Feb 25 08:27:17 EST 2007
Sun Feb 25 08:27:15 EST 2007
Sun Feb 25 08:27:16 EST 2007

Of course, all of the logs timestamps are screwy, and I don't know if I can trust them. I think that the issue started happening on February 25th, but I can't be sure. Our backups started reporting weird things on the 26th, and the last sane login date I can see was from the 24th. We additionally have some filter logs from the 25th.

At first I suspected that we got hacked and someone corrupted one of the libraries or binaries, but I took our md5 database and compared it against what's on the system, and everything looks ok as far as I can tell. Nothing other than the date (and things relying on the date, of course) seems to be wrong, but the date loop is wreaking havoc with the applications.

I've tried killing and restarting NTP, making sure that the box was connected to a good peer. I also tried rebooting the box, setting the date by hand, and doing a full power cycle. Sometimes it seems to help for a bit, but the date always goes wonky again in a week or so.

Here's the machine's info in the hope that you can help:

# uname -a
SunOS myhost 5.8 Generic_108528-12 sun4u sparc SUNW,UltraAX-i2

# /usr/platform/sun4u/sbin/prtdiag -v
System Configuration:  Sun Microsystems  sun4u Netra T1 200 \
  (UltraSPARC-IIe 500MHz)
System clock frequency: 100 MHz
Memory size: 2048 Megabytes

========================= CPUs =========================

                     Run   Ecache   CPU    CPU
Brd  CPU   Module   MHz     MB    Impl.   Mask
---  ---  -------  -----  ------  ------  ----
 0     0     0      500     0.2   13       1.4
 
========================= IO Cards =========================
      
     Bus   Freq
Brd  Type  MHz   Slot  Name                            Model
---  ----  ----  ----  ------------------------------  --------------
 0   PCI    66     5   network-pci108e,1101            SUNW,pci-eri
 0   PCI    66     5   usb-pci108e,1103.1
 0   PCI    66     8   scsi-glm/disk (block)           Symbios,53C896
 0   PCI    66     8   scsi-glm/disk (block)           Symbios,53C896
 0   PCI    66    12   network-pci108e,1101            SUNW,pci-eri
 0   PCI    66    12   usb-pci108e,1103.1           
 0   PCI    66    13   ide-pci10b9,5229/disk (block)

No failures found in System
===========================

========================= HW Revisions =========================

ASIC Revisions:
---------------
Cheerio: ebus Rev 1

System PROM revisions:
----------------------
CORE 1.0.4 2001/03/26 10:40

A I've seen this exact issue when the real time clock has died or is in the process of dying. To verify that this is your problem, drop to the OBP ok prompt and run watch-clock when you're experiencing the issue. It should tick once per second if the clock is functioning normally. If your RTC is broken, you can replace the motherboard to restore functionality. Your best bet is probably just to buy a new machine since used T1s are extremely cheap, or take this opportunity to spend a bit more and upgrade to something modern.

Q We've just patched a number of Solaris 10 machines of various SPARC architectures that all failed in the same way. Our audit turned up the fact that SMC was no longer running after one of the patch runs:

  # svcs -xv
  svc:/application/management/snmpdx:default (Sun Solstice \
     Enterprise Master Agent)
   State: offline since Wed Mar 14 5:01:22 2007
  Reason: Dependency svc:/application/management/seaport is absent.
     See: http://sun.com/msg/SMF-8000-E2
     See: man -M /usr/share/man/ -s 1M snmpdx
  Impact: This service is not running.

  # svcs -l svc:/application/management/seaport:default
  fmri         svc:/application/management/seaport:default
  name         net-snmp SNMP daemon
  enabled      true
  state        online
  next_state   none
  state_time   Wed Mar 14 5:01:15 2006
  alt_logfile  /etc/svc/volatile/ \
               application-management-seaport:default.log
  restarter    svc:/system/svc/restarter:default
  dependency   require_all/restart svc:/milestone/network (online)

When looking through the /etc/svc/volatile/application-management-seaport:default.log file, I see the following errors:

[ Wed Mar 14 5:01:06 Enabled. ]
[ Wed Mar 14 5:01:11 Executing start method \
  ("/usr/lib/sma_snmp/setseaport") ]  \
  chown: /etc/snmp/conf/snmpdx.reg: Read-only file system
[ Wed Mar 14 5:01:11 Method "start" exited with status 1 ]
[ Wed Mar 14 5:01:12 Executing start method \
  ("/usr/lib/sma_snmp/setseaport") ]
[ Wed Mar 14 5:01:15 Method "start" exited with status 0 ]

The permissions on /etc/snmp/conf/snmpdx.reg do include owner write, though:

ls -l /etc/snmp/conf/snmpdx.reg
-rw-r--r-- 1 root sys 1161 Mar 14 04:19 /etc/snmp/conf/snmpdx.reg

Google doesn't turn up anything useful, so I was hoping you might be able to shed some light on why /usr/lib/sma_snmp/seaport seems to be failing.

A It appears that you've installed patch 120272-05, which introduced bug id 6441943 and exhibits the same behavior you indicate. See:

http://sunsolve.sun.com/search/document.do?assetkey=urn:cds:docid:1-1-6441943-1

To correct the problem, install a newer revision of the patch from SunSolve. Currently the latest version is 120272-06:

http://sunsolve.sun.com/search/document.do?assetkey=1-21-120272-06-1

Q Recently, while installing a package from blastwave.org, some package (I'm not sure which, since it was doing a recursive dependency install of a number of packages) wiped out my symlink that points /opt/csw at /usr/local. I've been using the Solaris package repository at Blastwave for a couple years and have never run into this problem before, so I'm assuming that this is one bad apple among the bunch. How do I find out which package had the problem, and how do I fix it?

A Your symlink was removed because one of the packages you installed had an explicit directory entry for /opt/csw in the pkgmap file. This is generally a sign of poor packaging since most packages should use a relocatable BASEDIR root. However, the user guide at Blastwave (http://www.blastwave.org/userguide/) explains that this is a common issue with submitted packages and lists two suggestions if you don't wish to keep your packages local to the /opt/csw file system. Either mount your actual package file system as a loopback mount on /opt/csw or set the environment variable PKG_NONABI_SYMLINKS to true for the user who will be installing packages.

The Sun documentation on troubleshooting software package problems (http://docs.sun.com/app/docs/doc/817-0403/6mg741cag?a=view) includes the following information about the PKG_NONABI_SYMLINKS variable:

    In previous Solaris releases, there was no way to specify a symbolic link target in the pkgmap file when creating a software package. This meant a package or patch-related symbolic link was always followed to the source of the symbolic link rather than to the target of the symbolic link when a package was added with the pkgadd command. This created problems when upgrading a package or a patch package that needed to change a symbolic link target destination to something else.

    Now, the default behavior is that if a package needs to change the target of a symbolic link to something else, the target of the symbolic link and not the source of the symbolic link is inspected by the pkgadd command.

    Unfortunately, this means that some packages may or may not conform to the new pkgadd behavior. The PKG_NONABI_SYMLINKS environment variable might help you transition between the old and new pkgadd symbolic link behaviors. If this environment variable is set to true, pkgadd follows the source of the symbolic link. Setting this variable enables a non-conforming package to revert to the old behavior if set by the administrator before adding a package with the pkgadd command.

If you haven't done a pkgrm and want to identify the package(s) that created the /opt/csw directory, you can look for the directory entry in /var/sadm/install/contents with a simple grep:

grep "/opt/csw d" /var/sadm/install/contents

If you've already removed the package, you can grep for the same string in each of the package stream files, or, if you've unpacked the stream, you can look in each package's pkgmap file.

When you've identified the package that removed the symlink, you may want to contact the package maintainer.

Q On our internal networks, we're still running telnetd (yes, I know it's horribly unsecure, but it's a management edict) on Solaris 8. When trying to connect to one of our shell servers today, I received the following error:

telnetd: could not grant slave pty

What's a slave pty, and why is the server having trouble granting them? How do I go about fixing this?

A A pty or pts is a remote (pseudo) terminal as opposed to a directly connected one. Prior to Solaris 8, the default maximum number of ptys was 48 and could be adjusted by setting the variable pt_cnt variable in the file /etc/system and performing a reconfiguration reboot with reboot -- -r.

In Solaris 8 and above, the number of ptys is dynamically controlled by the syseventd and devfsadmd daemons started from /etc/rcS.d/S50devfsadm. If you look through the ps output and do not see /usr/lib/sysevent/syseventd, then you should start it using the init script. You can also run /usr/sbin/devfsadm by hand to generate additional pty files. Count the number of links in /dev/pts to see how many you have before and after: ls /dev/pts|wc -l.

Amy Rich has more than a decade of Unix systems administration experience in various types of environments. Her current roles include that of Senior Systems Administrator for the University Systems Group at Tufts University, Unix systems administration consultant, author, and charter member of LOPSA. She can be reached at: qna@oceanwave.com.