Article

jan2007.tar

Questions and Answers

Amy Rich

Q I installed Solaris 10 on a Sun box that I had lying around in order to investigate using ZFS. I chose to give ZFS entire disk devices from an external disk pack instead of giving it individual disk slices from each disk. After I was done playing, I wanted to re-install Solaris 9 and slice up the disks, but it doesn't appear to work anymore. Apparently ZFS deleted the disk label and I can no longer slice it up for Solaris 9. Here's the command I ran to create the pool:

zpool create -f testpool mirror c2t0d0 c2t1d0

Now when I run format and try to partition the disk, it tells me:

Current partition table (unnamed): 
Total disk sectors available: 143358320 + 16384 (reserved sectors)

Part      Tag    Flag     First Sector         Size         Last Sector 
  0        usr    wm                34       68.36GB          143358320 
  1 unassigned    wm                 0           0               0   
  2 unassigned    wm                 0           0               0   
  3 unassigned    wm                 0           0               0   
  4 unassigned    wm                 0           0               0   
  5 unassigned    wm                 0           0               0   
  6 unassigned    wm                 0           0               0   
  8   reserved    wm         143358321        8.00MB          143374704

partition> 8
`8' is not expected.

Note that there is no slice 7 anymore. If I try to partition slice 0 to start at sector 0, I get the following error:

partition> 0 
Part      Tag    Flag     First Sector         Size         Last Sector
  0        usr    wm                34       68.36GB          143358320

Enter partition id tag[usr]: root 
Enter partition permission flags[wm]:
Enter new starting Sector[34]: 0
`0' is out of range.

Do I need to buy a new disk now that I've modified this one, or is there some way to fix this?

A ZFS installed what's known as an EFI (extensible firmware interface) label on your disk, replacing the previous SMI label. See the following for more information on EFI labels and what they're used for:

http://docs.sun.com/app/docs/doc/817-5093/6mkisoq1k?a=view

To begin, make sure you've removed your zpool with zpool destroy testpool if you're still running under Solaris 10. To rewrite the SMI label again, run the following command:

format -e

Choose the disk you want to modify and reset the partition type to SMI:

format> partition
partition> label 
[0] SMI Label 
[1] EFI Label 
Specify Label type[1]: 0 
Warning: This disk has an EFI label. Changing to SMI label will erase  
all current partitions. 
Continue? y 
Auto configuration via format.dat[no]? 
Auto configuration via generic SCSI-2[no]? yes

When you look at the partition table now, you'll see that it lists the SMI partitions 0 through 7 again. You can go back to slicing your disk as you did previously.

Q I just upgraded a number of our departmental Powerbooks to Mac OS X 10.4.7. Since the upgrade, users are complaining that the Software Updater can't unpack software updates correctly. In the log file of one such machine, I see the following errors:

Aug 28 06:13:58 mymac Software Update[27884]: JavaScript error
"Undefined value" while running "__choice_su_visible"
Aug 28 06:13:58 mymac Software Update[27884]: __choice_su_visible 
returned error: Undefined value

If I download the update and apply it manually, it works just fine.

Google is unhelpful about how to correct this other than "do an archive install instead of an upgrade", which is completely unacceptable. Do you have any suggestions on how I might fix this issue?

A Whenever there are issues with an OS X upgrade or patch, I always suggest running the Disk Utility application to try to "repair" permissions (this essentially does a chmod and chown on various files and directories) on the boot volume. If that doesn't work, another step that commonly fixes issues is to reapply the Updater that broke the behavior (in your case, presumably the 10.4.7 Updater). I've successfully fixed the error you describe with those two steps.

Q I have a Solaris 8 machine that I'm trying to bring to the ok prompt, but every time I try, it asks me for a password. I've tried the root password, but that doesn't work. I'm stumped as to what this password might be, so how do I get around it?

A You don't mention what kind of hardware or exactly what the login message says, so there are a few possible scenarios. You could have a firmware password set; you could have an LOM, ILOM, or ALOM password set; or you could have an RSC password set. Each of these requires a different method to circumvent.

Let's take the firmware password first, since it's probably the easiest to deal with. Execute the following command as root from the running system:

eeprom security-mode

If it does not say:

security-mode=none

then you have an eeprom password set, and you can either turn down the security-mode to none or you can reset the password. To turn off firmware security entirely, run:

eeprom security-mode=none

To instead change the password (security-mode must be set to command or full for this to work), run:

eeprom security-password

It will prompt you for a firmware password twice.

If the firmware password is not your problem, then you either have an ILOM/LOM/ALOM password or an RSC password, depending on the type of hardware you're using. If you have a SunSolve account, look at InfoDoc 81146 to determine which type of console management software you have:

http://sunsolve.sun.com/search/document.do?assetkey=1-9-81146

If you have RSC, then you may need to install extra software to obtain the binaries you need to reset the password. Follow the directions at:

http://www.sun.com/servers/rsc_download_readme.html

to obtain and install the correct version of the RSC software for your hardware platform and OS version. Once you have the software installed correctly, run the following command to reset the admin password:

rscadm userpassword admin

If it claims that the username does not exist (possible if you just installed the software), then you can create a user, set its password, and give it privileges:

rscadm useradd admin 
rscadm userpassword admin 
rscadm userperm admin cuar

The privileges set above are:

c - Console permission: needed to toggle between LOM and console \
    (possible only when LOM and console share port A). 
u - User accounts permission: needed to create/manage LOM user accounts. 
a - Administrative permission: needed to re/set LOM 
    config variables. 
r - Reset permission: needed to reset/power-cycle LOM.

If your hardware has ALOM, then the software you need should already be installed as part of SUNWkvm. To reset the password, run the following command:

/usr/platform/`uname -i`/sbin/scadm userpassword admin

If you did not have the root password, you could erase all of the ALOM settings by following the instructions from:

http://www.sun.com/products-n-solutions/hardware/docs/html/ \ 
     819-3250-11/trouble_appx.html#pgfId-1003331

If your hardware has LOM, the predecessor to ALOM, then you'll need to install the LOMLite software available from the Solaris 8 Supplemental CD. After inserting it, perform the following:

cd /cdrom/cdrom0/Lights_Out_Management_2.0/Product 
pkgadd -d . SUNWlomm SUNWlomr SUNWlomu

You might then be able to bypass user security by modifying /etc/lom.conf or /platform/sun4u/kernel/drv/lom.conf to disable serial_security:

serial_security=1;

If you're working with an older machine, you might also have the option of setting a jumper on the motherboard to reset the LOM entirely.

If you're using x64 hardware with ILOM and a BIOS password, the default username is root and the default password is changeme. If it's been changed and you need to reset the password, it requires physical access to the machine to open the case and modify the physical jumper settings. Refer to your hardware guide to determine exactly which jumper you need to change. For example, this document describes the changes needed for the Sun Fire X4100 and X4200 class machines:

http://www.sun.com/products-n-solutions/hardware/docs/html/ \
     819-1157-15/power-bios.html#pgfId-1001019

Q I have a Sun V240 running Solaris 9. I'm currently using the default bge0 interface, but I want to replace it with a quad gigE ce0 card. I added the card and plumbed the interface, and everything looked good. I bought the interface up with a different IP and things still looked okay. When I unplugged bge0, though, the machine stopped responding. Here are the pertinent config files:

/etc/hosts:
  127.0.0.1 localhost
  10.1.1.1 host-bge0
  10.1.1.2 host-ce0


/etc/hostname.bge0:
  host-bge0

/etc/hostnmae.ce0:
  host-ce0

When I run ifconfig -a, it shows everything up and functioning as expected:

lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 3
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4 
        inet 10.1.1.1 netmask ffffff00 broadcast 10.1.1.1.255 
        ether xx:xx:xx:xx:xx:xx 
ce0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
        inet 10.1.1.2 netmask ffffff00 broadcast 10.1.1.255 
        ether yy:yy:yy:yy:yy:yy

So why can't I get ce0 to function when I unplug bge0?

A Your netmask indicates that you're using a class C-sized network, which means that the IP addresses for bge0 and ce0 are on the same subnet. This is an unsupported configuration unless you're using IPMP. When you bring up ce0 after bge0, packets coming out of the machine will leave the bge0 interface regardless of which IP address was used by the other machine sending said packets. This is because bge0 is listed first in the routing table and is considered authoritative for the 10.1.1.0/24 subnet. To verify this, you can look at the routing table by doing netstat -nr.

The output should look something like the following (with 10.1.1.254 assumed as the default gateway):

Routing Table: IPv4
  Destination          Gateway            Flags  Ref   Use   Interface 
-------------------- -------------------- ----- ----- ------ --------- 
10.1.1.0             10.1.1.1             U         1   4699  bge0 
10.1.1.0             10.1.1.2             U         1      0  ce0
224.0.0.0            10.1.1.1             U         1      0  bge0 
default              10.1.1.254           UG        1  31150  
127.0.0.1            127.0.0.1            UH        4    705  lo0

If you run snoop -d bge0 and snoop -d ce0 at the same time, you'll see packets coming into ce0 and responses to those packets leaving via bge0.

If you do a flash cutover instead of trying to have both devices up at the same time, ce0 should work just fine, assuming that there aren't hardware problems. Remove /etc/hostname.bge0 and reboot the machine. You may also run into an issue with the ARP cache on your switch, so give it a few minutes before deciding it didn't work (or clear the ARP cache if you have that capability).

Amy Rich has more than a decade of Unix systems administration experience in various types of environments. Her current roles include that of Senior Systems Administrator for the University Systems Group at Tufts University, Unix systems administration consultant, author, and charter member of LOPSA. She can be reached at: qna@oceanwave.com.