Questions and Answers

Amy Rich

Q I have a JumpStart server and a bunch of clients running Solaris 9. Recently we expanded our JumpStart network from 192.168.1.0/24 to 192.168.0.0/23 because we were running out of addresses. When we made this change, JumpStart stopped working all together. When I try to jump hosts off of the new .0 network, I receive the following error (192.168.1.2 is our JumpStart server):

TFTP: Could not send to 192.168.1.2:36547 (Host is unreachable)

A snoop of the network shows the following interaction (192.168.0.10 is the client) during boot net - install or boot net -s:

OLD-BROADCAST -> (broadcast)  RARP C Who is 0:3:ba:xx:xx:xx ? 
192.168.1.252 -> 192.168.0.10 RARP R 0:3:ba:xx:xx:xx is \
  192.168.0.10,  192.168.0.10 
192.168.0.10 -> BROADCAST    TFTP Read "C0A8000A" (octet) 
192.168.1.252 -> 192.168.0.10 TFTP Data block 1 (512 bytes)

It's at this point that the TFTP error I mentioned above is echoed on the client machine.

Just in case it's something wacky on the server side, here are the entries in the pertinent files:

 
/etc/bootparams: 
client.my.domain root=jsserver.my.domain:/inst/cdrom/SunOS-5.9-sparc/ \
  Solaris_9/Tools/Boot install=jsserver.my.domain:/inst/cdrom/ \ 
  SunOS-5.9-sparc boottype=:in install_config=jsserver.my.domain:/ \
  inst/jumpstart rootopts=:rsize=32768  term=:vt100 tz=:US/ \
  Eastern jnet=192.168.0.0;255.255.254.0  keyboard=:us \ 
  display=:NONE monitor=:NONE pointer=:NONE \
  sysid_config=jsserver.my.domain:/inst/jumpstart/sysidcfg/ \
  SunOS-5.9 ns=:NONE 
    
/etc/ethers: 
0:3:ba:xx:xx:xx client.my.domain 

/etc/hosts: 
192.168.0.10        client.my.domain 
192.168.1.2         jsserver.my.domain 

/etc/netmasks: 
192.168.0.0     255.255.254.0

And ifconfig for the JumpStart interface on the server shows:

bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4 
inet 192.168.1.2 netmask fffffe00 broadcast 192.168.1.255

As far as I can tell, this should work, but if you see any errors in my configuration, I'd be grateful if you could point them out.

A You don't specify your exact hardware, but there was a known issue with patch 119234-01 (bug ID 6317169), which upgraded the firmware on the Sun V210/V240 machines to OBP 4.17.1. Apparently this also broke JumpStarting across a supernet such as your /23. If you have this patch installed, try rolling back to OBP 4.16.4. I've also run into a situation where the patch was not installed (but the machine was still at a higher firmware/OBP level). Upgrading the OBP to 4.22.11 with patch 121683-01 might also work for you, but I don't believe the problem has actually been fixed yet with that version.

Your other option is to move away from using bootparams/RARP all together and start using DHCP to jump your clients. You can use the Sun DHCP server or a third-party DHCP server, like ISC's, to store your client information.

Q I just upgraded my laptop to Debian Unstable with the 2.6.17.1 kernel. This box sits on an internal private network and connects to the external network via an Apple Airport (second generation) and then a Solaris 8 machine running ipfilter 3.4.20. I'm seeing very very odd behavior on this laptop when it tries to connect to outside machines. I can connect to, say, www.google.com, but not www.yahoo.com. I also can't ssh out to other machines I have accounts on from that laptop, but all other internal machines talk to the Internet just fine.

First, here is the ipfilter configuration file:

#!/sbin/ipf -f - 

# internal interface: hme0 192.168.1.1 
# external interface: hme1 publicIP 

# Block short packets fragmented too short to be real 
block in log quick all with short 

# By default, block and log everything.  

block in log on hme1 all 
block out log on hme1 all 
block out log on hme0 all 
block in log on hme0 all 

# Block loopback addressed packets going in/out of network interfaces 
# that aren't on the loopback interface 
block in log quick on hme1 from 127.0.0.0/8 to any 
block in log quick on hme1 from any to 127.0.0.0/8 
block in log quick on hme0 from 127.0.0.0/8 to any 
block in log quick on hme0 from any to 127.0.0.0/8 

# Allow packets to transverse the loopback interface 
pass in      quick on lo0 all 
pass out     quick on lo0 all 

# Deny reserved addresses but don't log them. 
block in quick on hme1 from 10.0.0.0/8 to any 
block in quick on hme1 from 172.16.0.0/12 to any 
block in quick on hme1 from 192.168.0.0/16 to any 

# Allow any other internal traffic 
pass in quick on hme0 from 10.1.1.0/24 to 10.1.1.0/24 
pass out quick on hme0 from 10.1.1.0/24 to 10.1.1.0/24 
pass out quick on hme1 from any to 192.168.100.1/24 

# Allow outgoing DNS requests from .2 and .3 
pass in quick on hme0 proto tcp/udp from 10.1.1.2 to any port = domain keep state 
pass in quick on hme0 proto tcp/udp from 10.1.1.3 to any port = domain keep state 
pass out quick on hme1 proto tcp/udp from publicIP to any port = domain keep state 
pass out quick on hme1 proto tcp/udp from 0/32 to any port = domain keep state 

# Allow NTP from any internal hosts to any external NTP server. 
pass in quick on hme0 proto tcp/udp from 10.1.1.0/24 to any port = 123 keep state 
pass out quick on hme1 proto tcp/udp from any to any port = 123 keep state 

# Allow incoming mail 
pass in quick on hme1 proto tcp from any to publicIP port = smtp keep state 
pass in quick on hme1 proto tcp from any to 0/32 port = smtp keep state 
pass out quick on hme1 proto tcp from 10.1.1.0/24 to any port = smtp keep state 

# Outgoing connections: SSH, WWW, NNTP, mail, whois 
pass in quick on hme0 proto tcp from 10.1.1.0/24 to any port = 22 keep state 
pass out quick on hme1 proto tcp from 10.1.1.0/24 to any port = 22 keep state 

pass in quick on hme0 proto tcp from 10.1.1.0/24 to any port = 80 keep state
pass out quick on hme1 proto tcp from 10.1.1.0/24 to any port = 80 keep state 
pass in quick on hme0 proto tcp from 10.1.1.0/24 to any port = 443 keep state 
pass out quick on hme1 proto tcp from 10.1.1.0/24 to any port = 443 keep state 

pass in quick on hme0 proto tcp from 10.1.1.0/24 to any port = nntp keep state 
block in quick on hme1 proto tcp from any to any port = nntp keep state 
pass out quick on hme1 proto tcp from 10.1.1.0/24 to any port = nntp keep state 

pass in quick on hme0 proto tcp from 10.1.1.0/24 to any port = smtp keep state 
pass in quick on hme0 proto tcp from 10.1.1.0/24 to any port = whois keep state 
pass out quick on hme1 proto tcp from any to any port = whois keep state 

# Allow ssh from offsite 
pass in quick on hme1 proto tcp from any to publicIP port = 22 keep state 
pass in quick on hme1 proto tcp from any to 0/32 port = 22 keep state 

# Allow ping out 
pass in quick on hme0 proto icmp all keep state 
pass out quick on hme1 proto icmp all keep state 

# allow auth out 
pass out quick on hme1 proto tcp from publicIP to any port = 113 keep state 
pass out quick on hme1 proto tcp from publicIP port = 113 to any keep state 
pass out quick on hme1 proto tcp from 0/32 to any port = 113 keep state 
pass out quick on hme1 proto tcp from 0/32 port = 113 to any keep state 

# ftp 
pass out quick on hme1 proto tcp from 10.1.1.0/24 to any port = ftp keep state 
pass out quick on hme0 proto tcp from any port = ftp-data to 10.1.1.0/24 \
  port > 1024 keep state 
pass in quick on hme1 proto tcp from 10.1.1.0/24 port = ftp-data to any \
  port > 1023 keep state 
pass in quick on hme0 proto tcp from 10.1.1.0/24 to any port = ftp keep state 
pass in quick on hme0 proto tcp from 10.1.1.0/24 port > 1023 to any \
  port > 1023 keep state 
pass out quick on hme1 proto tcp from 10.1.1.0/24 port > 1023 to any \
  port > 1023 keep state 

# return rst for incoming auth 
block return-rst in quick on hme1 proto tcp from any to any port = 113 flags S/SA 

# Log these: 
block return-rst in log on hme1 proto tcp from any to any flags S/SA 

# * return ICMP error packets for invalid UDP packets 
block return-icmp(net-unr) in proto udp all

Here's some information I've collected while trying to diagnose the problem:

1. Any connections from the Debian laptop that originate above port 1023 match the following rules, and a connection is successfully negotiated:

pass in quick on hme0 proto tcp from 10.1.1.0/24 port > 1023 \
  to any port > 1023 keep state 
pass out quick on hme1 proto tcp from 10.1.1.0/24 port > 1023 \
  to any port > 1023 keep state

2. Any connections originating below port 1024 start the connection negotiation, but the packets back to the gateway machine are blocked. Here's an ipf log file entry (there are several for each connection, as the remote end keeps trying to ack the packet) illustrating this for ssh:

Sep 10 20:39:59 gateway.my.domain ipmon[15972]: \ 
     [ID 702911 local0.warning] 20:39:58.894977 2x hme1 @0:2 b \ 
     externalIP[externalIP],22 -> 10.1.1.65,41300 PR tcp len 20 \ 
     660 -AP IN

Checking ipfstat -hio shows that the following rules were hit:

8 pass out quick on hme0 from 10.1.1.0/24 to 10.1.1.0/24 
4 block in log on hme1 from any to any 
15 block in log on hme0 from any to any 
15 pass in quick on hme0 from 10.1.1.0/24 to 10.1.1.0/24 
1 pass in quick on hme0 proto tcp from 10.1.1.0/24 to any \
  port = 22 keep state

There's obviously something odd going on between the Debian laptop and the ipfilter firewall where state is being ignored, but I can't figure out what. Do you have any suggestions for places to look or more debugging information I can provide?

A Versions of the Linux kernel starting with 2.6.17 have seen issues with using TCP window scaling when some older firewall/NAT software (Cisco, ipfilter, etc.) enters the mix because it increased the TCP scaling factor if the machine had more memory available. See the following URL at kerneltrap.org for a discussion of the issue:

http://kerneltrap.org/node/6723

In short, if the other end of the connection supports TCP window scaling (say, www.yahoo.com) with a larger scaling factor, you'll see the packet blocked because ipfilter thinks that the packet is bigger than the receiving end will accept. If TCP window scaling is not negotiated or is offered at a smaller scaling factor (say www.google.com), you won't see this issue. This makes it very hard to diagnose because it appears that some sites arbitrarily fail while others work.

Turning off TCP window scaling will fix the problem but will result in slower throughput on high-latency and high-bandwidth connections. For more general information on TCP window scaling, see RFC 1323:

http://www.faqs.org/rfcs/rfc1323.html

In Linux, if tcp_window_scaling is on, the following command will return a 1 (0 if it's turned off):

cat /proc/sys/net/ipv4/tcp_window_scaling

You can turn off TCP window scaling temporarily by replacing the 1 in that file with a 0. You can permanently turn it off by adding the following to /etc/sysctl.conf:

net.ipv4.tcp_window_scaling=0

The better option is to upgrade your version of ipfilter to the 4.x branch, currently 4.1.13, which is available from:

http://coombs.anu.edu.au/~avalon/ip_fil4.1.13.tar.gz

Q I've been trying to debug an issue with a new Sun V120. The box is attached to a console server, and I don't have physical access. When I try to send it a break, it ignores it. (I can't get to the OS, so I can't do an init 0 from there.) I've tried dropping to the LOM and running break from there as well with no effect. If I do a poweroff and then a poweron from the LOM, it just tries to boot from the diag device (the network). How can I get the machine to break out of its boot loop and drop to the ok prompt?

A For Sun V120 machines (and others with the necessary LOM/ALOM support) which are unable to boot and where you can't break out to the ok prompt, you can drop to the LOM prompt and run the following commands:

bootmode forth 
reset

When it finishes the reset, it stops processing and enters the Forth interpreter before trying to boot the machine. This is akin to pressing L1-F on a physically attached keyboard. At this point, you can run the following at the ok prompt to turn off auto-booting:

setenv auto-boot? false

Now drop back to the LOM prompt and run the following to return to the normal boot mode:

bootmode normal 
reset

With auto-boot? disabled, you can now choose to boot off the network, CDROM, other disks, etc. Eventually, when you fix/debug the issue, set the auto-boot? variable back to true.

Amy Rich has more than a decade of Unix systems administration experience in various types of environments. Her current roles include that of Senior Systems Administrator for the University Systems Group at Tufts University, Unix systems administration consultant, author, and charter member of LOPSA. She can be reached at: qna@oceanwave.com.