The first part of this post explains the steps that I took to set up an old Sun Blade 100 that I had lying around as an OpenBSD router / firewall / PPPoE client. Once set up, I experienced problems with my PPPoE connection frequently disconnecting - the second part of this post explains how I diagnosed the problem to be a kernel level PPPoE bug in OpenBSD and FreeBSD (because they share roughly the same source for kernel level PPPoE).
Table of contents:
A Linksys WRT54GL running the Tomato firmware has served my home network for a long time but I started seeing my persistent connections disconnect regularly. The disconnects could have been blamed on the modem but looking at the WRT54GL’s uptime showed that it had also reset at the time of the disconnect. I thought that a replacement was in order - the WRT54GL was almost 5 years old at this point.
At around this time, I was also concerned about the security of this setup. I never had any problems with the Tomato firmware, but it bothered me that it was running such an old version of Linux and all the userland tools:
Linux unknown 2.4.20 #1 Sun Nov 29 06:53:09 PST 2009 mips GNU/Linux
As well, around this time, I had read a few posts on Hacker News about commodity routers being trivial to circumvent and gain access to. Because of this, I decided to build my own router.
Before switching to a new router, I ran a small benchmark speedtest. This is not terribly important, but it’s important to note that this test was performed over the wireless interface.
I unplugged everything, and plugged my laptop directy into the modem to check its settings:
02 days 17:21:10 (since last boot)
xDSL linestate up
(downstream: 4032 kbit/s, upstream: 800 kbit/s;
output Power Down: 19.0 dBm, Up: 11.5 dBm;
line Attenuation Down: 48.0 dB, Up: 29.0 dB;
snr Margin Down: 8.0 dB, Up: 13.0 dB)
Looks ok, but the stats are not great. 48 dB of attenuation is not good, and neither is 8 dB of SNR margin, according to this page on DSL line stats and their interpretation.
I chose to use an old Sun Blade 100 that I picked up at the Waterloo Surplus Sale for $10 a while ago as the router hardware. It doesn’t have great specifications, but a router doesn’t necessarily have to:
Processor: UltraSPARC IIe @ 500 MHz
RAM: 1.5 GB ECC
HDD: 20 GB
One neat feature of the OpenBoot firmware available on these Sun boxes is that it can perform diagnostics. I made sure to perform a diagnostic test on all the hardware before beginning this process, but unfortunately there is no equivalent to memtest86+ on SPARC systems.
I installed latest version of OpenBSD SPARC64 (5.3 at the time of this writing) from http://ftp.openbsd.org/pub/OpenBSD/5.3/sparc64/. The instructions for netbooting the Sun Blade 100 are found at the INSTALL.sparc64 readme.
Note: if using old Sun hardware, make sure your machine is compatable with the OpenBSD/Sparc64 set.
In INSTALL.sparc64, go to the Net Boot or Diskless Setup Information: section. Also helpful: http://eradman.com/posts/sparc-netboot.html
For my Debian system, the following packages were required:
apt-get install bootparamd rarpd nfs-kernel-server
Note: I already had dnsmasq installed and listening on my ethernet port eth0, handing out IP leases in the 10.0.0.0/24 range.
Here is how I set up the relevant files on fischer
:
/etc/bootparams:
sunblade100 root=10.0.0.1:/srv/tftp/ gateway=10.0.0.1:0xffffff00
/etc/hosts:
10.0.0.2 sunblade100
/etc/inetd.conf:
tftp dgram udp wait nobody /usr/sbin/tcpd /usr/sbin/in.tftpd -s /srv/tftp
/etc/exports:
/srv/tftp/ sunblade100(ro,sync,no_subtree_check)
/srv/nfs/ sunblade100(ro,sync,no_subtree_check)
/etc/ethers:
00:03:ba:18:26:9b sunblade100
In the directory /srv/tftp/
, only two files are required:
1) A file named after the hex format of the Sun box’s IP which points to the netboot file:
soft link: 0A000002 -> sparc/OpenBSD/5.3/ofwboot.net
2) the OpenBSD kernel:
soft link: bsd.rd -> sparc/OpenBSD/5.3/bsd.rd
Once this configuration is done, bring up the interface:
root@fischer:/srv/tftp# ifup eth0=eth0router
root@fischer:/srv/tftp# echo 1 > /proc/sys/net/ipv4/ip_forward
Boot the Sun Blade 100. At the ‘ok’ prompt, type:
boot net bsd.rd
And follow the installation instructions.
First step:
~/.profile: export "PKG_PATH=ftp://ftp.openbsd.org/pub/OpenBSD/5.3/packages/sparc64/"
# pkg_add -v vim
# pkg_add -v screen
One of the first things that I did was md5 all files from root - to note what files I changed, and later for intrusion / rootkit detection:
# find / -type f -print0 | xargs -0 md5 | tee rootmd5_`date +"%s"`;
The functionality that this router has to provide is as follows:
The Sun Blade 100 comes with an onboard 10/100 LAN port which I configured to be the PPPoE device (device was labelled gem0
in dmesg):
/etc/hostname.pppoe0:
-inet6
inet 0.0.0.0 255.255.255.255 NONE \
pppoedev gem0 authproto pap \
authname 'xxx' authkey 'xxx' up
dest 0.0.0.1
!/sbin/route delete default
!/sbin/route add default -ifp pppoe0 0.0.0.1
!/sbin/pfctl -e -f /etc/pf.conf
/etc/hostname.gem0:
up
My /etc/hostname.pppoe0
follows the man 4 pppoe
configuration almost exactly, with the exception of the following lines:
-inet6
!/sbin/route delete default
!/sbin/pfctl -e -f /etc/pf.conf
which perform the following additional configurations:
Reboot to reconfigure network and confirm that the connection works with ifconfig pppoe0
. It should have information filled in:
# ifconfig pppoe0
pppoe0: flags=28851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST,NOINET6> mtu 1492
priority: 0
dev: gem0 state: session
sid: 0x156c PADI retries: 3 PADR retries: 0 time: 04:55:18
sppp: phase network authproto pap authname "xxx"
groups: pppoe egress
status: active
inet mm.mm.mm.mm --> nn.nn.nn.nn netmask 0xffffffff
Ping out to 8.8.8.8 to test that your routes are set up correctly.
Check hostname resolution with
# ping google.ca
Initially, hostname resolution didn’t work for me. If this doesn’t work for you, you will have to manually add public DNS IPs to your resolv.conf file.
/etc/resolv.conf:
nameserver 8.8.8.8
The relevant interfaces here: athn0
, a PCI wireless card, and dc0
, a PCI 10/100 ethernet card.
/etc/hostname.athn0:
inet 10.0.0.1
netmask 255.255.255.0
broadcast 10.0.0.255
media autoselect
mode 11g
mediaopt hostap
nwid fieldnotes
wpakey "xxx"
chan 11
powersave
up
/etc/hostname.dc0:
inet 172.16.0.1
netmask 255.255.255.0
broadcast 172.16.0.255
up
This sets up two different subnets for my LAN. You don’t necessarily have to do this, I just thought it would be neat. One thing that is possible with separate subnets is to limit or restrict traffic between the wired and wireless interfaces. I didn’t do this.
Next, configure dhcpd, the dhcp daemon that ships with OpenBSD:
/etc/dhcpd.conf:
option domain-name "my.domain";
shared-network wired {
# dc0
subnet 172.16.0.0 netmask 255.255.255.0 {
option routers 172.16.0.1;
option domain-name-servers 172.16.0.1;
range 172.16.0.10 172.16.0.30;
}
}
shared-network wireless {
# athn0
subnet 10.0.0.0 netmask 255.255.255.0 {
option routers 10.0.0.1;
option domain-name-servers 10.0.0.1;
range 10.0.0.10 10.0.0.100;
}
}
Then, reboot. Again, check that things are working with ifconfig athn0
and ifconfig dc0
.
Connecting to the wireless or wired interfaces at this point should work - clients should receive an IP through the normal DHCP request methods.
To do this, edit pf.conf:
/etc/pf.conf:
ext_if = pppoe0
int_ifs = "{ dc0, lo0, athn0 }"
unroutable = "{ 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, \
192.168.0.0/16, 255.255.255.255/32 }"
wlan_net = "10.0.0.0/24"
lan_net = "172.16.0.0/24"
set loginterface $ext_if
set limit states 10000
set limit frags 500
# do not filter loopback interface
set skip on lo0
# scrub pppoe mss
match on $ext_if scrub (random-id max-mss 1440 reassemble tcp)
antispoof for lo0
# nat
match out on $ext_if inet from $wlan_net to any nat-to ($ext_if)
match out on $ext_if inet from $lan_net to any nat-to ($ext_if)
# rules
block return log on $ext_if all
# from pf.conf man page, examples section
block in from no-route to any
block in from urpf-failed to any
block in quick on $ext_if from any to 255.255.255.255
block in log quick on $ext_if from $unroutable to any
block out log quick on $ext_if from !($ext_if) to any
pass quick on $int_ifs all
## ICMP
pass out on $ext_if inet proto icmp from ($ext_if) to any icmp-type 8 code 0
pass in on $ext_if inet proto icmp from any to ($ext_if) icmp-type 8 code 0
## UDP
pass out on $ext_if inet proto udp from ($ext_if) to any
## TCP
pass out on $ext_if inet proto tcp from ($ext_if) to any
A couple of rule blocks here are directly from man pf.conf
page, and others are from some pf.conf example pages I found after searching:
Notes:
as explained before, I do not limit or restrict traffic between the wireless and wired LAN interfaces. I’m fairly certain that cross-subnet traffic works because this rule set will nat between subnets.
the scrub pppoe mss
line is taken from the man 4 pppoe
page
using ($ext_if)
instead of $ext_if
means that pf
will automatically handle if the external interface’s IP changes. This technically makes the line in /etc/hostname.pppoe0 which reinitializes pf redundant, but I’ve included both here.
None of this will work unless OpenBSD is instructed to actually forward packets:
sysctl net.inet.ip.forwarding=1
To make this a permanent setting, modify /etc/sysctl.conf by uncommenting the following line:
net.inet.ip.forwarding=1
Once this is done, you should be able to access the internet from clients.
One test that I performed is to nmap my IP from an external machine. This will make sure that all ports are closed and firewalled correctly. Alternatively, visiting a web site that scans your IP like Shields Up! will also tell you that your firewall is configured correctly.
The following entries must be appended to
/etc/rc.conf.local:
# added manually
sendmail_flags=NO
dhcpd_flags="athn0 dc0" # for normal use: ""
apmd_flags=NO # for normal use: ""
ifstated_flags="" # for normal use: ""
Remember to check /var/log/*
if anything goes wrong.
At this point, things were working for me. Connecting to smyslov
with fischer
, I could ping out, and DNS resolution was occurring.
From fischer
, I performed another speed test. Unfortunately, it was much slower. 70% slower:
nosuchuser@fischer:~$ ministat -w 80 speedtest_before speedtest_after
x speedtest_before
+ speedtest_after
+--------------------------------------------------------------------------------+
| + + xx |
| + + + xx |
|+ + +++ ++ + ++ xx xx x x|
| |____MA____| |__A__| |
+--------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 15 2000160 2302560 2141280 2140704 73701.317
+ 15 352716 763980 548352 564916.8 130480.22
Difference at 95.0% confidence
-1.57579e+06 +/- 79242.8
-73.6107% +/- 3.70172%
(Student's t, pooled s = 105965)
With a single wget connection, I saw a download rate of ~100 KB/s with smyslov
, when I saw ~500KB/s with the WRT54GL.
The problem was with how I had originally configured /etc/hostname.athn0. I had used the line
media OFDM54 mediaopt hostap
instead of
media autoselect
mode 11g
mediaopt hostap
The previous setting was throttling my connection. Interestingly enough, when I tested with multiple wget commands instead of one the connection seems to top out at ~430 KB/s, which is much more respectable and in line with what I saw with the WRT54GL.
smyslov’s load averages when at ~430 KB/s were also not bad to see:
0.13 0.18 0.09
I have since seen this configuration handle ~600 KB/s without problems.
I chose to install unbound so that I could have smyslov
resolve DNS queries. With OpenBSD, it was as easy as calling pkg_add:
$ pkg_add -v unbound
I configured unbound mostly following the steps at http://blather.michaelwlucas.com/archives/580, but I disabled remote control. Also, with my setup, I needed to specify ‘interface: ‘ to both work with both subnets and localhost.
Once configured, add to /etc/rc.local:
echo 'starting unbound '
/usr/local/sbin/unbound
smyslov
was put into service between my LAN and the internet on Wed Oct 16, and unfortunately my persistent connections still disconnected regularly.
This time however much more so:
Before:
After:
It also seemed to be getting worse.
OpenBSD’s ifconfig
has the ability to enable excellent debugging information per interface simply by setting the debug
flag on that interface:
ifconfig pppoe0 debug
I enabled the debug flag on the pppoe0 interface when the network was not noisy, and saw the following:
Nov 1 09:24:59 smyslov /bsd: pppoe0: lcp output <echo-req id=0xea len=8 00-00-00-00>
Nov 1 09:24:59 smyslov /bsd: pppoe0 (8864) state=3, session=0x3d22 output -> 00:90:1a:a3:7e:12, len=16
Nov 1 09:24:59 smyslov /bsd: pppoe0: lcp input(opened): <term-req id=0xbf len=4 00-00-00-...-00-00-00>
Nov 1 09:24:59 smyslov /bsd: pppoe0: lcp opened->stopping
According to RFC 1661, my computer was outputting bad echo-req packets:
Magic-Number
The Magic-Number field is four octets, and indicates a number
which is very likely to be unique to one end of the link. A
Magic-Number of zero is illegal and MUST always be Nak'd, if it is
not Rejected outright.
This is due to the following code in sys/net/if_spppsubr.c
, found at github here:
else if (sp->pp_phase >= PHASE_AUTHENTICATE) {
unsigned long nmagic = htonl (sp->lcp.magic);
sp->lcp.echoid = ++sp->pp_seq;
sppp_cp_send (sp, PPP_LCP, ECHO_REQ,
sp->lcp.echoid, 4, &nmagic);
}
This is a bug because the Sun Blade 100 uses the UltraSparc IIe processor, which is a 64 bit big endian processor. From the Wikipedia article on endianness:
Big-endian systems are systems in which the most significant byte of the word is stored
in the smallest address given and the least significant byte is stored in the largest. In
contrast, little endian systems are those in which the least significant byte is stored in
the smallest address.
The unsigned long is 64 bits in this context, and when the address of &nmagic
is passed, the most significant byte of the 64 bit unsigned long is seen, and the upper 4 most significant bytes are blank.
$ cat bigendian.c
#include <stdio.h>
#include <stdint.h>
int main()
{
int j;
uint64_t i = 0x11335577;
uint8_t * p = &i;
for( j = 0; j < sizeof(unsigned long); j++)
printf("%02x ", p[j]);
printf("\n");
return 0;
}
$ cc -o be bigendian.c
bigendian.c: In function 'main':
bigendian.c:8: warning: initialization from incompatible pointer type
$ ./be
00 00 00 00 11 33 55 77
The fix is not to use an unsigned long to hold the magic number, but an int32_t
:
$ diff /usr/src/sys/net/if_spppsubr.c.orig /usr/src/sys/net/if_spppsubr.c
4595c4595
< unsigned long nmagic = htonl (sp->lcp.magic);
---
> int32_t nmagic = htonl (sp->lcp.magic);
Compiling a custom OpenBSD kernel to test this out was very straight forward.
Following http://www.openbsd.org/faq/faq5.html:
1) Get the source (http://www.openbsd.org/faq/faq5.html#BldGetSrc):
export CVSROOT=anoncvs@anoncvs1.ca.openbsd.org:/cvs
# get -stable
cvs -d$CVSROOT checkout -rOPENBSD_5_3 -P src
# update -stable
cvs -d$CVSROOT up -rOPENBSD_5_3 -Pd
2) apply the patch
3) compile your architecture’s kernel (http://www.openbsd.org/faq/faq5.html#BldKernel):
# cd /usr/src/sys/arch/`machine`/conf
# config GENERIC
# cd ../compile/GENERIC/
# make clean && make
# make install
Reboot, and confirm that the new kernel has booted:
# uname -a
OpenBSD smyslov.my.domain 5.3 GENERIC#0 sparc64
/var/log/dmesg.boot:
OpenBSD 5.3-stable (GENERIC) #0: Tue Nov 5 19:03:41 EST 2013
toor@smyslov.my.domain:/usr/src/sys/arch/sparc64/compile/GENERIC
I booted the new kernel on Tue Nov 05, and the disconnects stopped happening as frequently:
They’re still happening, but not nearly as frequently, and as far as I can tell not because of the incorrect echo-req packets.
Note: from http://www.openbsd.org/sparc64.html:
The other architectures that OpenBSD supports have benefited because some kinds of bugs are exposed more often by the 64-bit big endian nature of UltraSPARC.
All 3 major BSDs use the same general if_spppsubr.c
structure. OpenBSD uses unsigned long
(bug), FreeBSD uses long
(bug), but NetBSD uses int32_t
. I expect NetBSD has been run on more architectures more frequently, and would have seen this endianness bug earlier because of this.
I’m in the process of reporting these bugs to both OpenBSD and FreeBSD.
smyslov
is still performing with no problems, but the main problem is that the box is quite noisy and requires about 40 watts to run. The main appeal of using the Sun Blade 100 was just to see if I could get it working, and to try using OpenBSD for the first time. I can see getting new hardware in the future that is quiet and uses far less power to run, but not yet - I’d like to run the Sun Blade 100 a little more.
Despite the bug, I had a very good experience with OpenBSD. A little bit of time to configure, but once configured it seems rock solid. It was also refreshing that it was so straight forward to configure - no surprises anywhere except when it was my fault. Debugging with tcpdump -i pppoe0
and ifconfig pppoe0 debug
was a big help in diagnosing the endianness problem.
Additionally, like my previous experience with FreeBSD, OpenBSD custom kernel development, building, and booting was easy. I’ve tried before to compile and boot my own Linux kernel to no avail. With [Free/Open]BSD, it was as easy as make && make install && reboot
. It’s probably as easy with Linux but I’ve never had that experience.