Bisecting Kernel Using PXE Boot
Recently I found that an Atheros AR5BXB64 wireless card in my Acer Aspire One AOA150 netbook stopped working properly in 13.04 with 3.8 kernel. The problem was that any file download over the wireless interface had a high chance of being corrupted. I was not able to download updates - every time something big (like a kernel upgrade) got downloaded, I got the file, but the checksum did not match. The card was able to inject copies of previous buffers into the stream in a way that still passed IP and TCP checksum verification.
This was exactly the same symptom we had when Active State Power Management (ASPM) L0s was enabled for that PCI-E device, when the hardware itself was not actually able to handle this state properly. It was fixed in 2009 but suddenly got broken in 3.8 again.
I filed a bug (LP:1137513) and, since I had a workaround, I just
continued using the netbook with a setpci
call in rc.local
until I was
asked whether any prior Raring kernels exhibited this behavior.
I booted 12.10 from a Flash drive and found that the card was working properly
there. Then I checked 3.7 kernels and found that the issue did not exist there
and appeared only in 3.8.0-rc1, which suggested there was indeed a regression
in the kernel. And the difference was that 3.7 and earlier kernels had
LinkCtl: ASPM Disabled
while 3.8+ had LinkCtl: ASPM L0s L1 Enabled
in the
lspci -vv
output.
Since I was not able to find any relevant commits that looked like they were
deliberately doing something to ASPM handling, the only way to proceed was to
run a binary search. Since it was possible to check whether the bug was there
or not just by checking lspci
, it could and should have been automated.
Bisect works by splitting the commits between known good and known bad revision in half with every iteration and testing the commit in between. Rinse and repeat until you hit a commit that made the switch from good to bad and that's where the things broke.
I had to test 5840 commits, with roughly 13 steps (213=8192, 212=4096). 13 manual reboots are 12 reboots too many so I went on to set up an environment that would do everything for me. Additionally I did not want to build the kernels on that netbook itself as it is pretty slow. The build host and the netbook should have been separated if I wanted to finish the task in a reasonable amount of time.
So, here's the plan:
Set up a PXE environment.
Find the minimal kernel config that exhibits the issue.
Create a custom initrd containing all the tools needed to detect whether an issue is there.
Create a script that can be reliably used by
git bisect
to make it all run automatically.
PXE
PXE is a Pre-boot Execution Environment which allows network boot without any local disk installation. It works by starting a DHCP client early in the boot process, obtaining the address of the TFTP server that hosts the boot images, downloading the image and passing the control over. TFTP is a Trivial File Transfer Protocol, a really simple connection-less and authentication-less protocol that's running over UDP.
In my case I already had a DHCP server which was running on my OpenWRT-based
router. I set up a virtual machine and had my DHCP server direct the queries
for the boot images to that virtual machine. This was added to my
/etc/dnsmasq.conf
on the router:
Then I switched to the "raring-server-i386" and installed atftpd
.
Ubuntu 12.04 Note
The version in 12.04 suffers from a bug that prevents
rlinetd
from being set up properly. If you happen to hit the bug, you can create the configuration manually by executing the following:
. /etc/default/atftpd sudo update-inetd --group BOOT --add \ "tftp dgram udp wait nobody /usr/sbin/tcpd /usr/sbin/in.tftpd $OPTIONS" unset OPTIONS
In order to run Linux with PXE you will need to use PXELINUX. The
required image, pxelinux.0
can be obtained after installing syslinux
package from /usr/lib/syslinux/pxelinux.0
. By default atftpd
uses
/srv/tftp
as a root for the shared filesystem, so pxelinux.0
should go
to /srv/tftp
and be readable by nobody
user.
Then I set up the netbook to boot from PXE. This will vary between different BIOS setup interfaces, but it boils down to enabling integrated LAN adapter, enabling boot ROM on that adapter and setting the network device as the first one to boot from. A wired connection to an IPv4 network is needed.
With pxelinux.0
in place I booted the netbook:
Intel UNDI, PXE-2.1 (build 082) Copyright (C) 1997-2000 Intel Corporation This Product is covered by one or more of the following patents: US5,307,459, US5,434,872, US5,732,094, US6,570,804, US6,115,776 and US6,327,625 Realtek PCI Express Fast Ethernet Controller Series v1.15 (090224) CLIENT MAC ADDR: 00 1E 68 E2 10 AE GUID: 40709CDA-A8E0-D411-8C6C-001E68E210AE CLIENT IP: 192.168.1.14 MASK: 255.255.255.0 DHCP IP: 192.168.1.1 GATEWAY IP: 192.168.1.1 !PXE entry point found (we hope) at 9D17:0109 via plan A UNDI code segment at 9D17 len 19CA UNDI data segment at 930A len A0D0 Getting cached packet 01 02 03 My IP address seems to be C0A8010E 192.168.1.14 ip=192.168.1.14:192.168.100.102:192.168.1.1:255.255.255.0 BOOTIF=01-00-1e-68-e2-10-ae SYSUUID=40709cda-a8e0-d411-8c6c-001e68e210ae TFTP prefix: Unable to locate configuration file Boot failed: press a key to retry, or wait for reset...
The line starting with "My IP address" is our pxelinux.0
working. Now we need
to put configuration and kernel in place.
You can just grab the kernel and initrd from existing installation and put them
in /srv/tftp
. In my case I put kernel to /srv/tftp/vmlinuz
and initrd to
/srv/tftp/initrd.img
. If you check /var/log/syslog
on the TFTP server, you
will see that pxelinux
wanted to get the configuration from the following
locations:
pxelinux.cfg/40709cda-a8e0-d411-8c6c-001e68e210ae pxelinux.cfg/01-00-1e-68-e2-10-ae pxelinux.cfg/C0A8010E pxelinux.cfg/C0A8010 pxelinux.cfg/C0A801 pxelinux.cfg/C0A80 pxelinux.cfg/C0A8 pxelinux.cfg/C0A pxelinux.cfg/C0 pxelinux.cfg/C pxelinux.cfg/default
If you are wondering, C0A8010E is 192.168.1.14 in hexadecimal notation.
So we need to put configuration at some location it is querying. I decided to
use the MAC address, 01:00:1e:68:e2:10:ae
and put the following configuration
to /srv/tftp/pxelinux.cfg/01-00-1e-68-e2-10-ae
(yes, the folder is named
pxelinux.cfg
):
You can reboot your device now and you will see that the kernel is loaded,
initrd.img
is loaded too but the boot breaks since the root device is
undefined. While you can put the root information from your grub configuration,
we don't actually need to do that. As we are going to build our own kernel with
minimal number of features, we will run everything from the intitial RAM drive,
initrd. I put the panic=10
option so that the kernel reboots after 10 seconds
in case it fails so that I would not need to shut the netbook down manually.
Kernel
Well, that was easy. I grabbed the git repository of Linus,
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
, checked out
v3.8-rc1 and ran make menuconfig
Since I was interested in ath5k module only I disabled loadable module support
and made ath5k a built-in module. I left quite a few features I did not
know/care about enabled which I believe has somehow increased the build time,
but in the end it took 3 minutes to build a kernel on my VM with 4 concurrent
builds (make -j 4
, the VM had 4 CPUs allocated)
initrd
When kernel boots with initrd, it mounts that image as a root device and runs
init
as PID 1 trying /sbin/init
, /etc/init
, /bin/init
and /bin/sh
. In
case /bin/sh
is handled by busybox
and it is running as PID 1, it attempts
to run /init
and /sbin/init
. So I needed to create a new initramfs that had
everything inside the /init script itself:
Setting up /dev, /proc, /sys and /tmp in case I need that for debugging.
Configure the network interface.
Execute the required test.
Wait for a reboot command.
Here's what I came up with, the first part is stolen from /init that ships with Ubuntu:
#!/bin/sh [ -d /dev ] || mkdir -m 0755 /dev [ -d /root ] || mkdir -m 0700 /root [ -d /sys ] || mkdir /sys [ -d /proc ] || mkdir /proc [ -d /tmp ] || mkdir /tmp mkdir -p /var/lock mount -t sysfs -o nodev,noexec,nosuid sysfs /sys mount -t proc -o nodev,noexec,nosuid proc /proc # Some things don't work properly without /etc/mtab. ln -sf /proc/mounts /etc/mtab ifconfig eth0 192.168.1.14 route add default gw 192.168.1.1 sleep 5 lspci -vv -d 168c:001c | nc 192.168.100.102 5555 nc -l -p 5555 && reboot -f
So this init script configures the wired interface, sends the output of lspci
to the host running the bisect script and listens to an incoming TCP connection
on port 5555 so that it would reboot the host when connection is terminated.
There is no need for anything fancier.
The initramfs contained the minimal set of applications and libraries, but you
will want to grab busybox
from a working system as it has a reboot
applet
which is missing from the version shipped with initrd by default on Ubuntu
systems. I needed to do a bit of tweaking before I got everything working as I
wanted so I had /bin/sh (symlink to /bin/busybox) originally in place of all
these lspci
calls. Also I used nc
from netcat-traditional
since it had
less dependencies.
ubuntu@raring-server-i386:~/minroot$ find ./init ./bin ./bin/sh ./bin/lspci ./bin/nc ./bin/bash ./bin/busybox ./lib ./lib/ld-linux.so.2 ./lib/libresolv.so.2 ./lib/libc.so.6 ./lib/libz.so.1 ./lib/libpci.so.3 ./dev ./dev/ram0 ./dev/console ./dev/null ubuntu@raring-server-i386:~/minroot$ find | cpio --format=newc -o | gzip > /srv/tftp/initrd.img 7806 blocks
Bisect script
git-bisect run expects the script to return 0 when this build is considered
a success or 1 when the build is considered a failure, so in our case a
successful build will have "ASPM Disabled" in the output. grep
returns 0 when
a string is found, 1 when not found and 2 when error, so we can just use that.
I called the script 'bisect-helper' and put it in my home directory on the VM.
#!/bin/bash cp ~/kernel.config ~/linux/.config cd ~/linux make olddefconfig make clean make -j 4 cp arch/i386/boot/bzImage /srv/tftp/vmlinuz # reboot netbook echo "reboot" | nc 192.168.1.14 5555 # when it goes up, it will send us a message nc -l 5555 | grep 'LnkCtl:\s*ASPM Disabled' exit $?
The script needs to be tested manually first to check whether it is actually producing the results expected:
ubuntu@raring-server-i386:~/linux$ git checkout v3.7 Checking out files: 100% (11706/11706), done. Previous HEAD position was 19f949f... Linux 3.8 HEAD is now at 2959440... Linux 3.7 ubuntu@raring-server-i386:~/linux$ ~/bisect-helper; echo $? ... Kernel: arch/x86/boot/bzImage is ready (#40) LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ 0 ubuntu@raring-server-i386:~/linux$ git checkout v3.8 Checking out files: 100% (11706/11706), done. Previous HEAD position was 2959440... Linux 3.7 HEAD is now at 19f949f... Linux 3.8 ubuntu@raring-server-i386:~/linux$ ~/bisect-helper; echo $? ... Kernel: arch/x86/boot/bzImage is ready (#41) 1
Yep, looks like it is working. Now let the madness begin!
ubuntu@raring-server-i386:~/linux$ git bisect start v3.8-rc1 v3.7 Bisecting: 5840 revisions left to test after this (roughly 13 steps) [6be35c700f742e911ecedd07fcc43d4439922334] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ubuntu@raring-server-i386:~/linux$ time git bisect run ../bisect-helper running ../bisect-helper ... (an hour later) ... 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6 is the first bad commit commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6 Author: Taku Izumi <izumi.taku@jp.fujitsu.com> Date: Tue Oct 30 15:27:13 2012 +0900 PCI/ACPI: Request _OSC control before scanning PCI root bus This patch moves up the code block to request _OSC control in order to separate ACPI work and PCI work in acpi_pci_root_add(). Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> :040000 040000 ac58db59f5eaf541d5cb36197aaaf5dfe319ea37 d25e645d59da508e3a1bcb01509cbfb5c7a8239b M drivers bisect run success real 63m39.674s user 155m30.732s sys 27m33.772s
It took ~64 minutes to run 13 kernel builds and the offending commit was
discovered. I filed a bug, it got picked up by
Yinghai Lu, who found that this commit triggered an issue in another function
that differentiated between ASPM handling for non-hotplug and hotplug cases.
This was affecting iwlwifi as well as other modules which were using
pci_disable_link_state
call, so it was a real issue.
Yinghai Lu has posted the patch to the kernel-pci mailing list and it should be merged at some point.
Summary
I've always wanted to do git bisect on something, but I had no idea what to start with. Bisecting is not specific to git, there's a Bazaar plugin as well and it is fairly easy to use this technique with any other VCS.
Minimizing the kernel build time is important. Originally I spent nearly 2 hours fiddling with the kernel configuration until I removed most of the things that were completely irrelevant to the issue being debugged. Having a powerful machine to build the software is a must in case the actual test needs to be performed on a slower machine.
Booting linux with PXE involves quite a few moving parts (DHCP, TFTP, boot image, kernel and initrd) that may take a while to configure, but all the time invested in this configuration will eventually pay off.