1

We have multiple old (14+ years) Tyan S3950 mobo servers running debian Unstable with various recent kernels like 6.1.0-3-amd64.

Despite being on the bleeding edge with the unstable distro, this has been a very reliable setup. However, after a recent update, we had to reboot a couple of servers and both booted with NO network or console keyboard. ie, completely unusable.

I have been able to mount the failed server HDs on another server and compared dmesg logs from the failed and last successful boots. The failed boots were not detecting the floppy (I told you they were old) and not loading the kernel piix4_smbus driver. This is what a good boot dmesg log contains:

...

Floppy drive(s): fd0 is 1.44M
piix4_smbus 0000:00:02.0: SMBus Host Controller at 0x580, revision 0

...

A failed boot did not have the above lines and subsequently failed to detect and enable the network interfaces and keyboard or any other usb devices.

FWIW, the 3ware RAID drivers are still loading fine in the bad boots. So some drivers are still loading.

This problem might stem from the ongoing Debian move to a merged-/usr system, but I have been unable to figure out how to determine that. The only hint I have is the last dbus package update, which we applied just before these failures, contained a warning about assuming that a system is fully merged-/usr system "In the case of dbus, the symptom when this assumption is broken is particularly bad (various key system services will not start". But, since this warning was in reference to a package dependency change and not to any specific code changes, I haven't reported this as a dbus bug yet.

At this point, I am not even sure what to ask. I expect the process that loads drivers during boot is not looking in the right place. But I don't know where to look for the drivers and even if I find them, what do I change to tell the system to look in the right place to find them?

If you can't tell, I am kinda lost right now. ;-)

1 Answer 1

1

After several days more research and testing I found that the problem has nothing to do with the debian switch to a merged-/usr system.

A few months ago we switched from initramfs-tools to tiny-initramfs because of a package conflict. However, we never actually tested tiny-imitramfs AND it was creating initfamfs.img files that lacked the missing drivers which could very well be a config problem on our part.

Anyway, we finally managed to fix this by booting the dead servers with a live flash drive, mounting the dead server raid into a chroot environment, switching the dead server back to initramfs-tools (the package conflict is gone now) and then rebuilding the initramfs.img files. A good description of the process is in the debian wiki at https://wiki.debian.org/RescueLive

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .