The (relatively new) server that Fedia.io was running on, a Hetzner AX 162-R, died overnight. Hetzner tells me that the main board failed and had to be replaced. In the process of repairing, the raid set got corrupted and would no longer boot.
Every single AX 162 (R or M) I’ve rented from Hetzner has failed now at least once. This was the last one I had. It was on my to do list to move fedia.io to a Dell server with the same specs. I knew this was going to happen, but I didn’t get it done in time.
For those of you who have been following along, Fedia has been cursed from the beginning. The kbin software was a god damned disaster, and very fortunately the mbin team spent an incredible amount of time and patience to help me sort out the many problems, nearly all of which are fixed now.
Except for the random occurrences where federation breaks due to an as-yet-unknown bug, the main stability issue has been hardware. I have had excellent luck with Hetzner’s Dell servers, so I am hopeful that is now fixed as well. The challenge is that the Dell server is quite expensive ($350 per month) so I will be looking to find a more cost effective way to host fedia.io, given the very small number of active users.
Do those specific servers have any specific hardware that would indicate why they fail so much? Do they cheap out on something?
Hetzner makes their own servers in a form factor that works for their rack space. There are lots of YouTube videos on that. I don’t know specifically why these are failing, but being an old time engineer and knowing the CPU they are running, I am going to bet it has something to do with cooling. Either the main board is getting too hot (I monitor CPU temp and that seemed fine) or it is related to either vibration from the certainly larger CPU fan it needs, or somehow related to how the CPU impacts the power supply - possibly creating noise that ultimately burns out capacitors or something similar.