Proxmox VM Misconfiguration Downtime

I’ve got a VM running a critical service on my Proxmox host. This VM doesn’t have any real data to speak of, it just runs a few automated tasks that we rely on, so the virtual disk is rather small for it. For some reason while working on the VM (likely restoring a backup from a previous problem I had), I must’ve restored the VM and made my NAS mount the location of the virtual disk. So my virtual machine has been running off my NAS for a while. Until one day, when I had to bring my NAS down for an extended period, and the service went down too. I spent a bit of time trying to get it back up before realizing that the disk was located on the NAS and the NAS wouldn’t be up for a while. Nevertheless, once the NAS was back up, I moved the disk back to the virtual host where all the other VMs run from… oops.

The Side Effects of RAM Issues

I have been fighting failing parity checks for a few months now on my unraid server. I looked into each disk, checked smart stats and even thought I had found the culprit hard drive that was causing the issues. I still had it in my array but with no data on it just in case. This all happened just before another set of problems arose. The VMs on my server started acting up, crashing, and eventually when logging into one VM, everything crashed due to memory problems. I ran memtest and discovered that one of my RAM sticks was at issue, and from there determined that it simply wasn’t seated properly. After reseating the RAM, everything started working properly again. Parity checks come back clean, no more kernel panics, and the VMs are running stably. One partially unseated RAM stick caused all those issues.

Docker Struggles

I was originally excited when docker was going to be included in the next release of unraid, the concept behind it was solid and sounded like it would make management of my server easier. This was the case for months before docker started acting up. Now I’ve been working on a way to remove any need of docker on my NAS, moving it to a VM or another server due to its instabilities. Issues I’ve run into include it not being able to stop running containers, start stopped containers, create new containers, and preventing Linux from shutting down. I could live with all of the above except the shutdown bug. It doesn’t just prevent shutdown from running, but it prevents the kernel from shutting down at all, and well after the user shells are all offline, so there’s no way to manually kill docker to allow the system to shut down safely. This is exceptionally frustrating and has caused unclean shutdowns when I’ve lost power and even when I’m just doing maintenance, since the only way to restart when docker does this is to do a hard reset. I’m not giving up hope on containers, just going to be a bit more careful around docker, they seem to advertise quite well compared to issues people have had with their software.