The side effects of RAM issues

I have been fighting failing parity checks for a few months now on my unraid server. I looked into each disk, checked smart stats and even thought I had found the culprit hard drive that was causing the issues. I still had it in my array but with no data on it just in case. This all happened just before another set of problems arose. The VMs on my server started acting up, crashing, and eventually when logging into one VM, everything crashed due to memory problems. I ran memtest and discovered that one of my RAM sticks was at issue, and from there determined that it simply wasn’t seated properly. After reseating the RAM, everything started working properly again. Parity checks come back clean, no more kernel panics, and the VMs are running stably. One partially unseated RAM stick caused all those issues.

Docker Struggles

I was originally excited when docker was going to be included in the next release of unraid, the concept behind it was solid and sounded like it would make management of my server easier. This was the case for months before docker started acting up. Now I’ve been working on a way to remove any need of docker on my NAS, moving it to a VM or another server due to its instabilities. Issues I’ve run into include it not being able to stop running containers, start stopped containers, create new containers, and preventing Linux from shutting down. I could live with all of the above except the shutdown bug. It doesn’t just prevent shutdown from running, but it prevents the kernel from shutting down at all, and well after the user shells are all offline, so there’s no way to manually kill docker to allow the system to shut down safely. This is exceptionally frustrating and has caused unclean shutdowns when I’ve lost power and even when I’m just doing maintenance, since the only way to restart when docker does this is to do a hard reset. I’m not giving up hope on containers, just going to be a bit more careful around docker, they seem to advertise quite well compared to issues people have had with their software.

NAS Build

I started my original NAS build with inexpensive quality consumer components, but by now its become a strange chimera of enterprise and consumer gear. The main goals: low power, quiet, high storage density

With the focus, the main decision was on a case, 8 hdd’s were the minimum number of bays, and having a few 5.25″ bays allowed me to use a 5×3 cage to add more hdd bays. From some research, it can also be found that another stack of hdd cages can be added to the case with relative ease, bringing the total number of disks held to ~21.

Case Fractal Design Define XL R2
CPU Intel Xeon E3-1245 v2
Motherboard Asrock Pro-4m
PSU Antec BP550
RAM Gskill Ripjaws X (32GB total)
HBA LSI 9201-16i
HDD Various
NIC Intel Pro/1000 VT, Chelsio dual port 10G SFP+
Extras Norco 5 x 3.5″ HDD Cage

The HDD list is a bit eclectic, i have used whatever is on sale and cheap.

Brand Size Model
WD 2TB Green
WD 2TB Green
WD 2TB Green
Seagate 3TB Barracuda
WD 3TB Red
WD 3TB Green
Seagate 4TB Barracuda
HGST 6TB NAS
Toshiba 6TB x300
WD 500GB Black

The NAS is used to host a couple of virtual machines and docker containers and it runs unRAID for the operating system. The motherboard for the system as picked for a low price and high expandability. The mATX board has 3 PCIE x16 slots, which will easily handle hba’s, 10Gb networking, etc.

The 5×3 cage added a large bit of storage density to the case. I replaced the original fan with a noctua to reduce noise.

Heres a shot of the 5×3 from the outside. It is not a hot swap capable unit, and also has no backplane. This reduces some of the complexity and potential failure points in the unit. I also don’t need any of this, so this 5×3 cage costed very little. I drilled new mounting holes in the case though to set it back about a half an inch. This allows the fan to pull in from the slots on the side of the case for airflow.

A bit dusty, but it keeps running. The HBA is mounted in a PCIE x16 slot.

The case is a giant fractal design define xl r2. The case is lined with sound insulation making it a nice quiet obelisk.

Ansible Setup

Having been running short on time to maintain my servers, I decided to look into some automation on that front. I came across Ansible, which allows management of multiple servers configuration and installation using some of the basic software that’s pre-installed: python and SSH.

Setting up ansible is the easy part. This can be done by simply setting up the Ansible host with SSH key based access to all machines that it will be managing. I set it up with root access to those machines so that it could do mass updates without problem or requesting dozens of passwords and because I don’t have Kerberos or a domain based login system.

Prerequisites on the systems:

  • Installed software before using ansible on a system
    • python (2.x)
    • python-apt (on debian based systems)
    • aptitude (on debian based systems)

Installing Ansible on the host machine was as easy as yum install ansible, though for a more recent version, one can install it from the GIT repository. This is covered in many other locations so it won’t be covered here.

Run the following commands on the machines that will be managed by Ansible:
>sudo su
>mkdir ~/.ssh
>touch ~/.ssh/authorized_keys
>chmod 700 ~/.ssh
>chmod 700 /home/administrator/.ssh

These commands create the .ssh directory if it doesn’t already exist, and put the authorized_keys file in it. They then lock down the .ssh directory as is needed for the SSH server to trust the files within haven’t been compromised. If the directory were left with the default read permissions to group and all, the SSH server wouldn’t let us log in using the SSH keys.

Run the commands on the ansible host:
>ssh-keygen -t rsa
>ssh-copy-id ssh administrator@192.168.1.23

These commands generate a public key / private key pair to use with SSH key based logins to the systems. This then copies the public key over to the machine we will be managing.

Run the following command as root on the machines that will be managed by Ansible:
>cp /home/administrator/.ssh/authorized_keys ~/.ssh/authorized_keys

This command copies the public key from the local user to the root user, allowing Ansible to login as root and manage the machines.

Once SSH key based logins are enabled on all the machines, they will need to be added to the Ansible hosts file (/etc/ansible/hosts). This file tells Ansible the IP addresses of all the machines it should be managing. This is a basic text file and can be easily modified with nano. Add a group (header of “[Group-Name]”) to the file with your hosts underneath it, and example is shown below.
>[rapternet]
>shodan ansible_ssh_host=192.168.2.1
>pihole ansible_ssh_host=192.168.2.2
>webserv ansible_ssh_host=192.168.2.3
>matrapter ansible_ssh_host=192.168.2.4
>unifi ansible_ssh_host=192.168.2.5

This adds the rapternet group with shodan, pihole, webserv, matrapter, and unifi servers in it.

One easy way to test the ansible setup is to ping all the machines:
>ansible –m ping all

This tests their basic setup with ansible (requirement of python 2.x on each machine, ssh connection functioning).

As can be seen, all servers are outputting that nothing has changed (which is to be expected with a simple ping) and that the pong response was sent back to the host. This is a successful test of the Ansible setup.

References: