I just wanted to get some feedback from the community on this subject. Basically I have a customer that I am installing an Ignition system for, and it will have redundancy built in for the Gateway as well as the DB servers. The customer is asking for us to implement some level of RAID and I don't know enough to really give opinions on this subject, but to me having never implemented this before on larger and sometimes more critical operations, it seems like it may be a unnecessary level of redundancy? The OS is Linux(Ubuntu). The main concern is data corruption on power loss. I just wonder everyone's opinion on this matter.
I have strong opinions on this topic. (Quelle surprise!)
-
Raid (of any kind) will not save you from power loss data corruption, unless the disruption actually corrupts a sector's on-disk ECC. That's nearly impossible with modern storage, whether NVMe, SSD, or spinning rust. (On-board capacitors protect against incomplete writes.) Writes queued by the OS but not yet on all member devices are suspect, but modern journaling filesystems will catch and fix this on bootup. Similarly, modern databases use transaction logging to ensure consistency in their files.
-
That said, Ignition (and many daemon services) do non-transactional writes all the freaking time, so a short-term UPS connected to each server should be used to gracefully shut stuff down. ("nut" is the Linux toolkit that will monitor your UPS for this purpose.) Do make sure the UPS is a continuous inverter style, not a quick switchover style. The latter can glitch your server, and will not protect from surges/spikes/brownouts.
-
Meanwhile, raid does provide continued functionality when your storage devices start to fail. Spinning rust often starts to show climbing stats on replacement sectors when they are approaching disaster, but modern solid state technologies simply die. Abruptly.
I recommend at least mirroring your devices using Linux Software Raid (aka
mdadm
), and arranging for dual EFI system partitions. (So your broken mirrors won't prevent reboot from working.) I generally layer Linux's LVM on top of the primary raid mirror for management convenience.
{ Full disclosure: I have a long history advocating for, and helping people with, Linux software raid. }
Also, if you are going to have your system operate as a hypervisor (which it can do very well), set up the raid and LVM at the hypervisor level, and pass LVs (logical volumes in LVM) to the guest systems as their dedicated storage. Maximum performance, and the guest OSs are protected without any extra configuration within.
Thank you for the information Phil!
+1 for Linux software RAID.
There’s no reason to use anything else anymore.
I've always just bought the server with hardware RAID. On Windows I've hated software RAID, but I've set up ZFS on TrueNAS and like it quite a bit, but that's on a NAS vs running it on the native OS.