Redundancy vs raid or both? Opinions

I am doing my research in regards to proposing an ignition system and I need to have an opinion on how best to ensure data reliability. Im more on the electrician side of SCADA engineer though. So I need opinions from more knowledgeable poeple. We are a water/wastewater system so loss of certain data could mean epa violations and public notices, etc.

The system I am looking at would consist of 4 edge gateways connected to a central gateway. This way the edge gateways can provide short term data backup to the central.

Now for the minimal cost it would take I cant see why I wouldnt set up full redundancy at every edge with a second computer.

Now full redundancy at the central gateway is what I will recommend, but it does bring with it a good amount of cost.

Going further past redundancy what about raid setups? One of our systems has a raid setup installed by a contractor I highly respect as knowledgeable. I believe it is Raid 1, with OS on one disk, Program on another and those are mirrored to another set of disks. Im not super knowledgeable about though.

Is Raid still a good idea even with redundant machines or is that getting a little silly at that point? Raid doesnt help if the entire computer dies but I suppose the disk going bad is the highest likely failure ? Could raid be a reasonable cost saving alternative to a redundant central gateway?

Everything will be on double conversion UPS. With a good backup policy, which I also need to come up with.

What are the thoughts of the forum?

RAID + Redundancy, IMO.

1 Like

So Raid 1 makes a lot of sense to me, just mirroring drives. What about some of the other setups like raid 1+0, pros and cons in terms of data protection.

At the end of the day, server redundancy is the real backstop. If you have a drive failure, you want the backup server to take over, and use monitoring tools to understand that your server needs fixing.

RAID on the gateway is relatively pointless, there is nothing on there as far as data storage that you need to hang onto. If you have a Gateway Backup of the current running system, and you're running a redundant server, you can rebuild the primary on literally any platform that has enough resources in about 15 minutes. By all means build on a platform with RAID to save you the hassle of OS installation on a failed disk.

RAID only mitigates some risks from disk failure. Redundant servers mitigate the risk from this as well as any fault on one of the servers.

The database is where you want to look at your setup carefully. This is where all your critical data resides, and RAID and clustering are essential for keeping this system running. This is also ALWAYS on a separate server to the gateway unless you use the Core Historian, which is unsuitable for critical data storage due to lack of features.

8 Likes

My go-to RAID configuration is RAID 10 (1+0) for servers, but with solid state drives the 0 portion may not even be necessary unless you want really fast storage. So you could do RAID 1 if using full solid state storage. In the past going with RAID 10 helped with write speeds for databases or anything needing fast writes. (Essentially you're distributing your writes, and reads while distributed in RAID 1 are distributed even further in RAID 10). If you're using Linux, I've become a huge fan of ZFS for storage, but it is more complex and requires a lot of RAM for caching which isn't cheap lately.

RAID only protects against drive failures, redundancy protects against entire computer/server failures and OS/software crashes (unless an Ignition config is the cause of the crash in which case your redundant backup will also crash most likely).

Then comes the network redundancy. Many hardware platforms provide some sort of redundancy for networks. Rings are popular in various flavors, but the best I've seen is PRP networks with rings (LAN A and B both being rings). This will require hardware that supports PRP or using RED devices to connect non-PRP devices to the PRP network, but it's very robust against network failures.

What are your experiences on reporting of degraded solid state? It was my understanding that spinning magnets offer better insight into failed sectors, where solid state “just stop”.

The most insight I've ever seen was a notification of the SSD's lifespan reporting less than 1% lifespan remaining. Which I know is just a calculated number, and that was on my home server. Most of the systems I've built have been sent off and just run until we get a call from a customer.

I've got a customer that supplies their own servers and hardware, and for their standard Ignition installation (literally Ignition standard with redundancy, 1 database server, 2 domain controllers, 3 client VMs, and an engineering VM), they run a large all flash SAN over 100Gbit storage network with no local storage on the host machines with a separate NAS for backups. Super fast and I'm certain expensive, but I haven't heard of them having any problems with any of it.

1 Like