In the 80s, shoulder pads were in, Cabbage Patch Kids were the country’s most popular toys, and Michael Jackson could be heard through the headphones of every Walkman in town. Computers, much less powerful than today’s, needed much more square footage to operate. Today, these are all considered vintage experiences, pieces of a decade that seem, while not ancient, certainly outdated. So much has changed, yet certain facets of this era seem surprisingly sticky.
One of those is Redundant Array of Independent Disks (RAID) configurations, the classic way of protecting data from hard drive failure. RAID 5 and its cousin, RAID 6, have been the default way of protecting data (and system uptime) for cost-conscious organizations since the late 80s.
But if you’re still using RAID 5 or 6 today, it could be a disaster for your workplace. If you use or are considering RAID 5 or 6 with hard disk drives, stop right there. Here’s what you need to think about.
The influence of RAID
When they first started, RAID 5 and 6 made sense, compensating for hard drive failures that were all too common at the time. By spreading data and parity information across a group of disks, RAID 5 could help you survive a single disk failure, while RAID 6 protected you from two failures.
This worked well for a while. Drive speeds improved over time, and RAID adapters improved with them. But in the early 2000s, hard drives ran up against the laws of physics. Spinning a drive faster than 15,000 RPM was not feasible, but improvements in storage density didn’t stop. The industry continued to turn out larger capacity drives, and storage and system admins continued to use these denser drives in RAID configurations.
Unfortunately, the increase in storage density without a corresponding increase in overall performance meant rebuild times began to climb. During the 80s, the average home computer only had a few megabytes of storage, and an enterprise computing system might have a few hundred megabytes —nothing close to the terabytes of information on hard drives today.
Which brings us to the problem: RAID just can’t keep up anymore.
The problem with RAID 5 and 6 today
In theory, we know that the limited speeds and high storage density of hard drives will slow rebuild to a crawl. But what does it look like in practice? We tested RAID 5 rebuild times across a variety of storage devices, and the numbers are more staggering than you might realize:
With RAID 5, rebuilding a 2TB array (built from 500GB drives) drive can take as long as 134 hours—that’s more than five days—if you expect to keep using your system while the rebuild occurs. For a 40TB array (built from modern 10TB drives), that time turns into a whopping 4,200 hours, or nearly six months. RAID 6, because it has an extra disk for parity, can take even longer.
In either case, leaving your system unprotected for that long is a huge gamble. Unless your business is still using less than 2TB of hard drive storage (unlikely, in a modern organization), RAID simply is not built for you.
Modern solutions for modern hard drives
Still, system admins continue to implement RAID configurations because they’re familiar, even as hard drives grow larger and larger.
But if RAID isn’t the right choice for your organization, what options do you have? Luckily, as computers have advanced, so have the ways to protect data from hard drive failure. There are a few you may want to consider.
1. Application redundancy
Application redundancy allows the application that’s using the storage to manage redundancy. Typically, this means replicating transactions to a second server, and providing full clustering capabilities besides the standard data resiliency that RAID affords. These techniques are among the best for data protection. Examples include MySQL Master-Slave replication, Oracle Data Guard, and Microsoft SQL Server AlwaysOn Application cluster.
2. Erasure coding
Erasure coding breaks data into fragments encoded with rebuild information, and spreads the fragments across disks and/or systems. Typically, erasure coding allows for recovery from many different failure scenarios, and often employs a fail-in-place strategy for devices. It can have slow write performance, but remains the top contender for large data pools. This is typically used in Object Storage systems like Swift and Ceph, as well as the HGST ActiveScale Object Storage system.
3. Software defined storage
This method is particularly popular, and typically uses general-purpose storage nodes built from common components to implement a SAN-style storage system. Usually, three or more storage nodes are used. It provides easy scaling and a deployment strategy familiar to SAN proponents. Examples of this would be Swift and Ceph, SUSE Enterprise Storage, VMware VSAN, Microsoft Storage Spaces Direct, Nexenta, DataCore, and a score of others.
4. Solid state storage
Of course, the answer could come from changing your hard drive, rather than your data protection. We also ran tests for RAID 5 configurations using Flash SSDs (in blue below) and NVMe/PCle devices (in green below). The NVMe/PCIe devices were measured with software RAID in Linux, and no hardware RAID controller was used. These PICe devices are not attached to the computer using a RAID controller.
In these tests the rebuild times drop to a fraction of Capacity HDD rebuild times:
These devices also have no moving parts and built-in resiliency, offering a totally different performance standard from HDD. Where budget and capacity strains will allow, flash storage may be the best choice for many deployment scenarios.
Making the right data protection choice
Just as your childhood sweater from the 80s would no longer fit you today, hard drives have simply outgrown RAID configurations. There are more fitting options that work more efficiently and keep your data safer. The pull of tradition is difficult to overcome, but ultimately, you’ll be better off exploring some of the alternatives for your data resilience.