Recently, someone asked me whether our system requires hot spares, and after a moment of shock and confusion I couldn’t resist but answer with a Star Wars quote: “Now there’s a term I’ve not heard in a long time.”
Turns out that there are newer storage systems out there that still rely on the dated and inefficient concept of RAID (or EC or replication) groups and hot spares. It’s very surprising because RAID with hot spares was invented in the 1980s (A Case for Redundant Arrays of Inexpensive Disks, Patterson et. al., 1988) and the issues caused by both have been well understood since the late 1990s.
So, what’s the problem?
RAID is a bunch of drives that hold the data plus parity for the files stored on them. In the case of a replication group you often have a set of three drives that each have a copy of the same data. This is all hunky-dory until one of the drives breaks.
When a drive goes bad you want to restore the lost data or redundancy as soon as possible to avoid data loss in case a second drive goes bad. To restore the data you have to read the remaining data from the other drives, recompute what is lost, and write it somewhere else (see hot spares in the next paragraph). This rebuild IO has a significant performance impact while the rebuild is running, affecting all users and applications with data on the group.
Modern storage systems distribute data on a much smaller scale, ideally per file or object. If a drive breaks in such a system, the files/objects that lost redundancy or replicas will be distributed across the entire cluster. A rebuild now requires very low IO from all drives in the cluster, and the impact on users and applications is negligible. Even better, the larger the cluster the greater the ability to handle drive failures.
Enter the Hot Spare
When the rebuild in the RAID (or EC, replication) group happens, the data from the lost drive needs to be stored somewhere. You can’t store multiple stripes or redundancy blocks on the same drive (failure domain) as it would lead to data loss when the drive dies. This is where the hot spare comes in. You add one or more drives to the raid group that just sit there until one of the other drives breaks (that’s why it’s called a spare). It’s hot because it is instantly available, unlike a spare that a human would have to swap out.
The hot spare is a significant waste in two ways: The hot spare’s performance is not available to the storage users. Similarly, the capacity of the hot spare just sits there and can’t be used. The waste of space can be significant. If you use erasure coding with 8+3 and two hot spares the space overhead goes from (8+3)/8=1.375 to (8+3+2)/8=1.625.
In contrast, a system that distributes on a file/object level across the cluster can use the performance of all drives and doesn’t waste the space by reserving hot spare capacity per RAID group.
So, the hot spare is a symptom of dated tech that will limit the scalability of your storage, will cause significant performance degradation during rebuilds, and cost you more money due to wasted resources.