At our new, quarterly “Tech Talk” event, co-sponsored by APA-NY and hosted at the NYC Hasselblad offices, one of the topics we covered in an in depth presentation was about RAID and backup systems. During the discussion we highlighted the very critical point that, RAID is NOT Backup. This caused some degree of confusion. We thought we’d use this blog post to dig in a little deeper and provide some more clarity.
Preparing for the Worst
As photographers in the digital age, most of us are storing our photos on some sort of hard drive system. And, as we all know, hard drives can fail. They can develop mechanical failures. Or, something can go wrong with the electronics that causes data to be lost or corrupted. Even worse, the drives themselves could be lost, stolen or destroyed.
To account for the possibility of data loss, we develop backup systems and workflow routines. In computer terms, a backup is an exact copy of digital data stored in a different location. This backed up data can be retrieved in case the working copy is damaged or lost. Usually this means corrupted data or hard drive failure.
There are many ways to achieve duplicating files and storing them securely. We’ve posted many times about the 3-2-1 strategy. As well as using Backblaze as a cloud based solution for keeping offsite copies of your images.
So here is where we draw the line between a RAID system and keeping a set of backup files.
What is a RAID?
A RAID (Redundant Array of Individual Disks) is where a bunch of hard drives can be cobbled together to form a single system. The features and capabilities of any give RAID system depend on the manufacturer and software involved. Suffice to say that, a RAID creates a system that is greater than the sum of its parts.
By way of example, let’s assume that we have so many photographs that we have to store everything on five different, individual hard drives. In this example we have no redundancy or fault tolerance. That is to say that, if anything happens to any one of those drives the data on that individual drive would be lost. We also would have to contend with having a data cable and/or power cord to connected to each individual drive. This could lead to a problematic situation itself: we might not be able to access all of the data at the same time since we cannot connect all the drives at once to one computer.
This is where a RAID comes in. In this case, a RAID 5* cobbles together all five of our example drives into a single system contained within a specialized enclosure. In other words, they work as if they were one hard drive. It has one data connection and one power cord. The RAID system then sprinkles little bits of data across all five drives. That is, every photo is divided into smaller chunks and distributed across all of the drives. The power of the RAID is its massive capacity and if (when) one of the five individual drives fails.
Because data is distributed across five drives, only part of the data would be lost if one of the drives developed a fault. Using something called parity (think of parity like a mathematical map of any given file), the system is able to rebuild a file using the other four parts and the parity information.
What is a RAID Good For?
A question you might ask then, is: why use a RAID at all? Why can’t I just use a bunch of hard drives and keep them all backed up? The simple answer is, you could do that but it would be an awful lot of work!
The large storage capacity created by a RAID is not available as a single hard drive. RAID provides redundancy, that is the ability to recover from any given drive failing. One additional consideration I would add is, RAID systems are best used when your total storage needs exceed roughly 20TB of data.
The current size and price of hard disk drives today is such that it is inexpensive enough to purchase and manage a single large capacity hard drive (8-12TB). However, the advantage of a RAID comes in when you need two to three (or more) times as much storage capacity.
Why RAID is Not Backup
Let’s take the above example of five hard drives in a RAID. We have redundancy (the ability to recover from failure) but we do not have backup (a separate, wholly independent copy of the data). The RAID system does not make an additional copy of the data to another device. The RAID simply distributes little bits of data across all of the drives in the RAID system to account for the possibility of part of the system failing. Therefore, we can see that RAID is indeed NOT backup.
How do you Backup a RAID?
There are two basic strategies for backing up a RAID system. The simple option is to use a cloud based backup system like Backblaze. There are many volume storage services in the market, such as Amazon, iDrive, and Crashplan. We’re biased towards Backblaze for a number of reasons. However, the key issues are pricing and having a fast, stable internet connection. Using a cloud based service does double duty in keeping a second copy of your data “off site” in case of catastrophe.
The other option would be to copy all the data from the RAID to another RAID, of equal or greater size. Or, to copy all of the data to a number of non-RAID hard drives. The former would be much simpler to manage but, would have cost implications as RAID systems can get expensive very quickly. The latter option would entail a lot more work, both in terms of time and hands on management.
Purchasing and setting up a RAID system is something that needs to be carefully considered. Given the cost and technical details, it may not be the best solution for most people without some guidance. RAIDs are good for storing and accessing large volumes of data but are not backup systems themselves.
If you have any questions about this or would like help in choosing or setting up a RAID, please do contact us as this is part of our normal business offerings.
Just a quick note here to say that there are many deeper technical issues regarding RAID that I did not address in this post! RAID can be a complex subject and there are many ways to achieve solutions for creating redundancy and backup. The purpose of this post was to write a non-technical explanation of why RAID isn’t a backup solution. So, I intentionally left out a number of issues so as to stay focused on this one point.