Xsan

The Anatomy of a RAID

Originally posted at http://www.318.com/TechJournal

The acronym for RAID can often be misleading as it has had multiple meanings over the years. RAID originally stood for a redundant array of inexpensive disks. The acronym RAID is now also known as a redundant array of independent disks as not all RAID disks are inexpensive. RAID refers to a hard drive storage mechanism using multiple hard drives to share or replicate data among the drives. In some cases this can mean having data that is written to a single logical drive stored on multiple drives so there is redundancy of the data or RAID can be used to maximize throughput to drives by aggregating possible speeds of RAID member drives. A key advantage of RAID is the ability to combine drives into an array with more capacity, reliability, speed, or a combination of these, than was affordably available in a single device.

Through the remainder of this article we will be looking at different types of RAID and what each can do. But before we look into RAIDs let’s look at a JBOD. In a JBOD (Just a Bunch of Disks) which is also often called a concatanated RAID in OS X, you can use multiple drives to merge data into one volume. You can take 2 drives of 2 Terabytes each and 4 drives of 1 Terabyte and merge them into one volume of 8 Terabytes. In this scenario you would end up with no fault tolerance in your environment but you would be able to take use of low cost drives, such as LaCies to create a single volume. However, if any of the drives in a JBOD fail the full volume will fail. This leads us to use this type of situation primarily with volatile situations such as a disk-to-disk backup solution.

A RAID0 is similar to a JBOD; however RAID0 requires all drives in a RAID0 array to be identical in size. Provided the drives are the same in size, RAID0 offers the fastest speeds available in a RAID. These are often used for high definition video editing and volumes housing database volumes requiring a lot of speed. RAID0 does not offer any redundancy of data. If one member in the array fails, just as with a JBOD then the volume will fail as well. However, RAID0 is fast and an inexpensive way to get large amounts of fast storage.

RAID1 is often known as a mirror. In RAID1, all data written to one disk is then duplicated onto a second disk of an identical size. In a mirror, if one member of the set of disks were to fail then the disk would continue to be accessible for read/write operations. RAID1 offers amongst the best protection to data loss available to RAID scenarios, but at the highest cost. For every byte of data stored on a RAID1 volume there must be an equal byte used for redundancy. As high end disks have become more and more expensive the development of more complex RAID strategies helps to maximize our ability to make use of a variety of solutions.

In RAID 3 you would end up with one member of each array as a static parity drive. This drive will store a stripe of information about each other drive and if one of them crashes will create itself in that drives image. The parity also causes a slight loss of speed over if it was a large RAID0 volume. RAID0 in the truest sense of the word (no parity) would net you 100% of the usable space.

Parity information is stored striped across all of the drives in RAID5, not just one. In RAID3, parity information is stored on a dedicated parity drive. But even in RAID3 you shouldn’t be able to make the smallest drive the hot swap. In fact you can only typically build a RAID0 out of drives of different sizes (which isn’t much of a RAID but more of a JBOD) unless you slim all drives down to the smallest drive size manually. Thus, a RAID5 + Hot Spare array of a 5 40GB drives would end up being a RAID 5 volume of 4 40GB drives. If you pull one for hot spare you would end up with an 80GB volume. This nets a 33% loss of space. If all 4 drives were in the RAID then you would get a RAID5 volume of 120GB, netting a 25% loss. If 5 drives at 40GB were in the RAID you would end up with a 160GB volume; thus resulting in only a 20% loss. And so on. The parity information is stored on all drives so any single drive can go down and the contents of the RAID will be rebuilt based on the parity stored on each of the drives.

RAID6 offers even more redundancy by writing two stripes of parity information to each member of the array. This allows for two drives in the RAID to crash without loosing data. RAID6 comes with more cost than most other RAIDs, both in RAID hardware and hard drives, and so is used much more rarely.