What is RAID?
RAID Overview
It usually happens at the worst possible moment. You have some critical data
you need for an application and boom - the hard disk crashes taking your
important data to oblivion with it. If you have a tape backup then at least you
can get to your data. But there is still the inconvenience and time wasted
rebuilding the data on a replacement disk. A common way of preparing for and
mitigating such eventualities as disks failing is to use a concept known as RAID
- Redundant Array of Independent Disks.
RAID technology allows you to have data written to multiple sets of disks at
the same time, thereby reducing the risk of an individual disk failure
destroying data. That is a rather simplistic definition of what RAID does. In
reality, over the years since RAID was introduced, the technology has gone
through multiple protocol revisions, enhancements and extensions to the point
that it can mean different things to different people depending on application
contexts.
RAID technology is divided into several protocols or levels
which, alone or combined, serve to increase data integrity, throughput,
availability and capacity. A RAID level basically specifies how disk sets are
arranged and the pattern in which data is written, read and verified for
integrity (for the RAID levels that support data integrity). RAID is not a
function of the disks themselves - rather it is implemented at the disk
controller level (hardware RAID) or in the operating system (software RAID).
Hardware RAID controllers are intelligent disk controllers, usually with a
dedicated microprocessor performing the complex RAID algorithms. Software RAID
on the other hand depends on the host system microprocessor to perform RAID
calculations and would therefore reduce the raw processing power available to
run applications. The benefit of hardware RAID over software RAID is that it
does not impinge on the host system microprocessor - leaving it to perform
regular computational tasks instead. Another advantage conferred by hardware
RAID circuitry is the ability to hot-swap or hot-plug failed disks from a RAID
array. What this means is that for the RAID levels that can survive one or more
disk failures, the ability to replace the failing disk(s) while the system is
still running in invaluable for mission critical applications.
The following are the most commonly used RAID levels. Others exist but are
not in widespread use and are mostly dedicated to very specific application
scenarios.
RAID Levels
|
RAID 0
|
|
RAID 0, also known as Striping, is not redundant at
all and is not RAID in a pedantic sense. What RAID 0 does is to increase
performance significantly by breaking up data blocks into smaller equally sized
chunks which are then distributed across two or more physical disks. The
performance enhancement is brought about by the fact that data is being written
to or read from all disks in the array or RAID set nearly simultaneously as
opposed to sequentially from a single disk. In general, the higher the number of
disks in the RAID 0 array, the better the performance. Another advantage of RAID
0 is that the total capacity of all disks in the RAID set is available to use
for data storage.The performance and capacity advantages do come at a price
however. In fact, following on from our initial example in the introduction
above, a single disk failure in a RAID 0 array will render all data in the array
irretrievable - with the only recourse being restoration from backups. And as
there are now more disks being used simultaneously, the chances of a failure
occurring increase.
RAID 0 is excellent for applications that require ultimate I/O performance
with a caveat of the ability to commit less dynamic data to longer term storage
at certain intervals. Applications such as image editing, pre-press and digital
rendering can benefit greatly form RAID 0 I/O performance and capacity
characteristics.
|
Back to top
|
RAID 1
|
|
RAID 1 or Mirroring is the opposite of RAID 0.
Rather than chunking data bits and spreading them across two or more physical
disks, mirroring writes identical data bits to two or more physical disks so
that in the event of a disk failure, at least one disk in the RAID set still has
a complete copy of the data. RAID 1 confers true redundancy and is generally
achieved and implemented as a mirror of two disks. Mirrors can however be
created with disks numbering multiples of two. The main disadvantages of RAID 1
are that cost and capacity. Cost doubles and capacity is halved in comparison to
a single disk non-RAID configuration. Performance on writes can also slightly
degrade as the data needs to be written at least twice for the write operation
to be considered complete.
RAID 1 is excellent for applications where data integrity is absolutely
critical and the inconvenience of restoring from backups is to be avoided at all
cost. Accounting and financial applications are two typical application
scenarios where RAID 1 would be ideal.
|
|
Back to top
|
RAID 0+1
|
|
This RAID level, as the name suggests, combines the attributes of RAID 0 and
RAID 1 to gain benefits of both levels; performance and redundancy. RAID 0+1
requires a minimum of four disks to implement and is a mirrored stripe set. That
is to say, a RAID 1 array is layered over two RAID 0 arrays. While getting the
performance benefits of RAID 0, RAID 0+1 increases reliability as well by
keeping a mirror of the data striped data. Naturally, as multiple copies of the
data is kept, the cost of the solution is double that of a RAID 0 array. A major
disadvantage of this RAID level is that a single drive failure will cause the
array to become a RAID 0 array.
|
Back to top
|
RAID 3
|
|
RAID 3 uses byte level striping with parity information stored on a dedicated
disk. RAID 3 has very high read and write data transfer rates and single disk
failures do not impact throughput significantly. RAID 3 stripes data blocks and
stores the striped information in the exact same location on the individual
disks that make up the array - so parallel I/O is not possible as data requests
require seeks on all disks simultaneously to the same position.
RAID 3 is excellent for media applications such as image editing, digital
pre-press and live streaming. The total capacity of a RAID 3 array is
sum(N-1) and requires a minimum of three disks to
implement.
|
|
Back to top
|
RAID 4
|
|
RAID 4 algorithm is similar to RAID 3 except that striping is done at the
block rather than byte level. This has the advantage of blocks requests being
serviced by a single disk if the controller supports that functionality. With
single disk block request serving, multiple block requests may be services
simultaneously in parallel so long as the individual blocks reside on separate
disks.
The total capacity of a RAID 4 array is sum(N-1)
and requires a minimum of three disks to implement.
|
Back to top
|
RAID 5
|
|
RAID 5 or Striping with Parity is implemented with
a minimum of three disks. In a typical three-disk RAID 5 array the data is
striped across two disks and parity information is written to the third. This
scheme is extended to any further number of disks in the array. For every stripe
of data that is written to disks on a RAID 5 array, a special parity bit is
calculated and stored in a round-robin fashion. The parity information is
therefore distributed and any disk in the array can fail and data can then be
restored from the remaining set of disks in the array using the parity
information. The total capacity available in a RAID 5 array is
sum(N - 1), where N is the
number of disks in the set.
A RAID 5 array cannot handle more than a single disk failure without being
corrupted. If two disks fail within a short time of one another, i.e
insufficient time has elapsed for the parity calculations to rebuild the data
blocks of the failed disk before another failure occurs, then the array and its
data will be lost. It is useful therefore to have a dedicated hot-spare*
disk in the array so that a rebuild can start immediately upon a disk
failure. With such a configuration, in the event of a single disk failure the
RAID controller will rebuild the data on the failed disk on the hot-spare disk
using the available parity information on the remaining array members. Once the
rebuild has finished, the array will operate as normal.
In terms of disks, a RAID 5 array is cheaper to implement than a RAID 0+1
array. RAID 5 data reads are also slightly faster than single (standalone) disk
reads. The main disadvantage of RAID 5 compared to RAID 0+1 or RAID 0 is that a
disk failure has medium to significant impact on throughput performance. RAID 5
also consumes a lot of resources in rebuild operations, meaning implementation
in software as opposed to a dedicated hardware controller would impact the host
system processor and applications more than is desired.
RAID 5 is an excellent option for general file and application servers,
database servers and web/email/news servers. To that end, RAID 5 is the most
commonly deployed RAID level in network server environments.
|
|
Back to top
|
RAID 6
|
|
RAID 6 is an extension of RAID 5 and provides added redundancy by using two
parity sets instead of one. The advantage here is that up to two disks can fail
in the array without compromising data integrity. RAID 6 requires a minimum of
four disks as opposed to three disks for RAID 5. As it requires quite powerful
computational resources, few hardware RAID controllers have this algorithm
implemented. However it is quite common in software RAID implementations that
make use of the host system processing facilities. The total storage capacity
for RAID 6 arrays is sum(N - 2) where
N is the number of disks in the set.
Like RAID 5, RAID 6 is great for database servers, file and print
applications and web and email serving. It provides an excellent amount of
fault-tolerance with very little overhead when compared to other resilient RAID
levels such as RAID 5 and RAID 10.
|
|
Back to top
|
RAID 10
|
|
RAID 10 is a nested RAID level and can be described as striped
mirroring. Like RAID 0+1, RAID 10 provides the benefits of both
resiliency and performance. Multiple RAID 1 arrays are grouped into a single
RAID 0 array and the striping of blocks is mirrored via the child arrays. A RAID
10 array can lose all but one drive in each of the child RAID 1 arrays without
compromising data integrity. However, if all the drives in one child RAID 1
array should be lost, the entire RAID 10 array will be compromised as would be
the case for a single drive loss in a RAID 0 array.
RAID 10 is very popular for high transaction applications such as databases
as write speeds are very good with quite acceptable levels of data security and
integrity. The total capacity of a RAID 10 array is
sum(N/2) where N is the
number of drives in the array and count(N) is even.
|
|
Back to top
|
RAID 50
|
|
Like RAID 10, RAID 50 is a nested RAID level. It consists of striping (RAID
0) over two or more RAID 5 arrays. RAID 50 gives an added performance boost over
RAID 5 with the caveat of being twice as expensive (assuming two RAID 5 sets are
being combined into a RAID 50). RAID 50 provides better performance than RAID 5
with limited loss in capacity. RAID 50 is able to achieve high data transfer
rates as a result of the RAID 5 segments and good I/O rates for small requests
due to the RAID 0 striping layered over the RAID 5 segments.
RAID 50 suffers from a similar intolerance as RAID 10 in terms of degradation
of a child RAID 5 set. A failed child RAID 5 set, which can occur if two drives
from within the same RAID 5 set fail, in a RAID 50 array will bring down the
entire array resulting in loss of all data.
|
|
Back to top



Back to top