RAID level for personal archival?

Discussion:

(too old to reply)

Derrick Coetzee

2009-09-18 06:41:31 UTC

Hi all,

I'm building a RAID solution for personal archiving (general large
things like raw photos, full disk backups, DVD rips, that sort of
thing). I picked up a 7-disk NAS tower that I'm filling with 2 TB
desktop-class SATA disks. Much of this is irreplacable data - RAID
failure would be very bad.

The question I have is: is RAID 5 sufficient, or is RAID 6 needed?
Because this is for personal archival, the system would only be turned
on occasionally (much longer per-disk mean time to failure), and in
the event of a disk failure, could be turned off until the disk is
replaced (it would never run in degraded mode). On the other hand,
rebuilding a 2 TB disk could take a while, and this interval could by
itself provide ample opportunity for a second disk failure
(particularly if my disks happened to come from a single batch).
Should I play it safe and go for RAID 6 here?

Thanks for your advice!

--
Derrick Coetzee

David Brown

2009-09-18 08:19:06 UTC

Permalink

Post by Derrick Coetzee
Hi all,
I'm building a RAID solution for personal archiving (general large
things like raw photos, full disk backups, DVD rips, that sort of
thing). I picked up a 7-disk NAS tower that I'm filling with 2 TB
desktop-class SATA disks. Much of this is irreplacable data - RAID
failure would be very bad.
The question I have is: is RAID 5 sufficient, or is RAID 6 needed?
Because this is for personal archival, the system would only be turned
on occasionally (much longer per-disk mean time to failure), and in
the event of a disk failure, could be turned off until the disk is
replaced (it would never run in degraded mode). On the other hand,
rebuilding a 2 TB disk could take a while, and this interval could by
itself provide ample opportunity for a second disk failure
(particularly if my disks happened to come from a single batch).
Should I play it safe and go for RAID 6 here?

A rebuild runs in degraded mode, and is the hardest task for the disks
since everything on all disks needs to be read again. It is therefore a
likely time for other disks to fail. This is particularly true if all
the disks are from the same batch (and they are obviously subject to
roughly the same environment).

What about running raid 10 instead? It's not as space efficient, but
it's arguably safer than raid 6 (raid 6 can survive two disk fails -
/if/ it can survive the rebuild afterwards. raid 10 can survive one
disk fail, followed by more failures as long as you don't hit both
drives in a pair. And rebuild is much faster and easier).

Cydrome Leader

2009-09-18 15:43:21 UTC

Permalink

It would be bad to lose your raid, but you still need to back it up. Trust
me, a "NAS tower + desktop class sata disks" will fail at some point, and
you will lose all your stuff.

Post by Derrick Coetzee
The question I have is: is RAID 5 sufficient, or is RAID 6 needed?

5 would be sketchy for such large drives. 6 is better.

Post by Derrick Coetzee
Because this is for personal archival, the system would only be turned
on occasionally (much longer per-disk mean time to failure), and in
the event of a disk failure, could be turned off until the disk is
replaced (it would never run in degraded mode). On the other hand,

you don't know that it won't run in degraded mode unless you have some
magic way to stop the array upon any failure, which really don't do you
any good anyways.

Post by Derrick Coetzee
rebuilding a 2 TB disk could take a while, and this interval could by
itself provide ample opportunity for a second disk failure
(particularly if my disks happened to come from a single batch).
Should I play it safe and go for RAID 6 here?

yes, raid6 may be better. I'd suggest using a decent RAID controller if
you can too. Any controller built into a motherboard is garbage by the
way.

Again, you still need backups no matter what you do. you said it yourself-
the data is irreplacable. A RAID array can help reduce the chance of
HARDWARE data loss, but that's just one of many ways you can lose data in
the first place. It helps, but is never a complete answer by itself.

Bill Todd

2009-09-19 02:56:13 UTC

Permalink

Data sitting in one location can never be considered really safe - even
if it's replicated in that one location. So if you're at all serious
about protecting against its loss, your first priority would be to
maintain copies in at least two locations.

Given that, given that you'll apparently never need to update any
archived data in the archive, and given that you don't appear to expect
to need to access the data often, why use RAID at all rather than just
dump the data to replicated (and then geographically separated)
archive-quality DVD or similar optical media, or even tape? The fact
that a single RAID array might be more convenient to use is irrelevant
if your real priority is protection against data loss.

If you choose to go the RAID route anyway you'll definitely need a RAID
controller (whether hardware or software) that can ride through
detection of a bad block when reconstructing data after a disk failure.
RAID-6 should normally be able to do this, but since the likelihood of
a second whole-disk failure during reconstruction is several orders of
magnitude lower than that of hitting a bad block on one of the surviving
disks a reasonably-designed RAID-5 controller should be able to ride
through bad blocks as well, though of course the resulting block will be
corrupt after reconstruction (especially with large files the likelihood
that this will affect more than one file is vanishingly small, so think
of it as single-file corruption rather than array failure).

Or you could choose to maintain an unreplicated off-site archive (on
disk, optical media, or tape) plus all the same data (or at least all
that you really care about) on a running system at your primary
location. This would keep all the data readily accessible while
maintaining a remote copy just in case - plus have the advantage that
neither site used a RAID implementation that might have difficulty
dealing with bad blocks during reconstruction.

Just food for thought,

- bill

Ralph Becker-Szendy

2009-09-19 04:53:29 UTC

Permalink

I'd like to second everything Bill Todd said. And add a few things.

RAID has two functions: It enhances the reliability of your data (the
data is likely to be restorable even after a failure of part of the
system), and it enhances the availability of your data (the data can
be read at all times, even right after a fault, as long as the fault
is of the kind you're protected against). We'll have to look at
likely fault scenarios, and see what it would take to provide
reliability and availability at a level you need. You seem to say
that reliability is absolutely required, but availability is
secondary.

The highest probability of data loss in such a system is not hardware
failure, but brain failure. You cd into the archival file system, and
say "rm *". Or you're formatting some other filesystem, and pick the
wrong /dev/ entry, formatting your multi-terabyte array instead of the
USB stick. Big reliability problem. To guard against that, you need
to backup from your archive to some other file system. That other
file system should be either a dedicated backup solution (which can't
be written to), or disconnected most of the time (so it doesn't fall
prey to similar mistakes).

The next biggest problem is not actual drive failure, but failure of
your RAID firmware or software. This is particularly true if your
RAID array is consumer-grade (although enterprise-grade RAID arrays
have seen their share of data loss events too). RAID data losses are
particularly common ruding rebuild. For that reason, I would keep
rebuild simple (RAID 10), instead of RAID 6. Particularly true for
low-end RAID solutions that don't have NVRAM: for complicated reasons,
it's difficult to do parity-based RAID (RAID 5, 6 etc.) without NVRAM,
which more often than not causes low-end parity-based RAID systems to
be buggy. So you mjight suddenly find your whole array to be
scrambled. Again, backup is your friend.

Remember, it's not a backup until you have successfully restored it.
Part of such a storage solution must be exercising your emergency
procedures. Pull a drive from the RAID array, and watch it go through
rebuild, then reinsert the drive. Use "write long" to cause a parity
error, and make sure the array scrubs it out pretty fast (leaving
lingering errors unscrubbed is a recipe for disaster). Pretend that
your local array has died, and try to restore onto a spare array, and
make your restore procedures actually work.

Speaking of hardware failures: As has been mentioned already, a
combination of a failure of a while drive with a failure of a single
block is dangerously probably on 2TB-class disks. Recommending
single-fault-tolerant RAID 5 arrays this size would be irresponsible.
But much more likely than any of these faults are systemic faults
which take the whole array out, with all the nice redundancy. Ten
years ago, a 10TB array would have been the size of a few
refrigerators, would have been in a computer room with a raised floor,
with a fire suppression system, and its own Liebert power conditioner.
Management was done by a team of highly skilled storage admins, who
have gone to training classes at places like EMC, IBM, or StorageTek.
The array had no single point of failure (redundant power and cooling,
multiple power connections to multiple grids, batteries to power
through short outages). Today it fits on a desktop, people put their
bottles of beer on top of it, it's exposed to all manner of
environmental dangers (beginning with dust clogging the fans, up to
local fires), and it's likely to get fried by "Pakistani Gas and
Electric" (our local power company here in Silicon Valley). It has
probably just one power supply and one cooling fan - and those two
components are actually much less reliable than the disks themselves.
So more likely than a disk or block error is a complete failure of the
whole array, likely caused by an environmental problem (like local
fire in your office, or sprinklers going off by mistake). Again,
backup is your friend, and the backup better be in a different
building, far away (as was clearly demonstrated a few years ago,
putting your backup data center in the other tower of the World Trace
Center is insufficient). On the other hand, some of these systemic
failures (like fan or power supply failure) only leave your array
temporarily disabled (availability), and don't necessarily induce data
loss (reliability). Still, restoring from remote backup might be
faster than repairing your array.

Once you are protected against brain cramp, firmware faults, local
disaster, the protection against disk failure and disk data loss
becomes secondary. Let's get back to the distinction between
reliability and availability. Backup takes care of reliability. Do
you actually need continuous availability? If your power supply in
the RAID array fails, would it bother you if you have to wait 3 days
until the replacement has been shipped? Even assuming that 3 years
from now you can still get spare parts for your disk enclosure (only
likely if you bought the enclosure from a name-brand vendor, Dell,
Sun, IBM, HP, EMC, NetApp ...). If your office catches fire, would it
bother you if it took 3 days to purchase a replacement disk array and
restore it from the remote backup? If you can handle a few days of
downtime, then I would suggest not wasting your money on RAID, and
instead invest it in better remote backup, and more bandwidth. And if
you want to invest in RAID, pick RAID 10 - inefficient, but reliable
and simple.

--
Ralph Becker-Szendy ***@lr_dot_los-gatos_dot_ca_dot_us
735 Sunset Ridge Road; Los Gatos, CA 95033

mrvelous1

2009-09-20 14:59:44 UTC

Permalink

I'd like to second everything Bill Todd said. And add a few things.
RAID has two functions: It enhances the reliability of your data (the
data is likely to be restorable even after a failure of part of the
system), and it enhances the availability of your data (the data can
be read at all times, even right after a fault, as long as the fault
is of the kind you're protected against). We'll have to look at
likely fault scenarios, and see what it would take to provide
reliability and availability at a level you need. You seem to say
that reliability is absolutely required, but availability is
secondary.
The highest probability of data loss in such a system is not hardware
failure, but brain failure. You cd into the archival file system, and
say "rm *". Or you're formatting some other filesystem, and pick the
wrong /dev/ entry, formatting your multi-terabyte array instead of the
USB stick. Big reliability problem. To guard against that, you need
to backup from your archive to some other file system. That other
file system should be either a dedicated backup solution (which can't
be written to), or disconnected most of the time (so it doesn't fall
prey to similar mistakes).
The next biggest problem is not actual drive failure, but failure of
your RAID firmware or software. This is particularly true if your
RAID array is consumer-grade (although enterprise-grade RAID arrays
have seen their share of data loss events too). RAID data losses are
particularly common ruding rebuild. For that reason, I would keep
rebuild simple (RAID 10), instead of RAID 6. Particularly true for
low-end RAID solutions that don't have NVRAM: for complicated reasons,
it's difficult to do parity-based RAID (RAID 5, 6 etc.) without NVRAM,
which more often than not causes low-end parity-based RAID systems to
be buggy. So you mjight suddenly find your whole array to be
scrambled. Again, backup is your friend.
Remember, it's not a backup until you have successfully restored it.
Part of such a storage solution must be exercising your emergency
procedures. Pull a drive from the RAID array, and watch it go through
rebuild, then reinsert the drive. Use "write long" to cause a parity
error, and make sure the array scrubs it out pretty fast (leaving
lingering errors unscrubbed is a recipe for disaster). Pretend that
your local array has died, and try to restore onto a spare array, and
make your restore procedures actually work.
Speaking of hardware failures: As has been mentioned already, a
combination of a failure of a while drive with a failure of a single
block is dangerously probably on 2TB-class disks. Recommending
single-fault-tolerant RAID 5 arrays this size would be irresponsible.
But much more likely than any of these faults are systemic faults
which take the whole array out, with all the nice redundancy. Ten
years ago, a 10TB array would have been the size of a few
refrigerators, would have been in a computer room with a raised floor,
with a fire suppression system, and its own Liebert power conditioner.
Management was done by a team of highly skilled storage admins, who
have gone to training classes at places like EMC, IBM, or StorageTek.
The array had no single point of failure (redundant power and cooling,
multiple power connections to multiple grids, batteries to power
through short outages). Today it fits on a desktop, people put their
bottles of beer on top of it, it's exposed to all manner of
environmental dangers (beginning with dust clogging the fans, up to
local fires), and it's likely to get fried by "Pakistani Gas and
Electric" (our local power company here in Silicon Valley). It has
probably just one power supply and one cooling fan - and those two
components are actually much less reliable than the disks themselves.
So more likely than a disk or block error is a complete failure of the
whole array, likely caused by an environmental problem (like local
fire in your office, or sprinklers going off by mistake). Again,
backup is your friend, and the backup better be in a different
building, far away (as was clearly demonstrated a few years ago,
putting your backup data center in the other tower of the World Trace
Center is insufficient). On the other hand, some of these systemic
failures (like fan or power supply failure) only leave your array
temporarily disabled (availability), and don't necessarily induce data
loss (reliability). Still, restoring from remote backup might be
faster than repairing your array.
Once you are protected against brain cramp, firmware faults, local
disaster, the protection against disk failure and disk data loss
becomes secondary. Let's get back to the distinction between
reliability and availability. Backup takes care of reliability. Do
you actually need continuous availability? If your power supply in
the RAID array fails, would it bother you if you have to wait 3 days
until the replacement has been shipped? Even assuming that 3 years
from now you can still get spare parts for your disk enclosure (only
likely if you bought the enclosure from a name-brand vendor, Dell,
Sun, IBM, HP, EMC, NetApp ...). If your office catches fire, would it
bother you if it took 3 days to purchase a replacement disk array and
restore it from the remote backup? If you can handle a few days of
downtime, then I would suggest not wasting your money on RAID, and
instead invest it in better remote backup, and more bandwidth. And if
you want to invest in RAID, pick RAID 10 - inefficient, but reliable
and simple.
--
735 Sunset Ridge Road; Los Gatos, CA 95033

By the time go through with all of these recommendations you will have
effectively built yourself a Data Center. And at what cost? Maybe
you should consider augmenting a simpler cheaper solution with some of
these online services that offer a cloud model, where you backup your
data to their servers and they are contractually obligated to insure
your data safety. This gets you the ability to restore in case you rm
the hell out of it (brain cramp), restore in case of theft or damage
(beer spill), and pretty much insures that if you upgrade your
hardware, or are unable to get a replacement drive or RAID card, that
you have an offsite/offline solution to get the data back. By the
time you are finished with all the fault tolerances these folks have
kindly suggested, you may have very well paid for years of an offsite
backup already... Just a thought.

Bill Todd

2009-09-20 21:09:22 UTC

Permalink

mrvelous1 wrote:

...

Post by mrvelous1
By the time go through with all of these recommendations you will have
effectively built yourself a Data Center.

Not really - perhaps you got confused by the length of the explanations
to the point that you lost track of their substance.

And at what cost? Maybe

Post by mrvelous1
you should consider augmenting a simpler cheaper solution with some of
these online services that offer a cloud model, where you backup your
data to their servers and they are contractually obligated to insure
your data safety. This gets you the ability to restore in case you rm
the hell out of it (brain cramp), restore in case of theft or damage
(beer spill), and pretty much insures that if you upgrade your
hardware, or are unable to get a replacement drive or RAID card, that
you have an offsite/offline solution to get the data back.

The cloud model is attractive in some regards (e.g., ease of use and,
when your backup needs are limited, cost), less so in others. You'll
probably want to back up to at least two different cloud vendors to
guard against the possibility that one will suddenly cease to exist
(especially when dealing with a low-cost - or even free, for small
volumes - cloud vendor, since this end of the market hasn't even begun
to shake itself out yet). And good luck finding a vendor (especially a
low-cost one) who will come anywhere near guaranteeing your data's
safety (another good reason to use two of them) - which is not to say
that some don't take very reasonable precautions to protect your data,
just that they're not stupid when it comes to 'contractual obligations'.

By the

Post by mrvelous1
time you are finished with all the fault tolerances these folks have
kindly suggested, you may have very well paid for years of an offsite
backup already...

Unlikely, since the least expensive suggestion that satisfies all the
criteria mentioned above is simply to archive to at least one off-site,
off-line set of media (which you periodically verify are still good)
while retaining an on-site copy as well if you don't choose to replicate
the archived data to a second off-site location. This also gets around
the need to use RAID in any form in your archive, thus avoiding Ralph's
concerns about low-end RAID implementation quality.

- bill

Maxim S. Shatskih

2009-09-21 05:47:40 UTC

Permalink

Post by Derrick Coetzee
I'm building a RAID solution for personal archiving (general large
things like raw photos, full disk backups, DVD rips, that sort of
thing). I picked up a 7-disk NAS tower that I'm filling with 2 TB
desktop-class SATA disks. Much of this is irreplacable data - RAID
failure would be very bad.

Maybe tape drive is better?

--
Maxim S. Shatskih
Windows DDK MVP
***@storagecraft.com
http://www.storagecraft.com

A***@NOT.AT.Arargh.com

2009-09-21 06:17:25 UTC

Permalink

On Thu, 17 Sep 2009 23:41:31 -0700 (PDT), "Derrick Coetzee

I wouldn't use desktop-class drives for anything I REALLY wanted to be
kept. For something like that I would use high-end server-class or
NAS-class drives. With automatic backup to a really good tape drive
system. And mirroring to a remote location would be good, too.

And, in any case you need a really good UPS/power conditioner.

And throw in some lightening protection.

<snip>

--
ArarghMail909 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

To reply by email, remove the extra stuff from the reply address.