IT Jobs

Did you know? Techworld now offers an IT Jobs section with hundreds of jobs! Current job listings are now available for Software Developers, Web Developers, Application Engineers, Project Managers, Graduate opportunities and more. Apply for your new IT job today!

Disk drives lack reliable failure model

You should absolutely not depend solely on RAID 5

In storage circles, much discussion has arisen from the very interesting papers (here and here) investigating disk drive reliability presented recently at FAST '07. Other columnists and bloggers, such as Frank Hayes and Robin Harris, have already done an excellent job of covering them. Rather than repeat the details, I'd like to take the perspective of what the implications are for service level commitments with the storage infrastructure.

In tiered storage architectures, distinctions among service levels are commonly based on attributes like performance and availability. Given the findings of these studies, it's worthwhile to review service levels and the design of supporting storage tiers.

Of the various findings, two factors stand out in this regard. The first is the lack of a reliable failure predictability model. The Google study, examining attributes such as age, heat, access, and SMART diagnostic data in consumer drives, found many drives failed without prior indication. The Carnegie Mellon (CMU) study does suggest that age is a factor in reliability, but it becomes significant far sooner than expected - in as little as two years. So, while the probability of a drive failing increases as it ages, the only meaningful action that can be taken from a service delivery perspective is to continue with regular tech refreshes (e.g., a 3-year cycle) and perhaps to institute a process to record and analyse disk failure as in these studies, but tailored to the particular environment.

Second, if you are making commitments of availability greater than three nine's (99.9%), the CMU study confirms what hopefully you already know: you absolutely should not depend solely on RAID 5. The increased likelihood of failure among related drives found in the study along with the increasingly long rebuild times required for the current crop of high capacity drives creates a risk of data loss that should not be ignored. In fact, I would suggest that either replication or host-based volume management mirroring to another storage system be implemented to support these availability levels. If this is not feasible then within a single storage array improved availability through mirroring (e.g. RAID 10 or RAID 50 -- mirrored RAID 5 sets), or dual parity (e.g. RAID 6) should be considered.

Disk drives are miraculous devices and, current headlines to the contrary, they are incredibly reliable given what they do. But when you have hundreds or thousands of them spinning continuously, some number of failures are unavoidable. Understanding the risks, reviewing service commitments, and being prepared for the inevitable is a must.

Jim Damoulakis is chief technology officer of GlassHouse Technologies Inc., a leading provider of independent storage services. He can be reached at jimd@glasshouse.com.


What are your views on this subject? Use the form below to post a comment on this article up to 500 characters.


Characters remaining: 500

Related Storage news

Startup offers EMC Centera data migration services

Interlock to move Centera data to any other NAS array

Intel launches $125 'affordable' SSD

Intel X25-V 40GB solid state drive offers lower performance with same per-gigabyte cost

3par offers new type of data tiering

Adaptive optimisation keeps costs lower

British hopeless at backing up PCs

Germans and French plan for disaster

Related Storage reviews



Email this article to a friend or colleague:


PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

Email archiving: Top 10 myths and challenges

This survey looks at a number of challenges and myths around email archiving that may also slow adoption of full archiving.

Download Whitepaper

Strategic mobile deployments

Deploying mobile applications? Supporting multiple devices? See why mobile platforms should be part of your IT strategy.

Download Whitepaper

Creating an AUP: Common myths & mistakes

Avoid the common myths & mistakes when implementing your AUP

Download Whitepaper

Legal risks of uncontrolled email and web use

Exploring the challenges facing IT Mangers today and vital steps to ensure safe internet an email use by employees.

Download Whitepaper

Techworld UK - Technology - Business

COLT White Paper

Virtualisation 2.0
Driving to higher ground beyond the basics

Virtualisation can deliver unparalleled efficiency and cost reductions to your business, allowing direct access to servers and guaranteeing a dependable, rapid response in times of crisis. Read this e-book to learn more about consolidation, discover the latest technologies and find out how to reduce the TCO of virtualisation.

Download E-Book
COLT White Paper

IT Misuse Survey

Complete this survey and you could win a Nexus One

Techworld are running a short survey to discover how UK businesses are managing Internet and email misuse in the Enterprise.

Complete Survey

Webcast: IT Financial Management: Cost Optimisation for Efficiency and Agility.
On Demand Webcast
Join this webcast to learn about the techniques and technologies that can help you prove the value of IT to the business by understanding the true cost of today's IT services and those that will be necessary to deliver future success.

Register Today

Site Map

IDG Network

* *