IT Jobs

Did you know? Techworld now offers an IT Jobs section with hundreds of jobs! Current job listings are now available for Software Developers, Web Developers, Application Engineers, Project Managers, Graduate opportunities and more. Apply for your new IT job today!

Google's storage infrastructure - part 2

In-house file system and no RAID

In part one of this feature we learnt about Bigtable. Next we look at the file system infrastructure underneath it and aim to see if this is something enterprise datacentres could use.

No standard Windows or Unix/Linux product

Obviously Google isn't using any standard operating system and file system here. It's Linux O/S has Google's own Google File System, a distributed one and we see how efficient it is in looking for and reading block-level data from disk.

Another Google paper states: 'the Google File System (is) a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.'

Google file system

(We quote extensively from this paper in this section.) GFS was conceived as the backend file system for Google's production systems. GFS provides a location independent namespace which enables data to be moved transparently for load balance or fault tolerance.

It has been designed from a point of view that component failures are the norm rather than the exception. The file system consists of hundreds or even thousands of storage machines built from inexpensive commodity parts and is accessed by a comparable number of client machines.

The quantity and quality of the components virtually guarantee that some are not functional at any given time and some will not recover from their current failures. Google has seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors, networking, and power supplies. Therefore, constant monitoring, error detection, fault tolerance, and automatic recovery must be integral to the system.

Files are huge by traditional standards. Multi-GB files are common. Each file typically contains many application objects such as web documents. Google is regularly working with fast growing data sets of many TBs comprising billions of objects, so it is unwieldy to manage billions of approximately KB-sized files even when the file system could support it. As a result, design assumptions and parameters such as I/O operation and blocksizes were revisited. (GFS uses a chunk size of 64MB, much larger than typical file system block sizes.)

Most files are mutated by appending new data rather than overwriting existing data. (This characteristic will radically increase Google's disk capacity needs on its own.) Random writes within a file are practically non-existent. Once written, the files are only read, and often only sequentially. A variety of data share these characteristics.

Some may constitute large repositories that data analysis programs scan through. Some may be data streams continuously generated by running applications. Some may be archival data. Some may be intermediate results produced on one machine and processed on another, whether simultaneously or later in time.

Given this access pattern on huge files, appending becomes the focus of performance optimization and atomicity guarantees, while caching data blocks in the client loses its appeal.

The Google file system has been designed for a specific enterprise environment. Google is, in effect, an absolutely massive but highly specialised set of applications with parallelism characteristic of them.

No RAID

GFS data is stored in chunks. A GFS cluster is highly distributed and typically has hundreds of chunkservers spread across many machine racks. These chunkservers in turn may be accessed from hundreds of clients from the same or different racks. Communication between two machines on different racks may cross one or more network switches. Additionally, bandwidth into or out of a rack may be less than the aggregate bandwidth of all the machines within the rack. Multi-level distribution presents a unique challenge to distribute data for scalability, reliability, and availability.

The chunk replica placement policy serves two purposes: maximize data reliability and availability, and maximize network bandwidth utilization. For both, it is not enough to spread replicas across machines, which only guards against disk or machine failures and fully utilizes each machine’s network bandwidth. GFS must also spread chunk replicas across racks. This ensures that some replicas of a chunk will survive and remain available even if an entire rack is damaged or offline (for example, due to failure of a shared resource like a network switch or power circuit).

It also means that traffic, especially reads, for a chunk can exploit the aggregate bandwidth of multiple racks. On the other hand, write traffic has to flow through multiple racks, a trade-off Google makes willingly.

Users can specify different replication levels for different parts of the file namespace. The default is three. The master clones existing replicas as needed to keep each chunk fully replicated as chunkservers go offline or detect corrupted replicas through checksum verification

As disks are relatively cheap and replication is simpler than more sophisticated RAID approaches, GFS currently
uses only replication for redundancy and so consumes more raw storage than other approaches..

The disk infrastructure Google uses will have been developed in conjunction with the file system and cluster-based processing concepts with Bigtable developed on this foundation. We'll look no further into the file system unless it's necessary to understand the disk infrastructure.

Disk infrastructure

Google's paper on drive failures stated 'More than one hundred thousand disk drives were used for all the results presented here. The disks are a combination of serial and parallel ATA consumer-grade hard disk drives, ranging in speed from 5400 to 7200 rpm, and in size from 80 to400 GB. All units in this study were put into production in or after 2001. The population contains several models from many of the largest disk drive manufacturers and from at least nine different models.

They are deployed in rack-mounted servers and housed in professionally-managed datacenter facilities. Google runs its own burn-in process: 'Before being put into production, all disk drives go through a short burn-in process, which consists of a combination of read/write stress tests designed to catch many of the most common assembly, configuration, or component-level problems.'

So?

Google is building a data handling infrastructure that is probably the largest the world has ever seen, and one that is greatly different in scale and use from business data centres, even the largest ones.

Everything is layered with each layer dependent upon features of the one beneath it, and tuned to help the layers above it. In other words, you can't take out a layer of this infrastructure and use it on its own.

One reading of this is that the Google storage infrastructure is irrelevant as a model for business to use. A second is that Google could realistically provide software as a service. It will already have accumulated much experience from its initial Gmail offering and the rolling out of its desktop productivity applications seems quite practical, even at this simplistic review level of its activities.


What are your views on this subject? Use the form below to post a comment on this article up to 500 characters.


Characters remaining: 500

Add your commentComments

zxq200407 | Published: 09:02 GMT, 04 January 2009

good content

Related Storage news

HP tool offers continous laptop backup

Set it and forget.

Intel fixes drive bricking firmware update for flash drives

Company to re-release SSD software

IBM offers Lotus Symphony on Keepod USB devices

Thin USB device uses VMware to provide secure access to the Lotus suite

Sun claims record-breaking storage array

Says Storage 7000 is fastest on the planet

Related Storage reviews



Email this article to a friend or colleague:


PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

Database security: Preventing enterprise data leaks at the source

IDC discusses the growing internal threats to business information, the impact of government regulations on the protection of data, and how enterprises must adopt database security best practices...

Download Whitepaper

Service-oriented security

SOA has become an integral part of enterprise software by providing a framework to efficiently develop software as services that is easily sharable, reusable, and integrated. No where is the need more apparent than in the Identity Management space. Welcome to the age of Service-Oriented Security (SOS).

Download Whitepaper

Data protection prospective vendor checklist

Organisations need a way to map business needs against all these challenges in procuring a technical solution. To help, SANS has developed the following Prospective Vendor Checklist.

Download Whitepaper

Unlock the power of the mainframe

This whitepaper presents the notion of CICS as an integration hub based on a component-based, service-oriented architecture supporting Web services. Highlights will review the challenges and contrasted support for Web services natively in CICS.

Download Whitepaper

Techworld UK - Technology - Business

COLT White Paper

Are all VoIP services the same?

Questions to ask your service provider to ensure you get the VoIP service you need
With careful choice of partner, your business can have all the advantages of VoIP access - reduced costs, flexibility and simplicity - without the drawbacks.
This white paper is your guide to ensure you get right the VoIP service and details the pitfalls which businesses would do well to avoid.

Download white paper
BMC

Ride the express lane in the journey to speed ITIL adoption

Explore the challenges in making the journey to ITIL and the criteria for selecting consulting services
By following ITIL practices, your IT organisation will become more closely integrated with the business. We recommend making the journey to ITIL in a sequence of six incremental steps, the phases of which are driven through execution of a strategic transformational roadmap.

Download white paper

Webcast: IT Financial Management: Cost Optimisation for Efficiency and Agility.
On Demand Webcast
Join this webcast to learn about the techniques and technologies that can help you prove the value of IT to the business by understanding the true cost of today's IT services and those that will be necessary to deliver future success.

Register Today

Site Map

IDG Network

* *