Follow Us

We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message

ZFS - the future of file systems?

Sun's ZFS could presage a round of file system upgrades. On the other hand ...

Article comments

ZFS - the Zettabyte File System - is an enormous advance in capability on existing file systems. It provides greater space for files, hugely improved administration and greatly improved data security.

It is available in Sun's Solaris 10 and has been made open source. The advantages of ZFS look so great that its use may well spread to other UNIX distributions and even, possibly and eventually, to Windows.

Techworld has mentioned ZFS before. Here we provide a slightly wider and more detailed look at it. If you want to have even more information then the best resource is Sun's own website.

Why is ZFS a good thing?
It possesses advantages compared to existing file systems in these areas:-

- Scale
- Administration
- Data security and integrity

The key area is file system administration, followed by data security and file system size. ZFS started from a realisation that the existing file system concepts were hardly changed at all from the early days of computing. Then a computer knew about a disk which had files on it. A file system related to a single disk. On today's PCs the file systems are still disk-based with the Windows C: drive - A: and B: being floppy drives - and subsequent drives being D:, E:, etc.

To provide more space and bandwidth a software abstraction was added between the file system and the disks. It was called a volume manager and virtualised several disks into a volume.

Each volume has to be administered and growing volumes and file systems takes effort. Volume Manager software products became popular. The storage in a volume is specific to a server and application and can't be shared. Utilisation of storage is poor with any unused blocks on disks in volumes being unusable anywhere else.

ZFS starts from the concept that desktop and servers have many disks and that a good place to start abstracting this is at the operating system:file system interface. Consequently ZFS delivers, in effect, just one volume to the operating system. We might imagine it as disk:. From that point ZFS delivers scale, administration and data security features that other file systems do not.

ZFS has a layered stack with a POSIX-compliant operating system interface, then data management functions and, below that, increasingly device-specific functions. We might characterise ZFS as being a file system with a volume manager included within it, the data management function.

Data security
Data protection through RAID is clever but only goes so far. When data is written to disk it overwrites the current version of the data. There are instances of stray or phantom writes, mis-directed writes, DMA parity errors, disk driver bugs and accidental overwrites according to ZFS people, that the standard checksum approach won't detect.

The checksum is stored with the data block and is valid for that data block, but the data block shouldn't be there in the first place. The checksum is a disk-only checksum and doesn't cover against faults in the I/O path before that data gets written to disk.

If disks are mirrored then a block is simultaneously written to each mirror. If one drive or controller suffers a power failure then that mirror is out of synchronisation and needs re-synchronising with its twin.

With RAID if there is a loss of power between data and parity writes then disk contents are corrupted.

ZFS does things differently.

First of all it uses copy-on-write technology so that existing data blocks are not over-written. Instead new data blocks are written and their checksum stored with the pointer to them.

When a file write has been completed then the pointers to the previous blocks are changed so as to point to the new blocks. In other words the file write is treated as a transaction, an event that is atomic and has to be completed before it is confirmed or committed.

Secondly ZFS checks the disk contents looking for checksum/data mismatches. This process is called scrubbing. Any faults are corrected and a ZFS system exhibits what IBM calls autonomic computing capacity; it is self-healing.

Scale
ZFS uses a 128-bit addressing scheme and can store 256 quadrillion zettabytes. A zettabyte is 2 to the power 70 bytes or a billion TB. ZFS capacity limits are so far away as to be unimaginable. This is eye-catching stuff but unlikely to be a factor solving 64-bit file system capacity limitations for decades.

Administration
With ZFS all storage enters a common pool, called a zpool. Every disk or array added to ZFS disappears into this common pool. ZFS people characterise this storage pool as being akin to a computer's virtual memory.

A hierarchy of ZFS file systems can use that pool. Each can have its own attributes set, such as compression, a growth-limiting quota, or a set amount of space.

I/O characteristics
ZFS has its own I/O system. I/Os have a priority with read I/Os having a higher priority than write I/Os. That means that reads get executed even if writes are queued up.

Write I/Os have both a priority and a deadline. The deadline is sooner the higher the priority. Writes with the same deadline are executed in logical; block address order so that, in effect, they form a sequential series of writes across a disk which reduces head movement to a single sweep across the disk surface. What's happening is that random write I/Os are getting transformed into sets of sequential I/Os to make the overall write I/O rate faster.

Striping and blocksizes
ZFS stripes files automatically. Block sizes are dynamically set. Blocks are allocated from disks based on an algorithm that takes into account space available and I/O counts. When blocks are being written to the copy-on-write concept means that a sequential set of blocks can be used, speeding up write I/O.

ZFS and NetApp's WAFL
ZFS has been based in part of NetApp's write Anywhere File Layout (WAFL) system. It has moved on from WAFL and now has many differences. This table lists some of them. But do read the blog replies which correct some table errors.

There is more on the ZFS and WAFL similarities and differences here.

Snapshots unlimited and more
ZFS can take a virtually unlimited number if snapshots and these can be used to restore lost (deleted) files. However, they can't protect against disk crashes. For that RAID and backup to external devices are needed.

ZFS offers compression, encryption is being developed, and an initiative is under way to make it bootable. The compression is applied before data is written meaning that the write I/O burden is reduced and hence effective write speed increased further.

We may see Sun offering storage arrays with ZFS. For example we might see a SUN NAS box based on ZFS. This is purely speculative as is the idea that we might see Sun offered clustered NAS ZFS systems to take on Isilon and others in the high-performance, clustered, virtualised NAS area.

So what?
There is a lot of software engineering enthusiasm for ZFS and the engineers at Sun say that ZFS outperforms other file systems, for example the Solaris file system. It is faster at file operations and, other things being equal, a ZFS Solaris system will out-perform a non-ZFS Solaris system. Great, but will it out-perform other UNIX servers and Windows servers, again with other things being equal?

We don't know. We suspect it might but don't know by how much. Even then the popularity of ZFS will depend upon how it is taken up by Sun Solaris 10 customers and whether ports to apple and to Linux result in wide use. For us storage people the ports that really matter are to mainstream Unix versions such as AIX, HP-UX and Red Hat Linux, also SuSe Linux I suppose.

There is no news of a ZFS port to Windows and Vista's own advanced file system plans have quite recently been downgraded with its file system changes.

If Sun storage systems using ZFS, such as its X4500 'Thumper' server, with ZFS-enhanced direct-attached storage (DAS), and Honeycomb, become very popular and are as market-defining as EMC's Centera product then we may well see ZFS spreading. But their advantages have to be solid and substantial with users getting far, far better file-based application performance and a far, far lower storage system management burden. Such things need proving in practice.

To find out for yourself try these systems out or wait for others to do so.


Share:

More from Techworld

More relevant IT news

Comments

sswam said: This is incorrect see here httpzfsonlinuxorg The restriction due to GPL is that you cannot distribute the linked binary module outside your own organization but we can build ZFS from source There is even an Ubuntu PPM package set now

Philip said: Sorry I must correct an error of omission ZFS drivers cannot run in Kernel space under Linuxs GNU license Obviously ZFS runs in Kernel space on Solaris and OpenSolaris Neither of which uses GNU licenses

Philip said: Sun specifically wrote its license for ZFS to prevent the ZFS drivers from operating within Kernel space ZFS must run in Userland While ZFS has been ported to Linux with the FUSE project ZFS in Userland means far too many context switches to produce excellent performance on LinuxWay to go Sun Oracle will certainly be much worse Immediately after the Sun merger completed it worked steadily and hard to annoy Suns built-up Open Source community

charles said: this brief post on ZFS gives idea about what is ZFS and its advantages over other filesystem Zfs is the future in the filesystem thanks for the document

Piotr K. said: ZFS supports compression using lzma Is there any Raid Card that transparently compress data written to the HDD

Cade Foster said: OpenSolaris also has ZFSAs an OpenSolaris user and developer I have found ZFS very useful eg The operating system can be patchedupgraded using a boot environment BE which is basically a ZFS-based snapshot of the patchedupgraded system and allowing rollbacks So if the current patch is buggy or not desired you can always rollback to the previous operating system state A BE can be createddeletedmanaged on-demand by the user This is good technoloogy



Send to a friend

Email this article to a friend or colleague:

PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

Choose – and Choose Wisely – the Right MSP for Your SMB

End users need a technology partner that provides transparency, enables productivity, delivers...

Download Whitepaper

10 Effective Habits of Indispensable IT Departments

It’s no secret that responsibilities are growing while budgets continue to shrink. Download this...

Download Whitepaper

Gartner Magic Quadrant for Enterprise Information Archiving

Enterprise information archiving is contributing to organisational needs for e-discovery and...

Download Whitepaper

Advancing the state of virtualised backups

Dell Software’s vRanger is a veteran of the virtualisation specific backup market. It was the...

Download Whitepaper

Techworld UK - Technology - Business

Innovation, productivity, agility and profit

Watch this on demand webinar which explores IT innovation, managed print services and business agility.

Techworld Mobile Site

Access Techworld's content on the move

Get the latest news, product reviews and downloads on your mobile device with Techworld's mobile site.

Find out more...

From Wow to How : Making mobile and cloud work for you

On demand Biztech Briefing - Learn how to effectively deliver mobile work styles and cloud services together.

Watch now...

Site Map

* *