Agámi: Storing up for the future
How Agámi’s new style NAS/iSCSI product offers advantages over other storage boxes
Oh, look, here’s another one. New NAS (network-attached storage) boxes come onto the market regularly. The newer ones tend to have an iSCSI capability as well because it’s cheap and lets the maker claim both file and block storage. It’s a crowded market out there with no shortage of products. We need pretty good reasons to pay attention to yet another new NAS box.
The latest one comes from the cleverly-named start-up agámi, which is Sanskrit for 'actions (of the present life) expected to bear fruit in future.'
In a nutshell, it has supercharged its CPU department, used SATA (serial ATA) drives and built its own NAS stack, complete with unique file system and built-in data protection features (replication, snapshots, RAID), on top of a Linux kernel. We’re looking at proprietary software executing on four 64-bit Opteron CPUs fetching files from up to 48 SATA drives.
Agámi says it can serve files at the rate of over 1GB/sec aggregate throughput. That’s great but there are lots of high-end NAS and clustered NAS suppliers who offer fast performance, so we asked agámi’s Paul Speciale, VP of product management, to position the company’s AIS (agámi Information Server) products against some other suppliers’ products: BlueArc, Isilon, Acopia, NetApp, EMC, and Sun.
He provided very clear and honest answers as you will see.
TW: How do the AIS products compare and contrast with BlueArc? I suspect slower performance but less cost and lower power consumption plus the built-in replication and snapshotting and iSCSI.
PS: BlueArc and agámi products deliver many common features for the NAS and unified storage space, and actually offer quite similar performance (fast throughput and high IOPS), but we do it through radically different architectures. This impacts TCO in a pretty big way.
To start, the agámi Information Server series of "unified" (NAS and IP SAN) appliances deliver the following properties:
• Over 1 Gigabyte/sec aggregate throughput (our AIS6000 systems can saturate 11 of their 12 Gigabit Ethernet ports)
• ~20,000 "file ops/sec" in our AIS6000 series (analogous to SpecSFS benchmark mix - but we haven't yet officially published a SpecSFS result as of yet) using only 48 x 7,200 rpm SATA-II disk drives – BlueArc will always use 15K rpm FC drives for SpecSFS results (e.g., 104, 15K rpm 72GB Fibre Channel drives for 50,858 ops/sec.) So that’s twice the drives and four times the RPMs (hence seeks and access times) for only 2.5x the ops/sec.
• Integrated "live" file system replication - transactionally consistent replication at the logical (not physical) layer for guaranteed data consistency
• Very Low Cost of Ownership: low CapEx (as low as $4K/TB list price for high-performance storage), and low OpEx through simplified management (no element level details to manage - i.e. automated volume/RAID management), lower power consumption (1300 watts for 24TB raw - hence less than 60 watts per TB, hence lower cooling demands), and finally very high density (almost 5TB per rack unit in our 24TB system AIS6124)
agámi has built the AIS using the following architecture:
• 19”, 5U rack-mount chassis
• 4 x AMD Opteron CPUs (64bit SMP configuration - with full shared memory)
• HyperTransport based I/O to disk & network controllers (2 x 2.4GB/sec for each 24 disks, 1 x 2.4GB/sec to 12 Gb Ethernet ports)
• Dedicated SATA-II channel for each of the 48 SATA-II disks (no shared FC-AL loop or SCSI bus)
• 12GB RAM (read cache) & 2GB NVRAM (write cache)
On top of this hardware platform we have created a complete storage operating system called agámiOS. At its base we started with a stock 64bit/SMP Linux kernel, heavily modified for resiliency (modified MD for recoverability after system failures - including power failure during write), and optimised I/O path (eliminated redundant copies).
We then layer our own Systems Management Service (SMS) & virtual file system layer on this hardened Linux to provide a 64-bit, multi-threaded file system (agámiFS), with embedded live file system replication (agámiFSR) and automated failover (agámiHA) for non-stop operations.
The system also provides a full feature stack including NFS, CIFS (including mixed mode), NDMP, iSCSI, Snapshots (implemented as Copy-on-Write), Quotas (user, group, quota tree). The System's Management layer includes comprehensive thermal/environmental monitoring of all physical components (fans, power supplies, disk temperatures, sector failures, CPUs, etc.) as well as SMTP alerting and "call-home" to agámi support. The system may be managed through an HTTPS/browser based interface, a Command Line Interface (CLI) and may be monitored over SNMP with full MIB II support.
Our agámiFSR live file system replication works at the logical level, capturing file system and iSCSI "operations" in a transactionally consistent manner in our file system log (journal), and replicating them in an atomic manner (all or nothing), in time order, and efficiently (at the byte level of change-as per the application's write granularity), for best network efficiency and guaranteed consistency of the replica file system or target.
Replication may be enabled "per" file system or iSCSI target, and both synchronous and asynchronous replication modes are supported. Each file system or target may also replicate to an independent target (different machines/IP addresses or physical locations).
In the current release, agámiFSR is "one-to-one" between file system or target pairs, but in the summer of this year (2007) we will enable two multi-site replication modes (1-to-many, and cascading). In addition, we build our automated failover feature (agámiHA) on top of the synchronous version of agámiFSR. agámiHA provides a fully redundant (no single point of failure - i.e, no shared disk array) high availability solution, with integrated heartbeating between file system pairs, automated failover, and transparent failover (no need to remount or re-access data after failover).
Our agámiFS file system is intrinsically a distributed file system (we acquired IP in the distributed file system space when we started the company in 2003), and plan to enable this for Global Namespace capabilities in a release in early 2008.
Parallelised RAID reconstruction
To add to the reliability engineered into AIS, we have innovated with a modified RAID-5 scheme we term RAID5/ES (Enterprise Sparing), which resembles RAID recovery on these large density Enterprise-class SATA drives. Enterprise-class SATA disk drives (i.e., Western Digital RAID Edition) provide densities up to 500GB today (750GB in Q3 2007), and simultaneously provide 1.2M hours MTBF, 100 percent duty cycle rating, and very low bit read error rates (1 in 10^15 bits read - important criteria during a RAID rebuild since a bit read error is higher probability than a second drive failure), which puts these drives on par with most high-end Fibre Channel drives in terms of reliability.
On top of this, our RAID5/ES does logical RAID5 at the "partition" level (we subdivide large drives logically into smaller 50GB partitions on these drives, and then create RAID5 groups at the partition level. We then eliminate the use of dedicated spare disk drives (all disks in the array are active), and instead reserve a portion of each drive as "spare" capacity, that can be used during a RAID rebuild. If a drive fails, we can then perform RAID reconstruction in parallel, equal in order to the number of active partitions on the failed drive. This has the effect of speeding up rebuilds dramatically - we have measured on the order of 3.5 hours to rebuild a full 400GB drive for example (no load).
This contrasts dramatically with BlueArc who use a proprietary FPGA-based architecture, and depends on "high-end" Fibre Channel HBA's and disk-drives to deliver its high-performance, thereby increasing cost. While BlueArc certainly delivers high-performance, as well as tiered-storage, they require fast (15K rpm) disk drives for best performance, and suffer lower performance when configured with lower rpm/lower cost drives. BlueArc therefore cannot match agámi’s price/performance curve (as we ride on commodity component and enterprise-class SATA disk drive pricing for high-performance), which provides a dramatically lower BOM (bill of materials) cost for equal performance. In addition, this helps agámi provide advantages in density, power & cooling due to our use of these higher-density spindles.
(Part two of this interview continues with comparisons of agámi versus Isilon, Acopia, NetApp, EMC and Sun's X54500 Thumper hybrid NAS product.) ]