Supercomputer to connect to 400PB of storage via Ethernet
5PB of nearline storage will run over 40Gbps Ethernet on Blue Waters
By Lucas Mearian | Computerworld US | Published: 17:03, 24 May 2012
The National Center for Supercomputing Applications in the US is rolling out a storage infrastructure that will include 380 petabytes of magnetic tape capacity and 25 petabytes of online disk storage made up by 17,000 SATA drives.
The massive storage infrastructure is designed to support of one of the world's most powerful supercomputers, called Blue Waters. Commissioned by the National Science Foundation (NFS), Blue Waters is expected to have a peak performance of 11.5 petaflops, though the specification given by the NFS is for it to offer 1 petaflop of sustained computing power for applications.
The NCSA, which is run by the University of Illinois, has awarded Cray a contract to build the supercomputer. The system will run a Lustre parallel file system with more than 1 terabyte per second of throughput to its back end storage.
Related Articles on Techworld
The Blue Waters project will create a 1 petaflop supercomputer to handle real-world science and engineering applications. Among others things, it will aid in understanding how the cosmos evolved after the Big Bang, help predict the courses of hurricanes and tornadoes, and play a role in the design of new materials at the atomic level.
The supercomputer will be made up of more than 235 Cray XE6 cabinets using 380,000 AMD Opteron 6200 Series x82 processors and more than 30 cabinets of a future version of the recently announced Cray XK6 supercomputer with 3,000 NVIDIA GPUs. The system will include 1.5PB of aggregate memory from 190,000 memory DIMMs.
In support of all that computing power, the NCSA is deploying 25PB of disk storage using Cray Sonexion storage systems. Sonexion is a rebranding of Zyratex storage arrays. The system offers up to 1TBps of aggregate bandwidth over a 40Gbps Ethernet from Extreme Networks.
"We've been working heavily with networking vendors to make sure they're ready to go with the 40 gigabit Ethernet," said Michelle Butler, the NCSA's senior technical program manager in charge of storage and network engineering. "We're not the first to use 40Gbps Ethernet, but we're among only a few."
Key to using the 40Gbit Ethernet network is the ability to carve up the pipe into multiple 10Gbps Ethernet channels, to enable the NCSA to spread the fabric across multiple ports, Butler said. The Ethernet will be used to connect about 75 hosts.
The NCSA also selected a DataDirect Network's SFA 12K storage array delivering 100GBps of storage performance to offload data to the "nearline" tape library system. The disk subsystem is scalable to 500PB of capacity, Butler said.
"That subsystem has to be able to offload that terabyte-per-second file system, so we needed a very large tape drive infrastructure to be able to offload that," she said.
Behind the primary storage are four Spectra Logic 17-frame T-Finity tape libraries that will have 366, 240MB/sec. IBM TS1140 enterprise-class tape drives. The libraries will offer an aggregate read/write rate of up to 2.2PB per hour.
"We actually evaluated either LTO-5 or LTO-6 and the TS1140. We didn't specify what tape drives, what libraries or anything. We wanted the vendors to have the freedom to propose multiple solutions to us," Butler said.
Butler said the NCSA chose the IBM tape drives over more popular midrange LTO drives because they offer superior performance. The TS1140 offers 240MB/sec. throughput compared to about 140MB/sec. for the LTO drives, she said.
In its request for proposals, Butler's team highlighted 10 to 15 requirements for storage vendors to meet. Among other things, they stipulated that the tape library would have to fit into a certain square footage, could not exceed certain power and cooling requirements, and should meet certain reliability and performance targets.
Butler said the goal for aggregate throughput on the tape library was 100GB/sec. Currently, it's right around 89.5GB/sec.
The Cray supercomputer is connected to its tape library via Mellanox IS5000 InfiniBand switches and ConnectX InfiniBand adapters for its tape library storage network. The switches are using the InfiniBand QDR protocol, which offers up to 8Gbps throughput per lane and up to 12 I/O lanes. Butler said she wanted to use the higher bandwidth version of InfiniBand, FDR, but Cray's systems didn't support it.
InfiniBand FDR (Fourteen Data Rate), offers up to 13.6Gbps throughput per lane or 163.6Gbps over 12 I/O lanes.
While the NCSA could have chosen products from any number of enterprise-class disk storage vendors for use in the supercomputer, Butler and her team felt the NCSA would receive better support if it all came from Cray.
"Lustre, as you probably know, is not the easiest thing to take care of and maintain, so we wanted to partner with one specific [vendor] for the software and hardware and have an appliance to do the failover and that hard stuff. And, we've been running Lustre here since 2003," Butler said. "So I understand [Cray] trying to simplify our system for us."