Benchmarking Amazon EC2: The wacky world of cloud performance
Drivers of cloud machines should keep a close watch on performance
By Peter Wayner | InfoWorld | Published: 16:24, 01 May 2013
Before turning to the world of cloud computing, let's pause to remember the crazy days of the 1970s when the science of the assembly line wasn't well-understood and consumers discovered that each purchase was something of a gamble. This was perhaps most true at the car dealer, where the quality of new cars was so random that buyers began to demand to know the day a car rolled off the assembly line.
Cars built on Monday were said to be a bad risk because everyone was hung over, while cars slapped together on Friday suffered because everyone's mind was already on the weekend. The most paranoid buyers would scrutinize the calendar to see if the car emerged from the factory when midweek descents into bacchanalia like the World Series would fog the brains of everyone with a wrench.
Inconsistencies like that are largely forgotten by today's shoppers thanks to better testing, a greater devotion to quality, and a general exodus of manufacturing from places where people stay up too late drinking too much while watching too much sports. If anything, the modern factories spit out identical items with a consistency that borders on boring. When modern movies show an assembly line, it's often a charisma-free mechanism stamping out perfect duplicates.
Those memories of the 1970s came back to me when I started running benchmark tests on cloud computers. While computers are normally pumped out with such clonelike perfection that adjectives like "soulless" spring to mind, I started to discover that the clone metaphor may not be the best one for the machines available for rent from the compute clouds. The cloud machines are not all alike, produced with precision like the boxes that sit on our desks. They're a bit like those cars of the '70s.
I started learning this after several engineers from the cloud companies groused at the benchmark tests in my big survey of public clouds. The tests weren't sophisticated enough, I was told.
My cloud benchmark results were never meant to be a definitive answer, just a report on what the user might experience walking through the virtual door. Like Consumer Reports, I signed up for a machine, uploaded Java code, and wrote down the results. While the DaCapo benchmark suite has advantages such as processor independence (Java) and a wide collection of popular server-side code (Jython, Tomcat), the results were scattered. The only solution, I concluded, was to test the very code you plan to run because that's the only way to figure out which cloud machines will deliver the best performance for your particular mix of I/O, disk access, and computation.
One engineer didn't like this approach. He started asking pointed questions that were surprisingly complex. What was the precise methodology? Which operating system did the tests run upon? How often were the tests run? Did the tests control for the time of day?
Time of day? Yes, the engineer said. He really wanted to know whether we were paying attention to when we run the tests. Sure, operating systems are an obvious source of performance differences, and using the latest drivers and patches always makes sense. But he also wanted to know the time of day.
This was new. CPUs have clocks that tick extremely quickly, but they'll stamp out computation at the same rate morning, noon, and night. Was it really important to watch the time? Yes, he said. In other words, the cloud machines may be more like a '70s-era Detroit assembly line than a Swiss watch.