ExtraHop mines the network to glean operations intelligence
Old school management tools are not tenable in todays' highly virtualised environment
By John Dix | Network World US | Published: 10:30, 14 October 2013
Jesse Rothstein, who was the lead architect of F5's flagship product line, founded ExtraHop in 2007 to develop products to derive IT operations intelligence from data gleaned from the network. John Dix, Editor in Chief of Techworld's sister title Network World,recently caught up with Rothstein for an update on the company and what it has learned about things like virtual packet loss - the bane of highly virtualised environments.
How does your background at F5 help you at ExtraHop?
My co-founder Raja Mukerji and I were both at F5 for many years. And what we did at F5 was bring application awareness and application fluency to what was the load balancer, and that created a whole new product category called the application delivery controller. Over at ExtraHop, we leverage that same domain expertise in high-speed packet processing and application fluency, but we've brought it to a new space, much more on the IT operations side, and we're starting to call this IT operations intelligence.
Related Articles on Techworld
Raja and I had conversations with IT organisations and people we'd worked with in the past and it became apparent to us the end result of megatrends like server virtualisation, where VMs spin up and spin down and jump across the data centre, and agile development, where we roll out new versions of applications every two weeks or every two days, was resulting in an unprecedented level of scale, complexity and dynamism. And the previous generation of tools and technologies that companies use to manage these environments are no longer tenable. And that's if they have those tools at all. More often than not companies just throw smart people at the problem of figuring out what's going on.
So I would say, No.1, the situation has become such that we're beyond the capability of just throwing smart people at the problem and pulling a few all-nighters and ordering pizza. And No.2, the previous generation of tools were built for much smaller environments that were not dynamic. Those tools basically start off as bricks, and you parachute in teams of sales engineers and systems engineers and consultants to configure them in order to provide the visibility you need. Then if the environment changes, rather than automatically detecting the changes, you have to rinse and repeat that process.
So we started with the notion that these IT megatrends were occurring, that we had the domain expertise to solve some of the problems around scale and dynamism, and that we could provide visibility into these environments.
What are you lumping into the current generation of tools?
This is a taxonomy I've been thinking about for a while. In enterprise IT there are four or so sources of data that you can use to derive some intelligence about your environment.
So No.1 we have machine data, and I'm using a term that Splunk popularised. Machine data includes log files, SNMP and WMI, and all of these data sources are largely unstructured. Splunk and others like them realised that enterprises are producing a lot of this unstructured machine data and not really doing anything with it. So they built a platform to index it, archive it, and analyse it to derive some intelligence from it.
I sometimes joke that it's been transformational in the same way as fracking has been in the energy market. What I mean by that is, the value was always there, but by applying new technology we can now access it and extract it. So I think one source of data in the IT environment is this unstructured machine data.
Another source is what I would call code-level instrumentation. And this is what traditional Application Performance Management is based upon. Wily (acquired by CA) really founded that market, but companies like DynaTrace and AppDynamics and even New Relic make use of code-level instrumentation. They have agents that instrument the Java JVM or the .NET common language runtime, and they can derive some intelligence and some performance metrics around how that service performs. Where are the hotspots and bottlenecks? What's it doing? These are very useful tools for developers who have intimate knowledge of the code and want to see how it runs in production.
The third source of data I call service checks. There are lots of facilities for doing this. If you're running some sort of synthetic transaction (basically a script mirroring common user actions), you can use internal checks, which is what HP's Mercury SiteScope and Nagios do today, or external service checks like a Keynote or Compuware's Gomez. These give you a sense of if your service or your application are up or down and, to some degree, how it is performing. But there are some challenges with this approach because, given these things are periodic in nature, there's an inherent under sampling problem. So that means that if you've got any sort of intermittent issue you very well might miss it.
And finally the fourth fundamental source of data for intelligence is what we call wire data. That's everything on the network, from the packets to the payload of individual transactions. It is a very deep, very rich source of data. In fact, all indications are that wire data is at least one or two orders of magnitude larger than other sources of data, because there is just so much moving across our networks. And it's definitive. We know that a transaction completes if we can observe it completing on the wire and we can observe the peers in this conversation acknowledge that that transaction completed.
To a large degree wire data has been neglected. Yes, there have been products like network probes and packet sniffers for three decades or more, but I would say they only scratch the surface of what's available on the wire. At ExtraHop we founded the company on the premise that there is this tremendously rich, tremendously deep source of data on the wire, and by leveraging gains in processing power and storage capacity, that we could extract and analyse and derive intelligence from that data. It has required a completely different technology approach than you would do for any of the other sources of data. But it is, I believe, every bit as valuable.
I tell organisations that, as a best practice, they should probably have a product that is focused on each of these four sources. I wish I could say that there's one that does it all, but there isn't, because these do require pretty fundamentally different approaches.
APM providers argue they can see it all, embedded as they are in the applications. What are you providing they can't?
APM is really focused on code-level instrumentation, and there are probably three fundamental differences between us and APM. One is philosophical. We define the application differently. APM tends to define the application as the code running on a server and they instrument that. At ExtraHop we define the application as the entire application delivery chain. That includes the client devices, the network transport, the front end, the middleware, the transaction queuing, back-end storage and even other ancillary services. It's a chain because if any one link fails, the entire application is down, and any one link can be a bottleneck. I can't tell you how many applications I've seen where the code is running fine but the application fails because of something like DNS resolutions aren't completing. That has to be considered part of that delivery chain.
No.2 is audience. Traditional APM tends to be used more by developers who have intimate knowledge of the application code, whereas IT operation teams can get more out of our wire data analysis because it is focused on production-level systems. We answer the questions they care about most, like "What's happening right now? Did something change in my environment? Are transactions succeeding or failing? Is this better or worse than it usually is? What resources are people trying to access?"
And the third difference is between custom applications versus off-the-shelf packaged applications. APM solutions are much more popular with organisations that are developing custom applications because they're writing the code and the code is changing and they need to see how that's performing. We really sell to both. Yes, we absolutely are used by organisations that are writing custom applications, but we're also used by organisations who are dependent on packaged applications that they don't have very intimate knowledge of, but still absolutely care how well it's working.
You guys deliver as an appliance, right?
Yes. We're sold as a physical or a virtual appliance.
And where do you plug in?
For us, we just take a copy of the network traffic with no overhead at all. We're not in line, we're out of line. And how we get a copy of the traffic really depends on the environment. Sometimes it's directly from one or more switches using a SPAN port or a VACL capture. Sometimes there is a whole aggregation-tapping layer that's in place. Some organisations even use some pretty advanced SDN techniques to get us traffic to analyse. At the end of the day, if we get a feed of the traffic, we can make sense of it.
But I want to stress that, even though we're a network deployment and we analyse what I'm calling the wire data, we're really answering questions about the health and performance of business-critical applications. So it's not just network teams that use an ExtraHop system. And that's an important distinction, because I see that confusion a lot.
Do you have a sweet spot in terms of customer size?
Our high-end physical appliances can support 20 gigabits of line-rate analysis, and hundreds of thousands of transactions per second. So we have large enterprises and carriers that use multiple EH8000 appliances across the data centre with an ExtraHop Central Manager to provide a unified view. Our initial customers were larger enterprises, but we're starting to see more adoption at mid-size organisations because we also have virtual appliances that can analyse a gigabit of traffic and cost less than $10,000.