Five reasons agents are wrong for application performance management
Keep up with the changing needs of business development
By Jesse Rothstein | Network World US | Published: 12:40, 22 November 2011
Despite rapid evolution in the application performance management (APM) market, few enterprise IT organisations would say they have sufficiently solved their application performance problems. If anything, the complexity challenges posed by virtualisation, agile development practices, multi-tier application architectures and other IT mega-trends are outpacing the capabilities of legacy APM products.
In light of this, IT organisations must judiciously evaluate the effectiveness of technologies and management practices they use to manage application performance. One conventional APM practice that deserves some scrutiny is the derivation of application health and performance data from host-based instrumentation.
Most APM technologies rely on agents deployed on servers or within application components to gather diagnostic data. These agents typically perform byte code instrumentation or call-stack sampling within the Java Virtual Machine (JVM) or the .NET Common Language Runtime (CLR), basically using profiling techniques common to software development tools.
Related Articles on Techworld
Certainly, this practice can yield useful information for managing application performance, including memory usage and the frequency and duration of function calls. However, this legacy APM approach suffers from five inherent drawbacks that make it increasingly untenable in today's IT environments.
Susceptibility to changes in application code, architecture and environment
During test and development, software engineers often use profilers to locate hot spots and remove bottlenecks in their code. While annotated source code and deep call stacks are acceptable for the developers, they are less useful to operations teams. In production, operations teams need to answer higher-level questions about application health and performance. To provide this view, agent-based APM tools require complicated configurations that are sensitive to changes in the application code, architecture or environment.
This limitation may not have been a serious problem in the static environments of the past, but today's applications undergo ongoing, iterative development, use loosely coupled multi-tier architectures, run on heterogeneous software and hardware platforms, and operate in virtualised environments where virtual machines are spun up, spun down and migrated across the data centre. With such rapid change at the application tier, host-based data gathering requires continual recertification and redeployment to ensure that it is functioning properly.
System and network overhead
APM vendors that rely on host-based data gathering claim that their approach imposes "minimal overhead" or "low overhead" on system performance, yet these vendors seldom offer guarantees. While the actual overhead incurred depends on the specificity of the data gathered and the application itself, less than 5% performance overhead is an optimistic general estimate.
Five percent overhead might be acceptable for some applications, but the problem is compounded when organisations use multiple monitoring tools, each with their own agent consuming system resources. In addition, host-based data gathering consumes bandwidth, creates significant noise on the network as data is sent to a central server and can perturb the very system being monitored.
Host-based APM tools require agents or other collectors to be deployed and maintained on every system they monitor. They must be patched, updated and upgraded regularly. With each operating system service pack, these tools need to be recertified. Many IT organisations take this management burden for granted, treating it as a necessary evil.
Newer entrants to the APM market have rightfully targeted traditional vendors such as CA and HP for the extraordinary time and cost required to deploy their technologies. These newer vendors skirt the complexity problem by limiting the deployment options and the level of detail of the collected data. In essence, these low cost, host-based APM tools trade specificity for simplicity.
Limited visibility of network performance
A strictly host-based view of application performance can provide only secondary indicators of network performance issues that affect application delivery. To compensate, most APM tools that rely on host-based data also offer a separate network-monitoring component.
Skewed end-user experience measurements within virtualised environments
In virtualised environments, the hypervisor schedules CPU time across multiple guest operating systems. As a side effect, the guests' sense of time becomes confused. When a guest is not scheduled, time stops from its perspective. When it is scheduled, the hypervisor must catch it up again by advancing time rapidly.
This stopping and speeding of time results in incorrect measurements when a host-based agent attempts to measure the end-user experience. In some cases, the agent can query the hypervisor to determine how long time was stopped. Although this workaround can provide a rough sense of how inaccurate the metrics are, there is no definite solution for this problem.
Not surprisingly, APM vendors that rely on host-based data differentiate themselves in their marketing claims by minimising these limitations. Thankfully, detailed health and performance information does not require gathering data directly from the hosts at all.
Recent gains in processing power and storage capacity have made a network-based APM approach feasible that can perform deep, real-time analysis of application transactions as they pass over the wire. By reassembling application transactions from network traffic and analysing the application details contained at Layer 7, network-based APM can provide IT teams with valuable insights into application performance, such as a particularly slow database procedure, a specific web server error or a method used in a transaction.
In addition, this approach extracts valuable network performance metrics from Layer 2 through Layer 4, providing industry-leading TCP analysis and other network-level information.
Now is the time to reassess old assumptions. Already pressed IT teams cannot afford to deal with the headaches associated with host-based instrumentation any longer. IT organisations owe it to themselves to consider new approaches to managing their application performance. Network-based APM offers an elegantly simple alternative that delivers comprehensive health and performance data while completely avoiding problems inherent to gathering host-based performance data.
Jesse Rothstein is the CEO and co-founder of ExtraHop Networks