Three Myths About Big Data and APM

In my last blog, I focused on how a big data approach can be applied to application performance monitoring (APM) to accelerate mean time to repair. Unlike data sampling or use of triggers, a big data approach to APM ensures that all the data is available when and as needed. APM big data can be used even in dynamic application environments that rely on microservices, virtualization, and cloud services.

Along with some of the misconceptions on how useful Big Data for APM really is, there are myths proliferating on whether this is even feasible. In this blog, I will talk about the three big myths around applying big data technology to APM.

What is APM big data?

APM data can be classified into two categories of measurements:

Data that describes the application workloads and informs how the application behaves/performs.
Data that describes the application environment, in the form of key metrics that show how the infrastructure responds to the demands of the application.

The size of APM big data is a byproduct of capturing raw and unfiltered data. It is an engineering feat to collect, store, analyze, and access petabytes of data, but the amazing aspect is that those petabytes of data represent the 100% unfiltered and unadulterated truth that we want to explore, understand and draw insight from. There are different aspects of big data that would apply to each category of measurements, but in both cases the end goal is to provide complete and correct understanding.

The tradeoff

Many APM products are not architected to support big data and instead force their customers to make tradeoffs between scale and data quality.

When APM goes for data quality and depth, it often can’t scale to support the application ecosystem…

…Or it scales by sacrificing data quality and depth.

APM vendors often address scalability requirements with sampling and trigger-based transaction tracing, by aggregating data, and by sacrificing call stack depth and detail. Or else they focus on just one or two applications, and deploy multiple analysis consoles.

You can see that these are unsatisfying options that do not provide the completeness of data needed to resolve performance problems, especially in ephemeral environments where resources are spun up and down based on demand and where shared dependencies mean that a failure in one component can negatively impact multiple transactions.

Myth 1: not scalable

Capturing detailed transaction records for every user action and backend activity does generate a lot of data, but APM is no different than other industries that have engineered big data solutions. The same big data techniques and architecture are applicable to APM.

For example, leveraging highly optimized data stores that are non-relational in addition to streaming and complex event processing architectures are well known and applied in many distributed systems scenarios. Also, federating the data processing and analytics across the APM components helps distribute the workload much like you would see in a Hadoop cluster.

With a three-tier processing approach, the in-app collectors do minimal work and pass on the raw data to a local processor which then sends the data to the central engine after performing compression and some base analytics. This scales far better than two-tiered approach where in-app collectors tax the application or the backend system.

Myth 2: hard to manage

Continuously capturing all transactions in detail is far more resilient and easier to manage because there is no reliance on a complex rules/inspection/trigger engine that needs to be configured and maintained. There is certainly more data persisted but managing storage is a solved problem at this point, and storage is a much cheaper resource than compute.

Myth 3: prohibitive overhead

Overhead is a key concern of any technology that is sitting in line with the application. APM agents are no different. Continuously capturing transactions to enable big data APM necessitates different techniques in how transactions are observed, recorded, and persisted to ensure the application does not suffer from overhead. Agents dynamically discover the application stack and record every critical method and application call in a transaction representation. Transactions are then compressed in flight and streamed leveraging a three-tier distributed architecture.

Our approach

Riverbed’s Application Performance Monitoring solution is architected to support the requirements of big data. Our product, SteelCentral AppInternals, directly addresses enterprise scalability requirements with proprietary technology that captures and stores every transaction and its associated metadata, down to deep levels of the user code, along with system metrics at 1-second intervals – without impacting application performance. With this, you get immediate insight into application problems, even for infrequent or intermittent issues, and a cost-effective, scalable solution for all your application monitoring needs.

For more musings on big data APM, check out this white paper “Why Big Data is Critical for APM” or learn more about our SteelCentral AppInternals product—available as SaaS or on premises.

Original Article