Introduction

Application Performance Management (APM) tools have been around since circa 2000 when Java emerged as the backend of choice for web applications. Early APM tools provided runtime visibility into the Java “black box” to aid in finding and fixing performance bottlenecks and locating the source of memory leaks.

A lot has happened since then; technologies have come and gone, application architecture has moved from monolithic to microservices, passing SOA (Service Oriented Architecture) along the way. Then came the rise of cloud computing radically changing the speed at which applications can be deployed and scaled.

The capabilities required of APM solutions for the new world of containerized microservices environments have shifted since new technologies and techniques have introduced new challenges. An effective APM solution must keep up with the speed of deployment and the dynamic allocation and deallocation of compute resources. This Buyer’s Guide summarizes best practices and advice on the requirements you should consider when selecting an APM solution.

DON’T WAIT! The Time to Change APM is Now!

There is a distinct architectural shift underway as organizations migrate their applications from monolithic and SOA architectures to containerized and orchestrated microservices. The monitoring tools that provided so much value in managing previous architectures are not suitable for container and microservice-based applications. Here are some reasons why:

  • Traditional tools can’t handle ephemeral containers
  • Agent architecture is too heavy (memory and CPU overhead) for hosts that are densely packed with containers
  • Data processing and transmission is too slow (often taking many minutes to see and analyze data)
  • The data collected is not granular enough (metrics granularity of 10 seconds or longer and heavily sampled transaction traces)
  • They require constant manual configuration to stay in syncwith dynamic environments
  • Fixed data models designed for monolithic and SOA applications, therefore they are not able to model more complex relationships found, for example, in Kubernetes

Now that you know the shortcomings of the legacy monitoring tools of the past 10 years, let’s explore the requirements for monitoring modern containerized (and even orchestrated) microservices applications.

Areas for Consideration

When assessing the capabilities of potential APM solutions, consider the requirements for both Day 1 and Day 2 operations. For example a solution may require the inclusion of an additional library with the application code in order to capture data from certain types of language run times.

This is a day 1 requirement – part of the initial setup, not an ongoing requirement. If your organization is already deploying multiple times per day, the extra library can be easily added as part of that workflow. The overall impact on developer time is minimal.

Day 2 operation requirements can be more arduous, as they are typically recurring tasks that have to be completed as part of every release. For example, if health signatures have to be continuously manually updated every time there is a code change or configuration change, then the drain on DevOps resources is more significant.

When performing your APM capabilities assessment, apply weighting to the capabilities depending on whether their impact is a one-off Day 1 operation or recurring Day 2 operations. It may be more advantageous overall to take a slight hit on day 1 for smoother running on day 2.

Data Collection

The first thing to tick off your list of requirements is technical fit – ensuring that the APM solution is capable of monitoring all or most of the technologies you are currently using – and any others you may use in the future. For applications running on microservices, you’ll want to look for a solution that has proven it can keep up with new technologies as they emerge.

Instana has over a hundred sensors to provide the largest possible coverage of your environment – continually adding new sensors with every release.

The APM system should collect time-series metrics for all the stacks you are using including infrastructure components such as host, container engine (e.g. Docker), and orchestration (Kubernetes). The resolution of the data should be as high as possible and retained for a useful period.

Instana collects time-series metrics at a 1-second resolution which are retained for 24 hours before rolling up.

There should be a minimal time lag from data collection, to dashboard display and evaluation by alert rules.

Instana uniquely uses stream processing on the data resulting in just a few seconds lag.

End-to-end traces for every request should be collected, and KPIs provided for every endpoint. All endpoints should be automatically discovered without limits. Any form of trace sampling always has the risk of dropping important traces.

Instana traces every request end to end and analyses the trace data in real-time to provide Rate, Error, and Duration (RED) KPIs for every endpoint.

A modern APM solution should not make any assumptions about the structure of the application architecture e.g. Application -> Tier -> Node. If the architectural relationships can not be accurately modeled how can the applications be effectively monitored?

Instana automatically and continuously discovers the relationships between all monitored entities which it records overtime in the Dynamic Graph. Therefore at any point in time Instana knows what was talking to what and where it was running. This ensures that drill down from any dashboard always goes to the correct place and alert events are correctly correlated.

Installation and Configuration

Your DevOps team is already busy enough creating new functionality, coding fixes, and running the production environment. Installing a monitoring tool should be as simple as possible, requiring minimal manual effort. Ideally, you should also not have to rebuild your existing container images. Your DevOps team is already busy enough creating new functionality, coding fixes, and running the production environment. Installing a monitoring tool should be as simple as possible, requiring minimal manual effort. Ideally, you should also not have to rebuild your existing container images.

Instana uses a single-agent approach. In fact, you can deploy that single Instana agent to your Kubernetes cluster with one simple command:


helm install --name instana-agent --namespace instana-agent \
--set agent.key= \
--set agent.endpointHost=saas-eu-west-1.instana.io \
--set agent.endpointPort=443 \
--set zone.name=my-K8s-cluster \
stable/instana-agent

Instana also supports simple one-line installs for other environments such as Docker, PCF, AWS ECS, AWS Lambda.

After starting, the Instana agent automatically and continuously detects what other processes are running and starts to monitor them – all with very little or zero configuration. This includes many language runtimes which are automatically instrumented. In many cases, the original container images will be automatically monitored without any modification. This frees up valuable time for your DevOps team to concentrate on core business objectives.

Automation is Vital

Microservices running in containers on a cloud platform enable agility for DevOps, providing a high-speed delivery pipeline for new functionality and fixes. As a consequence, the application environment is in a constant state of flux. Your APM solution must be able to remain in synchronization with the continual change with minimum manual intervention. There should be high levels of automation for detection and monitoring of new components, display and drill down of collected data, and alert potential problems across all monitored technologies. The burden of manually keeping agent configurations, dashboards, and alert rules up to date will be too high, inevitably resulting in loss of synchronization between application and monitoring.

The Instana agent automatically and continuously discovers and monitors new technology stacks as they are deployed or scaled. The curated knowledge base of health signatures along with intelligent algorithms ensures that newly deployed technology is monitored as soon as its metrics start arriving, which is a few seconds after they startup.

Instana - Enterprise Observability and APM for Cloud-Native Applications

Release Cycle Validation

Organizations are adopting techniques such as Agile, DevOps, and CI/CD, along with technologies like Cloud, Containers, and Microservices to deliver new features and fixes at increasing speed (or velocity). Without corrective feedback, that speed results
in instability. The availability of timely and accurate feedback on the impact of any deployment is essential to your team’s ability to deliver better code faster.

Your monitoring system should have a minimum lag from data collection to visualization and alert generation. In addition, the data resolution must be high enough to provide the needed level of accuracy. The data lag must be in the order of seconds rather than minutes to allow for prompt rollback if needed. To ensure that outliers are not missed, there should be zero or minimal sampling of request traces, especially with the small dataset available immediately after a deployment. You also want to be able to annotate the timeline with release information.

Instana uses stream processing to deliver data in just a few seconds from collection to visualization and alerting. Metric data is collected with a 1-second resolution and every request is traced end to end. Integration with Jenkins and a simple REST API facilitates the notification of releases on the timeline and correlation in automatic root cause analysis, providing a clear indication of the positive or negative impact of that change.

End User Monitoring

Modern web pages are extremely complex, essentially being an entire application in their own right. Remember, those web pages are what your users perceive as the service you offer them. Having high-performance, highly scalable backend services is no help if the webpage rendering is slow and error-prone. Monitoring the real performance your users are experiencing in their browsers is the only way to know that your applications are delivering the appropriate service levels to your users.

Your APM solution should provide fully integrated End User Monitoring (EUM), including the ability to trace individual requests from the users’ browser all the way through the backend services. You’ll also want the ability to analyze EUM data to identify potential problems such as CDN performance, scripting errors, large media files, etc.

Instana includes fully integrated EUM as part of the APM solution at no extra cost.

Team Views

Part of a microservices strategy is having many small development teams that can work autonomously. The monitoring solution must be able to provide personalized views for these individual teams and other specialists in your organization. Without
filtered views it will be difficult for these groups to quickly find the information they require to work effectively.

Instana provides Application Perspectives, rules driven filtered views based on entity tags. While a large number of tags are collected and assigned automatically, users can add extra tags manually.

Root Cause Analysis

Microservice applications have many moving parts and a highly complicated level of interdependencies between components.
A small problem on one entity can cause a tsunami ripple effect across the entire application. The resulting deluge of alert notifications makes it impossible for operators to determine the root cause quickly. The APM solution should automatically perform root cause analysis and present a concise summary of the incident differentiating cause and effect, enabling operators to rapidly fix the issue, maintaining the highest quality of service.

Utilizing the unique power of the Dynamic Graph, Instana automatically performs root cause analysis on incidents and sends out a concise summary of the events with links back to all the relevant information in the Instana dashboard.

Instana - Enterprise Observability and APM for Cloud-Native Applications

Value for Money

The potential cost savings from reduced downtime and staff productivity improvements are the primary value components of an APM solution ROI (Return on Investment) analysis. The savings from increased staff efficiency are maximized as more people use the solution in their day-to-day work. This can only happen if the user interface of the product is easy to learn and use. Ease of use also assists with decreasing the amount of time it takes to find the root cause of incidents, driving additional cost savings from reduced downtime.

Do not underestimate the importance of usability! And make sure you get feedback from across your staff. An easy-to-use solution gets more than 4x acceptance and uses outside of core teams. If the APM tool is only available to a small group of power
users, It makes the ROI of the solution quite low.

Organizations using Instana regularly have dozens of employees using Instana every day, with some of the biggest consumers having hundreds of active users per day.

APM Solution Selection Workflow

Business Stakeholders

To ensure alignment with the overall application, IT, and Business strategy, be sure to Include line-of-business managers throughout the selection process. The business should gain value from an APM solution via insights into business metrics.

Focus on Essential Features

Like any software type, there are always pretty pictures, graphs, and all the neat bells & whistles, and we all love gadgets with bells and whistles. Make sure you keep your team’s focus on the critical capabilities or core features needed to operate effectively. These take priority over toys. Separate your requirements into two lists, core, and extras. Before you even consider the nifty little extras, ensure that you’ve ticked off all the boxes for the necessary core features.

Skill Level of Existing Teams

Remember, APM solutions provide more tangible value the more individuals actually use the solution. Thus, you want to maximize your team’s participation. But APM can be difficult to learn how to use; that learning curve can be especially steep for legacy APM solutions.

Consider the skill levels of the team members who will be using the APM solution on a day-to-day basis and how steep that learning curve will be. You’ll want your APM solution to be intuitive to use, with easy navigation across its functional areas and clean, easy-to-read dashboards.

After all, any solution is of limited value if only a couple of experts in your organization can make sense of it. Get feedback from all users during the selection process. Rapid and broad adoption across all teams provides a much greater return on investment.

Integration

Most organizations have a rich legacy of systems. Which of these should the new APM solution work with? Check that integration is easily achievable.

Installation and Maintenance Effort

With Containers, Microservices, and Kubernetes technologies being so new, it’s unlikely your team is full of gurus. We all like learning new stuff but doing so slows down implementation and makes ongoing maintenance fraught. Consider not only the learning curve required by each solution but also what use cases and functionality is automated, eliminating the need to teach non-experts how to do something. Keeping it simple to install and maintain also helps achieve quicker results and reduce stress.

Look Into Your Application Crystal Ball

Once you have established that a solution fulfills your current needs, gaze into the future and determine if the solution can continue to meet those needs as your organization evolves. Check that the proposed solution is under active (even continuous) development with a frequent release cycle of new features and fixes. Ensure that the solution’s roadmap vision for near, medium, and long-term capabilities match your needs.

Trials and Tribulations

Having spent time watching vendor presentations and reading through product documentation, you should have a shortlist of two or three solutions. Now is the time to try them on for size. Ideally, you should be able to perform a self-service trial of the solutions without any interference from the vendors. If a vendor insists on holding your hand through the trial, it may be an indication of large professional services bills in the future and that the product is not as easy to use as their brochure suggests. Once you complete the self-service trials, you should have one or possibly two final candidates that you are confident meet your needs. Engage with the vendor to start a formal Proof of Concept to double-check your conclusion.

APM Solution Selection Workflow

To achieve Elite levels of DevOps efficiency, a high degree of automation is required in your software delivery pipeline, with every step being repeatable and consistent.

The monitoring step is vital to provide stabilizing feedback ensuring that delivery speed does not compromise quality. However, the overall speed of the pipeline is only as fast as its slowest step. Make sure that the burden of manual monitoring does not slow you down, choose a monitoring solution that is highly automated and tailored to dynamic microservices environments.

Instana - Enterprise Observability and APM for Cloud-Native Applications