
Monitoring and observability requirements are continuing to adapt to the rapid advances in public cloud, containers, serverless, microservices, and DevOps and CI/CD practices. As new technology and development processes become mainstream, enterprise adoption begins to increase, bringing its own set of security, scalability, and manageability needs. We sat down with Stephen Elliot, Vice President of Management Software and DevOps at IDC to discuss where the market is headed, how legacy vendors will need to adapt, and how customers can get ahead of these trends to gain a competitive advantage.
What key changes are you seeing companies make to better position themselves for digital business?
Business and technology executives are focusing on improving customer experience by transforming their internal processes and products through software innovation, a focus on digital-first, and direct-to-consumer.
Teams are relying more and more on data and powerful analytics to make faster, data-driven decisions. They are now able to collect and reuse data from across operations and the business to drive more product innovation, which in turn drives more revenue and higher profitability. Work smarter and faster with data. Innovation is about speed.
Many organizations are creating a dedicated CDO role to help accelerate the adoption of modern cloud technologies such as containers, microservices, and serverless, and DevOps and CI/CD processes.
The importance and specific responsibilities of a chief digital officer (CDO) vary from one organization to another, and can take one of several forms:
- Independent of the CIO
- Report up to the CIO
- CIO assumes the role of CDO and brings in a new CIO
As with most transitions, organizations must maintain their legacy model to keep the proverbial “lights-on”. Meanwhile, they must invest aggressively to prepare for the future, and create the freedom to adopt new technology frameworks, new team structures, and even new cultures.
In some organizations it’s more effective to have completely separate teams and executives to better focus on the specific speed and quality objectives of each group. This offers them the opportunity to invest in new tools, new processes, and new operating models that best match their team needs.
In large enterprises it can be particularly difficult to have traditional teams think differently, many folks don’t necessarily want to change. Success requires a change management approach to address all components—people, process, technology, culture, and business thinking, as well as setting realistic timelines.
Large enterprises typically have hybrid and multi-cloud strategies that require different architectures from on-prem deployments. Legacy infrastructure will continue to require dedicated support, as will each public cloud deployment.
Enterprises are doing a balancing act, their traditional heritage estate was built on a foundation of risk-mitigation, but they’re now being asked to move faster when selecting tools, processes, and technologies—many of which have only been around for a few short years.
What are the biggest barriers to success for these new digital initiatives?
Companies need to find and resolve issues quicker in their business logic and their applications. They need access to data, and monitoring and observability are a huge source of valuable real-time operational data, they’re a heads-up display for how business are being managed.
To resolve issues faster, modern agile organizations need 3 capabilities:
- Ability to collect and consolidate data in real-time
- Analyze data in real-time to find the root cause
- Collaboration to make sure everyone is on the same page as to what the problem is and how to solve it
By enabling teams with data access, organizations can drive more informed business decisions across both IT and line of business, including executives.
Consolidating data from traditionally disparate sources such as business, infrastructure metrics, tracing, application performance, and logs is critical to providing a comprehensive view of modern environments and how they affect applications and consumers.
Modern teams now span a wider set of responsibilities, including Developers, Infrastructure Platform Engineers, IT Operations, Site Reliability Engineers (SRE), Cloud Architects, and even business units. These teams need easy-to-use tools and preferably a single tool, to help them identify and solve problems across unified and commonly understood data. And because these groups have varying levels of technical depth, tools that either automate or augment analytics, and deliver them in a streamlined fashion—can help teams get to the right answer faster. Single pane of glass, single source of truth.
In addition to real-time data, analytics, and collaboration, these tools should also address the following:
- In-depth data collection and support for high-cardinality analytics
- Root-cause identification and directed troubleshooting
- Auto scaling and auto remediation capabilities
- Integration with CI/CD tool chain
How important are analytics for Monitoring and Observability solutions?
Monitoring and APM vendors are all investing more and more into analytics, however, some are investing more heavily in R&D and have comprehensive and well-defined product roadmaps. Customers need to do their due diligence to clearly understand what capabilities each vendor actually provides. This means digging into the patents, and then seeing first-hand how real-time, machine learning, AI, and high-cardinality are incorporated into actual product capabilities. Ultimately, the goal of analytics is to enhance and accelerate problem identification and resolution.
Everyone is talking about speed and real-time, and there’s no doubt that it’s becoming more important as companies adopt modern application development, deployment, and management frameworks, and transitioning their applications and platforms to run on the public cloud, microservices, containers, and serverless. The speed and frequency at which applications can now be deployed is making it more important to collect and analyze performance data in real-time. It’s important for organizations to understand that speed is increasingly becoming a competitive advantage for the business, and performance management is a great opportunity for organizations to extend that advantage.
Has the workflow for problem and issue resolution changed?
The core premise of solving product and application issues is the same. You first need to know when there is an issue—and you need to know as soon as possible. Once you’ve been alerted, you need to analyze the data to determine the source of the problem. Finally, after you have determined the root cause, you must resolve it as quickly as possible.
The most recent update to this workflow is the addition of automation—particularly auto scaling or in more advanced scenarios, auto remediation. Automated problem detection is also helping to make manual resolution quicker by augmenting the capabilities of ops and DevOps teams—helping them focus on the root cause and avoid alert storms. In addition, we are seeing tighter integration with ITSM workflows that enables better communication with customers from support centers.
What are some of the shortcoming of traditional APM tools?
Organizations need to learn what new monitoring and observability capabilities are available and whether new streaming architectures and analytics will be requirements for them. The market offers a few options to solve these problems.
As application development and monitoring requirements change, begin to think about real-time in terms of seconds, not minutes. As you increase the elasticity and scale of your cloud, consider a monitoring solution that offers analytics designed to scale alongside your applications, while still providing real-time responsiveness. Ask yourself, are you using the right set of tools to meet your ultimate goals? If not, then begin investigating and investing in tools that will solve your future needs.
Are consolidated monitoring tools more effective at issue resolution?
Consolidated monitoring tools provide a significant improvement in the speed and accuracy of identifying and resolving application issues. Although many organizations are satisfied with their current solution, especially relative to where they were three years ago, they need to prepare for the 10-100X increase in speed, scale, and complexity that containers and serverless will bring. And it’s not just the tools that need to change, but also team structures, skill sets, development processes, and business expectations.
Consolidated monitoring tools are able to collect more in-depth data from more places and apply analytics in real-time to identify and resolve problems more quickly. It’s important to recognize the underlying product architecture; analytics built on a streaming architecture can process tremendous amounts of data from thousands of resources and services then build accurate models that help your team find the true root cause of issues. Furthermore, intuitive dashboards powered by the same platform should provide each team with customized views of the same single source of data.
Organizations need to better understand their operational data requirements and how the tools they are using collect and sample data. We are now in a world where everyone wants more data, all the time—but that’s not necessarily the right strategy for most companies. It’s getting more difficult to find the needle in the haystack. A few modern monitoring solutions are now capable of capturing 100% of operational data, but without powerful analytics, this can introduce noise and bog down decision making. If you require complete visibility into your environment, consider partnering with a vendor that offers powerful analytics for automating the discovery and resolution of issues in real-time.
Not all data is created equal. The most important determination on whether you have the right data is—do you have the date that matters at that moment, for the specific business outcome you’re solving for? For example, some traditional APM vendors use random down-sampling of trace data—which can create blind spots in microservices environments and lost data. It’s important to consider how your current metrics and APM tools collect and process data and whether they will be able to meet your organizations scale, accuracy, and quality requirements as your environment grows.
What APM approach is best suited for microservices environments?
Distributed tracing is becoming an increasingly important data source for cloud architectures.
The level of granularity that distributed tracing provides is very valuable in and of itself, but especially when it is analyzed alongside a broader set of metrics data to provide a more comprehensive application view. The open source instrumentation powering distributed tracing is also another area that enterprises are going to have to consider implementing and managing as part of their adoption of microservices.
Ultimately, the monitoring and observability segment is undergoing a tremendous shift to address the future needs of enterprise cloud customers. As the cloud, containers, and even serverless continue to evolve and become the new norm, start investigating modern application monitoring capabilities to see if your existing vendor will be able to meet your future requirements.
Thanks,
Mike Klaczynski