At this year’s .conf25, we are defining what it means to be digitally resilient in the AI era. Splunk provides the world’s most powerful platform to unify all of your machine data in one place with Splunk platform, power the agentic security operations center (SOC) with Splunk Security, and drive agentic AI-powered observability with Splunk Observability. These tools help you correlate data faster, gain smarter insights, and collaborate better.
To accelerate this mission and usher in this new chapter of agentic AI-powered observability, we announced several key capabilities for proactive, AI-powered detection and investigation of business-impacting issues, including:
To safeguard the health, and to ensure performance, quality, safety and cost of your AI agents and infrastructure, we also launched:
Finally, we announced a massive payload of new innovations to help ITOps and engineering teams quickly correlate telemetry data across applications, infrastructure, and networks; monitor and troubleshoot three-tier and microservices environments; and prioritize the resolution of critical, business- or user-impacting issues — all from a single console. These capabilities include:
Agentic AI is rewriting the rules for what it means to build a leading observability practice. As agentic AI and AI-assisted coding continues to emerge, application services will be delivered with less and less human involvement, and run more and more autonomously.
This means observability has to evolve as well. We've had embedded AI helping you for years and are excited to announce agentic AI-powered capabilities across the Splunk Observability portfolio to help teams investigate and resolve issues faster. With AI-driven alert correlation, episode summarization, and AI agents for detection, troubleshooting and remediation, agentic AI means teams can understand, troubleshoot, and resolve business-impacting incidents faster.
Whenever something goes sideways — a server goes down or an application service starts acting up — alerts kick off and can then start to cascade across upstream and downstream systems. With all this noise, it can be difficult to understand what’s happened, much less fix the issue. Event iQ in Splunk IT Service Intelligence (ITSI) helps to reduce alert fatigue, so everyone — from beginners to experts — can spend less time managing alerts and more time keeping critical services running smoothly.
Event iQ delivers AI-driven alert correlation that groups related alerts and highlights critical incidents that require immediate attention. It suggests interesting fields that can be used to automatically correlate events, making it easier and faster for teams to connect the dots and focus on what matters the most.
Once related alerts have been grouped together into an “episode,” you can see what the problem is, when it started, and which systems are involved — all in a single glance, with no need to jump between tabs or hunt for missing information. Everything is explained in plain language, so you always know why alerts were grouped the way they were. Read the blog to learn more.
ITSI already reduces alert noise by correlating and grouping related alerts into episodes. Now, using advanced AI, Splunk ITSI can pull together all the key information about an episode — what happened, when it started, the most important events, and even a first look at what caused the problem — into a single, easy-to-read AI-generated summary. This means teams can go from clicking through ten tabs to seeing the most relevant details and suggested next steps in just one click. Not only does this save time and reduce confusion, it also helps newer users get more value from the platform right away. Plus, summaries can be shared easily with others or added to tools like ServiceNow, so everyone stays in the loop. In short, Episode Summarization helps teams work smarter, respond faster, and keep their critical services running smoothly. ITSI Episode Summarization is in Alpha.
Request access at the Splunk Voice of the Customer site.
Pinpointing the cause of an outage or performance problem in cloud-native environments typically involves a lot of manual effort on the part of engineering teams as they cycle through dashboards, search for the right metrics or logs, and hop between multiple screens and tools to cobble together reasonable hypotheses. The AI troubleshooting agent in Observability Cloud acts like an assistant SRE, automatically sifting through all the service and infrastructure data, identifying whether your application or infrastructure is at fault, and surfacing the most likely root causes — all in plain language, within your existing workflow.
Instead of spending time jumping in and out of screens, tabs, and tools, teams get a ranked list of probable causes, clear impact analysis, and actionable recommendations, right when and where they need it, minimizing hours of manual investigation into insights, in minutes.
When an alert comes in, AI analyzes everything — from recent deployments to Kubernetes events and historical incidents to highlighting patterns from previous fixes. It also provides a concise RCA (root cause analysis) so teams can act confidently and quickly, reducing downtime and keeping services running smoothly. AI troubleshooting agent in Observability Cloud will be Alpha in December.
When anomalies or health rule violations occur in an n-tier or hybrid environment, the AI troubleshooting agent for AppDynamics goes beyond simply flagging an issue. It explains what’s happening, why it’s happening, and how to fix it. You get clear, concise root cause summaries right where you need them without bouncing between dashboards or guessing which metric matters most.
The result? Teams resolve issues faster and with greater accuracy, even if they aren’t system experts. In Alpha now, AI troubleshooting agent forin AppDynamics removes knowledge barriers, guides investigations with just one click, and presents actionable insights in the context of real-time data.
On one hand, AI is a great enabler — including, as we’ve discussed above, in service of a new set of observability practices. On the other hand, AI itself needs oversight — whether that’s for the new infrastructure stack that it runs on, or the large language models providing outputs in response to user and system prompts, or the AI agents and AI-enabled applications leveraging them.
AI-enabled applications and AI agents are being built and deployed on a new stack, which includes not only low-level infrastructure components like GPUs, but also large language models (LLMs), vector databases, and AI frameworks and libraries. As with any other stack, this new infrastructure requires deep visibility, so that ITOps and engineering teams can ensure their systems are reliable, secure, and scalable.
With AI Infrastructure Monitoring teams can view data-dense dashboards and detectors to surface trends, patterns, and outliers that allow them to correlate application health with underlying AI infrastructure performance. These views allow teams to proactively pinpoint noisy neighbors, alert on operational metrics from AI apps and services, troubleshoot issues of resource contention and unmet workload demands, and ultimately mitigate security, performance, costs, and reputational risks. AI Infrastructure Monitoring is generally available (GA). Learn more about AI Infrastructure Monitoring.
Typically, as engineering teams leverage LLMs to interpret data, reason, and complete multi-step tasks as part of AI agents or AI-enabled applications, they learn that the LLMs have the potential to produce low-quality, misleading, or incorrect outputs or responses, decreasing customer trust, degrading end-user experiences, and increasing the cost of development.
With AI Agent Monitoring in Observability Cloud, ITOps and engineering teams can proactively identify and troubleshoot erroneous, inaccurate, or undesirable responses by LLMs (and the agents they power) to ensure applications are performing reliably, securely, and as intended. Teams can track established metrics for transactions, error rates, and performance and drill into interactions to pinpoint issues that impact application behavior like failed LLM or tool calls. Through an integration with Cisco AI Defense, teams can also gain a unified view of critical security risks — including prompt injection, data leakage, and harmful content — across agents and LLMs. This enhanced visibility is delivered alongside detailed insights into consumption, cost, performance, and quality measures for models and agents, empowering AI SRE and MLSecOps teams to manage both risk and operational efficiency from a single pane of glass.
AppDynamics LLM Monitoring provides a unified interface to monitor AI application performance, integrations, and supporting infrastructure in real time. It enables teams to track LLM usage for compliance with internal and regulatory policies, reducing risk. Teams gain visibility into resource consumption and costs, allowing for optimization and efficient allocation. Streamlined health and performance monitoring helps detect and resolve issues before they impact users or business applications. Customizable dashboards and business journey mapping improve insight into AI agent impact. AppDynamics LLM Monitoring empowers organizations to confidently manage, scale, and maintain the health of mission-critical AI initiatives. AI Agent Monitoring will be Alpha in November. AppDynamics LLM Monitoring is generally available. Learn more about AI Agent Monitoring in Observability Cloud and AppDynamics LLM Monitoring here.
The burgeoning 'Internet of Agents' demands a new era of observability. In July, Outshift, Cisco's innovation engine, donated AGNTCY to the Linux Foundation, laying the groundwork for a future where agents from any vendor collaborate seamlessly. Committed to empowering IT Operations (ITOps) and engineering teams with open standards, Splunk is advancing this vision by actively contributing to AGNTCY, a Linux Foundation project, and partnering with OpenTelemetry (OTel) leaders.
Splunk is advancing OTel semantic conventions by establishing a standardized, extensible telemetry framework that facilitates seamless data collection across various AI environments. By integrating AGNTCY’s Metrics Compute Engine, Splunk’s platform delivers vendor-neutral, scalable, and interoperable observability across diverse AI environments. This capability transforms raw telemetry into actionable insights, providing both foundational and advanced performance metrics — such as factual accuracy in agentic applications.
The foundation for digital and AI resilience is unified observability — comprehensive visibility into the health and performance of all your applications, infrastructure, and networks, and the ability to easily correlate signals across your entire estate to understand their relationships. While this visibility is valuable, its true power emerges when it is interpreted in the context of your users and business objectives. For example, are your customers able to browse a catalog? Can they start checkout and complete payment? If not, how is that affecting your bottom line? Is it a usability problem, or something that stems from an underlying application or infrastructure issue? These are the questions we help you answer with the new capabilities in this third, massive area of product innovations.
In a sea of alerts and metrics, understanding what truly matters to the business can feel impossible. Business Insights helps engineering and business teams focus on bottom-line impact by showing, in real time, how application performance affects critical business processes. With this release, Business Insights enables teams to visualize long-running processes — such as loan or credit approvals — and connect technical performance to business KPIs in a single, unified journey view. Going beyond traditional SLA metrics, it provides a clear view of revenue impact so teams can focus on the issues that matter most. Business Insights is currently in Alpha. Request access at the Splunk Voice of the Customer site.
Many teams run both traditional n-tier applications and cloud-native services, and they need application performance monitoring (APM) that bridges both worlds. We’re releasing new capabilities in Splunk Observability Cloud to strengthen APM for cloud-native applications, and extend support for hybrid environments — building on AppDynamics’ proven expertise in monitoring traditional three-tier applications.
Highlights include:
Together, these capabilities deliver a single APM solution for organizations building and operating hybrid or microservices-centric applications.
Balancing the workload of remediating code vulnerabilities while meeting schedules for delivering product innovations can be difficult. With Secure Application for Splunk Observability Cloud, engineering teams can rely on the assurance that every critical vulnerability ticket in their backlog represents a genuine, validated risk to their running applications.
Unlike traditional static vulnerability and threat detection tools, Secure Application integrates security directly into the observability framework, using the same Splunk Observability agent that you already have in place. This innovation bridges the gap between security findings and development realities. By seamlessly incorporating application security into your observability workflows and enriching runtime findings with comprehensive security intelligence, you can prioritize critical vulnerabilities based on their real-time exploitability and direct impact on applications, without the mental gymnastics of context switching between tools while meeting your SLAs.
With Secure Application, vulnerabilities are mapped directly to the specific microservices they affect in the Observability Cloud APM service map. This application context helps security teams differentiate between vulnerabilities in libraries and genuine exploitable risks in your production environment and ensures the list of vulnerabilities you receive has been filtered for noise. You can quickly prioritize these vulnerabilities based on their real-time exploitability and direct impact on applications, allowing you to patch and remediate issues before they create negative effects for end users or compromise security.
Application vulnerability detection is in Alpha. Request access at the Splunk Voice of the Customer site.
Historically, customers of Splunk Observability Cloud have leveraged Database Query Performance in APM and built-in dashboards for databases in Infrastructure Monitoring to gain visibility into query and host performance. However, many have asked for a unified view and deeper insights into the root causes of slow queries.
Database Monitoring is a powerful new solution to address these asks. Built on OpenTelemetry, it enables application and database teams to quickly troubleshoot inefficiencies using rich query-level insights such as wait time, wait categories, CPU time, memory usage, and execution plans. By correlating database performance with application services and infrastructure metrics, it empowers teams to prioritize what matters most while uncovering inefficiencies to reduce contention and control costs.
This release supports Microsoft SQL Server, with additional database types to follow. Database Monitoring is in Alpha. Request access at the Splunk Voice of the Customer site.
We are proud to announce that Splunk Observability Cloud expects to achieve FedRAMP Moderate authorization in the coming months. In reaching this latest milestone, Splunk Observability Cloud offers the security standards necessary across its world-class solutions for government agencies to securely manage the government’s sensitive data.
This FedRAMP Moderate authorization also empowers public and government agencies to accelerate and advance their cloud transformation — across on-premises, hybrid, and cloud-native environments — so they can efficiently meet the growing demand for citizen services, strengthen their cloud and cybersecurity capabilities, and deliver seamless digital experiences.
By offering comprehensive visibility into user interactions and facilitating rapid issue identification, resolution, and continuous improvement within a single solution, government agencies can enhance public trust and consistently deliver reliable citizen services.
Application experience relies on a complex web of networks (CDNs, DNS, the Internet) and external services like third-party APIs and cloud platforms, often beyond IT’s control. The Splunk Observability Cloud Real User Monitoring (RUM) Integration with ThousandEyes unifies global real-user experience data from Splunk with network insights from ThousandEyes in a single view. Engineering teams can quickly pinpoint whether issues stem from the application or network, streamline troubleshooting, and resolve incidents faster, ultimately improving performance and user experience. This integration is available at no additional cost for joint Splunk Observability Cloud and ThousandEyes customers and is currently in Alpha. Request access at the Splunk Voice of the Customer site.
Engineering and product teams share responsibility for delivering great user experiences and driving business outcomes. Digital Experience Analytics (DEA) extends existing Real User Monitoring (RUM) capabilities by providing engineering, product, and design teams deep visibility into user behavior, intent, and sentiment. Unlike siloed DEA tools, it correlates behavioral data with application performance data, helping teams understand the user experience impact of application incidents and troubleshoot them faster. By using the same OpenTelemetry instrumentation agent as RUM, it also simplifies setup and reduces overhead. In its initial release, DEA introduces Conversion Funnel analysis with detailed session replay at each drop-off point, making it easier to identify friction and take action. Digital Experience Analytics is currently in Alpha. Request access at the Splunk Voice of the Customer site.
When users can’t accomplish tasks in your apps, accurately diagnosing the issue — determining whether it’s because of a crash, an application error, or a confusing flow — requires an understanding of what users actually experienced. That’s why we’re introducing Browser and Mobile Session Replay in AppDynamics, and extending Session Replay in Observability Cloud from browser-only to include both browser and mobile. Session Replay captures user interactions as dynamic, video-like journeys, paired with rich session metadata, giving teams clear visibility into what users experienced when issues occur. With this context, teams can go beyond troubleshooting performance problems to also evaluate usability, identify friction points, and uncover hidden UX challenges. Learn more about Session Replay in AppDynamics and Observability Cloud.
AppDynamics customers now have a single agent to collect telemetry data for use in either Splunk AppDynamics or Observability Cloud. This agent enables them to avoid costly and disruptive changes in their deployment and integration pipelines, while evaluating or transitioning monitoring to Observability Cloud.
It contains both AppDynamics Agent code and Splunk OpenTelemetry code, and is deployed in the same way as any other AppDynamics Agent. Simply update your existing agents and latent OpenTelemetry functionality will be added as a normal update. You can update via the Smart Agent or whatever means you use today.
We’re offering three modes:
If you made it this far into this blog post — thank you. Our teams have worked hard over the past year to deliver this massive set of new capabilities for unified observability, observability for AI, and Agentic AI-powered observability. We are excited to have you try them out, and eagerly await your feedback through our Voice of Customer program. .conf25 marked the beginning of a new era for observability at Splunk and Cisco, and we are delighted to have the opportunity to partner with you on the journey into the agentic AI future.
Many of the products and features described herein remain in varying stages of development and will be offered on a when-and-if-available basis. The delivery timeline of these products and features is subject to change at the sole discretion of Cisco, and Cisco will have no liability for delay in the delivery or failure to deliver any of the products or features set forth in this document.
Follow all the conversations coming out of #splunkconf25!
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.