Monitor LLM and agent performance with AI Agent Monitoring in Splunk Observability Cloud

AI is everywhere. The rise of AI has transformed the pace of software development, content creation, customer support, and other business workflows and functions across industries. While incredible, the democratization of AI also gives rise to slop – inauthentic, inaccurate, low quality, and sometimes harmful outputs. The frequency and intensity of these issues are highly dependent on the strength and stability of their underlying infrastructure. That’s why we are introducing Observability for AI capabilities, specifically AI Infrastructure Monitoring and now AI Agent Monitoring in Splunk Observability Cloud to provide visibility and protection across your AI stack.

Evaluating Model and Agent Quality and Behavior With  AI Agent Monitoring

The non-deterministic and generative behavior of LLMs can lead to outputs with frequent inaccuracies, biases, and fallacies. These issues can ultimately lead to decreased customer trust, poor end-user experiences, and increased costs. Understanding the dependencies and interactions of those systems is critical for pinpointing the root cause of these issues and the degradation of service. To build reliable and trusted AI, teams need deep, unified visibility across the AI stack to be able to correlate business problems with performance, quality, security, and cost/usage metrics generated by agent interactions and tool calls (as well as AI infrastructure components) across on-prem, hybrid, and cloud environments. AI Agent Monitoring proves just that.

Extending the troubleshooting and monitoring capabilities of Splunk Application Performance Monitoring (APM), AI Agent Monitoring helps teams build trust with their agentic applications. With AI Agent Monitoring, ITOps and engineering teams can pinpoint and correlate the root cause of unreliable or degraded AI agent and model performance. By integrating APM with AI Agent Monitoring, users can easily troubleshoot both AI and non-AI applications with trace-level visibility.

Because Splunk delivers a unified observability experience, teams gain insights into frontend performance and can deliver seamless user experiences due to best-in-class application, infrastructure, and digital experience monitoring, with correlated business insights, app security, and network observability at their fingertips. Combined with Splunk Platform’s log analytics, telemetry pipeline management, and event intelligence, teams can analyze ludicrous amounts of machine data, seeing logs in context from Splunk Platform in Splunk Observability Cloud and bring observability insights directly into the Splunk Platform.

AI Agent Monitoring is also built on industry standards—OpenTelemetry and Cisco AGNTCY—to provide AI agent monitoring without vendor lock-in.

Watch the Navattic Demo

View a Complete List of All the Agents in Your Environment on the AI Agents Page

On the AI agents page, teams can view the aggregate and individual performance, cost, and security metrics of all the agents in their environment. Tracking key metrics like total requests, the total number and rate of errors, latency, the number of total, input, and output tokens and their respective cost, a quality score, and risks provides teams with a clear, high-level overview of the health and efficiency of each agent. With this comprehensive view and searchable list of AI agents, teams will know which agents exist, identify those with critical health, and understand where attention is needed.

Teams can get out-of-the-box historical trend analyses of performance metrics, token utilization and cost attribution, and quality and risk indicators for each agent. This visibility helps teams establish baselines, detect outliers, and make data-driven cost and resource optimization decisions. Users can also set alerts on any of these metrics in order to quickly detect and troubleshoot issues.

Connect Detailed Agent Interactions and Performance to User Impact Using the AI Trace Data Page

Teams can also view related AI trace data to find which quality issues or risks have appeared the most over time and detailed user interactions. From LLM prompts (inputs) and responses (outputs) by trace ID to their respective dates and quality issues like hallucinations, biases, sentiment, and toxicity, the AI trace data page enables teams to pinpoint and investigate LLM specific problems to reduce reputational or operational impact. Splunk AI Agent Monitoring leverages LLM-as-a-judge evaluators to measure performance.

Analyze Agent Workflows, Tool Calls, Span Details and More on the Trace View Page

Trace view also provides visibility into span details, the runtime and memory of tool calls, and agent workflows and execution paths to detect performance bottlenecks and optimize resources and costs.

Detect and Mitigate AI Risks With Cisco AI Defense

Finally, AI Agent Monitoring will soon enable teams to detect and mitigate AI (LLM, agent, and tools) risks, misuse, drifts, leakage, and threats via an integration with Cisco AI Defense. This added security layer will help teams protect their AI applications against real-time threats with bi-directional guardrails, blocking prompt injections, sensitive data exfiltration, harmful content, and various other risks. By being able to comply with AI security standards, teams will be able to build and deploy trustworthy AI apps and systems and prevent breaches to maintain operational resilience.

Additional Support for Tracking the Health, Availability, and Usage of AI Infrastructure With AI Infrastructure Monitoring

Traditional infrastructure has transformed.  AI infrastructure now includes new components like graphics processing units (GPUs), large language models (LLMs), vector databases, AI frameworks and libraries. Managing AI workloads across these new components as well as more traditional components like compute, networking, memory, and storage requires more resources and costs than ever before. This complexity will only continue to increase as better training and inferencing, reduced latency, informed decision-making, higher quality outputs, and more reliable models and agents are needed.

That’s why we launched AI Infrastructure Monitoring last year. As of November 2025, teams can now use AI Infrastructure Monitoring to view data-dense dashboards and detectors for Nvidia NIMs, Milvus and Pinecone vector databases, LiteLLM proxy services, GCP VertexAI applications, Cisco AI PODs, and more.

These dashboards provide GPU-related metrics such as GPU utilization and power consumption, as well as “tokenomics” metrics like time-to-first token and estimated token costs to assess the utilization and workload efficiency for hosted AI infrastructure like Cisco AI PODs.

Because Cisco AI PODs are pre-validated, full-stack infrastructure hardware solutions that include Cisco UCS servers, Nexus 9000 Series switches, and integrated software components such as Cisco Intersight, AI Infrastructure Monitoring, also provides those dashboards with key metrics like UCS fan speed, host temperature, and host power.

With this end-to-end visibility, teams can quickly find the offending AI infrastructure components that impact stability, cost, availability, and security, and correlate them with business health and usage trends to help mitigate performance and reputational risks.

Splunk & AGNTCY: Unlocking the Future of AI Observability

AGNTCY, a Linux Foundation project, is building an open, interoperable Internet of Agents—the foundational infrastructure that enables AI agents to collaborate across any framework or vendor. By defining shared protocols, identity systems, and discovery services, AGNTCY is shaping a future where agents interact seamlessly and securely. With over 80 members—including Cisco, Google, RedHat, Dell, and Oracle—AGNTCY unites industry leaders to develop, standardize, and maintain the critical components that make multi‑agent systems work in production.

Aligning with our commitment to empower ITOps and engineering teams with open standards, Splunk continues to contribute to AGNTCY, a Linux foundation project, to ensure customer benefit from consistent, vendor-neutral telemetry capture for large language models (LLMs) and agentic applications. Splunk also continues to leverage components of AGNTCY’s Metrics Compute Engine to provide advanced quality metrics such as factual accuracy and coherence, and foundational metrics like latency and error rates. Read this blog to learn more about Splunk and AGNTCY.

Get Started Today

Set up AI Agent Monitoring today and read this blog to learn more about Splunk observability for AI.

Related Articles

| datamodel Endpoint
Security
4 Minute Read

| datamodel Endpoint

Discover what's new in Splunk Common Information Model (CIM) 4.12
Staff Picks for Splunk Security Reading June 2021
Security
5 Minute Read

Staff Picks for Splunk Security Reading June 2021

Locating IP Addresses
Security
1 Minute Read

Locating IP Addresses