Why AI Monitoring Is the Next Big Test for Observability Leaders
CTO Stack Cory Minton Global Field CTOIf your AI systems aren’t observable, they’re a liability. As organizations rush to infuse AI into their products and operations, IT and engineering leaders face a turning point: either identify the intricacies of AI monitoring or risk letting their most critical digital assets run unchecked.
ITOps and engineering teams are already spread thin. According to Splunk’s State of Observability 2025 research, 59% of respondantssay tool sprawl negatively impacts team morale. AI monitoring is yet another area of responsibility to own, and 47% of respondents say monitoring AI workloads makes their jobs harder. But the ability to understand and capture LLM data is crucial, especially as AI’s impact extends far across the business.
Managing business risks during the shift to autonomous AI
AI systems, like traditional digital systems, generate a wealth of telemetry data at every layer of their architecture. Whether deployed on-premises in purpose-built data centers or consumed as cloud services, each layer, from the physical GPU up to the application interface, can (and should) be instrumented for observability. For executives, this instrumentation provides the primary audit trail required for regulatory compliance and the foundation for internal AI governance.
Without deep visibility, this “black box” nature creates a new category of operational liability that traditional IT frameworks are not equipped to manage.
This shift means that AI can go off the rails in novel and sometimes unexpected ways. We’ve seen this in public incidents ranging from brand-damaging and offensive hallucinations (for example, when Google Gemini depicted the United States’ founding fathers as people of color) to erroneous deals that significantly impact business operations, like a car dealership’s AI-powered chatbot that mistakenly agreed to sell a new Chevrolet Tahoe for $1.
As organizations move from “human-in-the-loop” (where a person reviews and approves AI decisions) to “human-on-the-loop” (where humans monitor but do not intervene in every decision), and eventually to “human-out-of-the-loop” (where AI operates autonomously), the potential impact of unmonitored errors grows exponentially. At machine speed, small issues can quickly snowball into significant business risks that threaten your brand equity and bottom line.
Essential metrics for measuring AI performance and ROI
According to our research, 47% of respondents say monitoring AI workloads makes their jobs harder. This complexity creates a productivity gap where high-value engineering talent is consumed by manual troubleshooting rather than innovations. Simply put, AI introduces more telemetry, more noise, and operates at a scale and speed beyond human capability.
Cloud AI services add further complexity. Each provider — AWS, Azure, Google Cloud — offers its own telemetry APIs (like AWS CloudWatch or Azure Monitor). Correlating data across these environments provides the flexibility to move workloads based on cost or performance, and to avoid being trapped in a single provider’s ecosystem. At the application layer, new forms of telemetry emerge, such as “guardrail” software that monitors for model hallucinations or rates the quality of AI-generated outputs.
Traditional metrics such as cost, performance, and utilization remain crucial, but new signals are emerging that directly impact the bottom line:
- Hallucination rates: How often is the model generating inaccurate or nonsensical outputs?
- Quality monitoring: Are the outputs meeting business or ethical standards?
- Tokenomics: What is the cost per token (input/output), and how does this impact the overall ROI?
- Model inventory: Can you prove which models are in use, and respond to compliance or legal requests?
These metrics not only ensure operational stability but are increasingly vital for compliance and risk management. For example, maintaining an accurate inventory of all models in use can be critical for corporate governance if legal teams need to verify that the organization isn’t running certain models that have security vulnerabilities or licensing issues.
Best practices for scaling AI observability and governance
Organizations need to get a handle on AI monitoring to reap the full value of AI. By shifting from a purely technical view, leaders can ensure that AI remains a productive asset rather than an unmanaged liability. Here are some best practices to consider:
Collaborate and share data across functions. Traditionally, observability falls under the office of the CTO, but those lines are blurring with security, legal, and compliance teams. Building these bridges ensures that data flows where it is needed and prevents compliance from becoming a bottleneck that slows down your AI deployment.
Lean on OpenTelemetry. Open standards like OpenTelemetry have become the de facto choice for instrumenting the full AI deployment chain, offering visibility across the entire stack. OpenTelemetry provides a unified framework for collecting, processing, and exporting telemetry data from diverse sources, making it easier to achieve consistent observability in complex, multi-layered environments. It also enables organizations to collect metadata about the AI model they’re running, making it easier to inventory.
Automate detection and response to AI-specific anomalies. AI workloads generate new types of signals like hallucination rates or user trust metrics that require continuous and automated monitoring. By leveraging automation, organizations can quickly detect deviations or compliance issues and trigger immediate alerts or remediation workflows. This acts as a digital circuit breaker that catches errors before they escalate. Proactive automation ensures that monitoring keeps pace with the speed and complexity of AI, reducing manual effort and improving resilience.
Unlock deeper insights and actionable strategies. Download the State of Observability 2025 report to stay ahead in the era of AI-driven monitoring.