AI Is Amazing, the Bills Are Not: Why Tokenomics Is the New FinOps

Observability Paul Lacey

Key takeaways

  1. AI costs are rising fast as agentic AI uses more tokens, making flat-rate and low-cost experimentation harder to sustain.
  2. Enterprises are moving some AI workloads from public cloud and frontier models to private infrastructure and smaller AI models to reduce costs.
  3. Full-stack AI observability helps teams track token usage, infrastructure costs, and business value so AI stays effective and affordable.

For three years, AI ran like a developer preview. Uncapped usage. Flat-rate plans. Inference priced well below cost. That era is over.

Uber burned through their entire agentic coding budget in just 4 months. Microsoft rolled back internal Claude Code licenses across non-engineering groups and steered thousands of developers towards smaller models. One firm left their usage uncapped and ran up a $500 million bill in a single month. These aren’t exceptions. They’re a collective wakeup call.

We’ve seen this before. The euphoria of limitless compute in the early 2010s. The cloud was empowering. Then we got the bill. The decade ended with enterprises strongly recoiling from runaway cloud invoices. On-prem had another moment.

We called it cloud repatriation. Almost a decade later, it’s coming back. This time with AI. Organizations around the world are starting to move workloads off frontier models to infrastructure they own and control. The key question to answer: do we really need a frontier model here, or could we get an acceptable result at 50% lower cost with an SLM?

The Free Trial Is Ending

AI spend now tracks with tokens, and usage is increasingly hard to predict. Per-token prices keep falling, yet consumption climbs far faster: enterprise AI spend tripled in a single year even as prices dropped more than 90% Goldman Sachs expects token use to grow 24 times by 2030. Cheaper tokens, bigger bills.

A chatbot query costs a few hundred tokens. An agent has a completely different profile. It plans, calls tools, checks its own work, and loops for minutes at a time. Every loop means more tokens. A single request can burn 100k tokens. A research spike can cost thousands of dollars.

That is why AI broke the model faster than SaaS ever did. A web service has a fairly predictable cost per request. An autonomous agent does not. It can decide, on its own, to do ten times more work than you planned for. You only find out when you get the bill.

The providers are repricing to match. In 2026, Anthropic pushed agent workloads off of flat-rate subscriptions onto metered credits billed at API rates. OpenClaw users woke up to bills upwards of $1k/day. Google similarly retired it’s open-source Gemini CLI for flat rate plans, pushing fleet workloads toward paid API keys. The cheap paths that made early experiments painless are closing, one by one, and what is left is the meter.

So, Teams Are Bringing AI Home

Faced with that math, AI leaders are doing what their finance peers have done for years. They’re bringing AI workloads back to infrastructure they control under fixed capex costs. 86% of CIOs now plan to move at least some workloads from public cloud back to private cloud or on-premises infrastructure, the highest share on record. Cost is the top reason.

The tools are ready. Open-weight models like the Llama and Qwen families now trail the frontier by months, not years. And they run on hardware you can buy. Apple’s new fully loaded M5Max Macbook Pro can run a 70-billion-parameter model on your desk. Put that capability in your own racks, and a steady enterprise workload could cost far less to run than it does to rent.

The Path Home Is Not Paved

But this freedom comes with a new challenge. The total cost of AI moves from provider bills to infrastructure data feeds. Tracing utilization to business value becomes impossible without deep agent observability and intelligent outcome scoring.

The industry is rising to this challenge. In June 2026, the Linux Foundation announced the Tokenomics Foundation, working alongside the FinOps Foundation, to set open standards for AI cost management. Observability best practices around the creation of accurate, low cost evals and guardrails are rapidly maturing. Tokenomics is becoming the FinOps of this era. It is shaping up to be a defining skill for the teams that run AI well.

From Silicon to Evals

This is where owning the whole stack means leverage. The ability to trace information from hardware utilization through agent behavior gives AI teams the tools they need to optimize spend, capacity and ultimately, business ROI. It puts true AI trust and security within reach by unifying data across every layer. It puts token consumption next to output quality and the performance of infrastructure, giving teams the ability to monitor combined performance across Cisco AI PODs, NVIDIA NIMs, and Milvus vector DBs.

These signals can be paired with custom quality metrics in Splunk Agent Observability that describe agent behavior across a number of dimensions. Teams can easily trade off cost vs. performance, so they can know which workloads can safely run on open weight models at a fraction of the cost. Real-time guardrails stop a runaway agent before it becomes a budget story. The invoice may have disappeared, but your visibility does not have to.

That full-stack view is also what unblocks the build-out. The hard part of an AI data center is not standing up GPUs; it is running them at a cost and quality you can see and defend. When the hardware, the network, and the observability come from one place, you can bring workloads home and still answer the only question that matters: what did this cost, and was it worth it?

Keep AI Amazing and the Bills Boring

AI earns its keep when you can see what it costs and what it returns. Bring the steady workloads home, run them on hardware you control, and hold them to the same discipline as every other line of variable spend.

The shift is already underway, and the economics will not reverse. The teams that come out ahead pair hardware they own with one honest view of what each workload costs, from the token down to the GPU. Get that right and repatriation stops being a cost-cutting scramble and becomes a lasting advantage.

To go deeper, download The Agentic Shift: Redefining Observability for the AI Era.

FAQ

What is cloud repatriation for AI?

It is moving AI workloads from rented public cloud and per-token model APIs to open-weight models running on infrastructure you own or control, usually to cut cost and gain predictability for steady, heavy workloads.

Why are companies repatriating AI now?

Agentic workloads spend unpredictably, and providers are repricing toward metered billing, so per-token costs keep surprising teams. 86% of CIOs plan to move some workloads back from public cloud, with cost the top driver.

What is the hardest part of running AI on your own hardware?

Two things. High-bandwidth memory is sold out through 2026, and there is no cloud invoice to measure spend, so you have to track GPU, memory, and token cost yourself to know what a workload really costs.

Related Articles

Fueling the SOC of the Future with Built-in Threat Research and Detections in Splunk Enterprise Security
Security
3 Minute Read

Fueling the SOC of the Future with Built-in Threat Research and Detections in Splunk Enterprise Security

The Splunk Threat Research Team develops security resources and content that helps enhance your ability to detect and respond to advanced threats.
Splunk Named a Leader in the 2026 IDC MarketScape for Worldwide SIEM
Security
5 Minute Read

Splunk Named a Leader in the 2026 IDC MarketScape for Worldwide SIEM

We’re thrilled to share that Splunk has been named a Leader in the IDC MarketScape: Worldwide SIEM 2026 Vendor Assessment.
Splunk Security Content for Threat Detection & Response: November Recap
Security
1 Minute Read

Splunk Security Content for Threat Detection & Response: November Recap

Discover Splunk's November security content updates, featuring enhanced Castle RAT threat detection, UAC bypass analytics, and deeper insights for validating detections on research.splunk.com.