July 21, 2025

9 Minute Read

Using Splunk to Develop Local LLM MCP Mitre Atlas Detections

By Rod Soto

Continuing our journey into understanding and addressing threats against locally hosted LLMs and MCP servers we are going to put the pieces of the puzzle together. In this blog, we are going to work on developing Mitre ATLAS detections based on logs obtained from Ollama hosting the most popular open source LLM (llama), used in conjunction with a Model Context Protocol Server and a security testing and red team tool that targets LLMs known as Promptfoo.

This blog will cover mainly 3 elements needed to develop LLM, MCP detections using Splunk:

A framework that provides guidance on Threat Taxonomy, Adversarial Tactics & Techniques - Mitre ATLAS.
An infrastructure that allows the execution of the attacks, configured to replicate as close as possible a real environment and where these attacks can be performed without risk of affecting production infrastructure and allows for collection of RAW logs from Ollama, LLM and MCP Server. In this case we are going to use a range like dockerized lab I wrote named Splunk MCP LLM SIEMulator.
With the collection and management of logs into an usable format, we are going to ingest it into Splunk and proceed to develop detections based on Mitre ATLAS categories.

The Framework: Mitre ATLAS

Adversarial Threat Landscape for Artificial Intelligence Systems is a knowledge base of adversary tactics and techniques against AI-enabled systems based on real-world attack observations and realistic demonstrations from AI read teams and security groups.

One of the main current challenges of everyone in the industry is, how do we test these things? And what should it test for? Can we even see it? Mitre ATLAS is definitely a necessary tool for those tasked in monitoring and securing AI technologies such as LLMs and MCPs.

The Mitre ATLAS framework is composed of 15 categories of tactics broken down into techniques. It is important that security analysts and AI developers or implementers should be made aware of potential, realistic threats targeting AI-driven systems. This knowledge will facilitate threat assessments and internal red teaming exercises, allowing for a deeper understanding of actual adversary behaviors in the real world and identifying suitable mitigation strategies.

It is crucial to document and report unique, real-world attacks on AI-enabled systems executed by adversaries, as this valuable information can inform future defensive measures and enhance overall system security. Cisco AI Defense has done a great job putting together and explaining the similarities of different AI security taxonomies.

The AI Security and Safety Taxonomy introduced by Cisco presents a comprehensive framework encompassing both security and safety concerns related to AI applications in an integrated manner. The emphasis on AI security involves safeguarding sensitive data and essential computing resources from unauthorized intrusions or attacks, while prioritizing AI safety addresses the potential hazards stemming from unintended consequences, such as design flaws or misuse, which may result in financial losses, reputational damage, and legal complications.

To address these risks effectively, Cisco advocates a holistic, end-to-end solution that integrates automated model and application validation for identifying vulnerabilities and runtime protection to enforce safety guidelines during deployment. The taxonomy offers a detailed overview of various security and safety threats, along with clear explanations, practical instances, and alignments with established standards like NIST, MITRE ATLAS, OWASP Top 10 for LLM applications. Cisco's AI Defense solution embodies this framework, offering tools for continuous vulnerability assessment and real-time enforcement of safety measures.

You can read more about Cisco’s approach to AI’s security taxonomy here. It is important to understand that many of these approaches are evolving and likely will change in the near future. In this blog we are going to focus on TTPs from Mitre ATLAS.

The Infrastructure: Splunk MCP LLM SIEMulator

A dockerized lab developed by the author in order to operationalize the execution of attacks (via Promptfoo) against local LLMs and MCP servers. This framework uses Ollama plus MCP Server as targets, and it is also pre configured to send debug RAW logs to a Splunk instance. These are the main components of the lab include:

Splunk Core: Provides log management and our main platform for developing detections via SPL code.
Local LLM: Local LLM runs on Ollama, which allows natural language understanding and reasoning. The local LLM used for testing was llama3.2.
MCP (Model Context Protocol): An MCP server that interacts with Ollama.
Promptfoo: An open source framework for evaluation and testing LLM prompts.

With the Splunk MCP LLM SIEMulator dockerized lab we can operationalize attacks against local Language Model (LLM) and Model Context Protocol (MCP) servers by employing Promptfoo as an attack vector. This architecture utilizes Ollama and an MCP Server as primary targets while sending debug RAW logs for analysis to a Splunk instance.

The lab's essential component, including Splunk Core, acts as the central hub for log management and detection development.

The Detections: Splunk

Splunk is the core of our lab where the logs will be sent and from where we can further manage them in order to create feasible detections. As stated in this blog, the logs are sent in RAW format from the docker containers into Splunk via HEC. This was half of the challenge.

The MCP / Ollama logs are not very friendly when it comes to parsing. In order to make these logs more actionable for detections the author wrote python scripts to turn them into json. Here are some the properties and challenges of these logs:

By default Ollama does not emit logs in structured format (i.e. JSON) but rather plain stdout/stderr text (same thing for MCP logs)
Log content: Must be set to DEBUG in order to get the most of the system. These logs contain operational messages such as warnings and errors, details on hardware resources (processes, memory, CUDA). Model specific messages (loading, unloading). Also they usually do not contain prompts (Something that was addressed by installing a specific version of Ollama).
Log structure: include log level (INFO, WARN, ERROR) format is unstructured.

The above are some of the challenges faced with Ollama logs, as this framework evolves hopefully a more stable, structured and verbose form should be implemented. As of the writing of this blog the author had to develop specific scripts to turn these logs into a structured format (json) that was used in the building of the detections.

Once we have the infrastructure in place with logs from Ollama and MCP Server flowing we can proceed to perform our attacks against the LLM and MCP using Promptfoo. As seen in the next screenshots the attacks can be executed from the command line once the lab is operational and running.

Prompfoo allows an operator to execute different types of prompt based attacks or OWASP Top 10 vulnerability probing against API endpoints in the Ollama server and MCP Server. This open source tool uses two approaches for testing and securing LLM applications.

EVAL: Evaluates the quality, reliability and correctness of prompts and LLM Outputs
Red Team: Identifies vulnerabilities and security risks in LLM applications by simulating adversarial attacks.

With the above capabilities the author proceeded to run several EVAL & Red Team runs against the lab LLM and MCP Server. Once we have ingested the logs and converted them into json format we re-uploaded them into Splunk and we were able to start developing detections to Mitre ATLAS TTPs. The following are 2 examples of Mitre ATLAS TTP detections for Ollama and 2 examples for MCP Server. (The detections were mapped as well OWASP TOP 10 LLM.)

Ollama Detections

Prompt Injection *

An adversary may craft malicious prompts as inputs to an LLM that cause the LLM to act in unintended ways. These "prompt injections" are often designed to cause the model to ignore aspects of its original instructions and follow the adversary's instructions instead.

Mitre ATLAS AML.T0051 - OWASP TOP 10 LLM LL01:2025

SPL code

index=ollamaparsedjsonlogs | eval client_ip=case(
   isnotnull(client) AND match(client,"^\d+\.\d+\.\d+\.\d+$"), client,
   match(raw,"\|\s+(\d+\.\d+\.\d+\.\d+)\s+\|"), replace(raw,".*\|\s+(\d+\.\d+\.\d+\.\d+)\s+\|.*","\1"),
   1==1, "internal_process")
| eval has_injection_keywords=if(match(msg,"ignore|forget|disregard|override|system|SYSTEM|bypass|jailbreak"),1,0)
| eval has_role_manipulation=if(match(msg,"you are now|act as|pretend to be|roleplay|developer mode"),1,0)
| eval has_instruction_override=if(match(msg,"previous instructions|above instructions|ignore.*instructions|new instructions"),1,0)
| eval has_encoding_injection=if(match(error,"codec|charmap|decode|encode"),1,0)
| eval has_format_injection=if(match(msg,"<|>|\[|\]|\{|\}|```|---"),1,0)
| eval prompt_length=if(isnotnull(prompt) AND prompt!="" ,tonumber(prompt),0)
| eval is_completion=if(match(msg,"completion request"),1,0)
| bucket _time span=10m
| stats
   sum(has_injection_keywords) as injection_keywords,
   sum(has_role_manipulation) as role_manipulation,
   sum(has_instruction_override) as instruction_override,
   sum(has_encoding_injection) as encoding_injection,
   sum(has_format_injection) as format_injection,
   sum(is_completion) as completion_attempts,
   dc(prompt_length) as prompt_diversity,
   values(client_ip) as all_clients
   by _time
| where injection_keywords>0 OR role_manipulation>0 OR instruction_override>0 OR encoding_injection>0 OR format_injection>0 OR completion_attempts>20
| mvexpand all_clients
| rename all_clients as client_ip
| where match(client_ip,"^\d+\.\d+\.\d+\.\d+$")
| sort _time, client_ip
| table _time, client_ip, injection_keywords, role_manipulation, instruction_override, encoding_injection, format_injection, completion_attempts, prompt_diversity

The above query detects potentially suspicious or malicious prompt injections attempts by searching for:

Attempts to override system instructions or manipulate models.
Role manipulation such as posing as another role (admin, developer, system).
Format injection via insertion of structured payloads like json or markdown.
Forcing encoding decoding failures to crash or escape restrictions.
Abusive repeated completion calls in short windows.

Cost Harvesting *

Adversaries may target different AI services to send useless queries or computationally expensive inputs to increase the cost of running services at the victim organization.

Mitre ATLAS AML.T0034 - OWASP Top 10 LLM10:2025

SPL code

index="ollamaparsedjson" sourcetype="json_no_timestamp"
| search "*gpu*" OR "*cpu*" OR "*memory*" OR "*compute*" OR "*resource*" OR "*cuda*" OR "*vram*"
| eval resource_usage=case(
   match(_raw, "(?i)(gpu|cuda|vram)"), "gpu_usage",
   match(_raw, "(?i)cpu.*usage"), "cpu_usage",
   match(_raw, "(?i)memory.*usage"), "memory_usage",
   match(_raw, "(?i)compute"), "compute_usage",
   1=1, "other"
)
| where resource_usage!="other"
| bin _time span=1h
| stats count by _time, host, resource_usage, name
| where count > 5
| sort -count

The above search detects cost harvesting by looking at 1 hour intervals for each host and the number of logs referencing activity to GPU, CPU, memory or computer usage, targeting workloads consuming lots of resources, patterns or spikes.

MCP Detections

Cost Harvesting * (Resource exhaustion via MCP)

Adversaries may target different AI services to send useless queries or computationally expensive inputs to increase the cost of running services at the victim organization, in this case via the malicious use of MCP.

SPL code

index="mcp_prompt_foo_jsonl" category="TIMER" message="timeout callback*"
| eval timeout_value_ms=tonumber(replace(message,"timeout callback ",""))
| eval timeout_severity=case(
timeout_value_ms >= 30000, "CRITICAL",
timeout_value_ms >= 10000, "HIGH",
timeout_value_ms >= 5000, "MEDIUM",
timeout_value_ms >= 1000, "LOW",
true(), "INFO"
)
| bucket _time span=5m
| stats
count as timeout_events,
min(timeout_value_ms) as min_timeout_ms,
max(timeout_value_ms) as max_timeout_ms,
avg(timeout_value_ms) as avg_timeout_ms,
sum(eval(if(timeout_severity="CRITICAL",1,0))) as critical_count
by _time, host, pid
| eval time_window=strftime(_time, "%Y-%m-%d %H:%M")
| eval resource_exhaustion_risk=case(
critical_count > 10, "SEVERE",
critical_count > 5, "HIGH",
critical_count > 3, "MEDIUM",
true(), "LOW"
)
| where resource_exhaustion_risk IN ("HIGH", "SEVERE")
| table time_window, host, pid, timeout_events, critical_count, avg_timeout_ms, resource_exhaustion_risk

The above search detects processes with excessive or severe timeout callbacks related to the use of Model Context Protocol which can indicate:

Overloaded services
Resource exhaustion at host
Unresponsive processes

MCP AI ML Denial of Service *

Mitre ATLAS - AML.T0029 - OWASP LLM10:2025

Adversaries may target AI-enabled systems with a flood of requests for the purpose of degrading or shutting down the service. Since many AI systems require significant amounts of specialized compute, they are often expensive bottlenecks that can become overloaded. Adversaries can intentionally craft inputs that require heavy amounts of useless compute from the AI system.

SPL code

index="mcp_prompt_foo_jsonl" category="NET"
| eval connection_action=case(
like(message, "%destroy%"), "DESTROY",
like(message, "%close%"), "CLOSE",
like(message, "%shutdown%"), "SHUTDOWN",
like(message, "%emit close%"), "EMIT_CLOSE",
like(message, "%SERVER%"), "SERVER_ACTION",
true(), "OTHER"
)
| eval action_severity=case(
connection_action="DESTROY", "CRITICAL",
connection_action="SHUTDOWN", "HIGH",
connection_action="CLOSE" OR connection_action="EMIT_CLOSE", "MEDIUM",
connection_action="SERVER_ACTION", "LOW",
true(), "INFO"
)
| bucket _time span=1m
| stats
count as net_events,
dc(connection_action) as action_types,
values(connection_action) as actions,
values(action_severity) as severities,
sum(eval(if(action_severity="CRITICAL",1,0))) as critical_actions,
sum(eval(if(action_severity="HIGH",1,0))) as high_actions,
dc(pid) as affected_processes
by _time, host
| eval time_window=strftime(_time, "%Y-%m-%d %H:%M")
| eval availability_risk=case(
critical_actions > 5, "SERVICE_DISRUPTION",
high_actions > 3, "POTENTIAL_DISRUPTION",
net_events > 20, "HIGH_ACTIVITY",
true(), "NORMAL"
)
| eval impact_level=case(
availability_risk="SERVICE_DISRUPTION", "CRITICAL",
availability_risk="POTENTIAL_DISRUPTION", "HIGH",
availability_risk="HIGH_ACTIVITY", "MEDIUM",
true(), "LOW"
)
| where impact_level IN ("MEDIUM", "HIGH", "CRITICAL")
| sort -critical_actions, -net_events

The above detection monitors network related connection events (shutdowns, closes, destroy) in MCP logs and displays possible patterns in:

Service disruptions
Network instability
High churn of connections

As seen in this blog it is possible to develop detections by operationalizing an environment that resembles or replicates real conditions using the same technologies in a dockerized lab. All of these technologies are evolving and certainly many of their features are going to change, specifically the format and verbosity of logs.

With these types of tools we can now approach these threats and vulnerability scenarios against local LLMs and MCP by looking at the available logs in their current form and addressing them in a workable manner ingesting them in Splunk and developing detections.

To learn more about Splunk Threat Research Team and to access the tools and security content, please visit research.splunk.com.

Rod Soto

Worked at Prolexic, Akamai, Caspida. Won BlackHat CTF in 2012. Co-founded Hackmiami, Pacific Hackers meetup and conferences.

Artificial Intelligence 3 Min Read

Powering Digital Resilience in the AI Era

With rapid advancements in AI, digital resilience is no longer optional – that's why leading organizations trust Splunk’s unified security and observability platform to keep their digital systems secure and reliable.

Artificial Intelligence 1 Min Read

Splunking GenAI Applications for Observability Insights

Splunker Jeff Wiedemann explains how Splunk Observability Cloud allows you to instrument GenAI apps to gain critical observability insights.

Artificial Intelligence 6 Min Read

Splunk’s AI Assistant: Top 7 Use Cases for AI-Driven Observability

Discover how Splunk’s AI Assistant transforms observability with AI-driven insights. Learn 7 powerful use cases to enhance performance and incident response.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram

Follow @Splunk

See Splunk Perspectives blog for execs

Get Perspectives

Using Splunk to Develop Local LLM MCP Mitre Atlas Detections

The Framework: Mitre ATLAS

The Infrastructure: Splunk MCP LLM SIEMulator

The Detections: Splunk

Ollama Detections

Prompt Injection *

Cost Harvesting *

MCP Detections

Cost Harvesting * (Resource exhaustion via MCP)

MCP AI ML Denial of Service *

Related Articles

Powering Digital Resilience in the AI Era

Splunking GenAI Applications for Observability Insights

Splunk’s AI Assistant: Top 7 Use Cases for AI-Driven Observability

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram

See Splunk Perspectives blog for execs