May 27, 2025

10 Minute Read

How To Use Splunk To Monitor Security of Local LLMs (Part I)

By Rod Soto

With the recent increase in use and popularity of Large Language Models (LLMs), many organizations are still trying to approach the use of this technology. In this blog, we approach the different log sources that can be potentially used to monitor a local LLM. Local LLMs are an alternative way of approaching generative AI without risking sending sensitive proprietary information to the main providers of this technology which are cloud based and presumed to use customer’s data for their own model training. One of the most popular frameworks to deploy LLMs locally is Ollama, which is a lightweight open source framework that can run Large Language Models locally, such as Meta’s LLAMA, or make API calls to ChatGPT and other online LLMs.

What is an LLM?

An LLM is a computer program designed to process, understand, and generate human-like language. These models are trained on vast amounts of text data, which allows them to learn patterns, relationships, and nuances of language. This training enables the model to:

A. “Understand”: The LLM can comprehend natural language input, such as sentences or conversations.

B. “Generate”: It can produce human-like text based on a given prompt or context.

LLMs are often used for various tasks, including:

Text summarization: Summarizing long pieces of text into concise, informative summaries.
Language translation: Translating text from one language to another.
Text completion: Predicting the next word in a sentence based on context.
Dialogue systems: Engaging in conversations and responding to user inputs

Foundational & Frontier Models

While both terms refer to broad, general-purpose language models that can be fine-tuned for various downstream tasks, they have slightly different connotations.

Foundational Model

A foundational model is a type of large-scale, pre-trained LLM that serves as a building block for many other models. It's often designed to capture a wide range of linguistic knowledge and can be used as a starting point for specialized models in various domains (e.g., natural language understanding, text classification, sentiment analysis). The idea behind a foundational model is that it can provide a strong foundation for multiple downstream applications, hence the name.

One example of a foundational model is BERT (Bidirectional Encoder Representations from Transformers). BERT is a foundation model developed by Google in 2018. It's a pre-trained language model based on the transformer architecture, which was initially designed for natural language processing (NLP) tasks like language translation and text classification.

Frontier Model

A frontier LLM, also known as a "frontier" or "leading-edge" model, is a term popularized by some researchers and AI enthusiasts. It refers to the state-of-the-art, cutting-edge large language models that are pushing the boundaries of what's possible in NLP. These models are often characterized by their massive scale (e.g., billions of parameters), advanced architectures (e.g., transformer-XL, Longformer), and impressive performance on a wide range of tasks.

ChatGPT is considered a "frontier model" in the field of artificial intelligence, as it represents a cutting-edge large language model with advanced capabilities in text generation, translation, and complex language tasks, placing it at the forefront of current AI technology development, particularly due to its ability to generate human-like text and engage in interactive conversations; this designation is often attributed to its underlying model, GPT-4, developed by OpenAI.

While both terms describe broad, general-purpose LLMs, there is a subtle distinction:

A foundational model emphasizes its utility as a building block for multiple downstream applications.
A frontier LLM highlights its cutting-edge nature, emphasizing its current state-of-the-art status in the field.

Multimodal LLMs

A multimodal LLM is a type of artificial intelligence model that can process and understand multiple forms of data, such as text, images, videos, audio, and other sensory inputs. Unlike traditional LLMs that only handle text-based input, multimodal LLMs integrate multiple modalities to create more comprehensive representations of the world.

Some applications of multimodal LLMs include:

Image-text retrieval: searching for images based on text descriptions or vice versa.
Question answering: responding to questions by integrating visual, auditory, and textual context.
Visual storytelling: generating narratives that incorporate images, videos, and audio.
Emotion recognition: recognizing emotions from a combination of facial expressions, speech patterns, and body language.

Some of the current popular multimodal LLMs include GPT4 Vision, Flamingo, Dall-E, and CLIP.

The use of these models is now very popular and becoming prevalent as AI is taking hold in pretty much every day use from TVs, phones, watches, computers, glasses, cars and soon we will witness the embodiment of AI from home assistants to robots, drones and many other devices. This is why it is important to look at possible risks and threats against this technology.

Risks of LLMs in the Enterprise

The use of these technologies comes with a number of risks which many enterprises need to consider and manage as they adopt LLMs.

Data Security Risks

When using these online LLMs it is important to understand that data leakage of sensitive information is possible. This has been reported recently where an organization found data leakage containing internal company information. This may lead to unauthorized access to confidential information and inadvertent disclosure of trade secrets. It is clear that these companies operate in cloud infrastructure and as such the data is stored in external, publicly exposed and likely multi-tenant cloud environments.

Accuracy Issues & Business Impact Risks

LLMs are known for “Hallucinations”, which consists of generating information that appears plausible yet is incorrect and factually wrong. This may lead to wrong business insights, inconsistent responses when using this technology and misleading analytical results. Without checking the information generated from these technologies organizations are also at risk of over reliance on information that is probably factually wrong and incomplete, lack of output transparency (as a customer you really do not get algorithm transparency) and this may lead to biased decision making and potential damage to brand reputation.

Operational Risks

Some of the things that have been noted and experienced as these technologies are being used extensively by many individuals and organizations is the degradation of service quality, uncertainty of operational costs and the loss of data and control once input into online LLM platforms.

Technical Vulnerabilities

As any other application running on cloud environments, these technologies are affected by supply chain vulnerabilities, web application and infrastructure vulnerabilities. There are also specific threats to these technologies which will be covered later in this blog. A recent campaign where a malicious extension associated with a popular LLM tool that creates images led to a compromise of a Fortune 500 organization.

Compliance Issues

This type of risks are clear as the storage and leakage of confidential, sensitive information presents risks of violation of data protection laws, breach of compliance standards, copyright infringement and legal penalties for misuse.

Available Local LLMs

As we have outlined some clear risks of the use of Online LLMs, there are also a number of organizations and individuals that are approaching the adoption of LLMs by using locally hosted frameworks that allow to run models without interacting with online platforms, eliminating some of the risks associated with Online LLMs.

There are several ways of deploying LLMs locally; two popular open source frameworks that can provide local LLM functionality are GPT4ALL and Ollama. With these frameworks you can deploy popular LLM models that can run from laptops, computers and servers and then run them locally. In this blog we will focus on two very popular LLM models LLAMA and DEEPSEEK.

GPT4ALL

https://www.nomic.ai/gpt4all

Includes native chat applications for Windows, MacOs and Linux and features LocalDocs plugin for document interaction. Allows model training and features REST API endpoints.

LM Studio

https://lmstudio.ai/

Supports a wide range of models, including Vicuna, Alpaca, LLaMa, Falcon, Starcoder, and GPT-2. Offers robust completion and chat endpoints, with acceleration via CUDA, OpenCL, cuBLAS, and Metal for high performance.

Known for extensive model compatibility and performance, making it suitable for users who want flexibility and speed.

Llama.cpp

https://github.com/ggml-org/llama.cpp

An open-source project focused on efficient inference of LLMs on local hardware. Runs on MacOS, Windows, and Linux, and is popular for its lightweight footprint and community support.

Ollama

https://ollama.com

Creates an isolated environment for LLM deployment, can be used in Desktop environments and works best with dedicated GPU's. Supports multiple model versions and operates completely offline. Supports Llama 3.2+ for NLP tasks, Mistral for code generation, allows model creation capabilities.

It is important to note that Llama 3+ models are considered the closest open source models to an actual FRONTIER model (ChatGPT, Claude). In this blog we will focus on the Ollama framework as it has become one of the most used frameworks to adopt LLMs locally.

Simple LLM Use Cases for Cybersecurity

Report and Incident Summarization

As seen in the above screenshot we can feed a PDF (in this case Mandiant APT1 report pdf) and ask llama3.1 to summarize it. This functionality can be used to effectively summarize and share incident reports or create actual reports from data like many companies are already doing with incident reports, tickets and customer updates.

Severity Rating & Alert Investigations

Another use for a LLM in security is to evaluate security events to determine threat levels, assign ratings, and prioritize incident response activities. This functionality is currently being applied by many organizations in detection of threats such as phishing, or malicious code.

The above are just two examples of some of the current ongoing research and uses of these new technologies. There are many other ongoing initiatives including generative AI use for vulnerability exploitation and defense.

Interacting with LLMs

The simplest way to interact with a LLM is by inputting language into it. This inputting can come from a GUI, keyboard, phone, camera, etc. The name of this structured input into a LLM is called a “prompt." A prompt is usually composed of the following:

Instruction: A command or query directing the model.
Context: Additional background information that gives model context for its response.
Format Specification: Specific output or specific style of answer.

Prompts can be structured in different ways but there are some known types of prompts:

Zero-shot: Asking a question without providing details, instructions or examples.

Few-shot: Include a few examples to guide model

Input: The sky is blue
Output: color
- Input: grass is green
Output: Color
- Input: Blood is Red
  - Output: [response]

Chain of Thought (CoT): This type of prompt guides language models to break down their reasoning into step by step intermediate steps before providing a final answer.

Input: How many apples are left if you start with 7 apples and give 2 to your friend?

Answer: Let’s solve this step by step

Start with 7 apples
I give 2 apples away
Therefore 7 - 2 = 5 apples

Answer: the answer is 5 apples

Input: How many apples are left if you start with 12 apples and give 4 to your friend?

There are many other types or techniques of prompting but in general these are some of the most common ones.

It is important to note that as prompts are the main way to communicate with the model they are of course the main medium of attack as prompt interfaces are input fields that can be used to insert malicious code or even craft prompts that may allow attackers to take advantage of the model.

Threats Affecting a LLM

Now that we have reviewed the basics of LLMs, let’s review some of the items that relate to threats of a LLM. The best way to understand these threats is by looking at the current efforts to outline the risks and threats to these technologies. Some of the most relevant efforts are OWASP Top 10 for LLMs and Mitre ATLAS.

Security Frameworks & Attack Scenarios

With many companies using LLMs online and locally there are now some initiatives that are focused on LLM security; the most relevant at the moment are OWASP and MITRE ATLAS. Splunk SURGe has previously covered OWASP Top 10 LLMs.

OWASP

h ttps://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/

A community driven effort to address and highlight security issues specific to AI Applications. The top 10 security issues detail specific threats affecting LLMs.

LLM01:2025 Prompt Injection

A prompt that alters the LLM behavior or output in unintended ways.

LLM02:2025 Sensitive Information Disclosure

Failure to sanitize or prevent sensitive information when outputting responses.

LLM03:2025 Supply Chain

Biased outputs, security breaches or system failures due to compromised integrity of supply chain.

LLM04:2025 Data & Model Poisoning

Data in pre-training, fine-tuning or embedded is manipulated to introduce vulnerabilities, backdoors or biases.

LLM05:2025 Improper Output Handling

Insufficient validation, sanitization and handling of outputs before interacting with other components or systems. (XSS, CSRF, SSRF)

LLM06:2025 Excessive Agency

Excessive Agency refers to the vulnerability that arises when an LLM, despite being designed to assist or inform, takes on a level of authority or control that is not intended. This can happen in various ways, but ultimately results in the model performing actions that are unexpected, ambiguous, or even malicious.

LLM07:2025 System Prompt Leakage

System prompts or instructions used to steer behavior of the model can also contain sensitive information that was not intended to be discovered.

LLM08:2025 Vector and Embedding Weaknesses

This vulnerability is related to the use of Retrieval Augmented Generation (RAG). Malicious manipulation of vectors and embeddings that can be generated, stored or retrieved injecting harmful content, manipulating model output or access sensitive information.

LLM09:2025 Misinformation

LLM produces false or misleading information that appears credible.

LLM10:2025 Unbounded Consumption

Risk of depleting resources, disruption of service, or theft of intellectual property via the execution of excessive uncontrolled inferences.

MITRE Adversarial Landscape for AI Systems (ATLAS)

A framework that contains a knowledge base of adversary tactics and techniques based on real-world attack observations and realistic demonstrations from AIred teams and security groups.

https://atlas.mitre.org/

This framework as of January 2025, breaks down 14 tactics, 91 techniques, and 26 mitigation strategies. Some of the key adversarial techniques contained in this framework include:

Access and Control Techniques

LLM Plugin Compromise
Proxy Model Creation
Model Backdoors

Model Manipulation Techniques

Data Poisoning
Model Extraction
Model Evasion

Information Gathering Techniques

ML Model Discovery
Public Artifact Collection
LLM Prompt Manipulation

This framework contains defensive strategies to protect against AI System attacks. One particular area which we are addressing in this blog is related to detection and monitoring which includes:

Model Protection
- Monitor Model extraction attempts
- Implement detection mechanisms for data poisoning attempts
- Deploy systems to identify adversarial examples and model evasion techniques
Operational Security
- Conduct Regular threat hunting to detect malicious activity early
- Use EDR/XDR technologies
- Implement proactive detection capabilities for identifying attack patterns.

In the next part of How to Use Splunk to Monitor Security of Local LLMs, we’ll look at log data with Splunk to defend against threats directed to local LLMs. We will also break these threats in three categories:

Threats against systems that run LLMs
Threats while the LLM is in use
Threats related to the development of Models

Rod Soto

Worked at Prolexic, Akamai, Caspida. Won BlackHat CTF in 2012. Co-founded Hackmiami, Pacific Hackers meetup and conferences.

Artificial Intelligence 4 Min Read

How to Use Splunk to Monitor Security of Local LLMs (Part II)

Learn how to use Splunk to monitor and defend your local Large Language Models (LLMs).

Artificial Intelligence 9 Min Read

Using Splunk to Monitor the Security of MCP Servers

Learn how to use Splunk to monitor MCP Server security.

Artificial Intelligence 4 Min Read

AI at Splunk: Trustworthy Principles for Digital Resilience

Building AI responsibly is one thing, but embedding trust into every aspect of our AI strategy is another entirely – and that’s what Splunk sets out to do. Kriss Deiglmeier and Hao Yang explain more in this blog.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram

Follow @Splunk

See Splunk Perspectives blog for execs

Get Perspectives

How To Use Splunk To Monitor Security of Local LLMs (Part I)

What is an LLM?

Foundational & Frontier Models

Foundational Model

Frontier Model

Multimodal LLMs

Risks of LLMs in the Enterprise

Data Security Risks

Accuracy Issues & Business Impact Risks

Operational Risks

Technical Vulnerabilities

Compliance Issues

Available Local LLMs

GPT4ALL

LM Studio

Llama.cpp

Ollama

Simple LLM Use Cases for Cybersecurity

Report and Incident Summarization

Severity Rating & Alert Investigations

Interacting with LLMs

Threats Affecting a LLM

Security Frameworks & Attack Scenarios

OWASP

LLM01:2025 Prompt Injection

LLM02:2025 Sensitive Information Disclosure

LLM03:2025 Supply Chain

LLM04:2025 Data & Model Poisoning

LLM05:2025 Improper Output Handling

LLM06:2025 Excessive Agency

LLM07:2025 System Prompt Leakage

LLM08:2025 Vector and Embedding Weaknesses

LLM09:2025 Misinformation

LLM10:2025 Unbounded Consumption

MITRE Adversarial Landscape for AI Systems (ATLAS)

Related Articles

How to Use Splunk to Monitor Security of Local LLMs (Part II)

Using Splunk to Monitor the Security of MCP Servers

AI at Splunk: Trustworthy Principles for Digital Resilience

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram

See Splunk Perspectives blog for execs