With the recent increase in use and popularity of Large Language Models (LLMs), many organizations are still trying to approach the use of this technology. In this blog, we approach the different log sources that can be potentially used to monitor a local LLM. Local LLMs are an alternative way of approaching generative AI without risking sending sensitive proprietary information to the main providers of this technology which are cloud based and presumed to use customer’s data for their own model training. One of the most popular frameworks to deploy LLMs locally is Ollama, which is a lightweight open source framework that can run Large Language Models locally, such as Meta’s LLAMA, or make API calls to ChatGPT and other online LLMs.
An LLM is a computer program designed to process, understand, and generate human-like language. These models are trained on vast amounts of text data, which allows them to learn patterns, relationships, and nuances of language. This training enables the model to:
A. “Understand”: The LLM can comprehend natural language input, such as sentences or conversations.
B. “Generate”: It can produce human-like text based on a given prompt or context.
LLMs are often used for various tasks, including:
While both terms refer to broad, general-purpose language models that can be fine-tuned for various downstream tasks, they have slightly different connotations.
A foundational model is a type of large-scale, pre-trained LLM that serves as a building block for many other models. It's often designed to capture a wide range of linguistic knowledge and can be used as a starting point for specialized models in various domains (e.g., natural language understanding, text classification, sentiment analysis). The idea behind a foundational model is that it can provide a strong foundation for multiple downstream applications, hence the name.
One example of a foundational model is BERT (Bidirectional Encoder Representations from Transformers). BERT is a foundation model developed by Google in 2018. It's a pre-trained language model based on the transformer architecture, which was initially designed for natural language processing (NLP) tasks like language translation and text classification.
A frontier LLM, also known as a "frontier" or "leading-edge" model, is a term popularized by some researchers and AI enthusiasts. It refers to the state-of-the-art, cutting-edge large language models that are pushing the boundaries of what's possible in NLP. These models are often characterized by their massive scale (e.g., billions of parameters), advanced architectures (e.g., transformer-XL, Longformer), and impressive performance on a wide range of tasks.
ChatGPT is considered a "frontier model" in the field of artificial intelligence, as it represents a cutting-edge large language model with advanced capabilities in text generation, translation, and complex language tasks, placing it at the forefront of current AI technology development, particularly due to its ability to generate human-like text and engage in interactive conversations; this designation is often attributed to its underlying model, GPT-4, developed by OpenAI.
While both terms describe broad, general-purpose LLMs, there is a subtle distinction:
A multimodal LLM is a type of artificial intelligence model that can process and understand multiple forms of data, such as text, images, videos, audio, and other sensory inputs. Unlike traditional LLMs that only handle text-based input, multimodal LLMs integrate multiple modalities to create more comprehensive representations of the world.
Some applications of multimodal LLMs include:
Some of the current popular multimodal LLMs include GPT4 Vision, Flamingo, Dall-E, and CLIP.
The use of these models is now very popular and becoming prevalent as AI is taking hold in pretty much every day use from TVs, phones, watches, computers, glasses, cars and soon we will witness the embodiment of AI from home assistants to robots, drones and many other devices. This is why it is important to look at possible risks and threats against this technology.
The use of these technologies comes with a number of risks which many enterprises need to consider and manage as they adopt LLMs.
When using these online LLMs it is important to understand that data leakage of sensitive information is possible. This has been reported recently where an organization found data leakage containing internal company information. This may lead to unauthorized access to confidential information and inadvertent disclosure of trade secrets. It is clear that these companies operate in cloud infrastructure and as such the data is stored in external, publicly exposed and likely multi-tenant cloud environments.
LLMs are known for “Hallucinations”, which consists of generating information that appears plausible yet is incorrect and factually wrong. This may lead to wrong business insights, inconsistent responses when using this technology and misleading analytical results. Without checking the information generated from these technologies organizations are also at risk of over reliance on information that is probably factually wrong and incomplete, lack of output transparency (as a customer you really do not get algorithm transparency) and this may lead to biased decision making and potential damage to brand reputation.
Some of the things that have been noted and experienced as these technologies are being used extensively by many individuals and organizations is the degradation of service quality, uncertainty of operational costs and the loss of data and control once input into online LLM platforms.
As any other application running on cloud environments, these technologies are affected by supply chain vulnerabilities, web application and infrastructure vulnerabilities. There are also specific threats to these technologies which will be covered later in this blog. A recent campaign where a malicious extension associated with a popular LLM tool that creates images led to a compromise of a Fortune 500 organization.
This type of risks are clear as the storage and leakage of confidential, sensitive information presents risks of violation of data protection laws, breach of compliance standards, copyright infringement and legal penalties for misuse.
As we have outlined some clear risks of the use of Online LLMs, there are also a number of organizations and individuals that are approaching the adoption of LLMs by using locally hosted frameworks that allow to run models without interacting with online platforms, eliminating some of the risks associated with Online LLMs.
There are several ways of deploying LLMs locally; two popular open source frameworks that can provide local LLM functionality are GPT4ALL and Ollama. With these frameworks you can deploy popular LLM models that can run from laptops, computers and servers and then run them locally. In this blog we will focus on two very popular LLM models LLAMA and DEEPSEEK.
Includes native chat applications for Windows, MacOs and Linux and features LocalDocs plugin for document interaction. Allows model training and features REST API endpoints.
Supports a wide range of models, including Vicuna, Alpaca, LLaMa, Falcon, Starcoder, and GPT-2. Offers robust completion and chat endpoints, with acceleration via CUDA, OpenCL, cuBLAS, and Metal for high performance.
Known for extensive model compatibility and performance, making it suitable for users who want flexibility and speed.
https://github.com/ggml-org/llama.cpp
An open-source project focused on efficient inference of LLMs on local hardware. Runs on MacOS, Windows, and Linux, and is popular for its lightweight footprint and community support.
Creates an isolated environment for LLM deployment, can be used in Desktop environments and works best with dedicated GPU's. Supports multiple model versions and operates completely offline. Supports Llama 3.2+ for NLP tasks, Mistral for code generation, allows model creation capabilities.
It is important to note that Llama 3+ models are considered the closest open source models to an actual FRONTIER model (ChatGPT, Claude). In this blog we will focus on the Ollama framework as it has become one of the most used frameworks to adopt LLMs locally.
As seen in the above screenshot we can feed a PDF (in this case Mandiant APT1 report pdf) and ask llama3.1 to summarize it. This functionality can be used to effectively summarize and share incident reports or create actual reports from data like many companies are already doing with incident reports, tickets and customer updates.
Another use for a LLM in security is to evaluate security events to determine threat levels, assign ratings, and prioritize incident response activities. This functionality is currently being applied by many organizations in detection of threats such as phishing, or malicious code.
The above are just two examples of some of the current ongoing research and uses of these new technologies. There are many other ongoing initiatives including generative AI use for vulnerability exploitation and defense.
The simplest way to interact with a LLM is by inputting language into it. This inputting can come from a GUI, keyboard, phone, camera, etc. The name of this structured input into a LLM is called a “prompt." A prompt is usually composed of the following:
Prompts can be structured in different ways but there are some known types of prompts:
Zero-shot: Asking a question without providing details, instructions or examples.
Few-shot: Include a few examples to guide model
Chain of Thought (CoT): This type of prompt guides language models to break down their reasoning into step by step intermediate steps before providing a final answer.
Input: How many apples are left if you start with 7 apples and give 2 to your friend?
Answer: Let’s solve this step by step
Answer: the answer is 5 apples
Input: How many apples are left if you start with 12 apples and give 4 to your friend?
There are many other types or techniques of prompting but in general these are some of the most common ones.
It is important to note that as prompts are the main way to communicate with the model they are of course the main medium of attack as prompt interfaces are input fields that can be used to insert malicious code or even craft prompts that may allow attackers to take advantage of the model.
Now that we have reviewed the basics of LLMs, let’s review some of the items that relate to threats of a LLM. The best way to understand these threats is by looking at the current efforts to outline the risks and threats to these technologies. Some of the most relevant efforts are OWASP Top 10 for LLMs and Mitre ATLAS.
With many companies using LLMs online and locally there are now some initiatives that are focused on LLM security; the most relevant at the moment are OWASP and MITRE ATLAS. Splunk SURGe has previously covered OWASP Top 10 LLMs.
https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/
A community driven effort to address and highlight security issues specific to AI Applications. The top 10 security issues detail specific threats affecting LLMs.
A prompt that alters the LLM behavior or output in unintended ways.
Failure to sanitize or prevent sensitive information when outputting responses.
Biased outputs, security breaches or system failures due to compromised integrity of supply chain.
Data in pre-training, fine-tuning or embedded is manipulated to introduce vulnerabilities, backdoors or biases.
Insufficient validation, sanitization and handling of outputs before interacting with other components or systems. (XSS, CSRF, SSRF)
Excessive Agency refers to the vulnerability that arises when an LLM, despite being designed to assist or inform, takes on a level of authority or control that is not intended. This can happen in various ways, but ultimately results in the model performing actions that are unexpected, ambiguous, or even malicious.
System prompts or instructions used to steer behavior of the model can also contain sensitive information that was not intended to be discovered.
This vulnerability is related to the use of Retrieval Augmented Generation (RAG). Malicious manipulation of vectors and embeddings that can be generated, stored or retrieved injecting harmful content, manipulating model output or access sensitive information.
LLM produces false or misleading information that appears credible.
Risk of depleting resources, disruption of service, or theft of intellectual property via the execution of excessive uncontrolled inferences.
A framework that contains a knowledge base of adversary tactics and techniques based on real-world attack observations and realistic demonstrations from AIred teams and security groups.
This framework as of January 2025, breaks down 14 tactics, 91 techniques, and 26 mitigation strategies. Some of the key adversarial techniques contained in this framework include:
Access and Control Techniques
Model Manipulation Techniques
Information Gathering Techniques
This framework contains defensive strategies to protect against AI System attacks. One particular area which we are addressing in this blog is related to detection and monitoring which includes:
In the next part of How to Use Splunk to Monitor Security of Local LLMs, we’ll look at log data with Splunk to defend against threats directed to local LLMs. We will also break these threats in three categories:
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.