May 13, 2025

6 Minute Read

SOAR & DSDL: Crossover for Agentic AI Workflow

By Huaibo Zhao

Background

Recently we released the Splunk App for Data Science and Deep Learning (DSDL) v5.2.0. This update introduced new features for integrating large language models (LLMs) and retrieval-augmented generation (RAG). With DSDL v5.2.0, users can easily perform LLM prompts, vector searches, RAG, and function calling directly from the app's dashboards. The features come with predefined scripts within the DSDL container, allowing organizations to quickly start using LLM with their own knowledge data and indexed data in Splunk.

However, if analysts want to customize workflows to better suit their needs—such as asking an LLM to select a vector collection and then perform RAG based on that choice—they'll need to modify the Python scripts in the container or create complex SPL commands with multiple Fit commands. This approach can be complex and lacks a simple, visual interface for end users to design and test custom workflows, which can limit the practical application of the LLM-RAG features.

Fortunately, Splunk SOAR stands out as a key player in the field of automation and orchestration, providing an intuitive user interface for designing and testing workflows. In this blog, we will demonstrate how to leverage SOAR as a platform for designing and executing agentic AI workflows using the functionalities offered by DSDL.

Architecture

The architecture of our proposal is illustrated in the figure below, where agentic workflows are created as SOAR playbooks using basic building blocks called utilities like LLM Prompt, Vector Search and Function Calling. These utilities are powered by custom functions within SOAR, which make FastAPI calls to the DSDL container to perform the respective operations.

In addition to the GenAI utilities, existing SOAR tools can also be integrated into the playbooks, enabling actions based on the LLM's observations and decisions. This integration unlocks vast potential for developing sophisticated agentic AI workflows.

architecture

In the rest of this blog, we will delve into the details of creating custom functions and provide a playbook example.

Custom Functions

Custom functions on SOAR allow users to define Python functions to be run as utilities, which are a type of basic building unit of a playbook. In this crossover, we’ve implemented four custom functions for agentic AI workflows as follows:

Vector Search: Conducts semantic searches against vectorDB collections to obtain contexts.
LLM Prompt: Prompts the LLM with a formatted query and context to obtain a response.
LLM Function Calling: The LLM answers queries by executing a set of function tools, including searching Splunk.
LLM Decision Making: Prompts the LLM to make decisions based on a set of labels defined by the user.

The first three functionalities are supported in DSDL 5.2.0, while for the LLM Decision Making function, we added a new script in the DSDL container. (This script will be built in and available in the next DSDL release). To use these functionalities in SOAR, we implement each custom function as a FastAPI call to the DSDL container for the corresponding script. The input parameters of the custom functions include the input parameters required by the DSDL script, as well as the endpoint and API token of the DSDL container FastAPI.

Below is an example code for the LLM Prompt custom function.

Python
def llm_prompt(query=None, model_name=None, api_endpoint=None, api_token=None, llm_service=None, system_prompt=None, **kwargs):
    ############################ Custom Code Goes Below This Line #################################
    import json
    import phantom.rules as phantom
    import requests
    import csv
    from io import StringIO
    url = f"{api_endpoint}/fit"
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Content-Type": "application/json"
    }
    if system_prompt:
        system_prompt = system_prompt.replace('"', '').replace('\n', '')
    else:
        system_prompt = "You are an expert Q&A system that is trusted around the world. Always answer the query using the provided context information and reasoning as detailed as possible"
    data = {
        "data": f"text\n\"{system_prompt}\"",
        "meta": {
            "options": {
                "model_name": "llm_rag_ollama_text_processing",
                "params": {
                    "algo": "llm_rag_ollama_text_processing",
                    "llm_service": llm_service,
                    "model_name": model_name,
                    "prompt": query 
                }
            }
        }
    }
    # Send POST request
    outputs = requests.post(url, headers=headers, json=data, verify=False).json()
    df_data = outputs['results']
    df_data = StringIO(df_data)
    csv_reader = csv.DictReader(df_data)
    for row in csv_reader:
        outputs["llm_response"] = row['Result']
    assert json.dumps(outputs)
    return outputs

The request is sent to the FastAPI endpoint of the DSDL container. The payload includes the name of the script under the key "algo" along with other input parameters. Once the result is returned, it is parsed and assigned to the output variable "llm_response" of the custom function.

The other custom functions are implemented in a similar fashion. All the custom functions are available in this Github repo. The required input parameters as well as the algorithm names can be found in the DSDL 5.2.0 documentation.
NOTE: The parameter "llm_service" used in this example is a newly introduced parameter in the future DSDL release. For the current parameter requirements, please refer to the Fit command in DSDL 5.2.0 Documentation.

Playbook

Based on the custom functions, we have created an example playbook of an agentic AI workflow with multiple decision points, illustrated in the figure below.

playbook

This playbook processes natural language queries and routes them to different agents based on the use case of the request.
It covers three scenarios:

Case 1: Inquiry about the current Splunk service status
Case 2: Q&A on general Splunk knowledge
Case 3: Q&A about the fictional Buttercup online store

The first LLM Decision Making block of the playbook determines whether the query is related to Splunk (Case 1 or 2) or Buttercup store (Case 3). For Splunk-related queries, the workflow routes them to the left side, where a second LLM Decision Making block identifies whether the query requires real-time data (Case 1) or static knowledge data (Case 2).

Case 1:
If real-time data is needed, the Function Calling block is executed, and the LLM uses Splunk search tools to gather the necessary context. The LLM then generates a final answer based on the outputs from these tools.

Case 2:
When knowledge about Splunk is required, the LLM selects the appropriate vector collection from the following options: splunk_platform_knowledge, splunk_enterprise_security_knowledge and splunk_itsi_knowledge. Each collection contains product-specific knowledge, and a vector search is conducted based on the LLM's chosen collection. The query and vector search results are then sent to the LLM to create a summarized answer.

Case 3:
If the first decision block routes the query to the Buttercup agent, the LLM selects the most relevant vector collection from buttercup_dev_knowledge and buttercup_support_tickets. A vector search is performed, and the LLM answers the query based on the search results.

In this playbook example, the output of each step is recorded in notes associated with the input event on SOAR. Next, let’s explore three examples of how this workflow operates.

Examples

Example 1:
In the first example, the input query is: "What indexes are there in my Splunk?"

The results of the workflow execution are illustrated in the figure below. The left side displays the final answer from the LLM, while the right side outlines the steps of the workflow execution.

Based on LLM decisions, the query was routed to the Function Calling block. The list_indexes() tool was executed, and the LLM generated the final answer based on the output from this tool.

Example 1

Example 2:
In the second example, the input query is: "What CLI commands in Splunk platform show service ports?"

Based on LLM decisions, the query was routed to the Splunk agent and then the knowledge_data context type. The splunk_platform_knowledge collection was then selected and searched against. The LLM generated the final answer based on the output from the vector search.

Example 2

Example 3:
In the third example, the input query is: "Has there been payment issues and how were they resolved?"

Based on LLM decisions, the query was routed to the Buttercup agent and then the buttercup_support_tickets collection was selected and searched against. The LLM generated the final answer based on the output from the vector search.

Example 3

The above examples demonstrate how this agentic AI workflow created on SOAR handles different scenarios based on the natural language input and orchestrates various tools to obtain contexts for accurate LLM generations. With SOAR, the creation of the playbook was simple and intuitive, with the capability to test at each step and export for sharing.

Conclusion

In this blog, we explored how SOAR can serve as a platform for designing and executing agentic AI workflows using the functionalities provided by DSDL. Additionally, DSDL can enhance SOAR playbooks by improving decision-making and data enrichment processes. For instance, in security use cases, LLM agents can gather the latest threat intelligence, assess the urgency of incidents, and use SOAR tools to take action. The integration of DSDL and SOAR unlocks a wide range of possibilities of using AI capabilities within your SOAR workflows.

Acknowledgement:
I would like to extend my gratitude to my collaborators, Mitchell Chan and Philipp Drieger, for their contributions to project development and use case discoveries. Special thanks to Hidekazu Fujimori for generously sharing his expertise in SOAR.

Huaibo Zhao

Huaibo is a GTS Global Solution Architect focused on ML & AI and an ex-Splunktern, with a MEng degree from Waseda University. Since July 2022, he has been working on the integration of LLM in Splunk DSDL, establishing close connections with customer use cases.

Artificial Intelligence 7 Min Read

Accelerating Security Operations with Splunk and Foundation AI’s First Open-Source Security Model

Cisco Foundation AI’s Foundation-sec-8b model brings a new wave of innovations and efficiency to security operations. As a purpose-built, open-weight Large Language Model (LLM) designed specifically for cybersecurity, Foundation-sec-8b enables security teams to act faster, reduce fatigue, and scale operations without compromising accuracy.

Artificial Intelligence 3 Min Read

From Zero to LLM-Hero: Plan, Architect and Operationalize your AI Assistant in Splunk

In this blog post we are going to show you how you can connect Splunk data with LLMs to interact with them, based on the way Zeppelin, a global leader in sales and services for construction machinery, power systems, rental equipment and plant engineering, achieved this.

Artificial Intelligence 4 Min Read

What's new in Splunk DSDL 5.2: LLM-RAG functionalities and use cases

In this version 5.2 release of Splunk DSDL, we introduce new functionalities utilizing retrieval-augmented generation with local large language models and a vector database. In this blog, we provide an overview of various use cases supported through a set of DSDL commands and dashboards.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram