From Data Pipes to AI Intelligence: The Evolution of Splunk-AWS Integrations
Artificial Intelligence Sahil Gupta , Greg Ainslie-MalikKey takeaways
- Splunk and AWS have evolved from basic data collection to fully integrated, AI-powered operations that bring advanced analytics into everyday workflows.
- New capabilities in Splunk’s AI Toolkit allow teams to use powerful AWS AI models directly and get faster, more relevant insights with real-world context.
- This integration helps organizations work more efficiently by speeding up analysis, reducing response times, and scaling AI without added complexity.
This blog was developed with contributions from Alan Peaty at Amazon Web Services.
I'm a Splunk specialist working daily with observability and security platforms, often bridging AWS cloud infrastructure with enterprise analytics. Splunk's AI Toolkit 5.6.4 introduced direct Amazon SageMaker inference and Amazon Bedrock Knowledge Base Retrieval-Augmented Generation (RAG) integration—capabilities that cap a decade-long evolution from basic log ingestion to AI-native operations. This blog traces that journey, highlights the breakthroughs, and shares practical implementation insights for teams like mine.
Splunk-AWS Integration Evolution Timeline
The Foundation: Data Ingestion Era (2012–2020)
The Splunk-AWS partnership began in 2012 when Splunk joined the AWS Partner Network (APN). By 2013, Splunk Cloud launched natively on AWS infrastructure, establishing the foundation for hybrid cloud visibility. The Splunk Add-on for AWS enabled pull-based collection from AWS APIs. Push-based streaming capabilities were added via Amazon Data Firehose in December 2017 to Splunk's HTTP Event Collector (HEC). This created comprehensive Amazon CloudTrail audit logs, Amazon CloudWatch metrics, Amazon VPC Flow Logs, and AWS Config data ingestion—but kept machine learning siloed within Splunk's boundaries.
Key Limitation: No native ML interoperability. Data scientists built models in SageMaker but had no straightforward path to operationalize them in Splunk environments. Splunk handled analytics; AWS managed infrastructure. They remained separate domains.
ML Expansion: Bridging the Gap (2018–2025)
The Machine Learning Toolkit (MLTK) evolved significantly during this phase. Splunk introduced the MLPClassifier neural network algorithm in MLTK version 3.4.0, expanding the toolkit's deep learning capabilities. MLTK 4.x versions followed, introducing the Smart Forecasting Assistant and expanding algorithm support.
The critical breakthrough came in February 2023 with MLTK 5.4.0, which enabled upload and inference of externally-trained ONNX models—a standard open source ML model format supporting TensorFlow, PyTorch, Keras, and MATLAB. This meant data scientists could train models in SageMaker, export them as ONNX files, and manually upload them to Splunk for inference. While progress, the workflow remained cumbersome: opset compatibility issues, version management & manual model upload challenges, and inference execution still bottlenecked on Splunk search heads.
AWS integrations expanded in parallel: Amazon SageMaker Canvas launched November 2021 as a no-code ML development service. By 2024, AWS published best practices for using SageMaker Canvas with Splunk data to build and deploy models. Early Bedrock integration emerged before formal documentation, with MLTK 5.6.0 (May 2025) providing the first official Amazon Bedrock support for generative AI within SPL searches—enabling foundation model integration directly within SPL searches.
Persistent Challenges: Complex model inference overloaded Splunk search heads. Bedrock outputs provided generic responses lacking organizational context. The Model training pipeline remained ML-scientist-centric, not operationalized for production security/ops workflows.
The AI-Native Leap: Splunk AI Toolkit 5.6.4 Breakthroughs (November 2025)
Released November, 2025, Splunk AI Toolkit 5.6.4 fundamentally reshapes the architecture—from ML consumer to inference orchestrator.
Product Evolution Note: Starting with version 5.6.3 (September 2025), Splunk formally renamed the Machine Learning Toolkit (MLTK) to the Splunk AI Toolkit, reflecting the platform's evolution from traditional ML algorithms to comprehensive AI orchestration including generative AI, RAG, and cloud-native inference capabilities.
SageMaker Inference Endpoints: Unlimited Scale for Custom Models
For the first time, organizations can invoke SageMaker-hosted models directly via SPL without model export or manual upload. The apply command now supports runtime=sagemaker, enabling real-time inference on arbitrarily complex models (TensorFlow, PyTorch, proprietary algorithms) while offloading ML inference compute to AWS managed infrastructure.
Technical Setup:
-
Deploy custom model to SageMaker endpoint
-
Create IAM role with
sagemaker:InvokeEndpointandsts:AssumeRolepermissions -
Register endpoint in AI Toolkit > Models > +SageMaker tab
-
Configure feature mappings (JSON/CSV), batch size (1–10,000), Open API schema
-
Invoke via SPL:
-
| apply sg_model_name runtime=sagemaker features="feature1,feature2,feature3"
-
For step-by-step AWS instructions, visit the AWS GitHub repo.
Impact: Predictions flow directly into dashboards and alerts. GPU-intensive inference can scale on demand without search head provisioning. Organizations achieve 70% faster model operationalization cycles.
Bedrock Knowledge Base RAG: Context-Aware Intelligence
Bedrock foundation models gain organizational context through Retrieval-Augmented Generation (RAG) connected to Bedrock Knowledge Bases. The ai command's new kb_id parameter transforms generic LLM responses into precise, environment-specific guidance.
Technical Requirements:
- Splunk AI Toolkit 5.6.4 with Predictive Analytics for Splunk Common (PSC) add-on version 4.2.4+
- Bedrock Knowledge Base with runbooks, incident post-mortems, internal procedures ingested
- PSC add-on version 4.2.4 or higher
-
apply_ai_commander capability assigned to user role - IAM credentials with AmazonBedrockLimitedAccess policy + Knowledge Base read access (bedrock:Retrieve permissions)
SPL Example:
| inputlookup alert_entries.csv
| table timestamp source message
| ai prompt="find and summarize resolution specific to the Alert: {message}"
kb_id=FZITBEFHT1
Without kb_id: "Restart the pod and check logs."
With kb_id: "Log into bastion (IP: 10.x.x.x), execute /usr/bin/deploy-rollback.sh, notify Slack #incidents, create Jira ticket INFRA-1234."
Impact: Reduces Mean Time To Resolution (MTTR) by embedding institutional knowledge directly into incident response workflows.
Evolution Timeline Verified
Real-World Impact for Enterprise Teams
Security Operations: RAG-powered intelligence transforms Splunk alerts into auto-resolved incidents. A detection for "lateral movement" triggers the ai command with your environment's Knowledge Base, returning: bastion IP, approval workflow links, escalation Slack channel, and incident template—all embedded in an automated response.
Operations Engineering: SageMaker endpoints enable real-time anomaly detection at scale. Stream CloudWatch metrics through custom SageMaker neural network classifiers; predictions feed directly into Amazon EventBridge auto-remediation workflows. No search head GPU provisioning required.
Cloud Economics: Pay-per-inference via SageMaker vs. fixed Splunk infrastructure overhead. During peak fraud detection periods (Black Friday, cyber events), SageMaker scales elastically while Splunk search heads remain optimized for querying and aggregation—aligning costs with actual demand.
Conclusion
The Splunk-AWS story—14 years from APN partnership in 2012 to AI-native operations in 2026—demonstrates how platforms evolve together. What began as operational visibility became ML-enabled analytics and now approaches true intelligence orchestration. Organizations no longer choose between Splunk's data platform and AWS's ML infrastructure; they integrate them seamlessly, offloading inference to AWS while maintaining Splunk as their operational command center.
Splunk-AWS story proves platforms evolve together. What's next?