Detecting Credit Card Fraud Using SMLE
Organizations lose billions of dollars to fraud each year. For instance, the financial services sector projects losses to reach $40 billion per year in the next 5-7 years unless financial institutions, merchants, and consumers become more diligent about fraud detection and prevention. Splunk delivers integrated enterprise fraud management software that quickly defines behavior patterns and protects enterprise information from malicious actors. In this blog post, we’ll explore an ML-powered solution using the Splunk Machine Learning Environment to detect fraudulent credit card transactions in real time. Using out-of-the-box Splunk capabilities, we’ll walk you through how to ingest and transform log data, train a predictive model using open source algorithms, and predict fraud in real-time against transaction events.
ML-Powered Fraud Detection
Traditionally, organizations have deployed rule-based fraud detection systems. However, with advances in fraudulent methods, these rule-based systems which require manual set-up and longer processing times have led to poor customer experience and weakened the security. An ML-based approach significantly simplifies or even automates the process and addresses the challenges of evolving, more nuanced fraud patterns.
Build A Solution With SMLE
Let’s explore a supervised machine learning approach to building a fraud classification solution using the Splunk Machine Learning Environment (SMLE). We’ll break down the solution into four steps:
- Ingest data from data sources of choice with Splunk’s SPL2 query language
- Extract features and transform data using convenient SPL2 operators
- Use popular and easy to integrate open source ML algorithms to build a classification model
Extract insights from the results and build a dashboard for monitoring and detection.
The summary statistics confirm that the dataset contains labelled examples of fraudulent transactions.
What we’ll do next is identify which of the signals in each transaction record are significant in determining the probability of fraud. To do this, we’ll derive a correlation between pairs of features and plot a heatmap to give us visual cues.
Based on what we see in the heatmap, We see in the heatmap that the Class feature shows a high degree of correlation (lighter colors indicate higher correlation). This way, we can drop a few columns to simplify our dataset and set it up for our training step.
Step 3: Build a Classification Model Using the Historical Data We’ll use Scikit-learn’s RandomForestClassifier algorithm to build a classification model that classifies a transaction as fraudulent or not learning from the supervised dataset we’ve analyzed so far. In the example below, we show a Python snippet that implements this step. The model we build can be published and exported to the broader Splunk ecosystem as we’ll see in step 4.
Step 4: Predict Fraud Finally, we’ll deploy the trained model into an SPL2 pipeline that is capable of reading logs from a variety of data sources (S3, Splunk indexes, etc.) and applying the trained model to predict fraudulent transactions in real time. The snippet below shows the outcome of this step in the form of a table.
Beyond the capabilities we’ve demonstrated so far, Splunk’s AI/ML platform provides capabilities to build dashboards to identify these fraudulent transactions and set up automated workflows to ensure models are kept up to date with recently ingested data. Using SMLE’s MLOps capabilities, the deployed models and their associated operations can be managed and monitored from a single pane of glass.
End-to-End Solution with SMLE
We’ve demonstrated a solution for fraud detection above using SMLE (Splunk ML Environment), a platform for building and deploying ML at scale from within the Splunk ecosystem. By extending the features of Splunk that customers love with a suite of data science and operations capabilities, SMLE allows Splunk users and data scientists to collaborate on building solutions that involve a combination of SPL and ML libraries.
We are excited to offer a beta version of SMLE — sign up now to get started! To learn more about offerings and announcements from Splunk Machine Learning, check out these Splunk Blogs posts.
Wrapping Up
You got to see how to build a simple, real-time fraud detection solution with SPL2 and Scikit-learn in the SMLE platform. With a combination of powerful and easy-to-use SPL2 operators and flexibility of popular programming languages like Python, SMLE allows users to construct entire workflows with a sequence of SPL2 and ML operations. Stay tuned as we explore further use cases and highlight the range of capabilities that SMLE has to offer.
Interested in trying SMLE? Sign up for our beta program!
Curious about more SMLE use cases? Check out our last use case walkthrough on improving DevOps workflows and identifying anomalies on the stream.
This Splunk blog post was co-authored by Vinay Sridhar (main author), Senior Product Manager for Machine Learning, and Mohan Rajagopalan, Senior Director of Product Management for Machine Learning.
----------------------------------------------------
Thanks!
Mohan Rajagopalan
Related Articles

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ

Dashboard Studio: Token Eval and Conditional Panel Visibility

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Dashboard Studio: Spec-TAB-ular Updates
