At the end of last year, I wrote about using Splunk to monitor the Know Your Customer (KYC) use case that is a regulation in most Financial Services Institutions in many countries. The last part of the regulation states that continuous monitoring of your customers in terms of their interactions and transactions needs to take place.
In any bank, there are many types of transactions covering various things such as core banking, ATM, wire transfers, credit card use, payments, etc. Every application involved in these activities produces its own time services log data that is used for troubleshooting, security tracking, and analytics. Let’s revisit the example I presented last year. Suppose we are only monitoring a core banking feature for deposits and withdrawals for each customer. The simplest possible representation of this can be given with this example table, which is from the previous KYC blog.
What was suggested last year is to use the Splunk stats command to find the average amount per account ID for every entry and then find any account that is more than N standard deviations from the average of the entity itself. For instance, if account ID 123 usually has an amount around 50 and then suddenly transacts 10,000, this would be an outlier that would easily be found. Yes, we can do this exercise on paper, but with a million accounts and monitoring each account separately requires continuous monitoring. We can then collect the outliers per account in a risk index and score them accordingly for further analysis.
Easily Operationalizing the Approach
Everything I suggested above still applies, but we recognize that not everyone knows Splunk Processing Language (SPL) or how to effectively collect this generated data per entity into a Splunk index.
Fortunately, Splunkers Rupert Truman and Josh Cowling created a free Splunkbase application called the Splunk App for Behavior Profiling, which can automate the KYC use case as long as we have the data for each functional banking domain. To continue the discussion with our example, let’s use their app, which is web driven. The only SPL I’ll use is to search for all events for a given sourcetype. My data is fictitious and several years old, but it still illustrates the point.
In the web page, after searching for the data within a time range, we pick a field to group by, which in this case is the unique customer name and the field that is going to be monitored for outliers, which is the amount field here. Sample results for the search and fields in question are shown automatically by the web page to continue.
Next, we pick a statistical function for the amount field (average) and split it by each unique customer. We can also do the average in time span buckets such as every hour or day.
Finally, we save this as a rule to collect the data to find the average amount per customer over a given time period as a scheduled search.
After the data is automatically collected within a summary index, we can use the web interface workflow for the indicators to score for standard deviation outliers, which go to a scoring index to stack rank them. This automation can be done for each functional domain in the FSI world such as ATM, credit cards, payments, wire transfers, etc., which makes continuous monitoring an easier task. The app also provides screens to drill down and investigate any particular entity, which is the customer in our case. There is even a review section to mark if an entity’s risk scores have been reviewed making this useful for compliance checks for review.
This part of KYC is set up and ready to go thanks to this app.
Rupert and Josh’s app also has screens for using machine learning (e.g., probability density function) to find outliers within all entities, without having to learn in depth data science. The question may be asked, why not use machine learning to find out anomalies within the set of transactions for each customer? This is a matter of practicality because the way machine learning typically works is that it builds a model for a dataset to apply for future data. Building a million models for a million customers is probably an overkill. A more maintainable approach would be to cluster each customer by a segment such as transaction amounts. Some customers will be clustered as average amounts around 50. Others may be clustered with 500. Some may even be clustered with 500,000 as their typical amounts. Now, one can build a model per cluster and find outliers per cluster rather than for individual customers. This makes it scale better and an order of magnitude more manageable.
The KYC use case is an important banking regulation and continuous monitoring is the most vital part of it. What was discussed was an easier approach to operationalize monitoring each customer’s transactions, and hence their behavior for outliers. The Splunk App for Behavior Profiling can be used for a variety of FSI use cases where one is looking for anomalies within any set of entities or for each entity against themselves as well.