In a world with increased regulation, uncertainty in the banking business due to the climate or unforced errors, and liquidity concerns, the capability for risk management departments, auditors, and compliance departments to have timely access to reports and data that drive their decisions becomes more important than ever. Saying that you have enough data points is like saying you have enough security. Enough data is not the answer, but efficient utilization of what is available in a timely manner is always the goal to be more compliant and reduce risks.
In the past, we wrote about ways to mitigate reputational risks from a security perspective, which was, hopefully, helpful. Today, I would like to elevate that approach by suggesting ways to incorporate risk scores or assessments into anything that provides decision makers one more data point to make correct decisions. Although the examples and subject matter may focus on the financial services industry, as always, the concepts transcend industry. Today’s blog entry will illustrate the concept of monitoring any type of business transaction, supplying a risk score to the aggregate of the transactions, and showing the results in transactions for relevant departments.
Before I get to my point, it may be good to start with a simple example to illustrate the concept. Suppose you are running a payment service and want to monitor the time series data generated by transactions every time a customer makes a payment. Moreover, you want to monitor the top customers in terms of the number of payments they make in a week. This sounds simple enough as a stats count by customer command in Splunk Processing Language (SPL) can easily do this.
At first glance, this seems mundane enough as it shows a list of customers who made the most payments. Suppose there was a control in place that stated it is expected that the top 10 customers each make at least six payments per week on this report to indicate that the business is healthy, otherwise there is a danger to the business. In our example, the indicator would register a big red risk score next to the report.
Now, the report becomes more meaningful to decision makers as it not only has a status indicator next to it, but it shows a number or percentage that indicates the degree of the issue. In practice, Splunk may also generate an alert to interested parties about the risk condition leading to a more proactive response to the business issue.
Payment Response Times
Sometimes, the risk condition above may be caused by the economic climate, but at other times, it may result from operational issues. Suppose the payment system fluctuates wildly day after day or even hour by hour comparing one hour to another. This may result in driving away potential customers to delay their payments. With the application response time data in hand, one can simply issue a Splunk timechart avg(ResponseTime) as ResponseTime for any time period and also compare one period to a previous one from the previous day using the timewrap command. This may show the variation and lead to a root cause for the risk issue. Clearly, in the picture below, there were unusual spikes the day before in the four-hour window at 3:30 PM. We could calculate risks here using deviation from the mean for, but visually seeing the dashboard tells us much.
Up to this point, what we have shown are what I call “IT based risks”, which are very important, but they are part of operations and not necessarily something the outside world of the company enduring the risk would know. Risks that have to do with regulations are much more visible as governments would now be involved. For example, suppose sanctions are put against a country, and it is forbidden to perform wire transfers to this country. These are now government regulations that will go under outside scrutiny as auditors will want to have time series reports on where money was transferred. Using Splunk and tracking the destination of a wire transfer from the time series events generated by the application could produce something like this on a map.
Just by looking at the dashboard would tell a compliance department about violations. Alerts could also be invoked as the violation happens. Notice in the past, we would be left with the report of the wire transfers and map, but in the spirit of today’s blog entry, there is a risk score associated with the report. The risk score percentage could be based on the number of violations per the last time period all the way to the second when it happened. This helps make quicker decisions on what to do next (such as lock down the account and investigate the code that allowed the transfer or the threat list that omitted the sanctioned entity) rather than wait for an auditor to view this days later.
Another risk score that is just as relevant is one that monitors the stability of the business. Again, an example would show this more clearly. Suppose we had a credit card Glass Table that shows Key Performance Indicators (KPIs) for the business of credit card payments. A Glass Table is a Splunk concept of an image with KPIs next to any relevant place in the image. One of the Glass Tables’s KPI could be a risk score calculated based on fraud attempts. To show an example of this, I borrowed a Glass Table from Splunker Marc Serieys and added a risk score.
Notice a Risk percentage that is calculated for the initialization of the transaction, and in this case, it is based on the fraud alerts in the last hour. Not only would a compliance or risk department want to know about this, but the fraud department would also be very interested in this report.
To shift industries, I borrowed another Glass Table from Mark and added a risk score KPI for a claims processing report.
Observe that the risk score is a percentage again, but this time it is affected by the number of outstanding claims that have not been processed in the last 48 hours. In this situation, the compliance department would like to know the number of claims that missed SLA deadlines, while the risk department would like to know the risk associated with the issue.
Putting it All Together
Several examples have been providing risk scores with the reports that the Financial Services Industry is used to reviewing. The scores can be computed in near real-time and also trigger alerts based on the threshold conditions. To make this more seamless, a Security Automation Orchestration and Response (SOAR) system can automate responses to these alerts with playbooks to mitigate issues and further reduce risks. How you calculate the risk score is up to you, as Splunk allows for arbitrary math to be performed with metrics. Examples of ways to influence risks scores could include
- Outside of several degrees of standard deviation
- Ratios of what is considered a bad metric over a good metric
- Exponential additions of (1, 2, 4, 8, etc) of scores to similar events such that the most frequent violations bubble up in reports.
- Strong deviations from average, mean, or mode
- A summation of inbound and outbound metrics to make sure differences in sums is positive. Liquidity use cases come to mind.
- A machine learning model of what is considered good and applying it to a real dataset that shows deviation from what is considered good
We can keep going and add as many ways to do the math as needed, but it’s ultimately up to you to choose and calculate the appropriate risk score for the situation.
I envision this being applied to three broad areas where risk scores will help the business.
- IT Compliance
- Internal Compliance
- Regulations Compliance
IT Compliance deals with the internal rules that the IT department puts into place to protect the business. An example is that all desktops will have an antivirus installed and running. The number that are not in compliance, and the percentage not in compliance may represent a risk score. Internal Compliance is the set of controls that are in place to protect the firm. For instance, all bank employees must sign a code of conduct. Regulations, of course, are all the government mandated rules that must be followed. This means reports must be created and readily available to show auditors the compliance directive is being followed. The risk scores help the internal department analyze what regulations have risky behavior associated with them. It is better to discover this upfront to fix it before it becomes unfixable.
How do we put this all together as it seems like a series of disparate reports? I propose the creation of a Compliance Fusion Center, virtual or physical, that the compliance and risks department have access to at any time. This is a virtual or physical room full of monitors that show the summarized reports that affect all phases of compliance, and each one has an associated risk score that is calculated by the business for self-governance. Alerting and automation can be built into the process.
In an industry where regulations increase frequently, creating corresponding new vectors of risks for missing compliance mandates, using time series data to create reports in near real-time elevates the game for compliance departments. Rather than wait hours or days, decisions can be made quickly, and risks can be mitigated. By associating risk scores to FSI transactions, we create a single pane of glass to not only view the KPIs of the transaction, but also any perceived risks that may be associated with the activity. This ultimately helps the business and makes risk management more manageable. Splunk products, due to their very nature of collecting time series data at scale and providing just-in-time dashboards, alerting, and automation, are in the perfect position to take advantage of this proposal.