Splunk is often used by financial services customers for the usual indexing, searching, reporting, and analysis of any type of textual IT data. This may include monitoring devices, investigating login attempts, making sure an application is up and running, or centrally searching for data via various log files. As users have become more familiar with the power of Splunk, they have started to use it to monitor, investigate, and report on the business aspects of their operations. What follows is a non-exhaustive discussion of use cases where customers in financial services can further their utilization of Splunk. It is hoped that this provides insight into garnering more value from your data, which is often a theme of my blogs.
I have already mentioned in past blog entries that Splunk is being used to track the status of a trade. In fact, the status of the trade not only includes its position in the business life cycle, but it also may include which application last touched it. This provides enormous insight for troubleshooting and application capacity planning. For instance, Splunk’s transaction command, which is used to group trades by patterns or ID’s, returns as a by-product the total duration from the first event to the last event in each grouping. This calculated duration field can be interpreted as one kind of latency for the trade. Suppose, we were to use Splunk to graph the average latency (duration) per trade using the duration field over time. After creating this report, we may take some time to rewrite parts of the system and again graph the average duration over time to see if the rewrite improved the overall results. An example graph may look like this:
As you can see, not only did the rewrite not make a difference, but it appears that the average duration over time after the rewrite has gotten a little worse. At first it may seem that the developers may not have done a good job. However, we may want to use the same technique to graph the average volume of trades over the same time period. If this has gone up and the average duration has gotten worse, then the next step may be to look at hardware and OS resources. The Splunk Unix/Linux or Windows App may have been running on these machines at the same time and they would provide similar graphical views into CPU and memory utilization. Within minutes, an investigator can conclude that more volume is using more machine resources, which means more capacity may be needed to lower the overall latency. Splunk’s versatility to provide business metrics (average latency and average volume) and operational metrics (CPU and memory usage) through the data that it indexes enhances the original use case, which began with searching for the status of a trade.
Speaking of the status of a trade, a more general question would be the status of a FIX (Financial Information eXchange) order. In financial systems, in order to optimize electronic trading, a protocol called FIX was created for the exchange of securities and providing low latency transactions. Without getting into the details for how FIX works, I’ll instead provide an example for how this could be used in Splunk. The contents of the FIX message can be indexed into Splunk and Splunk can be used to provide ad-hoc search capabilities for status, troubleshooting, reports, and alerting. Each FIX message is rather cryptic for ordinary humans to comprehend. Here’s a sample:
8=FIX.4.2 | 9=178 | 35=8 | 49=PHLX | 56=PERS | 11=ATOMNOCCC9990900 | 52=20071123-05:30:00.000 | 20=3 | 150=E | 39=E | 55=MSFT | 167=CS | 54=1 | 38=15 | 40=2 | 44=15 | 58=PHLX EQUITY TESTING | 59=0 | 47=C | 32=0 | 31=0 | 151=15 | 14=0 | 6=0 | 10=128 |
Not only is the numerical tag=value an obscure format to grok, but there are also hidden binary ^A characters in the message that Splunk substitutes as \x1 in the web presentation of the message. A discussion of how to extract these fields is in Splunk Answers. Speaking of Splunk Answers, user Glenn Sinclair, from the UK, was kind enough to let me test his custom written Splunk command he called translatefix, which takes FIX events as input at search time and translates them to a mnemonic human readable format more suitable for the average lay person. This lets Splunk provide business meaning to the numeric tag=value notation giving operations people at financial centers a powerful capability to interpret their data. Here’s an example of a pie chart I created using Glenn’s translatefix command.
Notice what this does is provide human readable text for the FIX protocol tag=value. I used this to find the top country currency conversions (the derived Symbol field) that was in my sample data. By indexing FIX messages, Splunk provides enhanced visibility to financial services operations that use this protocol.
SWIFT and Payments
Once orders have moved out of the front and middle office, payments may have to be sent to clearing systems. One of the most well known uses of payment interaction is the SWIFT (Society for Worldwide Interbank Financial Telecommunication) Gateway. SWIFT operates as a financial messaging network to exchange messages between banks and other financial institutions. Messages are sent to queues picked up in different locations worldwide. The duration of a “transaction” is no longer measured in mere milliseconds and worldwide networks introduce a latency of their own. As before, Splunk can be used to trace the flow of messages and reduce the mean time to resolution for complex troubleshooting. A vanilla diagram for how Splunk can be used at the institution sending messages to SWIFT is depicted below:
Because each message is indexed into Splunk, along with application data, business statistics may also be gathered from the same data.
- Average messages sent per day
- Number of payments cleared per day
- Number of payments not cleared per day, which could provide alerts. (A Splunk report can provide green, red, and yellow light indicators for these numbers)
- Total cost of commission for payments
- Top institutional customers
Notice how Splunk as a reporting system can provide reports and dashboards for this data again giving greater visibility to the enterprise with the same tool that is used for operations support.
As all these messages are flowing through different electronic trading systems, let us not forget that humans can also play a role in this complex web of algorithmic trading, FIX orders, and payments. For instance, Splunk can easily track the average volume of trades initiated by a trader in a given department.
Notice how the orange line has considerably higher peaks than the other lines. This may indicate nothing at all or if it is consistently happening in lock step for the same orange line in past data, then it may mean something that could be worth investigating. This, along with many other statistics that are in the data, can be used for a variety of investigations.
- Efficiency Analysis
- Fraud Tracking
- Insider Trading Suspicion
- Compliance Reporting
- HR Inquiries
The data as a whole can provide insight into behavior tracking and alerting, which again improves the efficiency of the enterprise.
As in past blog entries, I have tried to show scenarios from indexed data that can help in decision making by using Splunk to produce business metrics. Not all of these use cases are deployed by financial institutions today, but it is hoped that the vision of this article gives you the road map to employ these strategies in your own business. We are going beyond IT search here to the everyday use of Splunk for operational intelligence for financial services.