Splunk Demo for Operational Intelligence

Watch this demo to see how data can be leveraged across multiple business areas to investigate issues, optimize operations, and obtain greater Operational Intelligence.


Video Transcript


Splunk is the platform for machine data that can be applied to many different solution areas. But what is machine data? If machine data has one thing in common, it's that it comes in many different formats. Some of it's structured like these key value pairs, and a lot of it is unstructured, or not yet structured. And all of this different machine data has applications across all of these multiple solutions areas.

I just got an email from my exec. It looks like there's something he wants me to review urgently. Looks like we may have a problem. Let's investigate. We'll start by looking at the executive view. This is just a dashboard that gives us a high-level view into what's going on across all of our solution areas, so we can see how applications are performing, what kind of security events are going on.

But it looks like he was very concerned about our SLAs. Are we violating one of our SLAs agreements. So we'll click and drill down into our IT ops page and see what's going on. So on the left, we can see our resources, how our network storage, et cetera, are performing, and on the right, we can see how our applications are performing. And based on how those are performing, are we meeting SLAs?

It looks like we're having an issue with our web store. We're failing a lot of transactions, and our customers can not be happy with this response time. Let's drill down and see what's going on. Something's happening with our storage. We can see one of our storage servers, our NetApp for our database, has an extremely high latency. Since it's for our database, let's see how the database application is performing.

We can see a direct correlation. When the NetApp got slow, we had a couple of queries running that were really expensive. What were these queries and who was running them? Using Splunk Stream, we can actually take data off of the wire so we can see what queries were running and who's running them without affecting the database. So we can see these top two queries are taking a long time to run, and it looks like a bad user is trying to exfiltrate data.

This is a security concern, so let's jump over to our security view and figure out what's going on and where it's coming from. From the security view, I can see the authentication activity, notable events, which can be anything like nine failed passwords followed by a success, or a malicious attack on our infrastructure. And on the right, the network activity.

So we can see that there are a lot of attacks going on, and we can see exactly where they're occurring. It looks like there's a big circle over here. We're being attacked by a specific country. It looks like they're doing brute force and SQL injection attacks. Let's drill down into this attack by clicking on this top notable event, the SQL injection attack, and look at the raw data.

These are the raw events, and it looks like an IP has triggered that we're having an HTTP SQL injection attack from this source IP. So let's go ahead and click on this IP and open up a new search and see what this specific host is doing, and sure enough, we can see there are a lot of access logs. So they must be hitting some website or endpoint. Let's see exactly where they're attacking.

Let's extract a new field from these events. Here's the URL they're hitting, so let's just select that and say this is their attack vector. Now that we've created that field, we should be able to run the search and see this data in our events. So I have a new field on the left-hand side called extract, so anywhere someone's running a select statement, they're doing a SQL injection attack and I can see exactly what endpoints they're attacking.

And it looks like they're attacking our mobile website. Now Splunk is extracting these fields on the fly. That's how I'm able to add a field at anytime. Since they're only attacking our mobile endpoint, let's go ahead and look at our mobile application and see if we can figure out if we have a vulnerability in our mobile app. We can start by looking at our application delivery view, which shows us how development is going versus operations.

So in development, we care about our staging, our production, our builds, and our tests. So how are our tests going? What devices have we had a lot of errors on? It looks pretty much the same across the board, but what's having the most errors? It looks like it's a particular version or phone. So this was having a lot of errors in our test and build environments, let's see if it's having any issues in our production environment.

We can see that they're definitely having an issue. Using data from Splunk MINT, mobile intelligence, we can see that the performance of our users has gotten a lot slower. The latency has increased and the error rate has also increased. Let's see if it relates to the app version.

Who is using this version 4.6 that we know has had a lot of errors in tests? Well, let's click on the app version 4.6 and Splunk can show us exactly who is using this application and what they're doing. So here we can see that malicious user and exactly what they're doing. They're actually getting SQL injection from using the mobile endpoint. That's not good.

I wonder if all this latency increase in error is affecting our customer sentiment. Let's take a look at our business analytics dashboard and see how our customers are perceiving this. So from the business analytics page, we can see not only our customer experience and drill down, but we can also see our product analytics, our revenue in real time. Let's drill down though on customer experience.

Here I can see the affected users and their value. So some of my customers are more valuable than others, let's just see what high-value customers are affected. Now, we can directly contact these high-value customers and let them know, hey, we're on it. We know there's an issue.

But I also notice that our operational costs seem to have increased a little bit at a certain point in time, and I think around this time is when the database increased, so let's drill down. Most of our operational costs actually come from power usage in our facility, so let's drill down to the internet of things dashboard so we can see from a power perspective what's going on.

Right away, I can see that we have a hotspot in our data center in RAC 2, and we can also see that that RAC, RAC 2, is using more power than anything else and skyrocketed in power at the same time everything else happened. Let's click on this RAC and see exactly what systems are running in that RAC. There's our NetApp, which we know spiked, and our SQL database.

So I can actually drill down to the raw data from here and see every piece of data that's being generated by everything that's in this RAC. I can see all of the data being generated, not just by the servers themselves and the server logs in this RAC, but also the applications running on top of those servers and the sensors embedded into the RACs in the rooms themselves.

So by a simple search of RAC equals 2, it brought all this data back for me. And nowhere in this data does it actually say RAC equals 2. So Splunk was able to enrich this data and allow me to search on something like RAC equals 2 and see that data. So these 5,000 events are still really a lot to look through, so let's try the patterns tab and narrow this down.

Instead of looking at 5,000, 6,000 events, now I only have several patterns to look through, and right away, I can see that there is my high latency, and right below it, there's our hacker doing the malicious query. So we could have just viewed the patterns tab instead of going through all those dashboards. Very quickly, we were able to find what was happening to our infrastructure and why, and that it was of malicious intent.

So we were able to see how Splunk can take raw machine data, allow you to search, analyze, and visualize that data, solve problems, and see how it's affecting the business, and importantly, this machine data isn't only valuable to one area of the business. It's valuable to multiple areas of the business. Splunk is the platform for machine data. The platform for machine data is what allows this data to be used for true operational intelligence.