Watch the Web Demonstration that includes a basic overview and demonstration of Splunk software.
Thank you for joining me today. My name is Kelly Kitagawa and I'm a sales engineer here at Splunk. What we're going to talk about today is an overview of Splunk, along with a demo. In terms of our agenda, we'll start with talking about what machine data is, why you care about it, followed by use cases of Splunk, and then, for the majority of the time, I'm going to be giving you guys a demo. After the demo, we will come back to the presentation and talk about the awesome Splunk community and how easy it is for you guys to get started with Splunk on your own.
On your commute to work or while you're running errands around town, I know that all of us are experiencing the connectedness of our world and how that connectedness has changed everything. We are now living in a world of machine data, and every industry, every business everywhere is experiencing the effects of digitization and change. Our world is in the midst of a massive change, and this technical renaissance or digital transformation has only just begun, and it's accelerating. Add to this, customer and end user expectations have never been higher.
So what is this machine data? It's the operational data coming from your servers, storage, applications, users, customers, and cell phones-- everything that is running your business. Whether it's on-premise, in the cloud, or both, machine data is the fastest growing, most complex, but also the most valuable area of big data, and organizations are creating and capturing more data every day. But these massive streams of data come in an array of unpredictable formats that are difficult to process and analyze in a timely manner by traditional methods.
So what can you do with this data? Well, everything you might expect, delivering service in IT ops, security, application development. But what you might not expect is that once you start correlating events from all of these data sources into one place, you'll see that it contains a categorical record of user behavior, cyber security risks, application behavior, service levels, and customer experience. And what if you can do all this in real time? You can respond more quickly to events that matter. It's all in there; you just have to listen to the data.
So what is Splunk's take on machine data? Our mission statement is to make machine data accessible, usable, and valuable to everyone, and this overarching mission is what drives our company and our product priorities. Now, let's talk about machine data and the operational excellence and business value that it can bring to your business.
So how it works. First, we start with your data sources, and on the left-hand side, you could see that your data sources could be on premise. It could be in a cloud like AWS or Azure, public or private. And we take your data sources and we bring them into Splunk. Now the thing about that is that Splunk can index any type of machine data that you have. As long as it's human-readable, we can bring in any source from any type. We are a universal machine data platform. And along with that, we can also ingest any volume.
The way that we license our product is how much data in gigabytes you're bringing into Splunk within a 24-hour period. So our smallest customers are ingesting 1 gigabytes of data per day, and our largest customers are ingesting petabytes of data per day. And once you have your data into Splunk, you can do ad hoc searching and agile reporting and analytics.
Machine data is complex and unstructured by nature, and tapping into it requires a new approach. The problem is traditional database and analysis systems were engineered to force-fit the data into a predefined schema to answer specific questions really well. But unfortunately, this creates an access to new data and questions that can be asked of the data.
So Splunk takes a disruptive approach by storing the data in its raw, original format and creates a schema at the last possible moment when the question is asked. So because of this, there are no limits to the questions that can be asked of this data. We call this schema on the fly, or simply schema at read.
Search-- you can search by keyword, field value pairs, or interactively explore your data.
Universal indexing-- this enables you to ingest any type of machine data, as I mentioned before. Data is never normalized or reduced to fit a predefined schema.
Now let's talk about the Splunk portfolio. At the bottom, you'll see the data sources, some that may be familiar on the other slides, and Splunk is your platform for operational intelligence. And we offer that in two different flavors, Splunk Enterprise, which is our on-premise product, and Splunk Cloud, which is a fully hosted, SaaS-based solution.
And another key to extending the value of Splunk are the over 1,100 apps found on splunkbase.com, which is our app store. And these are developed by Splunk, by our partners, and by our customers. And our premium solutions from Splunk apply real-time intelligence and rich domain-specific functions to manage your security posture, IT operations, and more.
Now, let's talk about the use cases for Splunk. We're investing heavily in solutions that make it easy for you to meet your goals across IT operations, application delivery, security, fraud, and compliance, business analytics, and lastly, internet of things and industrial data. Our customers typically start with Splunk to solve one specific problem and then expand from there to address a broad range of use cases.
First, let's start with IT troubleshooting and app delivery. Thousands of our customers use Splunk as the backbone of their IT operations to quickly troubleshoot IT issues and outages, improve system uptime, and support strategic initiatives like DevOps and continuous delivery practices. And with Splunk, you can reduce the mean time to resolution, monitor end-to-end services, because IT teams have visibility across the entire stack and developers can see real-time production data without having to access the actual production systems.
Now, on security compliance and fraud, over 40% of our customers leverage Splunk to secure their organization from brute-force attacks, to track down malicious viruses, investigate internal breaches, and analyze internal employee risk, and more. We're seeing from our customers that Splunk functions as a security nerve center, enabling them to centralize and standardize their security operations from detection, to investigation, to reporting.
And we say it's like a security nerve center because security organizations are putting Splunk at the center of everything, but also because of its ability to signal and orchestrate all of the components of your security stack, such as your firewalls, your web proxy, anti-virus, threat intelligence feeds, and so on. And regardless of the vendor, the form factor, or deployment architecture, we are the centerpiece that helps bring the security stack together and make them all smarter and work in tandem.
It's all about collaboration, and no other SIEM can come close to this. And with all these security capabilities and thousands of successful customers, it's been an honor to be recognized by Gartner as a leader in the SIEM Magic Quadrant for the last four years. Our rapid ascent reflects the customer traction we have and the value we deliver to our customers. With 40% year-over-year growth, we are the fastest-growing SIEM vendor in the market.
Now, most customers do use Splunk for IT or security, or both, but there are many other use cases, and I'd like to share two of the most important, business analytics and IoT. Machine data offers immense insights to marketers, product managers, business owners, for optimizing customer experiences, product definitions, or end-to-end business processes. And IoT is the hottest new frontier to benefit from big data analytics. Splunk customers are collecting data from control systems and sensors to monitor, secure, and optimize operations from the factory floor or warehouse to planes, trains, and automobiles.
All right, now let's get to the fun part. I'm going to give you guys a live demonstration of Splunk. On the screen here, you'll see the same areas that we discussed earlier in the presentation, and at the bottom, I'm going to review the architecture. If I scroll down here, you'll see the data sources at the very bottom. Some of them may look familiar to you. We have Linux machines, Windows boxes, maybe your Cisco firewall, and the way that the data is getting brought into my environment here is using a universal forwarder.
Now, the universal forwarder is one of our most common ways to get data into Splunk, and all that that is is a very, very lightweight piece of software that sits on your server or your syslog box, et cetera. And all that it does is it forwards data from your data sources to the Splunk Indexer. Now, the Indexer is where we store your data. And then the last piece is the Search Head. The Search Head is really where the search processing is done, and this is where the users interact with it. So when a user types in a search, the Search Head is saying, where do I grab that data from?
Another way that you can get data into Splunk is to have Splunk listen in on a specific port or have Splunk monitor a specific file directory. The third way is through our HTTP Event Collector. And all that it does is, essentially, it scrapes data from websites and bring it into Splunk, and you don't need to install any software for that. And the last way, the easiest way, is to simply drag and drop files directly into Splunk, like your CSV files, maybe some ZIP files, et cetera.
All right, now that we've talked about how to get data into Splunk, let's go ahead and start using it. So I'm going to start off with the basic search interface. Here you'll see the search bar, which is really similar to Google, where you can just start typing in words-- error, or failure, et cetera-- and you'll see that it will start auto-suggesting some terms, which is one of our new features in 6.5 that just came out in September.
But before we start searching, I want to talk a little bit more about the interface. And if I click on the Data Summary button in the middle, you'll see a summary of all of the different Hosts, Sources, and Source types. The Host is who my data is coming from. The Source is where my data is coming from. And The Source type is what type of data I have in here. So you'll see that I have Cisco data, I have my web server logs, we have access combined, I have some mobile data in here. And I'll be able to correlate all that data in one place.
And at the bottom here, I like showing this. This is the search history. So this will show you all of the recent commands that you may have done, so if you wanted to go back the next day and say, what was that search that I did a couple of days ago? Then you can simply go back here and click Add Search, and it will add it straight into your search bar.
OK, now let's get searching. I'm going to use a scenario of where we're on the web operations team, and we're in charge of managing our company's e-commerce store. And we work for a company called Buttercup Games. And I have been tasked to understand what's been going on with our website. We've been seeing some failures. We've been getting some reports that people haven't been able to load the page. So let's start diving in a little bit more.
Now, one of the things I want to start with is our HTTP status codes. We know that status code of 200 equals a success. We know that a status code in the 400s is a client error, and we know that status codes in the 500s are server errors. So let's start by looking at some of the errors. I know that 404 is a really common one, so I'm just going to say 404. Or we can just say, maybe, 400. Or we can even use a wild card and say 4-star, and that will catch 4 and everything preceding.
And then I'm going to select the time range, and I'm just going to say the last four hours. And then I'm going to hit the Magnifying Glass, and now I'm brought to my search page. So it looks like, based on that simple term search, we have 264,805 events. You can see that count right underneath the search bar. And right underneath the search bar, we have what's called a histogram.
Now the histogram shows a distribution of our events over time. And on the right-hand side, it will show you what each little green column means. And in this scenario, it looks like we have one minute per column. And the histogram is really great to show a spike in events or a drop in events. It's really easy to see it in a nice, visual way. And if we select on it, we can select on just that specific time window if we want, or we can deselect it.
Now let's scroll down and start looking at what the raw data looks like. So what is Splunk doing when we start doing this search? When I start typing in 404 or 400, Splunk is going through all of my data that matches those terms. So I'm going to click that little arrow, and what Splunk does is it breaks down all of these pieces of information, and we break them down into key and value pairs. The key in this example would be host, the value is websphere01, and so on. And you'll see all these different pairs that we've brought out.
And so without having to do any kind of extrapolations or anything like that, Splunk is, out of the box, pulling these key value pairs out for me. And on the left-hand side, you'll see that Splunk has turned all of these into what we call interesting fields.
So Host, Source, and Source Type will always be assigned to your data. If your data does not have a host or a source, then Splunk is going to automatically assign it for you. So you have the field there in blue, and the gray number after it is the number of distinct values that match that case.
And then you'll see we have all kinds of interesting fields. I see that we have Action, and once I click on Action, it will show me a nice preview of what the data looks like. It looks like, in 39% percent of the data we have allowed-- if, let's say, maybe we wanted to go through and look at products, which products have been purchased.
You know what? I'm looking at the data, and it looks like it's catching more than the status codes I want. It looks like it's catching some kind of time stamp. So let's refine our search a little bit more. Let's say we wanted to catch all of the errors, right? So all of the errors will be everything that's not a success. So now I'm going to use a key value pair search and say "status not equal to 200."
So what that's going to do, it's going to bring back everything of not 200, which is going to be the arrows, and that's what I really want to look for. Now, based off that search, it looks like I have 7,623 events, and I'm going to do it just for the last 60 minutes.
Now I want to get a visualization of my status codes. I'm going to scroll down, and without having to type in anything in the search bar after, besides the search that we wanted, I'm just going to click Top Values. Once I click Top Values, you'll see that the search bar at the top has been automatically updated with some specific terms.
Now this is using the Top command, which is one of the many search processing language commands that we have within Splunk, and it's automatically outputted a visualization for me. On the left hand side where it says Bar Chart, you'll see that I have many other different visualizations to choose from. I can do a pie chart. But for this example, I'm going to keep it as a bar chart.
Now I'm going to save this out as a dashboard. So how do I do that? In the top right-hand corner where it says Save As, we have three different choices. We have Report, and what that does is it saves your search as a report, and you can have a report scheduled to run at a specific time. You also have a Dashboard Panel, which is what we're going to click. And then the third one is Alert, and we'll talk about Alert in a little bit.
But let's save this as a dashboard panel. I'm going to call this our Web Operations Team Dashboard. And the Panel Title, I'm going to call it Top Error Codes. And then I'm going to click Save.
I'm not going to view our dashboard because I want to create a couple more searches before we go to that. So I going to go here. And simply using the Back button in our browser, we can go back to the search that we had before.
All right, so we did one of top error codes, but I also want to do top error codes over time. So I can go back down to Status field, and I can use one of the simple reports they have already for me. I'm going to click Top Values By Time.
Let's do an area graph. So I'm going to click Line Chart, and then I'm going to change it to Area Chart. This is a nice visual representation as well. And you'll see from the legend on the right-hand side, if you hover over one of the status codes, the graph will be updated, also. Or if you hover over the actual graph, you'll see that the legend will then be reflected from that, too. And also, if you actually click on the graph, it will take you to the raw information.
So I'm going to save that one out, too. So I'm going to click Save As in the top-right Dashboard Panel, but since I already have an existing dashboard, I'm going to click Existing. And I'm going to select Web Operations Dashboard. But this title I'm going to call Error Codes Over Time.
Now we just created two. One was using the Top command. Another one was using the Time Chart command. Well, what if I just wanted to get a total count of all of the status codes and how many times they've occurred? I can just use the command Stats Count By Status. And here you'll see all the status codes and the number of times these status codes have occurred within the last 60 minutes.
But let's say I only care about the ones where I've gotten a status code where the count is greater than 500. You know, we want to look for something that's abnormal. And it looks like I have status code 404 and 503 that I'm getting a lot of errors here.
So I'm going to save this out as an alert. So just where I did where I wanted to save it as a dashboard, I'm going to go back to Save As and click Alert. Now, let's call this alert Error Codes. So here, we can have the alert scheduled to run on a specific time, or you can also do it on a Cron job and have it very custom, or you can have the alert run in real-time.
Now, also, we have trigger conditions. Now, trigger conditions is, when is this alert going to fire off? If the number of results is greater than 500.
Now, Trigger Actions. Once this alert goes off, what do you want Splunk to do with it? We have lots of different options here. In my environment, I have the HipChat-enabled one. And what that does is that will message our HipChat room. HipChat is our messaging platform that we use internally at Splunk. So it'll message a group HipChat room or a specific person and let them know this alert has gone off.
Well, let's look at that email at the bottom here. So pretty simple in the sense that, if this alert goes off, send an email. And you can have this go to a group, alias, but it will also tell you why the alert was triggered. So I like to select Trigger Condition, and I also like to attach CSV. So it will actually attach to the email the raw logs that come with the alert.
And then another one that we want to do is running a script. So this one is really, really powerful what you can do is you can write a script in your own preferred language-- Python, Powershell, Ruby, whichever your heart desires-- and you can have a script run, and such that Splunk is going to pass the variable from this search and allow you to run whatever kind of script that you want.
Let me give you a scenario. What about in a port scan scenario? Maybe you're on the network team or something like that, and you want to get notified if there is a specific IP address hitting a certain number of ports x number of times. You could have this search in Splunk say, if this IP address is hitting this port by x number of times, then to run this script. So what Splunk is going to do, it's going to pass that variable, which is going to be the IP address, and it's going to plug it into your script, and your script can say, "Log in to my firewall and blacklist this IP."
So here's where we can really start seeing some proactive monitoring, so things that you can do ahead of time before something actually happens. And again, the running a script can be really powerful. And there's lots of these different trigger actions that you can find in Splunkbase.com. We have several different ones like the Skype one. We have, also, one for sending SMS, and so on. And that's pretty much it for alerting.
Now, I'm going to do one more search before we save off this dashboard and finish out. So let's say we have our chart, right? We have where the status is not equal to 200, where we have all the errors. I'm going to do a simple stats count, and I'm going to use a Spark Line by URI path.
So I'm still using the Stats command, but there's another variable that I can use in the Stats command called Spark Line. So Spark Lines are inline charts that appear within table cells in your search results, and they're designed to display time-based trends associated with the primary key of each row. So it's a nice visual way to see your data.
So what I'm really doing with this search is showing, of the pages that have been hit, how many area codes do I have for each one? And show me a nice visual representation using the Spark Line. So I'm going to go ahead and save this last dashboard panel, and I'm going to call it Web Page Visits With The Most Errors.
Now, let's go ahead and view our dashboard. Now here we have our top error codes, which we had before. Then we did it by time. And lastly, our web page visits with the most errors.
Now, what if I wanted to drag and drop some of these around? All I click is the Edit button, and I can simply drag and drop these dashboards and put them side-by-side. I can also easily rename them. I can also change the visualization to something else. And it's that simple to use these dashboards.
I can also export these dashboards as a PDF and have them scheduled to run out at a specific time frame. So let's say your boss asked you to send him a report on what the environment looks like at 6:00 AM every morning. You can have those PDFs scheduled to run out at 6:00 AM, or whichever time you'd like.
Now there's one more thing I want to show you guys, and this is one of the new features that we came out with in 6.5, again. Now some of you might be a little bit intimidated by the search processing language, by some of those commands that we used. So we introduced Table Datasets. What Table Datasets is a way to both prep and analyze your data with tables. It's a structured view of the data that you can create, edit, and analyze without having to use the search processing language, or SPL. So whether you're a Splunk specialist or an occasional user, you can appreciate the power and the simplicity of this new feature.
So I'm going to click on Datasets in the menu at the top, and we're going to go ahead and create a new dataset over in the top right-hand corner. So you can either create a new dataset from scratch, or you can use an existing dataset, or you can also start from a Splunk search.
In today's example, we're going to create our own data set from scratch. So I'm going to click on Indexes And Source Type. So what I'm going to do is first select the index that I want that has the data that I'm looking at, and we're going to use the same data in the Access Combined Logs. So I'm going to select Access Combined, hit OK, and then I select the fields that I want to use.
So I want to play out this scenario where we're still on Buttercup Games, we're still on the web operations team, but people are asking, which products on our website are people purchasing through mobile devices? So we really want to get a picture of which mobile devices people are using to visit our website.
So if I want to hit on mobile devices, I'm going to hit Action. I'm going to use that field. We want to get the purchases, which-- what have people purchased through mobile devices? I'm going to select Bytes. I'm going to select a device. I can also just hit the Search and start typing in. I also want to select Status again. And then the last one I'll do is Product. OK, and then I'm going to hit Done
Now, here's what our data looks like in an arranged fashion. So similar to how you might use Excel, you can select a row or a column, hit Filter. All we're going to care about are the purchase values. So I'm going to get rid of everything that's not No, and then I'm going to filter on only the things that are purchased. I only care about the successful purchases.
So I filtered by Purchase only, and now I click over on the device column, and it looks like I have Linux, but I know that's Android. So I can select the column and hit Clean, and I can replace values. So I'm going to replace Linux with Android. Hit Apply, and you'll see that all of the Linux devices have been changed to Android.
And you'll notice on the left-hand side, you'll see that there's these little layers. And these are all the actions that I've taken. But let's say, for example, I wanted to undo that. I can hit the X next to Replace Values, and it'll go back to be what it was before.
But I'm going to go back now. You'll see that I have the product and the status. But let's say I'm a business user, and I don't really know what the status codes are, but I can join it from a lookup table. So I'm going to select the column status, click Add New, and click Join From Lookup.
Now I'm going to select the lookup table that I want to choose. I have one called HTTP Status. And I'm going to join it from the Status field, and I'm going to concatenate it to the Status field. And the field that I want to pull in for my lookup table is Status Description. And then hit OK and hit Apply. So now you'll see that I have the Status and the Status Description.
A lot of people use these look up tables for, maybe, Salesforce IDs, or maybe employee databases as well-- so you can put in names-- or even in the simple scenario where you have ZIP codes or something like that and you wanted to know which states, et cetera. Another thing that I can do is rename the field. And let's say I wanted Device to be renamed to Mobile Devices. So I click Edit in the top left-hand corner and select rename. I'm going to call it Mobile Device, and Apply.
Now I showed you before on the left-hand side where it has the commands and everything that I'm doing, but next to the Commands tab, where it says SPL, here what you will see all of the SPL that's being generated underneath it. So again, you don't have to know SPL, but this could also be a great way-- "Hey, how do I do this in SPL?" if you wanted to do it yourself.
So we have a data set there where we feel pretty good about. I'm going to click on Summarize Fields in the middle on the top. So what the Summarize Fields page is showing is a view of the table to get analytical information about its fields. So you can see things like Top Value Distributions, Null Value Percentages, Numeric Value Statistics, and more.
It looks like I have some null values in Mobile Device, so I'm going to go back to Preview Rows. I'm going to select Mobile Device, filter out by Is Not Null. So I, again, am going to cleanse my data to make sure that I have a good data set that either myself or my business users can work with. Now it looks like 100% of my data looks pretty good, so I'm going to save this, and I'm going to call this Purchases On Mobile Devices. Hit Save.
Now, this is where we go to the visualizations. I've created a dataset, and I'm going to select only the data in the last 30 days. Now, I'm going to add the device, hit Add To Table. And this is really similar to Pivot in Microsoft Excel if you've ever used that before. And the column that I'm going to put is Product. So it's showing me, of the devices, which one of the products have been purchased, by device, and by which product.
So now that I have this table on the left-hand side, that black or gray bar, I'm in the Chart View. Now I'm going to select on the Visualizations. I can select a line chart, or I like the area graphs here. And now, once I have the visualization that I'd like, I can click Save As, and we can save it off just like we did with the other dashboards.
All right, let's review what I demonstrated in Splunk. The first thing we did was we searched our data. We started with basic term searches and then moved to key value pair searches. The other thing we did was we created an alert and we discussed triggered actions like running a script or sending an email.
And lastly, we created some intuitive dashboards to help out our web operations team. We created these dashboards using the UI from the Fields menu, clicking through Common Reports. We also created dashboards typing out some of the Search Processing Language or SPL commands. And lastly, we created dashboards using one of our new features, Table Datasets, without having to know any SPL.
So why do our customers choose Splunk? Here are some of the top reasons. Number one, fast time to value. Splunk can be downloaded and installed in minutes, and if that's not fast enough you can get a cloud instance in seconds. Splunk can also ingest any data from any source into any size. We're a universal machine data platform, and with our schema on the fly approach, you can ask any question of your data.
Visibility across your entire stack. Because you can ingest this data from any source, you can quickly gain visibility across all of them and correlate it in one place. And lastly, one platform supporting many use cases. You may have a point problem to solve, and that's usually where our customers begin with Splunk, but you then realize that with this platform, you can address company-wide analysis needs.
And we're proven at over 12,000 customers in over 110 different countries. You're joining a worldwide community of organizations across virtually every industry, and more than 80% of the Fortune 100 companies are using Splunk. And our passionate and vibrant community is extremely enthusiastic and ready to answer questions and share best practices. From splunkbase.com, we have over 1,100 apps that are created by our partners, by our customers, and some by Splunk as well. We also have Splunk Answers, which is the go-to place for your questions and answers, and you can also participate in meet-ups, user groups, contribute to our forums, or attend local Splunk live events to hear from your peers.
And the best part is that Splunk is really easy to try and deploy. We have multiple options for you to getting started. We have a free two-week cloud trial, we have a free software download which comes with a 500 megabyte per day license, and that product that you download off our website is the exact same product that would scale to ingest petabytes of data per day. And lastly, we offer a one-week trial instance for our premium applications for enterprise security and IT service intelligence.
And with that, thank you so much for joining me today, and happy Splunking.