I guess that if you are here, you've already read the first part of this series and want some help to quickly get value from your NetFlow data, building trend analysis and advanced analytics with long term data (i.e months), in addition to playing with real-time data.
You can take advantage of Splunk’s super flexible schema on read architecture to exploit your real-time data from the very first moment you get the data in. Plus you can make use of Splunk’s Data Model Acceleration to get maximum performance for longer term data and enjoy a schema on write experience. You’ll get the best of both worlds! On top of that, you will configure it easily in a couple of clicks with the help of Splunk’s CIM.
This second part is based on a joint work Raúl Marín and I did in his own blog: NetFlow traffic ingestion with Splunk Stream and an Independent Stream Forwarder: Part 2. To help you through this journey, I built an app (sample dashboards for NetFlow) with a couple of dashboards and visualizations that you could use as examples of the insights you could easily get once NetFlow traffic is indexed in Splunk.
For this journey, we will assume that we have a Splunk deployment with the Splunk Stream app and NetFlow traffic being indexed in Splunk, preferably from a NetFlow traffic generator. If that is not the case, please have a look at part one of this series.
Ready to get to the next level? Let’s conquer Everest camp 2!
Step 1 – Deploy Netflow sample dashboards app
First of all, you need to download the app for sample dashboards for NetFlow from Github. This is a sample app that I built with a couple of dashboards with several visualizations of the NetFlow traffic that was ingested into Splunk using an open-source NetFlow traffic generator already mentioned in part 1 of this series. Follow the installation steps explained in the app README. The 'sample dashboards for NetFlow' app assumes that you are indexing NetFlow (from a generator or network) to an index called netflow_index. If this isn't the case, you'll need to modify the app's SPL accordingly.
After the sample dashboards for the NetFlow app is installed, you will be able to access it like any other Splunk deployed app:
Step 2 – Install required splunkbase apps
To be able to enjoy the schema on write experience and some cool visualizations we will need to install the following apps from Splunkbase:
- Splunk Common Information Model (CIM) app
- Splunk Sankey Diagram - Custom Visualization
- Force Directed App for Splunk
To install Splunk Sankey Diagram - Custom Visualization and Force Directed App for Splunk just download the app and unzip to $SPLUNK_HOME/etc/apps on your Search Head before restarting Splunk. You can also use the web ui at the Search Head to install both apps from Splunkbase by clicking in the “+ Find More Apps” section at the left-hand side of the landing page:
Step 3 – Set up data model acceleration
Now we will configure the data model acceleration to get the desired schema on write experience for long-tail data. To have a look at the fields managed at the Network Traffic Data model at Splunk CIM have a look at the Common information model add-on manual. To perform the configuration we will follow the next steps:
1) Click on Datasets and filter by Network traffic and choose Network Traffic > All Traffic click on Manage and select Edit Data Model
2) Before configuring the acceleration of the data model you will need to add an index constraint to the data model. That is not necessary for a demo/dev environment but in a production environment, you should set an index constraint to optimize the performance of the data model acceleration. Default the constraint will be: (`cim_Network_Traffic_indexes`) tag=network tag=communicate . Click on Edit at the left of the constraint definition and set the constraint to index=netflow_index (`cim_Network_Traffic_indexes`) tag=network tag=communicate so that the acceleration will only be made on the netflow_index. Then click on the green Save button.
3) Now let’s review how the CIM can map NetFlow traffic fields contained at netflow_index to CIM: For that click on Datasets and filter by Network traffic: click on Network traffic > All Traffic:
4) This leads you to a table view of the Network Traffic > All Traffic mappings to the fields contained in NetFlow logs from the netflow_index index. Note that many of the fields are left blank as the Network Traffic>All Traffic model is broader than the set of fields provided by the NetFlow log.
5) Now let’s explore at search time how the CIM model extracts fields. For that, click on Datasets and filter by Network traffic: choose Network traffic > All Traffic and click on Investigate on Search. You will be redirected to Search in Splunk and you will be able to look at how a list of fields in grey have appeared after the JSON log info. Those are the list of fields that are compatible with the CIM model and that will be extracted for fast long-tail searches when it will be accelerated. The field protoid is not in the list of extracted fields but since you will need it for some of the dashboards of the 'sample dashboards for NetFlow' app, you have to manually modify the CIM data model and add it in the next step.
6) Click on Datasets, filter by Network traffic and choose Network Traffic > All Traffic click on Manage and select Edit Data Model. In the next step click on the Add Field button and select Auto-Extracted.
7) Splunk will help you by proposing a number of fields that could be extracted from the netflow_index and add it to the data model to later be accelerated. Choose the one we were looking for: protoid, set it as number and optional. Click on save.
8) Click on Datasets and filter by Network traffic and I will choose Network Traffic > All Traffic and click on Investigate on Search to have a look at how a list of fields in grey have appeared after the json log info. See that protoid field has been included!
9) Now let’s finally set up data model acceleration. For that click on Datasets, filter by Network traffic and choose Network Traffic > All Traffic click on Manage and select Edit Data Model.
10) On the top right of the menu, click on Edit and select Edit Acceleration.
11) Then, you need to enable acceleration by clicking in the white box next to Accelerate and set the Summary Range depending on the range of data you want to search. Set it to 3 months. In advanced settings, you could define other parameters such as Backfill Range, Max Summarization Search Time, etc. But we won't modify these this time. If you want to know more about configuring Data model Acceleration have a look at Accelerate Data Models documentation. Finally, we will click on save.
12) After performing the previous configuration steps check that the acceleration icon has turned yellow to the NetFlow Traffic Data model signalling that acceleration is turned on.
Step 4 – Explore the dashboards app
Cool! Now you can click on dashboards at the NetFlow sample dashboards app and begin playing with the two existing dashboards:
- @demo: NetFlow Dashboards: here I will have examples with long-tail data using Splunk’s tstats command that is used to exploit the accelerated data model we configured previously to obtain extremely fast results from long-tail searches. Note that tstats is used with summaries only parameter=false so that the search generates results from both summarized data and raw data. That will allow us, in one dashboard, to seamlessly enjoy the real-time experience of schema on read against raw data plus the long tail performance of the schema on write against data model accelerated data.
- @demo: Hosts: here I will have examples using real-time data based on Splunk’s stats command that is used to exploit the raw real-time data indexed at Splunk.
We use Splunk’s stats command to calculate aggregate statistics, such as average, count, and sum, over the results set coming from a raw data search in Splunk. Splunk’s tstats command is also applied to perform pretty similar operations to Splunk’s stats command but over tsidx files indexed fields. Those indexed fields can be from normal index data, tscollect data, or accelerated data models. Splunk Enterprise creates a separate set of tsidx files for data model acceleration. In this case, it uses the tsidx files as summaries of the data returned by the data model.
Splunk’s tstats command is faster than Splunk’s stats command since tstats only looks at the indexed fields whereas stats examines the raw data. Since Splunk’s tstats command can only look at the indexed metadata it can only search fields that are in the indexed fields.
To explore the SPL contained in each dashboard just click on Open in Search at the button of the panel of interest and you will see the search that created the panel:
@demo: NetFlow Dashboards
Find here some snapshots of the panels included in this dashboard:
Find here some snapshots of the panels included in this dashboard:
Credit to Matthieu Araman for this technical guidance on this topic.
Credit to Matt Olson for his guidance and support in the publication of this blog series.
Awesome, we have reached Everest camp 2.
From here, we can build on this foundation using trend analysis, and ML to get predictive and to identify anomalies - helping capacity management, informing routing and peering strategies, and protecting our networks by finding bad actors and nefarious activities. The possibilities are almost endless, and we’ll explore more Splunky NetFlow goodness in future instalments.
Do not miss part 1 of this blog series: Splunking NetFlow with Splunk Stream - Part 1: Getting NetFlow data into Splunk.