Metrics and logs go together like cookies and milk. Metrics tell you when you have a problem, and logs/events often tell you why that problem happened. But it’s always been harder than it needed to be to get both types of data onto a single screen, especially when the sysadmins using the tools aren’t necessarily daily experts in managing those monitoring platforms.
In this post we’ll walk you through the simple steps for getting Linux metrics & logs into Splunk for analysis with the Splunk App for Infrastructure (SAI), a free app that sits on top of Splunk Enterprise or Splunk Cloud. If you missed the overview post, The Splunk App for Infrastructure: Getting Started with Metrics & Logs Together for Easy Infrastructure Montioring, SAI is designed for IT admins and sysadmins who just want an easy button to monitor metrics and logs so they can get back to other tasks that fill their day. SAI is built with prescriptive guidance for getting data in, along with preconfigured dashboards that require no configuration.
Before you begin, you will want to install both the Splunk App for Infrastructure, and the Splunk Add-on for Infrastructure. The Add-on will create some indexes and configure some sourcetypes and other components. These components can both be easily installed in the Splunk GUI. Take a look at the SAI documentation for further details on minimum requirements and installation steps.
If you are installing in a distributed Splunk environment, the Splunk App for Infrastructure should be installed on the search head, and the Splunk Add-on for Infrastructure should be installed on the indexers.
What the HEC?
For Linux systems, metrics are most easily captured via collectd, an open-source agent for collecting system and application performance metrics. Splunk provides a packaged distribution of collectd. Collectd then forwards the metrics to the Splunk HTTP Event Collector (HEC), a high-throughput collection engine that makes sending many types of data across networks easy and safe.
If you need an alternative to using collectd and HEC, it is also possible to send metrics to SAI using the Universal Forwarder (UF) and Splunk Add-on for Unix and Linux (TA-Nix). Look for another post in this series to explain further details.
Before you get started, configure HEC for use with SAI. This one-time step enables HEC to receive data destined for SAI. In this process you will create a token for HEC and tell it the index to which metrics should be sent.
In our case, we are going to use the standard em_metrics index, though you can use whatever metrics index you want. Just be aware that if you use a different index you will need to update the sai_metrics_indexes macro so your metrics index is included in SAI searches.
What Do You Want To Collect?
Just like when you go to the supermarket, it's a good idea to figure out what you need before you get there. You don’t want to forget anything. It’s probably a good idea to know what you want to monitor before going through this next bit. It’s easy enough to remove everything and redo it, but who has time for that. Let’s take a look at how you configure SAI to collect the logs and metrics you prefer. It’s as easy as 1-2-3.
1. Open the Splunk App for Infrastructure and click on Add Data in the menu.
2. On the left pane, see a list of available platforms for data sources. Make sure Linux is selected.
Follow the 1-2-3 steps for adding data as shown in the form.
3. Click Customize to select the metrics and logs sources you want to collect data for.
4. For this example, we will be collecting all of the available metrics. Notice cpu and uptime are grayed out. They are selected by default and cannot be deselected. As for the logs, since we are using CentOS, we need to uncheck the predefined log files in /var/log, because they don’t exist.
5. We do want to add some custom sources that we need. Click + Add Custom Source, then add the following logs, one at a time. You will also want to select a source type from the list.
/var/log/messages Sourcetype: linux_messages_syslog
/var/log/secure Sourcetype: linux_secure
/var/log/audit Sourcetype: linux_audit
6. Click Save when you are finished.
What Are The Dimensions?
The next field in the Configure Integrations form is Dimensions. Dimensions are like tags; key/value pairs to add contextual metadata about the metrics you are collecting. Adding dimensions allows you to group, split by, and filter when troubleshooting or analyzing metrics in SAI. The collectd distribution provided by Splunk includes a write_splunk plugin that includes dimensions when sending data to the Splunk HEC.
Dimensions are in the format of dimension:value; for example env:prod, or location:nyc. There are 5 dimensions that get created by the write_splunk collectd plug-in by default. These cannot be removed and provide out-of-the-box context.
7. Add some dimensions. We are going to use location: nyc and role: web server. One thing to note when adding Dimensions, when you type the colon after the key name, it will appear to add the dimension without a value. Don’t panic, just type the value and hit enter. As soon as you do, it will merge everything. It looks like this:
Where Are We Going?
As mentioned above, we are going to use HEC to collect the metrics. Logs will be sent via a Universal Forwarder (UF). But we don’t have a UF installed. We also don’t have collectd installed. Patience grasshopper (seriously dated TV show reference to anyone under 50). Trust me, we will get to that.
Continue completing the Configure Integrations form by describing where the agents should send their data to Splunk.
8. Specify the hostname or IP address of the Splunk server you want to send the data to.
9. Configure the following settings for HEC and the Universal Forwarder. 8088 is the recommended HEC port to use, but that depends on how you configured HEC.
9997 is the recommended receiver port to use.
Set the location where the script will install the universal forwarder. Unless you're a rebel and like to install things in non-default directories, leave the Forwarder location as is.
Finally, copy and paste the HEC token you generated earlier.
Next, we will deselect the following two items in the form for our example:
10. Deselect Authenticated install - This only applies to Ubuntu or Debian systems. Since we are using CentOS, we don’t need this checked.
11. Deselect Use SSL - wget doesn’t trust self-signed certificates by default, so you might want this disabled, unless you are using Splunk Cloud, in which case, keep it selected.
Last But Not Least
If you have Docker containers running on the host that were not deployed with an orchestration tool such as Docker Swarm, Kubernetes, or OpenShift, you can collect those metrics too. These metrics will get merged with the host system and displayed as one entity in SAI.
12. If you have Docker containers running on the host and you want to collect metrics, select Monitor Docker containers. Otherwise deselect it.
Are We There Yet?
Not yet, but we will be momentarily. All that’s left to do is copy the Easy Script that was generated based on the values you configured in this form.
A few last-minute checks:
13. Verify that you have a user with root privileges on the host you will be monitoring.
14. Verify the following dependencies (based on your operating system):
- wget (CSWwget for Solaris)
- apt-get (Debian, Ubuntu)
- yum (Redhat, CentOS, Fedora)
- zypper (SUSE, openSUSE)
- pkgutil (Solaris)
- internet access
15. Now click the copy icon next to the generated Easy Script to copy it.
16. Open a terminal and switch to the user with root privileges. Paste and run the script on the host you want to monitor.
That’s it! Now you can use the same command on any system you wish to monitor, and data will start flowing.
If all goes well, you will see new entities appear within about 5 minutes. You can keep an eye on the bottom of the Add Data page, which will refresh every minute.
Or you can go to the Investigate menu option at the top of the screen and look for new entities in the list. You can also see the current status, last time data was collected from each entity, and the dimensions for each entity.
Oops, I Need To Do It Again
Since no one is perfect, there may be a case where you need to redo things. Maybe you forgot to add a dimension that would have been helpful, or you might have forgotten a custom log that you wanted to monitor.
NOTE: If you need to adjust dimensions, they can be edited in the write_splunk plugin section of the /etc/collectd.conf file.
SAI makes it super simple to remove things so you can reconfigure and try again. All you need to do is click on the Remove tab above the Easy Script in the Add Data page. You don’t even need to fill out anything in the form in case you switched away and came back. Just copy the Remove script and run it as the same user on the host or hosts that you need to redo.
Get to Work!
That’s it! Data is now flowing into SAI. Now it’s time to create value; following posts will walk you through common use cases in SAI. But honestly, SAI is pretty easy to figure out. You can probably navigate on your own from here!