TIPS & TRICKS

Splunk Connect for Syslog: Turnkey and Scalable Syslog GDI - Part 3

In Part 1 and Part 2 of this series, we explored the design philosophy behind Splunk Connect for Syslog (SC4S), the goals of the design, and the new HEC-based transport architecture, as well as the rudiments of high-level configuration. We'll now turn our attention to the specifics of SC4S configuration, including a review of the local (mounted) file system layout and the areas in which you'll be working. 

Configuration Levels and Filesystem Layout

Configuration of SC4S occurs at five main levels, outlined below:

Configuration Level

Files

Typical Use

Global and Required Settings

env_file

  • Required SC4S configuration
  • Unique listening ports
  • Alternate destinations
  • Kernel settings
  • Debug/Development settings

Splunk Metadata

splunk_metadata.csv*

  • Set/override Splunk metadata
  • Set event output format

Vendor/Product Event Categorization

vendor_product_by_source.*

  • Set log path based on incoming hostname or IP address
  • Add contextual indexed fields

Compliance Overrides

Event sub-filters

compliance_meta_by_source.*

  • Override Splunk metadata for subset of events
  • Add indexed fields to subset of events

Platform Extension (Log Paths)

lp-<vendor>.conf.tmpl d_<destination>.conf
s_<source>.conf

  • Add new data source to SC4S
  • Add new destination to SC4S
  • Add new collection method (source) to SC4S

We will now look at each of these in turn, but before we do, let’s take a look at the SC4S Container architecture and configuration directory in detail.

The container is configured to mount a local set of directories so that configuration will remain persistent between SC4S restarts.  This directory has the following structure:

Directories and files shown in red are part of a standard SC4S configuration and cover the required components such as the URL and Token for the Splunk HEC endpoint (which is the default destination for SC4S traffic) as well as configuration for Splunk metadata (index, etc.). Files contained in the blue directories contain configurations for locally configured collection methods (sources), output destinations and, most commonly, log paths (filters) for data sources that SC4S does not support out of the box.  Creation of log paths will be covered in Part 4 of this blog series.

For the rest of this installment, we will focus on the files and directories in red above. Let’s start with the most common (and required) configuration, setting environment variables in the the env_file.

Global and Required Settings (env_file)

Configuration of SC4S begins with setting required and common optional parameters required for basic SC4S operation, which is done via the env_file. The file contains, as its name implies, environment variables that are used by the container shell at startup, and later consulted by the templating process (described in detail in Part 4 of this blog series) to create the final syslog-ng configuration that drives SC4S. Here is an example of a simple, but in many cases totally sufficient, env_file.

You will see that most of these variables make sense simply by their name.  The variables here range from the required (Splunk HEC URL and token) to typical optional variables that are outlined in the documentation. The optional ones specified above are commonly used, however, and merit some explanation:

  • SC4S_DEST_SPLUNK_HEC_TLS_VERIFY: This is a critical variable to get right, as the default in SC4S is for high security. If this variable is not specified (or set to “yes”), the default is to verify the server cert to the HEC endpoint. In many cases, particularly when first configuring SC4S, the required certificate file is not in the required location (this is documented here), and traffic will not flow to the HEC endpoint. In such cases, it is best to start out with this variable set to “no” and confirm that data is flowing to the HEC endpoint before tightening down the security for production.
  • SC4S_USE_REVERSE_DNS: This can be very helpful for events that do not have the hostname set (or is set to an IP address) in the payload. However, ensure that your reverse DNS algorithm (caching nameserver) is fast; otherwise you’ll see events land in Splunk up to minutes after they’re collected.
  • SC4S_LISTEN_JUNIPER_NETSCREEN_TCP_PORT: You may have several of these “LISTEN” variables in your file; they are documented per source or technology. They indicate to SC4S that you will be listening on a specific port (or ports) for that specific device class. Be sure not to specify the same port or overlap port ranges for multiple devices; the ports must be unique to each type. There will be a startup error if SC4S detects this.
  • SC4S_SOURCE_STORE_RAWMSG: This setting is extremely useful when debugging event flow and/or developing a new log path, where a raw, “on the wire” syslog message is required. Be sure to turn this off in production, however, as it essentially doubles the resource usage throughout the pipeline (internal memory of SC4S, disk queues, and potentially the Splunk index.)
  • SC4S_DEST_GLOBAL_ALTERNATES: This variable specifies a list of alternate destinations (besides HEC) for all traffic. These alternates can be specified per-source as well. SC4S-supplied (d_hec_debug and d_archive) as well as locally defined destinations can be specified in the list.
  • SC4S_SOURCE_UDP_SO_RCVBUFF: This variable and the following “sockets” variable are critical for UDP buffer tuning; as a “send and forget” protocol the receiver cannot tell the sender to “back off.” Therefore, critical matching of SC4S server resources to expected traffic load is necessary to avoid dropped packets. Often, this variable (specified in bytes) must be configured to 32 MB or more. The kernel must be tuned similarly or a warning will be issued at startup.
  • SC4S_SOURCE_LISTEN_UDP_SOCKETS: This is another UDP variable that instructs the kernel to set up parallel listening sockets for UDP traffic. This can be used to significantly improve the scalability of the underlying hardware for UDP traffic.

This list is by no means complete; the documentation covers the full complement of available variables that can be set in this file.

Context Customizations

The env_file configuration alone will suffice for many deployments, but there are three key areas that file does not cover:

  • Splunk metadata (index, host, source, and sourcetype) overrides, and
  • Categorization of data based on originating hostname or IP address/CIDR block
  • Sub-categorization of events (for compliance needs such as PCI)

The context directory contains lookups and syslog-ng configuration stubs that allow all of these customizations to be made.  

Splunk Metadata (splunk_metadata.csv)

We will now dive into this section with a few examples, starting with the file that most will require due to unique indexing needs for most enterprises – the splunk_metadata.csv* files, typically located in the /opt/sc4s/local/context directory (diagram). The purpose of these files is to assign Splunk metadata appropriately based on a key (first column) that is set in the log path for that data source. These files work similarly to the way the “default” and “local” conf file directories do in Splunk. The “example” file is analogous to the default conf file in Splunk and should not be edited directly. Indeed, the internal version of the example version of this file is copied to the local directory purely for reference. Here is a portion of that file:

You can scan through this file to determine what index (and other metadata) will be set by default in SC4S (these entries are also documented). The format of this file is:

vendor_product, metadata, value

where vendor_product is an arbitrary key specified by the author of the log path using the convention of the vendor (e.g. cisco) and product (e.g. asa) separated by an underscore. In most cases the key is lower case; the exceptions are CEF and LEEF sources where these values are derived from the event and are not an arbitrary selection by the log path author.  For all SC4S-supported data sources, the keys (and index/sourcetype defaults) are documented in the source section of the documentation. The full list of available overrides is available here and includes alternate output templates (which govern how an event will appear in Splunk) as well as traditional metadata. Only one metadata item can be set per row; if more than one metadata item for the same source needs to be set or overridden, the same key can be used in multiple rows.

If a particular metadata item needs to be overridden (most commonly the index), the splunk_metadata.csv file (without the example extension) can be edited to override one or more entries. The local (non-example) file can also be used to specify default metadata for new log paths (we will see this in Part 4 of the blog series):

Similar to local .conf files in Splunk, this one is much shorter than the default. It simply shows an override of the Cisco ASA index, as well as a new entry for the “StealthINTERCEPT” device that does not exist in the example file at all (as it is not supplied OOTB with SC4S). This last entry is for our new log path that we’ll cover in Part 4.

Vendor/Product Event Categorization (Context Filters)

When events arrive into SC4S, they must be categorized properly so that the proper sourcetype and other Splunk metadata can be applied. If an event is captured over its own unique TCP or UDP port, that categorization is easy. More often, however, the event must be captured over the default UDP port 514, which is historically reserved for syslog traffic and will undoubtedly receive events from many different device types. In this case the contents of the event (payload) will be examined to route the event to the proper log path, which further parses the event and assigns proper metadata. While this works for a majority of device types, there is a small but significant set of devices for which the payload is not unique enough to categorize it with certainty.  

To account for this, SC4S provides a means to categorize (filter) events based on sending hostname or IP/CIDR address, well before any payload examination is done. The filter relies on the fact that one characteristic of an incoming event is known with certainty (the source IP address) and another with near certainty (the hostname) prior to deep payload inspection. When the filter (rule) “fires” on an event with a specific hostname and/or source IP address, a lookup file is consulted to set an internal variable to a string unique to the vendor and product type. This variable is in turn used as a (separate) filter in a log path created for that specific vendor/product combination, ensuring the event is routed to the correct log path regardless of the payload contents. We will examine the link between the context filter, the lookup file, and the log path filter in an example below.

First, a caution: this level of customization requires some rudimentary understanding of syslog-ng syntax for the first time, as the rules themselves and the lookup they consult are “live” syslog-ng configurations which are exposed in the local directory tree. If these files are misconfigured, the syslog-ng startup parser will flag the errors, note them at startup, and abort. But fear not, there are well-populated example files that will serve as a template for most use cases, and we will walk through them now. 

Let’s dig in. The primary context files are located in /opt/sc4s/local/context, and consist of two files:

vendor_product_by_source.conf
vendor_product_by_source.csv

The use of these files differs from the typical Splunk “default/local” convention that is used for splunk_metadata.csv. To properly use these files requires that they be complete (in other words, the “example” files are not consulted at all). Therefore, it is best to copy the example files to those with the regular extensions before you start using/editing them.

Here is the vendor_product_by_source.conf file:

You will see that it is a series of syslog-ng filters than can filter on any syslog-ng “macro” (field), but host and netmask are the only ones used because (as discussed earlier) these are the only fields which can be used with certainty before the payload is processed further. These filters (rules) are checked very early in the data path through SC4S, just after the rudimentary syslog parsers are run and before any payload filtering.

In this example, highlighted at line 11 is a filter for Juniper Netscreen devices with hostnames of jpns-* (the “glob” type indicates that this is a wildcard, and not a regex). The netmask in this case is commented out, but if it is known it can be easily applied. You will see that this file gives no indication as to what categorization should be applied — this is where the second (lookup) file, vendor_product_by_source.csv, comes in:

This lookup has the same 3-column structure as the splunk_metadata.csv file used for Splunk metadata, but the fields and values set are quite different and do not directly reference any Splunk metadata. The secret to these files is how the lookup key is used: you will see that the filter name in the conf file (line 11) will match one or more keys in the csv lookup (line 7).  When the filter in the conf file “fires”, this link will set an internal field called fields.sc4s_vendor_product. This variable is in turn used in a simple filter that checks for the value of that variable in a log path created for that specific vendor/product combo, in this case Juniper Netscreen:

You may ask, why not just use the original filter in the main log path (shown immediately above) rather than the indirect approach using an internal field? The reason is to allow the administrator to set hostname/CIDR filters using a minimal amount of (repeatable) syslog-ng code, and more importantly to not require access to the main log path, which is inaccessible inside the container. In this case, line 43 in the main log path (hidden inside the container) never needs to change, but instead is driven by the filters and lookups in the vendor_product_by_source.* files.

Compliance Overrides (Event Subfilters)

There is a second set of context files that are configured similarly to the primary context filters described above. The primary difference is where the filter operation takes place — deep inside a log path, long after the vendor/product combination has been determined. It allows the administrator to configure (again, primarily based on hostname/CIDR block but can also include other syslog-ng macros (fields) that might have been created internally in the log path itself. These overrides (sub-filters) can be very useful in compliance use cases, where a subset of a given device type (sourcetype) must be tagged or redirected.

Again, there are two files in the same /opt/sc4s/local/context/ directory:

compliance_meta_by_source.conf
compliance_meta_by_source.csv

As with the first set of vendor/product context files discussed above, it is best to copy over the “example” versions first before you start. This set of files typically is far shorter than the first set as well, so it may be preferable to simply use the example files as a reference.

Here are the two compliance_meta_by_source files:


You will indeed see that this is dummy data, and also note one other interesting characteristic: the second column in the csv file is not limited to Splunk metadata but can be set to any syslog-ng macro. In fact, it can be used to set an arbitrary indexed field=value pair (e.g. compliance=pci) that can be sent to Splunk, which can be very useful for compliance needs.  As part of this flexibility, however, Splunk metadata entries need to have the internally-used .splunk. prefix applied (e.g. .splunk.index) in order to override the index. These details are covered in the configuration section of the documentation.

Having now completed all configuration steps for existing data sources, we will now turn to our final task in the next installment (Part 4). We will cover the process of extending the SC4S platform with an exploration of the templating process, and the creation of new “log paths” to include data sources not supported “out of the box.” This will be the deepest level of SC4S customization; though syslog-ng experience is helpful, the templating process makes this exercise far easier as you will see!


Splunk Connect for Syslog Community

Splunk Connect for Syslog is fully Splunk supported and is released as Open Source. We hope to drive a thriving community that will help with feedback, enhancement ideas, communication, and especially log path (filter) creation! We encourage active participation via the git repos, where formal request for feature (especially log path/filters) inclusion, bug tracking, etc. can be conducted. Over time, we envision far less “local filter” activity as more and more of the community’s efforts are encapsulate in the containers OOTB configs.

Splunk Connect for Syslog Resources

There are many resources available to enable your success with SC4S!  In addition to the main repo and documentation, there are many other resources available:

We wish you the best of success with SC4S.  Get involved, try it out, ask questions, contribute new data sources, and make new friends!

Mark Bonsack
Posted by

Mark Bonsack

Mark Bonsack is a Principal Sales Engineer at Splunk, and is responsible for Strategic Accounts in the Southwest US region. During his 9-year Splunk career, he has developed a particular interest in data acquisition (AKA "GDI") and has guided Splunk's largest customers in this area. He is a “Brady Bunch” dad: 2 girls of his own, 2 by marriage; all the same age like the Bradys. His professional beach volleyball career didn’t work out, but is usually on the winning team during competitions at Splunk team-building events...

TAGS

Splunk Connect for Syslog: Turnkey and Scalable Syslog GDI - Part 3

Show All Tags
Show Less Tags

Join the Discussion