TIPS & TRICKS

Splunk Connect for Syslog: Turnkey and Scalable Syslog GDI - Part 4

Previous installments of this series have given you the overview and configuration details you need to ingest any source that is supported by Splunk Connect for Syslog and configure customizations and overrides that match your enterprise. This leaves one key capability of SC4S that we have not yet covered, and that is extending the platform itself.

In this installment, we'll walk through the configuration of an entirely new data source – one that SC4S does address out of the box. Let’s dive into to the task of adding support for a new data source to SC4S!

SC4S Development Overview

Prerequisite Tasks

Prior to launching an effort to create a log path (filter) for a new device, you’ll want to gather answers to some key questions:

  • Is the device entirely new, or is it a relative of an existing one (such as a new Cisco device that is based on IOS)? If the latter, it is best to raise an issue on Github so that the Splunk development team can add this device/format to the existing vendor/device family support.  SC4S is developed rapidly, so even a few weeks’ time will see new device support out of the box.
  • Is the device or data source common in industry? Though we now have support for most of the common data sources, by no means does SC4S cover everything. If the device is commonly used (e.g. it is a firewall from a known vendor vs. syslog from an internally developed app), then it might be best to also raise an issue (and perhaps contribute code/PR) rather than embark on a totally custom effort.
  • Has the community developed a solution for the data source? Asking on the relevant Slack channels is always a good first step.
  • Is there an existing Splunk App and/or TA for this device? This is a critical question to answer, as it governs what the output format (to Splunk) needed to “match up” with what the TA expects. Check Splunkbase and vendor documentation for confirmation.
  • Is there good documentation on the output format the device uses? Most vendors do not document the “on-the-wire” (raw syslog message) properly or at all, so you will have to collect this on your own (which we will cover in this section).

The answers to the above questions will guide you as you extend the platform with a new log path. Let’s start with some background on what the overall structure of a syslog-ng configuration looks like, as an understanding of this will be necessary for crafting a log path.

Syslog-ng config file structure

Here is the overall structure of a syslog-ng configuration file. The syslog-ng configuration file syntax, itself a programming language, offers a myriad of ways to “skin the cat." But all follow this same basic scheme:

In SC4S, most of this is abstracted via the configuration mechanisms described in Part 3, which frees the administrator from understanding the nuances of the syslog-ng syntax. But the structure of a log path will become immediately apparent when developing a new one, and an understanding of how the parts fit together is crucial.

Syslog-ng event flow

In a typical syslog-ng configuration — including SC4S — there will be several log paths, one for each “flavor” of event (device). These event formats are typically set by the vendors themselves and should comply with published (RFC 3164 or RFC 5424) syslog standards, but many have deviations from these standards which must be taken into account in the log paths.

Events flow from top to bottom in the final config file, with each one getting tested by the filters in each log path to see if that event “belongs” there. Though the diagram above shows filters as a separate entity, in reality all stages (except for the final destination directive) act as a filter (or test). This means the source block itself acts as a filter; if the source block says, “Collect on UDP port 5000” and the event shows up on UDP port 514, that log path will not be used for that event. Similarly, message parsing can also act as a filter (if desired) and exclude events from that log path if the parsing fails. If the event “survives” this long in a log path, it is ultimately sent to one or more destinations of the administrator’s choice.

There is nothing preventing an event from matching more than one log path, but we discourage this in SC4S and indeed configure each log path with a flag to terminate processing if the event is successfully processed by a given log path. In this case the “first one wins” and is the only place in syslog-ng where the names of the log path filenames matter: log paths are processed in lexicographical order (essentially alphabetically) by filename.  Internally, SC4S uses appropriate filenames to force certain log paths to be ahead (or behind) all others, forcing a “winner” should more than one log path “fire” based on the filtering alone. This technique is used only for fallback (catchall) and “null queue” log paths, so it should not affect any log paths developed in the field.

We will now explore a key feature of SC4S, which is fundamental for log path creation. Remember early in Part 3 where we discussed the environment variables in env_file? If we specify a HEC endpoint URL and token with an environment variable there, how does that get translated into the syslog-ng syntax, which effectively must be “hard coded”? The answer is templating, a key part of abstracting the underlying syntax from the administrator, and a key part of making log paths far easier to create.

Log Path Templates (gomplate)

Syslog-ng syntax is very strict, and while it is close to a full programming language, it is missing some key constructs, in particular the ability to interact with the running environment and adapt its configuration based on conditional testing of environment variables. When syslog-ng is instantiated, the configuration must be solidified at runtime. Therefore, SC4S needs a mechanism to allow SC4S to dynamically build this fixed configuration just prior to the launch of syslog-ng. Enter “gomplate” or “go templates”.  

The templating process allows for environment variables to dictate the final syslog-ng configuration used by SC4S. Consider this environment variable:

SC4S_SOURCE_UDP_SO_RCVBUFF=33554432

How does the value of this variable work its way into the final config? The key is templating; the syslog-ng source config inside the container is not hard coded, but instead looks like this for the UDP receive buffer:

so-rcvbuf({{getenv "SC4S_SOURCE_UDP_SO_RCVBUFF" "1703936"}})

Everything inside the double curly brace pairs is part of the template (which itself is its own language) and is used to conditionally insert configuration elements based on environment variable settings. The result is the following final configuration that replaces the default value (1703936) with the value from the variable:

so-rcvbuf(33554432)

The example above is a simple substitution; indeed, more complex conditional replacements can also be made:

{{- if or (conv.ToBool (getenv "SC4S_ARCHIVE_GLOBAL" "no")) (conv.ToBool (getenv "SC4S_ARCHIVE_CISCO_ASA" "no")) }}
    destination(d_archive);
{{- end}}

This construct inserts the alternate archive destination into the configuration if either of the ARCHIVE variables are set to “yes”. If neither of the variables are set at all in the env_file, or are both set to “no”, the text specified between the “if/end” conditional is not inserted into the configuration.

We will now turn our attention to the process of creating a log path, which makes heavy use of the templating process described above. But there are a few items we must take care of prior to writing the log path which will aid us in the process. A critical step — after determining that a log path is indeed necessary by checking the “Prerequisite Tasks” at the beginning of this section — is obtaining a suitable raw event to work with.

Raw Data Collection

You may have experience with collecting raw data samples when configuring SC4S, particularly if events land in Splunk with the wrong metadata (sourcetype, etc.). This task is critical for log path development and should be the first technical step taken. A number of options exist for this; the two most common being tcpdump and using SC4S itself. The details are documented here, and will be briefly summarized below and will be reviewed in our walkthrough:

  • Run tcpdump with appropriate options on the SC4S host and look for the telltale <PRI> string (e.g. <134>) and the text that follows. The message will likely occur between “garbage” characters, but should be clear enough to pick out, particularly the most important part which is the header.
  • Set the variable SC4S_SOURCE_STORE_RAWMSG=yes and look for the RAWMSG field in Splunk when a sample event is sent to SC4S. When you view the event in Splunk (which should have one of the two fallback sourcetypes – sc4s:fallback or nix:syslog) you will see other fields as well; these will be very useful when developing our log path.
  • To “play back” a raw event (either gathered from the device itself or supplied out of band), simply run
    echo "<contents of RAWMSG>" > /dev/udp/<SC4S IP addr>/514

We now have the data we need to walk through an example log path. Let's dive in!

SC4S Log Path Development Walkthrough

Log Path Development: Initial Process

There are two types of log paths in SC4S: "Simple" and "Traditional".  Simple log paths can be used when the device is capable of sending on a unique port and minimal, protocol-only parsing is sufficient to determine a single sourcetype for the event. These can be configured entirely via environment variables, and do not require the development of a dedicated log path. More details on simple log paths can be found here.  

On the other hand, if the device family requires multiple sourcetypes (e.g. Palo Alto), a traditional log path with more comprehensive parsing must be developed. As part of our walkthrough, we will determine if our new device can be supported with a simple log path, or whether a traditional one needs to be developed.

We will use the Stealthbits StealthINTERCEPT product as an example for our new log path. The configuration of the App and the device for syslog operation is typical in that a raw, "on the wire" sample is not provided. Therefore, tcpdump or SC4S must be used to obtain a raw sample as outlined above.  

The following steps for creating a log path that will support the StealthINTERCEPT device family will be outlined below:

  • Use the raw sample to see what the event looks like in Splunk, with minimal processing and no dedicated log path applied.  
  • In consultation with the vendor and Splunk App/TA documentation, determine whether the minimal fields provided by the initial syslog protocol decode phase will allow a simple log path to be used, or whether a traditional log path will need to be developed.
  • If a simple log path will suffice, configure the unique port and metadata entries appropriately. The log path will then be created automatically at SC4S startup.
  • Otherwise, a create a custom log path for the new device based on the provided example template.

Let's begin by looking at the raw sample in Splunk (either by listening to a real device or using the "echo" command outlined above):

Send the event to SC4S (edited for brevity):

echo "<37>`date +\"%b %d %H:%M:%S\"`"'.986 stealth-host
 StealthINTERCEPT - Authentication Auth failed -
 PolicyName="StealthDEFEND for AD" Domain="TDOMAIN"
 Server="TDOMAIN-DC" ServerAddress="10.2.8.55" Perpetrator="MarkB"
 ClientHost="AP34.TEST.COM" ClientAddress="10.2.8.55"
 TargetHost="TDOMAN-DC.TEST.COM" TargetHostIP="10.135.33.7"' >
 /dev/udp/sc4s.test.com/514

Here is the event in Splunk; you can see that RAWMSG is turned on:

A couple of things stand out here:

  • The message has a valid <PRI> string, a timestamp, and what appears to be a hostname in the header. That bodes well for the event being at least minimally RFC-compliant. To verify this, you can check the indexed fields in Splunk: ; indeed, this event is RFC 3164-compliant.
  • There is another field (syslog-ng "macro") called PROGRAM that could be very useful in a filter – or perhaps as a sourcetype for the event? A quick check of the Stealthbits documentation shows that this field indeed can vary – so it could be useful to assign a sourcetype to the event. Another quick check of the TA shows that SC4S will need to deliver more than one sourcetype. So, this rules out our simple log path, and a traditional log path will need to be developed. Let's continue:

Log Path Development Walkthrough

Now that we know that we need to develop a traditional log path, where do we start? Recall the directory structure outlined above. You'll see the directory

/opt/sc4s/local/config/log-paths

There, you will see two files:

lp-example.conf
lp-example.conf.tmpl

The log-path directory (along with the others in the config directory) are all "live" syslog-ng configuration files and are included in the overall configuration when the container (and underlying syslog-ng process) are run. Therefore, the syntax must be perfect or the whole affair will fail to start. For this reason, many of the .conf files are not edited directly, but rather through their .tmpl variants. In addition to variable substitution and conditional substitutions discussed in Part 3, the templating process allows us to abstract complex parts of the log path that need not be exposed (and configured) by the administrator.

Let's take a look at the lp-example.conf.tmpl file. Don't worry if the specific example file in your particular SC4S release differs from the one used for these screenshots. The example file periodically changes as SC4S is enhanced and refined, and will have indeed undergone changes to simplify the structure as you read this. Focus on the overall structure while reviewing the steps below; they will remain consistent regardless of the specifics of the file.

You will see that, first of all, the file is overly commented for instruction.  The number of executable lines that result will actually be quite small.  Second, though you will unlikely have a context-aware text editor (e.g. Sublime in "C" language mode) on your SC4S server, it helps to use one off-box when initially creating your template file from the example file above for help with syntax checking.

Let's walk through the steps needed to convert this "example" file into a log path that works with a real device, in this case our "Stealthbits" example. The following steps will prepare the new log path for customization specific to the device:

  1. The first thing to do (and this is a biggie): Copy the example file to a new template filename that will be used for your log path (e.g. lp-stealthbits.conf.tmpl). Do not edit the example file directly!
  2. Note the difference between "gomplate" (templating) comments, and regular syslog-ng ones (which have traditional bash-style hashtags). In most cases, you will retain the syslog-ng comments, but templating comments will rarely be used.
  3. The first section (lines 1-26) is largely an instructional comment block that focuses on a key aspect of how to use the example file: Key phrases (e.g. LOCAL_EXAMPLE) in this file will be replaced with a phrase that represents the vendor and product we are interested in. In accordance with the "vendor_product" convention used throughout SC4S, we will use the phrase STEALTHBITS_INTERCEPT.
  4. Similarly, replace all occurrences of the lower-case version of the same string (local_example) to stealthbits_intercept.

After these string replacements, we will now look at each section in turn. Careful examination of the snippets in the sections below will show the string replacements compared to the full screenshot of the unaltered example file above.  

In accordance with the outline at the beginning of this section, we will dissect this file into four main sections:

  • Source
  • Filter
  • Parsing/metadata
  • Destination

Source

Let's start with the "Source" portion of the log path. Starting at line 28, we see:

These two templated lines do a most of the work for you, courtesy of the templating process. Just the main STEALTHBITS_INTERCEPT string, which forms the "root" of environment variables used to set unique listening ports, and a "parser" value (set to match the high-level structure of events.  If you don't know, use "common") is all that is needed to create a custom source declaration for your device. Keep in mind this section will create the source declaration – i.e. the function that is called from within the log path as we'll see below.

Filter

Next, we will explore the beginning of the log path itself, starting at line 32 (line 34 truncated):

You will see that each event will take two parallel pathways through a filter "junction", which then "merge" after all "channel" elements are traversed.  In the top channel, the newly created source called s_STEALTHBITS_INTERCEPT is checked. If any events arrive on that source (unique port), it is passed through to the remainder of the log path with no further filtering. If, on the other hand, the event arrives over the default s_DEFAULT port (typically UDP or TCP 514) you can see that there is an additional filter (f_stealthbits_intercept) that must also match, or the event will not be "allowed in" to this log path. This filter can be declared (just like the custom source) immediately above the log{} stanza in the log path file, or it can be included in a separate file as shown below. Like the source, it is just a function with the name f_stealthbits_intercept:

Take a look at lines 2 and 3. Remember the raw message screenshot in Splunk (above)? Look for the field PROGRAM. This is an example of how the initial parsing pass of syslog-ng can be extremely useful for building filters in log paths, and lines 2 and 3 show how this field ("macro" in syslog-ng parlance) is checked to see if it matches the two values shown. You'll also see in the following section below how the same macro can be very useful when assigning sourcetypes – which the TA will expect.

Parsing/Metadata

In this phase of the development, the full complement of the syslog-ng config programming language can be brought to bear. While you can extensively parse the full event payload and even go as far as complete field extraction a la Splunk itself, it is best to limit the parsing to just the Splunk metadata that will need to be sent along with the event to be indexed. This metadata includes the normal index, time, host, source, and sourcetype. Note that time is included in this list; we want to ensure this is properly parsed before it gets to Splunk, as timestamp processing is bypassed (by default) with the /event HEC endpoint used by SC4S.

Here is the "guts" of the log path, where this metadata assignment is done:

Several rewrite functions are made available to the developer as shown, so even this section can be "plug and play" for most log paths. The defaults for all Splunk metadata are set using the rewrites on lines 56 and 61 – and which rewrite is used is dependent on the value of the macro PROGRAM. Again, the initial syslog-ng parsing has been put to good use here. You can see that the sourcetype is set when these functions are called, but none of the other metadata is. This is because the other metadata (host, time, and source) are typically set at ingest time (in the source declaration), and do not need to be specifically set (or overridden) here.

But what about the index? We typically don't want to default that, so where is that set? It is done on line 66 – which is the parser that consults the splunk_metadata.csv file discussed in Part 3. The sole argument passed in that function is the key that the developer assigns (again, using the vendor_product convention). Similarly, the compliance_meta_by_source.* files are referenced in the parser called on line 70, and is the last lookup consulted before the event is sent out to one or more destinations, described next.

Destination

After having all variables set (including several indexed fields derived from the initial syslog-ng parsing), the event is ready to be sent to one or more destinations. These destinations are heavily controlled by environment variables, which in turn means several templating constructs. This section of the file is shown below:

The final step in preparation for sending out is the setting of the output template, shown on line 75. An appropriate default template (based largely on what the TA expects) is chosen. These templates are all documented and are constructed from the various syslog-ng macros (PROGRAM, MESSAGE, etc.). This default can be overridden via splunk_metadata.csv if desired.

Finally, in lines 81 through 102, environment variables are consulted and appropriate destinations are added to the final config file. The log path then ends with two flags – one to tell syslog-ng to flow-control TCP traffic if necessary (UDP cannot be flow controlled), and the other to cause the event to not enter any other log path, but instead to terminate further processing.

Final Output

So what does this all look like when the template file is passed through the gomplate templating engine? Here is the output after template processing of the example above, with the following variables

SC4S_LISTEN_STEALTHBITS_INTERCEPT_TCP_PORT=5015
SC4S_DEST_GLOBAL_ALTERNATES=d_hec_debug

set in the env_file:

First off, you will see all the "curly brace" gomplate code is now gone throughout. The file starts off with the source declaration (which is only 3 lines of "gomplate" code), and has expanded to several lines (11-48) in the final output. The source declaration handles much of the initial metadata and preparation of indexed fields, as well as the setup of the listening socket on TCP port 5015 based on the env_file setting. The filtering and parsing sections (lines 50-87) pass through the templating engine relatively unchanged, while the destination section (which was several lines of gomplate code) is effectively reduced to 3 lines of code in the final output (lines 91-93) and includes the d_hec_debug destination, again as a result of the env_file setting.

Final Result in Splunk

Here is the final result as it appears in Splunk. Note that the output format is no longer JSON (as is the case with "fallback" events) but is simply the original event minus the header (<PRI> string, host, and timestamp). This is what most TAs expect.

Development Tips

Here are some tips that are helpful during development:

  • Ensure you configure the source device (or capture logs) using the format the vendor and/or TA author recommends. This is critical prior to beginning any development work, as it likely will affect even the early syslog protocol parsing. In addition, the setting(s) on the source device should be documented in the relevant SC4S source document (if adopted as a supported source).
  • If you plan on writing log paths for several custom sources and wish to contribute back to the community, consider using a full IDE with the pytest suite that is provided on github. It is a good practice to write the test before the code, so you know if your log path is functionally correct and doesn't impact the rest of the code base.
  • Test your new log path as early as you can. After simply replacing the initial LOCAL_EXAMPLE strings and creating your filter, test with some events to see if the log path is being entered and is assigned at least one sourcetype.
  • To send a raw event to sc4s, run this simple command on any linux box:
    echo "<raw event text>" > /dev/udp/<sc4s host IP or name>/514
    • Having the raw event is critical; the entire log path/filtering depends on this being correct.  Do not make any assumptions; collect from a real device!
  • When development is coming down to just a few details, it is often helpful to "exec in" to the container and simply "HUP" the syslog process to see the effects of your changes. In this case (and this is the only case) you will be editing conf files directly and, when satisfied, "backing in" the changes to the tmpl files. "Exec'ing" in also allows you to view what the supported log paths look like; there's some great stuff in there you can copy for your own log paths – date parsers, csv/kv parsers etc. that can be very useful.
  • If you are sure you made a change and nothing happens when sc4s is restarted, you can be pretty sure you edited the conf file directly and not the tmpl file. Remember, conf files are ephemeral and get rebuilt at every restart.

We realized the above is a "whirlwind tour" and many details were glossed over, particularly the nuances of the syslog-ng configuration syntax itself. The community is here to help you with any questions or design challenges you may have! Good luck!


Splunk Connect for Syslog Community

Splunk Connect for Syslog is fully Splunk supported and is released as Open Source. We hope to drive a thriving community that will help with feedback, enhancement ideas, communication, and especially log path (filter) creation! We encourage active participation via the git repos, where formal request for feature (especially log path/filters) inclusion, bug tracking, etc. can be conducted. Over time, we envision far less “local filter” activity as more and more of the community’s efforts are encapsulate in the containers OOTB configs.

Splunk Connect for Syslog Resources

There are many resources available to enable your success with SC4S!  In addition to the main repo and documentation, there are many other resources available:

We wish you the best of success with SC4S.  Get involved, try it out, ask questions, contribute new data sources, and make new friends!

Mark Bonsack
Posted by

Mark Bonsack

Mark Bonsack is a Principal Sales Engineer at Splunk, and is responsible for Strategic Accounts in the Southwest US region. During his 9-year Splunk career, he has developed a particular interest in data acquisition (AKA "GDI") and has guided Splunk's largest customers in this area. He is a “Brady Bunch” dad: 2 girls of his own, 2 by marriage; all the same age like the Bradys. His professional beach volleyball career didn’t work out, but is usually on the winning team during competitions at Splunk team-building events...

TAGS

Splunk Connect for Syslog: Turnkey and Scalable Syslog GDI - Part 4

Show All Tags
Show Less Tags

Join the Discussion