TIPS & TRICKS TIPS & TRICKS

Syslog-ng and HEC: Scalable Aggregated Data Collection in Splunk

While significant development in new data collection mechanisms such as pub/sub (Kafka, Google), and other API- and REST-driven methods continues, traditional syslog still plays a major role in IT Operations. Many network and infrastructure devices, operating systems and applications continue to use the syslog standard for system and application logging. Over the years, the Splunk community has contributed many best practices for ingesting syslog with Splunk.  I will briefly review these best practices in our overview of a new aggregated data collection architecture below, but I do recommend that you refer to the links below for more detail on the topic and why, in particular, it is best not to send “514” traffic directly to Splunk:

  • A good overview of Splunk and syslog, which includes other (necessary) items such as proper DNS setup and naming conventions
  • Another overview on syslog-ng and Splunk, with the emphasis on setting up syslog-ng and the attendant Universal Forwarder
  • An older posting on the rudiments of syslog-ng, with a good line-by-line explanation of the config file

Challenges of high-volume data collection from aggregated data sources

Many organizations have set up syslog-ng (or rsyslog) installations, which are collecting a significant portion of the “data center exhaust” typical in large data centers. The rising data volume from these centralized syslog-ng servers has introduced a new challenge:  effectively distributing the centrally-collected data to multiple Splunk indexers. 

Background:

In a typical Universal Forwarder (UF) setup, there is a “many-to-one” relationship between the forwarders and the indexers, with potentially many thousands of forwarders forwarding data to the indexers using “AutoLB” and the Splunk-toSplunk (S2S) protocol. This is an effective technique when there are a large number of forwarders supplying a significantly smaller set of indexers, in the familiar “pyramid” architecture:

This breaks down, however, when instead of many forwarders at the bottom, there are only one or two – resulting in an “inverse pyramid”:

This creates a situation in which a single UF is required to “spray” the data evenly to all indexers, which is extremely challenging as the data rate rises. In that case, one of the indexers is “drinking from a firehose” when the UF “AutoLB” load balancing algorithm selects it as a data destination. This impacts performance in two ways: 1) preventing it from participating in searches when attending to “extreme” data ingest, and 2) uneven data distribution, which in turn prevents even indexer participation in searches.

A new data collection architecture

Ryan Faircloth has blazed the trail here and written an excellent blog post that introduces a new aggregated data collection architecture, and has also provided a python “destination” script for rsyslog (the other of the two common, open-source syslog packages). The new architecture alters the architecture above in a significant way:  Instead of the single UF attempting to “AutoLB” to all indexers, it omits the UF layer entirely.  In this revision, the rsyslog server (with the attendant script) writes directly to the indexers via Splunk HTTP Event Collector (HEC).  This allows a traditional load balancer, which is optimized for http traffic, to load balance the data far more effectively than using the S2S protocol provided by AutoLB (below).

 

 

syslog-ng Integration with HEC

While Ryan’s script and blog focus on rsyslog (which is included in most Linux distros), others prefer syslog-ng due to its easier-to-understand configuration and, as you’ll see shortly, a potentially tighter integration with HEC.  If you are not familiar with setting up HEC on the Splunk side, here is a good introduction.  Let’s now take a look at how to set up HEC with syslog-ng, and specifically how to set up syslog-ng “destinations” to send data directly to indexers (and optionally bypass local disk storage altogether – though there are still sound reasons to continue to do that).

Here is a basic syslog-ng.conf file, which is broken into 5 basic parts:

 

# Global Options
options {
        # sync (40);
        time_reopen (10);
        time_reap(5);
        long_hostnames (off);
        use_dns (no);
	}

# Sources
source s_syslog {
        udp(ip(0.0.0.0) port(514));
        tcp(ip(0.0.0.0) port(514));
};

# Destinations
destination d_checkpoint { file("/var/log/data/checkpoint" create_dirs(yes)); };
destination d_asa { file("/var/log/data/asa/$HOST.log" create_dirs(yes)); };
destination d_all { file("/var/log/data/all.log" create_dirs(yes)); };

# Filters
filter f_checkpoint     { host("10\.64\.8\.79") and match("kernel" value("PROGRAM")); };
filter f_asa            { match("%ASA" value("MESSAGE")); };

# Log
log { source(s_syslog); filter(f_checkpoint); destination(d_checkpoint); };
log { source(s_syslog); filter(f_asa); destination(d_asa); };

 

The first section, as the name implies, sets up global parameters (such as buffer sizes, whether DNS resolution is used or not, etc.) that are used throughout. The following three sections (source, destination, and filter) are used to build tuples that define from where logs are collected and where they go based on filter matches. The last section, the “log” section, builds the tuples that route the traffic. It is this arrangement, along with the large number of destination choices and comprehensive filtering, that make syslog-ng so powerful, especially when combined with the standard log file input of the Universal Forwarder. Again, the basics for the setup of syslog-ng with traditional forwarding, using the concepts above, are outlined in earlier blogs.

The http() and program() destinations in syslog-ng

The traditional UF setup with syslog-ng uses the standard file() destination as shown above.  However, there are a vast array of destination choices in syslog-ng besides files, and one of the newer (version 3.7 and above) syslog-ng destination choices is perfect for Splunk and HEC:  http().  It is a simple matter to add the following to the syslog-ng.conf file, which specifies an http destination:

 

destination d_http1 {
    http(url("http://172.17.0.3:8088/services/collector/raw")
        method("POST")
        user_agent("syslog-ng User Agent")
        user("user")
        password("00000000-0000-0000-0000-000000000000")
        headers("X-Splunk-Request-Channel: FE0ECFAD-13D5-401B-847D-77833BD77131")
        body("${ISODATE} ${HOST} ${MSG}")
    );
};

 

This specifies a destination to a “raw” HEC endpoint using the token in the “password” argument. When combined with appropriate sources and optional filters, selected syslog traffic will be routed to Splunk indexers directly via HEC, bypassing the forwarding layer entirely. The body, of course, is the actual traffic which can be formatted with syslog-ng’s full complement of macros and templates. The header can be used to pass source, sourcetype, and index information as well.

Now, here’s where it can get fun. You can also use the “event” HEC endpoint as well:

 

destination d_http2 {
    http(url("http://172.17.0.3:8088/services/collector/event")
        method("POST")
        user_agent("syslog-ng User Agent")
        user("user")
        password("00000000-0000-0000-0000-000000000000")
#       headers("X-Splunk-Request-Channel: FE0ECFAD-13D5-401B-847D-77833BD77131")
        body("{ \"time\": ${S_UNIXTIME},
                \"host\": \"${HOST}\",
                \"source\": \"${HOST_FROM}\",
                \"sourcetype\": \"mysourcetype2\",
                \"index\": \"main\",
                \"event\":  \{ \"message\": \"${MSG}\",
                               \"msg_header\": \"${PROGRAM}:${PID}\",
                               \"severity\": \"${LEVEL}\",
                               \"eventSource\": \"${.SDATA.example.eventSource}\",
                               \"date_time\": \"${ISODATE}\" \}
              }")
    );
};

 

Here, the body contains the full JSON payload which is expected by the “event” endpoint. Note the “headers” is commented out in this case, as the metadata is now passed in the main body, and a UUID is not needed. This destination is particularly flexible, as individual portions of the full RFC 5424 message can be passed as fields in the “event” section. Note the host from where the message originated (host) can be, and often is, different from the directly-connected host from which the message was received (source). This is common when there are intermediate syslog-ng servers in the architecture. The Splunk output from sending the following syslog message:

 

<165>1 '`date -u +"%Y-%m-%dT%H:%M:%SZ"`' sender.computer.org evententry - ID47 [example iut="3" eventSource="Application" eventID="1011"] Test message

 

 is shown below: 

Keep in mind the use of the event HEC endpoint, which looks great in Splunk and automatically extracts fields, may not be appropriate if traditional syslog-based TAs are used, such as those for Cisco or Palo Alto.  In this case, simply use the raw HEC endpoint and craft a message body consistent with what the TA expects (using an appropriate syslog-ng template; usually just the date, host, and message). 

There are also two main caveats to using either the raw- or event-based http() destinations above. The first is that the http() destination cannot use SSL unless a Java plugin is used on the syslog-ng side. This is well documented in the syslog-ng admin guide. The second, and more critical is that each syslog message is sent as one http “POST” query at a time, which will create scale issues at high volumes.  For those cases where scalability is paramount (and surely that’s one, if not the sole driver for this exercise) we must offload the http processing to an external script, which enables multiple messages to be sent in a single “POST” query (batch mode) to the HEC endpoint.   Syslog-ng again makes this simple, and can be configured to send messages to external scripts via the program() destination, in a manner very similar to rsyslog: 

 

destination d_http3 { program("/usr/local/bin/omsplunkhec.py 00000000-0000-0000-0000-000000000000 172.17.0.3 --sourcetype=syslog_tcp --index=main --ssl --ssl_noverify" template("host=${HOST} <${PRI}>${DATE} ${HOST} ${MSG}\n") ); };

 

In this case, we use the program() destination (rather than http()) to simply send the messages to an external script listening on stdin. The omsplunkhec.py script referenced in this destination is freely available in Ryan’s bitbucket. It can be run from either rsyslog or syslog-ng without changes, and provides considerable “knobs” for batch size, etc. that can tune the message processing for high-volume, highly-centralized aggregation. In this case, source, sourcetypes, index, etc. are all passed as arguments to the script, and again the message body can utilize all macro and template features of syslog-ng. Lastly, SSL is fully supported. 

Here is the output in Splunk from the same syslog message used above, but this time processed using the d_http3 program() destination:

You’ll note that one of the “casualties” of using batch mode via program() is the loss of automatic host metadata setting (which you can set on a per-event basis via the syslog-ng ${HOST} macro in the event-based (d_http1) destination above).  In this case, you will need to employ traditional event-based host override via props/transforms on the indexers, or use the technique shown above to pass the originating host as part of the payload, where the value can be derived via traditional field extraction (original_host). The omsplunkhec.py script could also be enhanced to provide event mode (currently it uses only the “raw” HEC endpoint); that is left as an exercise for the reader. 

Tying it all together

With the foundation from earlier best practices on syslog-ng and Splunk as a base, we can now add direct communication with Splunk indexers via HEC to the configuration via the http() and/or program() destinations. Keep in mind the traditional file-based destinations can be still be used for some or all data, and as noted previously there are arguments for writing to both the local filesystem and directly to Splunk over the network.  Below is a complete syslog-ng config file which includes these elements:

 

# Global Options (many are missing here; check your needs carefully)
options {
        # sync (40);
        time_reopen (10);
        time_reap(5);
        long_hostnames (off);
        use_dns (no);
	}

# Sources
source s_syslog {
        udp(ip(0.0.0.0) port(514));
        tcp(ip(0.0.0.0) port(514));
};

# File-based Destinations
destination d_checkpoint { file("/var/log/data/checkpoint" create_dirs(yes)); };
destination d_asa { file("/var/log/data/asa/$HOST.log" create_dirs(yes)); };

# HEC-based destinations

destination d_checkpoint_hec {
    http(url("http://172.17.0.3:8088/services/collector/raw")
        method("POST")
        user_agent("syslog-ng User Agent")
        user("user")
        password("00000000-0000-0000-0000-000000000000")
        headers("X-Splunk-Request-Channel: FE0ECFAD-13D5-401B-847D-77833BD77131")
        body("${ISODATE} ${HOST} ${MSG}")
    );
};

destination d_asa_hec {
    http(url("http://172.17.0.3:8088/services/collector/event")
        method("POST")
        user_agent("syslog-ng User Agent")
        user("user")
        password("00000000-0000-0000-0000-000000000000")
#       headers("X-Splunk-Request-Channel: FE0ECFAD-13D5-401B-847D-77833BD77131")
        body("{ \"time\": ${S_UNIXTIME},
                \"host\": \"${HOST}\",
                \"source\": \"datasource\",
                \"sourcetype\": \"cisco:asa\",
                \"index\": \"main\",
                \"event\":  \{ \"message\": \"${MSG}\",
                               \"msg_header\": \"${PROGRAM}:${PID}\",
                               \"severity\": \"${LEVEL}\",
                               \"eventSource\": \"${.SDATA.example.eventSource}\",
                               \"date_time\": \"${ISODATE}\" \}
              }")
    );
};

destination d_checkpoint_hec_batch { program("/usr/local/bin/omsplunkhec.py 00000000-0000-0000-0000-000000000000 172.17.0.3 --sourcetype=checkpoint --index=main --ssl --ssl_noverify" template("host=${HOST} <${PRI}>${DATE} ${HOST} ${MSG}\n") ); };

destination d_all { file("/var/log/data/all.log" create_dirs(yes)); };

# Filters
filter f_checkpoint     { host("10\.64\.8\.79") and match("kernel" value("PROGRAM")); };
filter f_asa            { match("%ASA" value("MESSAGE")); };

# Log
log { source(s_syslog); filter(f_checkpoint); destination(d_checkpoint); };
log { source(s_syslog); filter(f_checkpoint); destination(d_checkpoint_hec); };
log { source(s_syslog); filter(f_asa); destination(d_asa); };
log { source(s_syslog); filter(f_asa); destination(d_asa_hec); };

 

Hopefully this discussion has shown the power and flexibility of syslog-ng when paired with HEC, as well as given you the foundation for further exploration and testing for your large-scale aggregated data needs. There are many “knobs” to tweak, both in syslog-ng and Ryan’s batch script, and I have by no means explored the limits of scale with this setup!

 

 

Mark Bonsack
Posted by Mark Bonsack

Mark Bonsack is a Staff Sales Engineer at Splunk, and is responsible for Named Accounts in the “SoCal” region.  He has an extensive background in Security and Network/IT Operations during stints in the largest technology concerns, as well as more than a few 10-person startups.  He is a “Brady Bunch” dad:  2 girls of his own, 2 by marriage; all the same age like the Bradys. His professional beach volleyball career didn’t work out, but is usually on the winning team during competitions at the Splunk team-building events...

Join the Discussion