TIPS & TRICKS

Reliable syslog/tcp input – splunk bundle style

Important Update as of 6/5/2020: Splunk has released Splunk Connect for Syslog (SC4S) and solution for syslog data sources. More information can be found in our blog post, here


Wanted to drop this someplace for feedback.
Splunk is often hooked up to syslog(ng) or tcp ports.
Customers then shoot data as fast as they can at splunk.

 

You can have splunk buffer inputs or have the sender buffer but in many cases this is less than optimal – Its usually not a good idea to rely on sender side buffering.

As an interesting alternative you can use a splunk bundle to catch data off the network port and spew it to a file(s) and have splunk tail those files at its leisure. If splunk can keep up it will be seconds before you can search it. If you get a huge burst, no problem the bundle will just go to disk and splunk we be right behind. Furthermore, if someone wanted to restart splunk ( or splunk were to crash – yes it happens ) then again, just going to disk.

The advantage in making/using these scripted imput bundles (same mechanism as monitoring and imap) is that the code is usually in scripts ( perl,sh,python,ruby ) thus on-the-fly mods in the field are easier than filing a support enhancement request with splunk and waiting for someone to compile it into the product. I often give the ( perhaps poor ) analogy that our scripted input bundles are to splunk what cgi was to early webservers. They are a great place to do anything and when we see enough of them in the field we can better build it into the server.

This bundle will use scripted input to listen on a port and cut files of up to a given size. The bundle is configured to keep up to a certain number of these files before deleting the oldest. For example, you can configure it listen on port 9999 and make up to 5 files 800M in size before deleting the oldest data. Then splunk can tail the the directory where the files are being spun. If there is a burst or splunk is stopped / goes down the data will be there when splunk frees up.

Installing / Configuring is easy:

  • Download this, untar it, and drop it in $SPLUNK_HOME/etc/bundles
  • Configure the inputs.conf
    • Change nc -l -p 9999 to whatever port you want
    • Source / sourcetype are not important here – we will create new files
    • Change the tail:: stanza to the location where you want to pick up the spun files
    • Change the path variable at the top to spew where you want
    • Change maxfilesize and buffersize to control size of files to roll – keep in mind that its m*n
    • Change numfilestokeep to control how many of the folled files to keep
    • Save restart…
  • Here is the inputs.conf contents.

    [script://./bin/rolling-netcat -l -p 9999]
    interval = 30
    sourcetype = syslog
    source = syslog-tcp
    disabled = false
    --
    [tail://$SPLUNK_HOME/var/spool/tcptofile]
    sourcetype = syslog
    disabled = false

    Configure where and how much to spew by editing the file etc/bundles/tcptofile/bin/rolling-netcat
    I know it sucks to edit the source of bundles and we are working on fixing this ASAP….

    The following lines are near the top of the file:

    path="$SPLUNK_HOME/var/spool/tcptofile"reconnect_timeout=10
    maxfilesize="28k"
    blocksize="8k"
    numfilestokeep=5

    The default is to cons up 400M files and 5 of them before rm-ing the oldest.
    The default is to also write to $SPLUNK_HOME/var/splunk/tcptofile – the scrip does not create this directory if its not there. Maybe it should.

To Troubleshoot:
Most likely problem is that the netcat line in the inputs.conf is wrong since netcat changes from platform to platform.
Best thing to do before trying is man nc on your host and correct the part “nc -l -p 9999″

The way to test and fix is…..

  • Test the nc command by hand by running “nc -l -p 9999″ and make sure it “connects”
  • If that works then run the script ./rolling-nc by hand
  • Then do the opposite nc side run nc localhost 9999
  • it should connect – if it does not then rolling-nc is configure wrong – if it does connect – start typing.
  • Check the file where you should be spewing – the stuff you typed in should be there 😉

Like i said, if you have problems, it most likely the “nc -l -p 9999″ and you should try that on the command line.
Again, not sure what folks think about this. I like it as its simple and external to splunk wasn’t sure if i was alone.

Let me know what you think.
e

Here is the content of the script for easier review:

#!/bin/sh
set -u

# This script will take input from a port and spew it to file(s)
# It will write $numfilestokeep files of size $blocksize * $maxfilesize
# Once it rolls the $numfiletokeep files it will rm the oldest
path=”$SPLUNK_HOME/var/spool/tcptofile” #where to write the data
reconnect_timeout=10 # how often to retry if incomming stream stops
maxfilesize=”28k” # this is the number of blocks to write before rolling
blocksize=”8k” # self explanitory – this time the count above is size of files
numfilestokeep=5 # how many files to keep – this script will remove rolled files byond this number

self=”`basename $0`”
if [ $# -eq 0 ]
then
echo 1>&2 “usage: $self {netcat arguments}”
exit 1
fi

exec >”$self.err.log” 2>&1 < /dev/null ( while : do echo 1>&2 “$self: `date`: (re-)starting netcat”
nc “$@” || :
sleep $reconnect_timeout
done
) \
| (
while :
do
file=”$path/$self.log.`date ‘+%Y-%m-%d-%H:%M:%S’`”
echo 1>&2 “$self: `date`: logging to file $file”
dd bs=$blocksize count=$maxfilesize of=”$file”
numfiles=`ls -1ast “$path/$self.log.”* | wc -l`
extra=`expr $numfiles – $numfilestokeep`
if [ $numfiles -gt $numfilestokeep ]; then ls -1t “$path/$self.log.”* | tail -n $extra | xargs rm -f; fi
[ -s “$file” ] || break
done
)

echo “$self: `date`: erroring out”

Splunk
Posted by

Splunk