When deploying Splunk in the wild, there is the task of deciding “to forward, or not to forward”. This decision comes down to many factors, but the typical response/answer is to use the forwarder. In this blog, I’ll detail that decision process so you can decide for yourself.
First, let’s quickly explain what a Forwarder does…if you already know, skip to the next paragraph. Splunk can perform four basic functions: searching, indexing, forwarding, and acting as a deployment server. When Splunk is setup to be a forwarder, it reads in the raw data and sends it to a Splunk indexer. In the latest version of Splunk, we offer an additional software package especially for forwarding (only). This is called the Universal Forwarder. All forwarders can accept input in all the standard methods: file/directory monitoring, network input (tcp/udp), scripted input, and file system change monitor. Feel free to read the more official stuff here: http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Aboutforwardingandreceivingdata. Now that we understand forwarder functionality, let us better understand when and where we might want to use them.
File/Directory Monitoring and scripts: As an example, assume we want to monitor HTTP access logs and error logs on a server, possibly OS level information as well. In most use cases, my recommendation would be to leverage the Splunk forwarder. Using the forwader will provide both the resiliency and reliability of the data getting to Splunk. When monitoring files, Splunk is excellent at keeping track of what it has indexed. Additionaly, it can control scripts (like top or ps data) to ingest information. So why is it reliable and resilient? Well, let’s assume that you had a network outage between servers OR you perform ‘maintenance’. If the forwarder to indexer connection is lost, Splunk will queue the data. For the file or directory monitoring, Splunk will keep track of where it left off and stop sending data. Once the connection is reestablished, Splunk will begin sending data from where it left off. For the scripted input data, Splunk does queue that as well until a certain point is reached. Another consideration for running a forwarder is real-time searching. You cannot get real-time data or searching if you are loading the data at timed intervals (think of secure copies every minute or hourly).
Network Devices: For this next example let us consider monitoring of routers, switches, and firewalls. These devices can be a challenge since you typically don’t have the luxury of installing a forwarder on the local device. Also, it is much easier to make a config change to have them send syslog output over UDP to a ‘collector’. A lot of people consider sending directly from their devices to the Splunk indexer, via a udp network input. While this is easy to do up front, it does not provide any queueing since a lost connection to the indexer means the event will never show up. My second option would be to use an intermediate forwarder. This could make things easy since all configuration would send to a single server AND my forwarder has a queue in case my connection to the indexer is severed. Well, what if I need to restart or upgrade the forwarder? In this case, you will lose data during that period. To solve this, one could leverage a tool like syslog-ng to persist the data to disk. Basically, you have syslog-ng act as the collector writing out to files. These files then get ingested by the Splunk forwarder. While there is still potential to lose data if syslog-ng is turned off, the reality is that you should not need to tweak syslog-ng once up and running.
So How do I choose to forward, or not? Since I always prefer to have the forwarder installed, these are the questions I start with:
- Can I install a forwarder on this system?
- Do I need to distribute (load balance) data across many indexers?
- Are there data sets only obtainable through scripts run locally (e.g. – non WMI events, local OS level data)?
- Do I need to parse the data locally (e.g. – anonymize)?
If the answer is yes to any of the above questions, then I would steer towards installing a Forwarder. There are some cases where it can be easier to go without a forwarder, but the advantages of a forwarder typically outweigh the difficulty cost. To summarize, if there was a list of key points to take away they would be as follows:
- Splunk keeps track of files and directories very well
- Splunk Forwarders can queue input data
- Forwarders perform data load balancing to many Splunk Indexers (provides faster searching and availability benefits)