Tips & Tricks

June 09, 2020

2 Minute Read

Making the Collection of Centralised S3 Logs into Splunk easy with Lambda and SQS

By Paul Davies

Got multiple AWS data sources in the same S3 bucket but struggle with efficient SNS notifications based on prefix wildcards? Well, struggle no more, we’ve got your back.

Many of our customers have a centralised S3 Bucket for log collection for multiple sources and accounts. For example, all Config, CloudTrail and Access Log logs may be routed into one central bucket for an organisation. The key prefix for these log objects generally provides an easy navigation around each account and log type – for example the object keys are in generally of the format:

Bucket/AWSLogs/account number/logtype/region/year/month/day/log

To collect these logs into Splunk, one of the best practice approaches is to use the Splunk Add-On for Amazon Web Services, using the “SQS Based S3” input. This input essentially uses an SNS notification on the bucket along with SQS message that the add-on uses to identify new files in the bucket, which it then reads into Splunk.

Although this is a very scalable solution, a challenge arises with this logging method when more than one source of logs is being dropped into a bucket, such as CloudTrail and Config. This is due to the SNS notifications only being able to be triggered with a wild card set at the tail end of the prefix, such as /bucket/account/*. It is not possible therefore with a centralised logging bucket to separate out one single notification for all CloudTrails in the bucket, as this would require the notification to be set on bucket/AWSLogs/*/CloudTrail/* which is not valid.

A way around this of course is to set up multiple notifications topics, corresponding SQS queues and an add-on input for each account, which over time can be quite complicated and difficult to manage/maintain. An example of this could be where 100 accounts with 3 log types each would result in 300 SNS topics, 300 SQS queues (each with another dead letter queue) and 300 add-on inputs.

There is however another much easier setup and approach that can be taken using Lambda functions. Instead of having separate SNS notifications for each account, one SNS topic for the whole bucket could trigger a Lambda function via an SQS queue, which in turn “routes” the notification into other SQS queues depending on the log source, which are then linked to an add-on input of the correct “sourcetype”. Using this approach, one bucket could have multiple accounts and sourcetypes without the need for a large setup of SNS topics, SQS queues and Add-On inputs. With the same example above of 100 accounts and 3 logs, only 1 SNS topic would be needed, with only 4 SQS queues (with each queue having a dead-letter queue).

(It is also possible to go direct from SNS into a Lambda function avoiding 1 more SQS queue, but in the event of a function failure, there is no way to retrieve the SNS notification, whereas the queue would still contain the notification.)

Central Logging Account

Detailed Instructions on how to set this up is available on GitHub here, along with a sample Lambda function.

The sample function provides a use case where 3 different sources may be available in an S3 bucket. It uses function environment variables to set the queue names for each of the different sources, as well as a default queue for any other object that is put there. The function also can take a exclusion list environment variable to “ignore” certain objects that may also be copied into the bucket but not needed to be sent to Splunk.

Other use cases may be added to the function, such as sending to different queues based on account numbers. This could enable logs from certain groups of accounts to be sent to different Splunk indexes for security or retention requirements.

Happy Splunking,

Paul

Paul Davies

Paul is an Architect in EMEA, responsible for working closely with Splunk customers and partners to help them deliver solutions relating to ingesting data or running Splunk in the cloud. Previously, Paul worked at Cisco as a BDM for big data solutions, an Enterprise Architect at Oracle, and Consultant at Hitachi.

Tips & Tricks 3 Min Read

RDP to Windows Server from a Splunk Dashboard

Tips & Tricks 5 Min Read

Search Command> stats, eventstats and streamstats

Advance past “super grep” searching & learn; Web log example of 5 events shows how stats, eventstats & streamstats commands work & ways they differ step-by-step.

Tips & Tricks 4 Min Read

Splunking Microsoft Cloud Data: Part 1

A step-by-step guide for configuring and ingesting Azure audit & O365 Management Logs

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram