Welcome back. Well I hope you are returning from reading the first part of this blog and not just jumping to the end!
In this second part of our two part blog series, we will continue to step through how to configure Splunk’s Federated Search for Amazon S3 against AWS WAF logs. If you haven’t read the first part then you will have missed all of the AWS pieces you need to do first before setting up Splunk. Disclaimer out of the way, let’s jump straight back into the driving seat and pick up where we left off.
There are two major parts to setting up Splunk Federated Search.
4. On the next page fill out the following items:
Example screenshot of filled in data:
5. Once all of those settings are filled in. Click the Generate Policy Button.
6. Copy the policy for the Glue Data Catalog resource policy.
7. Navigate back to your Glue tab and click the Catalog Settings section on the left hand menu.
WARNING: If you already have an existing policy in here, please take care to merge the policies together.
8. Paste or Merge in the policy from Splunk and click save.
9. Navigate back to your Splunk Federated screen and expand and copy the S3 bucket policy.
10. Navigate to the S3 bucket tab you had open for AWS WAF logs. Go back to the top level folder and select the permissions tab.
WARNING: Again this will most likely have an existing policy already on the S3 bucket. Take care to merge the policy from Splunk into this existing policy.
11. Copy in or Merge the policy into the S3 bucket policy. Click Save when happy and go back to the Splunk screen.
12. Now that we have setup the permissions of Splunk, tick both the agree buttons and Click Save:
This should now have created a federated provider. Let’s now create our federated index for AWS WAF logs.
Sample screen shot with fields filled in:
Click Save when happy.
You should now have created both the federated provider and federated index. It's time now to see if everything is working!
| sdselect * from federated:aws_waf_logs_index limit 100
Note: it may take a minute or two for the permissions to propagate properly. But after a minute or two you should hopefully see the following result as an example:
That’s it! You have now configured a connection between Splunk and AWS to search in place your AWS WAF Logs.
Before calling it a day let’s quickly discuss the ‘So What’ part of this blog.
When I talk to a lot of Splunk users around Federated Search the biggest challenge sometimes is what I call phase three on the learning path.
When thinking about how to use Federated Search for something useful we need to make sure we remember the key reason why this feature exists.
It is not designed for real-time threat hunting of data analysis. It's designed for infrequent or ad-hoc searching of data.
Before I move into common use cases for federated search I quickly wanted to mention how it's charged.
Splunk Federated Search uses what we call a Data Units Scanned charging model. This is not specific to Splunk, a lot of people in the market use this type of model. Effectively you pre-purchase a chunk of data (think of it like a prepaid phone plan) where you get ‘x’ number of terabytes to consume before you have to buy more.
Each time you run a search against your data it will eat into that pre-paid amount depending on how much data you scan inside the Amazon S3 bucket. It's important to remember it's the data scanned not the data returned. So if you run an inefficient search across an entire S3 bucket you will consume a lot of your license. This blog won’t go into the details of how to be efficient with searches or methods on how to optimize this (will save that for the course mentioned in the blog part 1!).
Now we understand the licensing model let’s continue on to the so what and use cases.
When we talk common use cases for Federated Search, three main use cases help with understand more of ‘so what’ of federated search:
Use Case One: Forensic Investigations
Now remember there is a difference between threat hunting and forensic investigations. The former is continually looking for some anomalous behavior, something that should be flagged for..you guessed it, the latter, forensic investigations!
Generally when you have an investigation you have narrowed it down to a point in time. Eg during this week or month or even day. This type of search is a great use case for federated search because you are being very specific and narrow in your search. It ticks the two major boxes for searching in place:
This means you have been very efficient and only using data (and your license) to a point in time.
Use Case Two: Historical Analytics
You could use federated search to generate some statistics over your historical data. You could be storing data in Amazon S3 over very long periods of time and perhaps you want to generate a report based on a subset of that data over time and pull out trends of specific information.
Again this a great and efficient way to do this as you haven’t had it in your data platform the entire time and although this might not be random search as described in use case one above, it is still very efficient as you run it infrequently on a subset of data.
Use Case Three: Data Enrichment
This is another good and common use case. Perhaps you have a large set of data stored for historical purposes in Amazon S3 and you would like to supplement a Friday or monthly report with details from that data.
This is a perfect way to do that. You can run the search, return the results and add it to an existing report or search to add additional data to a report or analysis to show trends etc.
Hopefully this two part blog helped you:
Hope you enjoyed the blog. See you in the next one!
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.