I had an interesting meeting recently where lots of great ideas were shared. One of those was how useful it would be if you could analyze data from your backup systems? Imagine you have a backup for each of your endpoints like desktops, laptops or production machines in manufacturing. You would be able to track and review problems as well as security incidents, creating reports on the fly to gain insight into your environment.
We at Splunk have our endpoints on Crashplan from Code42. Code42 offers a cloud version as well as on premise appliances. So if a laptop is broken or stolen we can easily restore all the data that lived on the lost endpoint to a brand new machine. That is the core reason to have Crashplan.
Once you have that capability the backup agent sits on the endpoint and monitors which files are modified, changed, created or deleted. This information includes a lot of meta-data to just backup what has changed. That machine-generated data is highly valuable information, already exists and can be used for other use cases. A few of them could include:
- You need to find out who has installed which application in a specific version on your endpoint, but you do not run endpoint inventory tools. This could help you report on which adobe reader.exe versions are deployed on which hosts. You could even track in real-time how they are updated.
- You have a virus infection in your network and you want to identify which host was infected first and how the malware got there. Thanks to the file changes you can track down which file the virus was on and which host identified it first. You could also check which other machines in your network have the same file sitting there, potentially sleeping, or where the AV is not up to date and you need to check this
- You want to verify what sensible document exists next to your file shares and on which endpoint. Based on the filename or MD5 hash you’re able to analyze with one simple request on which system the file is deployed
- If an endpoint device is stolen, was there sensible data on it that could be made public? Thanks to the data from the endpoint you can check what was on it and you can even prove with evidence that the data was on that endpoint previously, but was then deleted.
- You recognize some malware but it keeps deleting itself. Thanks to data retention you can keep a record of activities so that you’re able to review if a specific file was on an endpoint or not. You can even use Threat Intelligence Feeds and IOC’s to correlate file activity information and check for malicious activity as well as infected/hacked hosts.
Next to those use cases you can also review information on your backup tool operation. For example:
- Verify who is accessing what backup files. If you have a lot of data protection on your network drives you should regularly review who is accessing what data in the backup environment.
- Review capacity management of the full backup environment as well as review which users are taking up most space
- Create operational insights like how often is data duplicated restored, how many unique users are accessing the solution so you can prioritize the service and capture the rollout within your environment as project manager.
I hope that this blog has provided some insight and ideas for what’s possible from a single data source.