There are times when data within events contains sensitive information. This could be Social Security numbers, credit card numbers, date of birth, an employee’s salary information, etc. The data may be in the clear and when it gets sent to Splunk, it would be indexed. Any person in a role that has access to that data would be able to search on it. To prevent such things from happening, Splunk has an out of the box feature to mask sensitive data. For instance, a Social Security number may end up looking like xxx-xx-xxxx within a search. The administrator could either use Splunk’s built-in sed like syntax to replace sensitive strings or use a regular expression in a transforms.conf file to accomplish this.
This works great for the case where you never want anyone to have access to the data’s sensitive contents. However, there are some cases, where it would be useful for an authorized person to retrieve the original contents at search time such as searching for an employee’s salary or date of birth data. To accomplish this, the sensitive data would have to be encrypted in the index and decrypted by an authorized search. I’ll spend the rest of this blog entry describing how to do this with a simple approach I wrote that is available for download from Splunk’s add-on page.
Let’s start with example data such as this:
Tue Oct 6 13:23:36 EDT 2009 name user100 creditcard=32345465659888 b shoes
Tue Oct 6 13:23:38 EDT 2009 name user101 creditcard=32345465659889 b pens
Tue Oct 6 13:23:39 EDT 2009 name user102 creditcard=32345465659873 b pets
Notice that the credit card number is in the clear. The requirements are that this piece of data be encrypted before indexing. In the encrypt/decrypt Splunk add-on, there is a small Python utility that be run against any text file to encrypt any single string of data using DES with a symmetric key. The usage is:
python encryptfield.py <filename> <regex with grouping> <8 chararacter key>
in this example, it would look like this:
python encryptfield.py credit.log "creditcard=(\\d+)" DESCRYPT
This would produce a file with an extension of
en.txt such as
credit.log.en.txt. In the example provided, the produced file looks like this:
Tue Oct 6 13:23:36 EDT 2009 name user100 creditcard=BTXoXxBF/i//Izi1uYJEKA== b shoes
Tue Oct 6 13:23:38 EDT 2009 name user101 creditcard=BTXoXxBF/i9OBE/y2eNIWw== b pens
Tue Oct 6 13:23:39 EDT 2009 name user102 creditcard=BTXoXxBF/i/qE+IQ+tS98Q== b pets
What the Python program did was take the matching regular expression, run a publicly available DES encryption on it using the provided key, and then sent the resulting data through a base64 utility so that the final result would be printable in ASCII text. Since source code is provided, you can change the behavior of the program to encrypt multiple strings or use triple DES instead of DES for the encryption. What the user would do next is pass the
credit.log.en.txt to Splunk to index.
In this case, the requirement is that authorized users have the capability to decrypt the encrypted field at search time. To accomplish this, I provided another Python program called decrypt that can be used to get the original data back. Instructions are provided in the download on how to register this as a Splunk search command and make it part of a role. Now, if the user has authorization to this sourcetype, is authorized to run the decrypt command, and knows the symmetric key to decrypt the string, they can run from Splunk Web or from the command line the following search:
sourcetype="credit" |decrypt "creditcard=([^\s]*)" DESCRYPT
This will create a new field at search time called decryptedField that you
can use for further search. Its content will contain the original credit card number. Notice that the same key
that was used to encrypt (DESCART) is used to decrypt. Here’s another example that prints out the
decryptedField in columns, although practically speaking, you’ll probably end up searching for the decrypted value for a particular event such as one that matches a user name.
sourcetype="credit" |decrypt "creditcard=([^\s]*)" DESCRYPT|fields + decryptedField
With this add-on, you now have the capability to encrypt your data before indexing it and decrypt it at search time using roles and symmetric keys. Because the source code is provided, you could, of course, use other approaches such as public and private keys for encryption and decryption, but that may end up adding complexity to an approach that is rather simple to administrate and use. Hopefully, this distribution provides one possible approach to handling this requirement. Happy Splunking.