Topics

| pdf version

Use Splunk's command line interface (CLI)


Splunk > The IT Search Company

  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk

Localized Splunk documentation

Looking for Splunk documentation in other languages?

Anonymize data with sed

This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6

Anonymize data with sed

This utility allows you to anonymize your data by replacing or substituting strings in it at index time using a sed script.

Most UNIX users are familiar with sed, a Unix utility which reads a file and modifies the input as specified by a list of commands. Now, you can use sed-like syntax to anonymize your data from props.conf.

Note: Edit or create a copy of props.conf in $SPLUNK_HOME/etc/system/local.

Define the sed script in props.conf

In a props.conf stanza, use SEDCMD to indicate a sed script:

[<stanza_name>]
SEDCMD-<class> = <sed script>

The stanza_name is restricted to the host, source, or sourcetype that you want to modify with your anonymization or transform.

The sed script applies only to the _raw field at index time. Splunk currently supports the following subset of sed commands: replace (s) and character substitution (y).

Note: You need to restart Splunk to implement the changes you made to props.conf

Replace strings with regex match

The syntax for a sed replace is:

SEDCMD-<class> = s/<regex>/<replacement>/flags

  • regex is a Perl regular expression.
  • replacement is a string to replace the regex match and uses "\n" for back-references, where n is a single digit.
  • flags can be either: "g" to replace all matches or a number to replace a specified match.

Example

Let's say you want to index data containing social security numbers and credit card numbers. At index time, you want to mask these values so that only the last four digits are evident in your events. Your props.conf stanza may look like this:

[source::.../accounts.log]
SEDCMD-accounts = s/ssn=\d{5}(\d{4})/ssn=xxxxx\1/g s/cc=(\d{4}-){3}(\d{4})/cc=xxxx-xxxx-xxxx-\2/g

Now, in you accounts events, social security numbers appear as ssn=xxxxx6789 and credit card numbers will appear as cc=xxxx-xxxx-xxxx-xxxx-1234.

Substitute characters

The syntax for a sed character substitution is:

SEDCMD-<class> = y/<string1>/<string2>/

which substitutes each occurrence of the characters in string1 with the characters in string2.

Example

Let's say you have a file you want to index, abc.log, and you want to substitute the capital letters "A", "B", and "C" for every lowercase "a", "b", or "c" in your events. Add the following to your props.conf:

[source::.../abc.log]
SEDCMD-abc = y/abc/ABC/

Now, if you search for source="*/abc.log", you should not find the lowercase letters "a", "b", and "c" in your data at all. Splunk substituted "A" for each "a", "B" for each "b", and "C" for each "c'.

Revision: 207 | Contact | Privacy Policy | Terms of Use | Community content licensed under Creative Commons