Topics

| pdf version

Authentication


Splunk > The IT Search Company

  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk

Localized Splunk documentation

Looking for Splunk documentation in other languages?

Splunk Data Anonymizer

This documentation does not apply to the most recent version of Splunk.

This documentation applies to the following versions of Splunk: 2.1 , 2.2 , 2.2.1 , 2.2.3 , 2.2.6

Splunk Data Anonymizer

The anonymizer combs through sample log files or event files to replace identifying data - usernames, IP addresses, domain names, etc. - with fictional values that maintain the same same word length, and the same event type both locally and at Splunk Base. For example, it may turn the string user=billg@microsoft.com into user=carol@adalberto.com . This lets Splunk users share log data without revealing confidential or personal information from their networks.


The anonymized file is written to the same directory as the source file, with ANON- prepended to its filename. For example, /tmp/messages will be anonymized as /tmp/ANON-messages .


Simple method

# splunk anonymize filename

This is the easiest way to anonymize a file using the anonymizer tool's defaults, as shown in the session below. Note that you currently need to have $SPLUNK_HOME/bin as your current working directory; this will be fixed in an incremental release.


# cp -p /var/log/messages /tmp
# cd $SPLUNK_HOME/bin
# splunk anonymize /tmp/messages
Getting timestamp from: /opt/paul207/splunk/lib/python2.4/site-packages/splunk/timestamp.config
Processing files: ['/tmp/messages']
Getting named entities
        Processing /tmp/messages
Adding named entities to list of public terms: Set(['secErrStr', 'MD_SB_DISKS', 'TTY', 'target', 'precision ', 'lpj', 'ip', 'pci', 'hard', 'last bus', 'override with idebus', 'SecKeychainFindGenericPassword err', 'vector', 'USER', 'irq ', 'com  user', 'uid'])
        Processing /tmp/messages for terms.
        Calculating replacements for 4672 terms.
===================================================
Wrote dictionary scrubbed terms with replacements to "/tmp/INFO-mapping.txt"
Wrote suggestions for dictionary to "/tmp/INFO-suggestions.txt"
===================================================
Writing out /tmp/ANON-messages
Done.

Advanced method

You can customize the anonymizer by telling it what terms to anonymize, what terms to leave alone, and what terms to use as replacements. The advanced form of the command is shown below.


# splunk anonymize filename public_terms private_terms name_terms dictionary timestamp_config 'branding'

  • filename

Default: None


Path and name of the file to anonymize.


  • public_terms

Default: $SPLUNK_HOME/bin/public-terms.txt//


A list of locally-used words that will not be anonymized if they are in the file. It serves as an appendix to the dictionary file. Below is a sample entry.


2003 2004 2005 2006 abort aborted am apr april aug august auth
authorize authorized authorizing bea certificate class com complete
  • private_terms

Default: $SPLUNK_HOME/bin/private-terms.txt


A list of words that will be anonymized if found in the file, because they may denote confidential information. Below is a sample entry.


{{erik


susan


passw0rd


}}


  • name_terms

Default: $SPLUNK_HOME/bin/names.txt


A global list of common English personal names that Splunk uses to replace anonymized words. It always replaces a word with a name of the exact same length, to keep each event's data pattern the same. Splunk uses each name in name_terms once to replace a character string of equal length throughout the file. After it runs out of names, it begins using randomized character strings, but still mapping each replaced pattern to one anonymized string. Below is a sample entry.


aaron          
abbey          
abbie          
abby           
abdul
  • dictionary

Default: $SPLUNK_HOME/bin/dictionary.txt


A global list of common words that will not be anonymized, unless overridden by entries in the private_terms file. Below is a sample entry. Below is a sample entry.


algol
ansi
arco
arpa
arpanet
ascii
  • timestamp_config

Default: $SPLUNK_HOME/lib/python2.4/site-packages/splunk/timestamp_config


Splunk's built-in file that determines how timestamps are parsed.


  • branding

Default: splunk


A single-quoted string to use frequently as a replacement for anonymized words, to give the file a recognizeable identity.


Output Files

Splunk's anonmyizer function will create three new files in the same directory as the source file.


  • ANON- filename

The anonymized version of the source file.


  • INFO-mapping.txt

This file contains a list of which terms were anonymized into which strings. Below is a sample entry.


Replacement Mappings
--------------------
kb900485 --> LO200231
1718 --> 1608
transitions --> tstymnbkxno
reboot --> SPLUNK
cdrom --> pqyvi
  • INFO-suggestions.txt

A report of terms found in the file that, based on their appearance and frequency, you may want to add to public_terms.txt or to private-terms.txt or to public-terms.txt for more accurate anonymization of your local data. Below is a sample entry.


Terms to consider making private (currently not scrubbed):
['uid', 'pci', 'lpj', 'hard']
Terms to consider making public (currently scrubbed):
['jun', 'security', 'user', 'ariel', 'name', 'logon', 'for', 'process', 'domain', 'audit']
Revision: 207 Contact Privacy Policy Terms of Use Community content licensed under Creative Commons