TIPS & TRICKS

Fixing Scripted Inputs in Tiered Deployments

The Splunk App for Microsoft Exchange has a useful lookup named ad_username. It takes the various forms that you can logon to a domain as (like DOMAIN\user and user@domain.com) and normalizes them. Further, it then takes all the user aliases and normalizes them so adrian.hall is the same as ahall and that is the same as adrian. It’s really useful when you are trying to deal with domain accounts from a support functionality – you don’t have to know how they logged in – only what their official username is.

AD_Username is a scripted input written in Python and lives in the bin directory of the application directory. It relies on two files that live in the local directory called domain_aliases.csv and active_directory.csv. In a single box environment, the ad_username.py script finds these files, loads them and uses them to do its job. Normalization happens normally. All is good in the world.

But what happens in a multi-tier environment when you have indexers separated from the search heads? The search head pushes the ad_username.py script into a replication bundle and passes that to the indexers to execute as part of the pipeline. The lookup then happens on the indexer, not the search head. Unfortunately, the dependent CSV files aren’t on the remote indexers, so the script doesn’t do an effective job and if the scripted input requires those files, it could break completely. At best, you get wrong results.

Fortunately, the fix is relatively easy. One of the configuration files that you can use is distsearch.conf and its job is to tell Splunk how to handle distributed searches. One of the things we do in the Splunk App for Windows Infrastructure, for example, is to blacklist the tSessions lookup (which tends to run into the Gb of data) from being replicated. This improves performance. However, you can also whitelist files and this technique is used in our example to ensure that dependent files are replicated properly.

Try this simple stanza in the distsearch.conf:

[replicationWhitelist]
ad_username = …(domain_aliases|active_directory).csv

Place this file in the Splunk_for_Exchange/local/distsearch.conf (for 2.x releases) or the splunk_app_microsoft_exchange/local/distsearch.conf (for 3.x releases). This will add the domain_aliases.csv and active_directory.csv file to the replication bundle for ANY search that is executed within the context of the app. Now you can utilize the ad_username lookup as normal. In addition, panels that rely on the ad_username (normally under the User Behavior menu) will similarly work as intended.

Of course, you could just upgrade to the latest (v3.0.1) version of the Splunk App for Exchange and get the same effect. The Splunk App for Exchange now has a free downloadable trial, so trying this functionality is easier than ever.

Splunk
Posted by

Splunk

Join the Discussion