EXCERPT FROM “EXPLORING SPLUNK: SEARCH PROCESSING LANGUAGE (SPL) PRIMER AND COOKBOOK”. Kindle/iPad/PDF available for free, and hardcopy available for purchase at Amazon.
You need to find transactions with specific field values.
A general search for all transactions might look like this:
sourcetype=email_logs | transaction userid
Suppose, however, that we want to identify just those transactions where there is an event that has the field/value pairs to=root and from=msmith. You could use this search:
sourcetype=email_logs | transaction userid | search to=root from=msmith
The problem here is that you are retrieving all events from this sourcetype (potentially billions), building up all the transactions, and then throwing 99% of the data right in to the bit bucket. Not only is it slow, but it is also painfully inefficient.
You might be tempted to reduce the data coming in as follows:
sourcetype=email_logs (to=root OR from=msmith) | transaction userid | search to=root from=msmith
Although you are not inefficiently retrieving all the events from the given sourcetype, there are two additional problems. The first problem is fatal: you are getting only a fraction of the events needed to solve your problem. Specifically, you are only retrieving events that have a to or a from field. Using this syntax, you are missing all the other events that could make up the transaction. For example, suppose this is what the full transaction should look like:
 10/15/2012 10:11:12 userid=123 to=root  10/15/2012 10:11:13 userid=123 from=msmith  10/15/2012 10:11:14 userid=123 subject=”serious error”  10/15/2012 10:11:15 userid=123 server=mailserver  10/15/2012 10:11:16 userid=123 priority=high
The above search will not get event3, which has subject, or event4, which has server, and it will not be possible for Splunk to return the complete transaction.
The second problem with the search is that to=root might be very common and you could actually be retrieving too many events and building too many transactions.
So what is the solution? There are two methods: using subsearches and using the searchtxn command.
Your goal is to get all the userid values for events that have to=root, or from=msmith. Pick the more rare condition to get the candidate userid values as quickly as possible. Let’s assume that from=msmith is more rare:
sourcetype=email_logs from=msmith | dedup userid | fields userid
Now that you have the relevant userid values, you can search for just those events that contain these values and more efficiently build the transaction:
... | transaction userid
Finally, filter the transactions to make sure that they have to=root and from=msmith (it’s possible that a userid value is used for other to and from values):
... | search to=root AND from=msmith
Putting this all together, with the first search as a subsearch passing the userid to the outer search:
[ search sourcetype=email_logs from=msmith | dedup userid | fields userid ] | transaction userid | search to=root from=msmith
The searchtxn (“search transaction”) command does the subsearch leg-work for you. It searches for just the events needed to build a transaction. Specifically, searchtxn does transitive closure of fields needed for transaction, running the searches needed to find events for transaction, then running the transaction search, and finally filtering them to the specified constraints. If you were unifying your events by more than one field, the subsearch solution becomes tricky. searchtxn also determines which seed condition is rarer to get the fastest results. Thus, your search for email transactions with to=root and from=msmith, simply becomes:
| searchtxn email_txn to=root from=msmith
But what is email_txn in the above search? It refers to a transaction-type definition that has to be created in a Splunk config file — transactiontype.conf. In this case, transactiontype.conf might look like:
[email_txn] fields=userid search = sourcetype=email_logs
Running the searchtxn search will automatically run the search: sourcetype=email_logs from=msmith | dedup userid. The result of that search gives searchtxn the list of the userids to operate upon. It then runs another search for: sourcetype=email_logs (userid=123 OR userid=369 OR userid=576 …) | transaction name=email_txn | search to=root from=msmith
This search returns the needle-in-the-haystack transactions from the results returned by the searchtxn search.
Note: If the transaction command’s field list had more than one field, searchtxn would automatically run multiple searches to get a transitive closure of all values needed.
Explore using multiple fields with the searchtxn command. If you’re interested in getting the relevant events and don’t want searchtxn to actually build the transactions, use eventsonly=true.