
Almost every tech maven and maestro begins their Splunk experience by pointing Splunk at their syslog dump. The results can be spectacular as you end up with thousands of different sourcetypes. (Admit it, many of you have done this at least once.)
Your log data is all in Splunk, you can search across all of it, but it is hard to conceptualize the data into anything meaningful. Because you don’t know how any of the content is related or what it represents the best you can do is a simple term search.
So how do we find the meanings hidden in the data and make it more accessible to more people?
In Splunk we identify different logical forms of data as sourcetypes. They are a powerful key to making all the automated magic happen in searches and reports. We use sourcetypes to trigger field extractions, lookups and many other data knowledge related features.
Sourcetypes do three very important things in Splunk.
- The data in a source data stream may contain many different types of information. Sourcetypes give you a way to simply identify the events in those data streams as unique types.
- Sourcetypes make it easier to dramatically filter out large swaths of unwanted data in a search, focusing on one type of information. > sourcetype=iis
- Sourcetypes give you a place to hang different parsing settings, transforms, lookups and extractions to enrich and clarify the meaning buried in the information.
Splunk has auto-typing to pick out sourcetypes from incoming data streams. Auto-typing works wonderfully in many cases but does not always do well identifying unusual log types. Things like proprietary error message files and smallish dump files. When you have this condition, Splunk gives up and creates a sourcetype that looks like this
postfix-too_small, snmpd-too-small, …._too_small
This can result in a large number of meaningless sourcetypes being generated which do not help you in searching the data at all. So the answer is to explicitly set sourcetypes using inputs.conf, props.conf and transforms.conf. The techniques to do this are well documented by Splunk.
Check out Why Sourcetypes Matter
Also check the excellent blog From Vi Ly titled Sourcetypes Gone Wild.
Now you know everything you need to know to create a sourcetype.
So how should I name my source type?
So you’re setting in front of the keyboard and your thinking, what should I call the kind of data that I am loading into Splunk.
Sourcetype names are literals so you can use any convention. Lets start with sourcetypes named “Huey” Dewey” and “Louie”. This allows me to write the following search using the Splunk App “Say for Splunk“.
> search (sourcetype=”Huey” AND status=”sleepy”) AND (sourcetype=”Dewey” AND status=”sleepy”) AND (sourcetype=”Louie” AND status=”sleepy”) | say “everything is all-right”
That is creates a wonderful entry for playing Splunk Jepardy, but I can only guess about what data is identified by those hard working sourcetypes. Today let’s look at one proposal for naming sourcetypes.
In this example, the semantics for naming the sourcetype would be as follows.
<Technology>.<Technology Tier>.<Sourcetype>
This example might produce soucetypes that look like this.
WebSphere:app:performance_bus,
WebSphere:app:performance_data
WebSphere:app:performance_pres
It also allows you to do searches across platforms like this
> sourcetype=”Websphere:app:*”
Example Values for naming conventions might include:
Technology — WebSphere, ApacheTomcat, OracleApps, SQLServer, etc
Technology Tier — Net,App,OS,FW,IP,STO, etc
Given that, in the Microsoft world, Microsoft has decided that the version of the OS/app/etc. is important contextual information in understanding logs Adrian Hall, who wrote our excellent Splunk App for Microsoft Exchange, uses the version number in the sourcetype naming convention like this.
<OS>.<Version>.<Sourcetype>
This example might produce soucetypes that look like this.
MSWindows:2008R2:IIS
MSWindows:2003:IIS
MSExchange:2007:Mailbox-Usage
What approaches have you used?
Next time we’ll talk about a different approach to this question using the Splunk Common Information Model and how these two conventions could be integrated.
Check out Understand and use the Common Information Model
Credit for these concepts goes to Adrian Hall, Vi Ly and John Folkers. Splunk has the most awesome technical resources.
----------------------------------------------------
Thanks!
David Clawson