TIPS & TRICKS

App Design Patterns - Creating Indexes

One of the most common design anti-patterns seen in apps submitted to the Splunk App Certification program is defining indexes during development for data.   This article will review the issues involved with this design and recommend a design pattern that is more robust and maintainable.

Problem

Apps should be primarily concerned with acquiring data, transforming it into knowledge objects, and providing tools for users to act on that data. Index definitions exist to specify where and how data is stored and who can access it. When an app specifies index configurations and builds them into searches, it is mixing different concerns.  By mixing knowledge management and storage management the app loses some cohesion.  

Discussion

Apps should be environment agnostic

Splunk Enterprise indexes allow administrators to perform several critical tasks. 

  1. Control the location, on disk, of index files.
  2. Manage data retention.
  3. Manage access control.

Most index definitions packaged with apps are some variation of the example below.  By providing this configuration,  we've chosen the file locations for indexed data. Using the $SPLUNK_DB variable is a good guess, but most administrators are still going to validate the setting for their environment.

[app_index]
homePath   = $SPLUNK_DB/app_index/db
coldPath   = $SPLUNK_DB/app_index/colddb
thawedPath = $SPLUNK_DB/app_index/thaweddb

Data retention for an index can be specified in megabytes or seconds until data is frozen. A developer could provide what they feel to be reasonable defaults but there are significant pitfalls to doing so. Without understanding the amount of data that will be incoming to the system or the user's storage capacity, these defaults could freeze data to early or to soon, or fill up storage devices with data that could be frozen. Access control is managed in authorize.conf and can be used to allow access to specific indexes. Without knowledge of the user's organizational structure restricting access to data is going to be challenging if not impossible to do accurately.

Highly coupled

Many developers create indexes to increase the app's search performance, and increase data isolation. The assumption   that nothing else will be put into the index created by their app gives the developer assurances about their data.  The app's dashboards and saved searches look like this:

[sample_search]
#This is a hypothetical search that is looking for errors in data indexed by a mod input in the app
search = "index=app_index earliest=-1d error"

While this search is performant, it is brittle. Consider this scenario: A developer is writing an app to allow users to ingest data into Splunk from the developer's  API. The user has a single corporate Splunk software instance and two departments, Sales and Marketing, that are both subscribers to the developer's service with different accounts.  The Splunk administrator configures two inputs with different credentials to get the data, and stores the data into two indexes app_index_sales and app_index_marketing.  The data is imported just fine, but none of the searches or dashboards work, because they are all hardcoded to the index app_index that the developer provided in the app.  In order to fix this, the administrator now needs to change all of your searches which are in saved searches and in the dashboards.  Making these changes creates parallel copies of the original searches and dashboards in the local/ directory.  Now, every time the developer releases a new version of the app, the customization must be done again. Ouch.

Caveats

Performance

Despite the issues listed above, including indexes in a search does have one significant advantage, performance.  By limiting the number of index covered by a search you can get significantly better performance from the search itself. In this case the solution is not to define the index ourselves, but provide the administrator options. Our recommended solution is to use macros, which we will discuss in further detail later.

Also, remember that the most efficient way to accelerate your searches is to specify earliest and latest time windows for your search.

Internal development

Developers creating apps that will only be used in their own Splunk software deployment, either on-premises or in the cloud may choose to specify indexes.  Their knowledge of the environment allows certainty that is not available for a developer preparing to publish on Splunkbase.  Additionally, we strongly encourage developers to keep all of their configuration under source control, so  that your index definitions are tracked and managed.   

Even with increased information about the environment, we still recommend following the design pattern described in this article. It ensures that knowledge objects are abstracted from choices about how the data is physically stored on disk; and increases the maintainability of the code in the event of infrastructure changes.

Structure

This solution substitutes the direct inclusion of indexes in searches with a macro that can be modified once by an administrator to reflect their environment.  By moving all of the index references to a macro we gain a single point of modification, abstract the index search out of the individual searches, and enable highly performant searches. The app developer does not define the index at design time, but instead provides the macro for an administrator to add an index reference later.

Refactoring to a solution

Last, we will discuss how to refactor an app with existing index definitions to the recommended pattern.

1. Remove index specification from inputs

Replace index specifications in data inputs with source type and/or event type specifications.   We want to ensure that we are setting metadata that can be used to reliably find the information later in our searches.  

The inputs.conf specification changes from this:

[my_input]
index=app_index
  
[my_input://instance]
api_endpoint=https://api.widgets.com/

To this:

[my_input]
sourcetype=widgets_api
  
[my_input://instance]
api_endpoint=https://api.widgets.com/

2. Create a macro to support limiting searches

Administrators may want to optimize search performance for their data by adding an index to the search, a macro allows them to do this in a single step. If you are refactoring your app you will want to prefill the macro definition with the existing (legacy) index, so that searches will find the historical data they contain. Don't forget to update your documentation to explain to administrators how to improve performance with this macro.

Note: Be careful to think about how you use the OR or AND operators in this macro to ensure you get the proper results back. Splunk uses an implicit AND between components of a search. The choice of AND or OR will also affect the placement of the macro in your search commands.

Sample macro for an app refactoring:

[macro_widgets_index]
definition = index=app_index OR
description = "This macro is used to limit searches for the widgets.com app."

If you are doing new development, you can provide an empty definition that the administrator can customize later.

Sample macro for a new app

[macro_widgets_index]
definition = ""
description = "This macro is used to limit searches for the widgets.com app."

An administrator can customize the macro to specify an index in order to increase search performance by restricting the scope of the search. The biggest difference between this version and the refactoring model the search will implicitly use AND rather than the explicit OR used above.

Sample macro for performance

[macro_widgets_index]
definition = index=app_index
description = "This macro is used to specify an index for searches in the widgets.com app."

Warning: Splunk event types do not support macro expansion. If an app uses or depends on event types, and event types are critical to the performance or security management of data, index definitions will need to be manually added to each event type by administrators after they define indexes for data.

3. Update searches

Now that we have macros in place we can update our searches to look for items in either our legacy index or with our newer sourcetype definition. We make sure to use the OR statement in order to find data in our legacy index or by sourcetype.  Our saved searches change from:


[sample_search]
#This is a hypothetical search that is looking for errors in data indexed by a mod input in the app
search = "index=app_index earliest=-1d error"

To:

[sample_search]
#This is a hypothetical search that is looking for errors in data indexed by our input.
search = "(`macro_widgets_index` sourcetype=widgets_api) error earliest=-1d"

4. Move indexes.conf

The last step of this refactoring is to remove the indexes.conf from the app.  Ideally in new development you will never create the file, but if it already exists you are going to need to help your users make some changes:

  1. Have your users to move indexes.conf from the default folder to the local folder or to a system folder before upgrade. This will ensure the index continues to be tracked and the contents of the index are available.  
  2. Remove the indexes.conf file from your default directory.
  3. Consider providing users with a sample copy of your old indexes.conf in your documentation, in case they didn't read the upgrade instructions and lost the index definition.

Depending on your deployed base, you may want to implement these changes incrementally over several versions.

Conclusions

Indexes allow administrators to manage data storage locations, retention and access control. Making those choices requires knowledge of the specific details of a Splunk software installation and should be left to the administrator to configure.  Developers should write environment agnostic software that works in any environment. Adding index criteria to searches, increase coupling and decrease cohesion of the software by tying knowledge objects to their physical storage location.  Developers can gain the performance benefits of limiting search by an index by providing a macro that administrators can configure to match their index configurations.  This macro abstracts the index out of the search and provides a single point of customization for the administrator, making the software perform better while being easy to maintain.

Andy Nortrup
Posted by

Andy Nortrup

New father, husband, Army veteran, and import to Seattle. 

Join the Discussion