IT

Top 3 Best Practices for Configuring Splunk ITSI in a Large-Scale Environment

Part of the beauty of Splunk IT Service Intelligence (ITSI) is that it provides users with flexible models of their entities and services. Additionally, Splunk ITSI can scale to support monitoring of thousands of services and tens of thousands of entities.

This blog post provides a sample of best practices for configuring a large-scale Splunk ITSI deployment. It's NOT a complete list of Splunk ITSI configuration guidelines; check out the Splunk ITSI Documentation for more in-depth information about that.

Best Practice #1: Focus on the KPIs that matter

It's not good to have so many KPIs in a single service that you can barely keep track of them all. I’ve seen cases where the customer configured more than 50 KPIs in a service. How do you effectively monitor and troubleshoot the service when there are that many KPIs involved?

Part of the beauty of Splunk ITSI is that it makes it easy to focus on what matters in your environment. So spend time crafting and fostering the KPIs that you really care about and want to measure. You’ll save yourself time troubleshooting later.

So what is the recommended number of KPIs for a single service?
It’s best to have no more than 20 KPIs per individual service—more than enough to capture the key metrics you care about (like CPU, IO, disk free, and response time).

Best Practice #2: Use entity rules to filter to the entities you care about within your service

Entity rules within a service ensure that you’re dynamically filtering to the entities that matter in your environment. Use entity rules that are prescriptive enough that you’re catching the entities you care about for that service. If you’re matching service-level entity rules to tens and thousands of entities, it can be difficult to monitor the entities that are of interest, and can slow internal operations.

Recommendation:
Splunk ITSI does not limit the number of matching entities for a service. The recommendation is be mindful of the performance implication when you have a lot of entities matched for a single service.

Best Practice #3: Use shared base searches to power multiple KPIs

In Splunk ITSI, shared base searches are recommended to minimize the overall search load at the Splunk Enterprise level.

Use the following guidelines to decide on the correct number of KPIs to be powered by a single shared base search.

When configuring a shared base search, consider the following recommendations:

  1. Go to the search inspector and check the search execution stats. If the shared base search is scheduled to run every minute but the actual search execution takes longer than a minute, the next scheduled search will be skipped. This will cause delayed KPI alert values and health score results, and means you have too many KPIs tied to a single shared base search. Try reducing the number of KPIs.

  2. A single shared base search produces more rows of results when more services and entities are involved. Splunk ITSI transports the processed search results to the ITSI summary index (itsi_summary) through alert_actions.conf. By default, the system only transports 50,000 rows of results to the summary index. This number is not enough in a large-scale environment and will produce mysterious “N/A” results on your service or KPI tiles. You must increase the value of the following setting in limits.conf:

     

    max_action_results = <integer>
    * The maximum number of results to load when triggering an alert action.
    * Default: 50000

How do I calculate the limit:
limit = (number of KPIs * number of entities for each service) + (number of services) * 2

 

Ex: A shared base search is powering 5,000 KPIs across 500 services.

Each service is matching 10 entities.

limit = (5000 x 10) + 500 x 2 = 50100

How do I know how many KPIs are associated with a single shared base search:
Starting with Splunk ITSI version 4.0.x, the ITSI Health Check dashboard provides these statistics:              

Warning: Increasing this limit to a very large number can have a negative impact on the overall system, as more memory must be allocated to support the increased number of search results.

Recommendation:
Again, Splunk ITSI does not limit the number of KPIs that can be powered by a single shared base search. Use the above recommendations to decide on the correct amount of KPIs that can use a single shared base search in your environment.

Conclusion

Splunk IT Service Intelligence provides actionable insight into the performance and behavior of your IT operations, making it easy to effectively monitor your environment and provide value for your business. It allows you to see across silos and services for easier collaboration and real-time information about your IT and business health. By leveraging the best practices and recommendations in this blog post, you can successfully configure Splunk ITSI in a large-scale environment to meet the demands of your business.

This blog post is a collaborative work by Kan Wu, Keegan Dubbs, and Elizabeth Snyder.

Kan Wu
Posted by

Kan Wu

Kan has more than 15 years of experience in building distributed systems. He loves what he does and hopes you love what he builds.

TAGS

Top 3 Best Practices for Configuring Splunk ITSI in a Large-Scale Environment

Show All Tags
Show Less Tags

Join the Discussion