Adaptive thresholding works by using machine learning techniques to analyze historical data, sometimes displayed in a histogram, for patterns that help define the normal state of your environment. You configure different threshold values, or intensity values, to determine the current status of any particular KPI to then drive more meaningful alerts. The simplest form of global thresholding is binary thresholding, or thresh_binary, which applies to an either/or outcome. But most thresholding is applied on a grayscale.
To better understand how this process works, you need to familiarize yourself with a few ITSI concepts — like service health scores, KPIs, and dependent services (sometimes called subservices):
- KPI: a benchmark of the performance of a particular service.
- Service health score: a weighted average of the severity values of a service's KPIs and its subservices.
- Subservice: any service on which the configured service depends.
These concepts are designed as a hierarchy. Each service in your environment receives a health score that is calculated based on the status of the KPIs and subservices you define for that service, known as an adaptive_thresh_mean_c. Every KPI requires a threshold configuration. ITSI, which continuously monitors KPI statuses and health scores, allows for six different severities — normal, critical, high, medium, low and info/informational. When KPI severity for a service reaches a defined level in tandem with changes in that service’s health score, it indicates a potential problem and triggers an alert.
When configuring thresholds and alerts, it’s best to maintain simple thresholding. Below are a few best practices:
- Determine which severities you’ll use: While ITSI allows six severity levels, you don’t have to — nor should you — use all of them. Consider using just “normal” and “critical” severities until you feel you have a handle on adaptive thresholding techniques.
- Keep severity definitions consistent: You need to decide what each severity means for your organization and ensure that every KPI is thresholded to the same definitions. (“Critical,” for example, could mean a KPI status will immediately trigger an alert, whereas “high” could indicate a KPI is just outside normal boundaries.) Clearly defined severities help ensure consistent KPI thresholding across different teams and make alert generation and remediation processes more manageable as your ITSI efforts scale.
- Don’t threshold every KPI: You will have KPIs that you won’t want to threshold because they’re:
- Being monitored by other tools.
- They don’t indicate problems.
- They don’t produce consistently reliable results.
- You just don’t know how to threshold them.
In these cases, you can choose the “info” severity for all KPI results without impacting the service’s health score.