Wouldn’t it be great to peek into the future and find answers to the problems that you’re facing today? This may sound like science fiction, but many companies currently possess this capability, and they are creating strategies around it to strengthen their monitoring and analytical capabilities.
One way is time series forecasting, a statistical method. You can take advantage of the insights of time series forecasting by using techniques like anomaly detection to gain:
- A deeper understanding of your systems
- The ability to alert as soon as potential problems arise
What is time series forecasting?
Time series forecasting is a way to forecast or predict behaviors based on historical, timestamped data.
For example, take a look at the time series data below. The data is charted out and binned with 1 hour granularity. It shows a cyclical pattern as traffic begins ramping up at 8PM, hits its peak at midnight, and then dwindles down entirely by 9AM.
As you can see, the volume of traffic on each day is not identical. For example, Mondays usually have much less traffic than any other day except Tuesday, which appears to be flat. This means that you must account for the hour of the day and the day of the week in order to make a proper forecast.
Benefits of time series forecasting
There are many advantages of using time series forecasting — and the greatest one is the ability to forecast expected behavior at an unknown point in the future.
For example, one of your teams may be tasked with budgeting for storage needed for the next six months. This may not seem too hard if you’re dealing with a handful of servers, but what happens when you have tens of thousands of hosts with multiple volumes?
Another advantage of time series forecasting is the ability to forecast future customer traffic over the coming days and then alert anytime the actual traffic deviates from the expected traffic volumes. This can be used to quickly identify the onset of denial-of-service attacks or a drop in traffic levels which could point to potential problems within your system.
Finding a pattern
It may be difficult to see a usable pattern in the image above, so let’s overlay the data to bring out a pattern that’s more apparent to the human eye. The timewrap command is a perfect solution for this since it will overlay each day with a 24-hour time period on the x-axis.
Using anomaly detection on time series forecast
Time series forecasting will allow you to peek into the future and know what behavior to expect at any point in time. This will enable you to:
- Apply dynamic (adaptive) thresholds to your predicted behavioral trends and overlay the present and past values. If the present value falls outside of the established thresholds, then you can consider it an anomaly.
- Score the severity of the anomaly and compare the severity at past points in time in order to determine how abnormal that particular data point is.
(Read our anomaly detection introduction.)
How to build dynamic limits
Hard thresholds will not work with cyclical data, since the values vary relative to the time of day and day of week, which could lead to:
- Lots of false alerts
- Useless noise that people will quickly learn to ignore
Instead, you can determine how to construct dynamic limits using the following p Chart formula:
”Introduction to Statistical Quality Control” - Douglas C. Montgomery
In the formula above, the UCL represents the upper control limit while the LCL represents the lower control limit.
You’ll need to establish your standard baseline before you can calculate these limits. The standard baseline can be developed from the output of the timewrap command that we discussed above, which we used to overlay daily values on top of one another.
Using a time series forecast to predict disk usage
Predicting future disk usage is another good use case for time series forecasting. You can plot out the disk usage per day and leverage the streamstats command to identify the previous day's value.
If you have enough data points, you will also be able to identify the slope. You can use this to predict the point where the slope will intersect with the total capacity and plot that point on the x-axis. Since the x-axis represents time, this will give you a good idea of how long you have before a server runs out of disk space.
Time series forecasting is statistics at work
Time series forecasting is not magic; it’s a statistical technique that takes advantage of historical, timestamped data in order to predict how the data likely will behave at some point in the future.
What is Splunk?
The original version of this article was written by Steve Koelpin. Steve is a former Splunk professional services consultant and 5x Splunk Trust MVP. He specializes in Splunk IT Service Intelligence, Splunk Machine Learning Toolkit, and general Splunk development. While not behind the keyboard, he is best known as dad.
This posting does not necessarily represent Splunk's position, strategies or opinion.