Monitoring ICS with Splunk: SCADA, Historians, and Alarms, Oh My!

As many of our customers already know, Splunk can be used to solve thousands of different use cases—from IT operations, security and compliance, to monitoring the systems that run much of the world’s critical infrastructure. We're continually seeking to help our customers solve new and innovative challenges, and are proud to announce the Splunk Essentials for Industrial Control Systems (ICS) Monitoring and Diagnostics.  

What is ICS?

Industrial Control Systems (ICS)—sometimes called SCADA outside of the industry—are responsible for keeping critical infrastructure such as electric grids, oil & gas refineries, wastewater and nuclear facilities all running safely and continually. In addition, manufacturing and transportation management systems all make use of ICS to produce their products, reduce waste and optimize their systems. Much of what underlies the goods and services being produced across the world rely in some form on ICS.

What Are the Primary Concerns for ICS Operators?

One of the key things to understand about ICS is that one concern stands out above all other concerns: SAFETY. This means that the ICS cannot injure people, damage the environment, threaten critical infrastructure or damage the equipment itself.   

As Splunk teams have talked to our customers, three common areas of focus have stood out among them:

  1. Keeping systems running and reducing downtime
  2. Protecting ICS from cybersecurity threats
  3. Optimizing their processes to reduce waste in terms of time, maintenance or product

What Concern Does this Essential Focus on?

The Splunk Essentials for ICS Monitoring and Diagnostics' primary focus is the monitoring of ICS systems in order to reduce downtime. Keeping systems running means that operators are able to know what is happening at all times, and as a result, make good decisions about protecting individuals or the environment while also making sure their company is generating revenue. Downtime must be kept to a minimum and being able to proactively identify issues, diagnose problems, and respond effectively and efficiently are key in doing that. These are all areas Splunk can and is being used by our customers.

What is a Splunk Essential and What Categories of Use Cases are Covered?

Splunk Essentials provide a series of use cases that are common across industries and technologies. They're aimed at helping our customers understand ways in which Splunk can help them solve their problems, providing a “head start” to solve those problems.  

In the Splunk Essentials for ICS Monitoring and Diagnostics, the following categories of use cases are covered:

Alarms and Event Management
ICS Systems are designed to monitor and operate critical processes and generate alarms and events based on conditions. Examples include things like when a critical piece of equipment shuts down, temperatures are outside of normal operating conditions, or when product quality is low. Due to the large number of alarms these systems generate, managing alarms and events means that operators only receive important alarms and are not inundated with nuisance alarms. Splunk’s built-in analytics and investigative abilities can be leveraged with alarms to ensure companies are meeting their alarm management compliance requirements and operators are not getting bombarded with unnecessary alarms.

Each ICS is composed of many working parts and redundancy is a key component. As a result, it's important that all systems be running the same version of software, services, and configuration. Unfortunately, what often happens is small changes are not implemented across all machines and small inconsistencies can develop over time, resulting in potential downtime or failure. Splunk can help solve this problem by letting operations know when program versions or configurations are not internally consistent, thus reducing the risk of downtime.

Database and Data Replication
ICS operators are responsible for maintaining critical services like energy, water, and transportation. Critical services like these need to be available and running at all times. When catastrophic events such as floods, earthquakes or hurricanes strike, operators may need to switch operations to different physical sites or machines to ensure continued operation on a moment’s notice. Splunk capabilities help operators ensure this functionality is working properly and lets them know when problems might exist, thus ensuring operations are ready at all times.

Monitoring System Health
For ICS operators, uptime is critical not only in terms of safety, but also for generating revenue. While some ICS systems provide limited alerting for specific conditions, there are often unknown and untested problems that may occur on a customer’s system. Additionally, alerts may be vague and not provide the necessary detail for support staff to respond quickly. Using Splunk Enterprise’s investigation and custom alerting capabilities, operators can now understand how often problems are occurring, what's causing system failures and provide meaningful context around those failures.

Troubleshooting and Investigation
ICS operators are dependent on several other technologies to operate and manage various aspects of their systems. Problems with individual networks, communication circuits and firewalls can lead to a loss of connectivity to field devices and sites which need to be monitored. Splunk allows operators to focus on troublesome communication paths and understand potential issues with other technologies that may be impacting connectivity.

What's Next?

Interested in seeing what Splunk can do? Download the Splunk Essentials for ICS Monitoring and Diagnostics in Splunkbase.

Chris Duffey

Posted by