“Resilience” is hip, but it is not new. In fact, its first known usage dates back to 1626 by Sir Francis Bacon in Sylva sylvarum; or, A naturall historie. Today, the Oxford English Dictionary defines resilience as “the quality or fact of being able to recover quickly or easily from, or resist being affected by, a misfortune, shock, illness, etc.; robustness; adaptability.”
U.S. government agencies, like the Department of Homeland Security and industry alike — including Splunk — focus on building resilience given our critical dependence on Information and Communications Technology (ICT) and the pervasiveness of disruptions by bad actors, glitches or natural (or manmade) disasters.
Director of the Cybersecurity and Infrastructure Security Agency (CISA) Jen Easterly recently called out Ukraine’s approach to cybersecurity. In an August 9 blog, she writes, “Ukraine has demonstrated an impressive ability to quickly respond to, and effectively restore its critical infrastructure, despite facing barbaric kinetic attacks. It is critical for the United States to take inspiration from Ukraine's successes and proactively fortify its defenses and improve its response and recovery mechanisms.”
There are several models for building resilience. Most align with Easterly’s model discussed in that same blog:
First, organizations must identify their most critical functions and assets, define dependencies that enable the continuity of these functions, and consider the full range of threats that could undermine functional continuity.
Second, organizations must perform dedicated resilience planning, determining the maximum downtime acceptable for customers, developing recovery plans to regain functional capabilities within the maximum downtime, and testing those plans under real-life conditions.
Finally, organizations must be prepared to regularly adapt to changing conditions and threats. This starts with fostering a culture of continuous improvement based on lessons learned and evolving cross-sector risks.
Each step is critical, but the foundation to success is measuring and monitoring performance and the time it takes to recover or rebound. Businesses and governments alike need telemetry to understand the status of network operations and identify anomalous activity. This capability encompasses more than detecting security issues — it also includes monitoring and discerning the difference between technical glitches, such as hardware or software failure, or physical disruption, such as weather or an explosion.
A critical gap is the need for exercises within enterprises to test resilience capabilities. In 1997, the U.S. government sponsored an Eligible Receiver exercise designed to test its ability to identify, detect, and respond to a series of digital and physical infrastructure events. The results were eye-opening. There was widespread confusion as NSA’s red team had little difficulty disrupting systems. Given enterprise dependence on digital infrastructure, resilience exercises are warranted. These need not be as complex as past USG exercises, but they should test critical digital infrastructure dependencies. Many companies maintain “follow the sun” operations which would require tests to encompass operations overseas too.
The world has changed dramatically since 1997. The word “cyber” was new to almost everyone. This was the era of dial-up modems operating at 56 kilobytes per second. Since 1997, the US Government has run a series of Cyber Storm exercises, the most recent in 2022.
Simplistically, enterprise digital infrastructure requires an EKG heart monitor and pacemaker. The former evaluates the status of networks to identify potential problems quickly, and the latter keeps operations on track. This is not an academic exercise for enterprises, given the SEC’s recent ruling requiring reporting an “incident” material to investors within four days and a requirement catalog monitoring capabilities. A key question becomes how fast enterprises can detect and remediate a digital event.
Given the dramatic changes and enterprise dependency on digital infrastructure over the past 25 years, it is time for enterprises to begin regular resilience tests too.
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.