Availability describes the amount of time a device, service or other piece of IT infrastructure is usable — or if it’s available at all. Because availability, or system availability, identifies whether a system is operating normally and how effectively it can recover from a crash, attack or some other type of failure, availability is considered one of the most essential metrics in information technology management. It is a constant concern. After all, an entire business may not be able to operate if an essential piece of hardware or service is not available.
Any number of business processes and internal and external factors can impact availability, making it particularly difficult to achieve. Denial of service attacks, hardware and IT service failure and even natural disasters can all impact availability and extend the mean time to repair (MTTR). A problem on a third-party service provider’s shared cloud server can cascade downstream to impact another organization’s availability. And in any IT environment, numerous devices and services interact — and an issue with a single device or service can cause a wide-scale outage. For example, if a key database is corrupted, a critical web server may become unavailable, even if the underlying hardware, operating system and network have not been impacted.
Availability is commonly represented as a percentage point metric, calculated as such:
Availability = (Total Service Time) – (Downtime) / (Total Service Time)
This metric can also be represented as a specific measure of time. For example, if Server X has a stated availability (or a promised availability) of 99.999% (known in the industry as ‘five nines’) over the previous month, it has a maximum downtime of 26 seconds per month.
In this article, we’ll examine how enterprises can achieve high levels of availability in a variety of operating environments, as well as the benefits and costs of doing so.