Storage monitoring, or storage performance monitoring, is the practice of tracking the performance, availability and overall health of physical and virtual storage devices. Data storage (provided by vendors such as Dell, EMC, Microsoft as well as countless others) is the backbone of all types of computing endeavors, whether you’re working on a spreadsheet, checking your email or playing a video game on your Xbox over the internet. In fact, there is virtually no mainstream computing operation that does not require access to some form of storage.
While it is an essential technology component of IT infrastructure, storage can be problematic. Network storage is invariably the slowest part of a computer or server, and — depending on the specific hardware technology — it can create significant computational bottlenecks, particularly when many users are attempting to access the same data at once. (This kind of overload is one of the primary ways in which distributed denial of service (DDoS) attacks operate.) Storage devices, particularly traditional hard disk drives, are prone to failure, as parts wear down and eventually fail. Storage devices will also eventually become full, requiring expansion or upgrades, often on an ongoing basis. In short, every computing environment requires high performance and availability. The best practice for doing this is to implement storage monitoring software or a storage monitoring tool.
It’s also important to note that in the modern enterprise, storage is increasingly virtualized, abstracting the physical location of data away from the user via a cloud platform. Storage virtualization can be used to archive or back up data, web or application services, or (increasingly) for web-based email and productivity suites such as Google Docs.
In this article, we’ll discuss the different types of enterprise storage, how storage monitoring works and what to look for in a storage monitoring solution.
Network-Attached Storage (NAS) vs Storage Area Network (SAN)
NAS (network-attached storage) and SAN (storage area network) are the two primary forms of on-premises networked storage. The names sound quite similar, which often leads to confusion, but they are separate technologies.
NAS refers to a hardware device that is connected to an enterprise’s local network. A NAS device typically works over ethernet or a similar wired connection and is designed to be simple to set up and cost effective. If a business runs low on storage space, it can obtain a NAS device and plug it into the network so it becomes available to everyone. NAS devices can contain multiple drive bays that include failover technology like mirroring or RAID, although some NAS devices are simple enough so they can be repurposed for home use.
Rather than representing a single device, a SAN is a network of storage devices. These devices connect to a dedicated network off the ethernet LAN, generally using the storage-centric fiber channel technology to transfer data to client computers. Because SAN is more expensive and complex than NAS, it is often reserved for applications where low latency — and zero downtime — are critical. Video editing and surveillance video recording are common applications for SAN, demanding high throughput and low latency for the massive amount of data being transferred. Because this data is transferred on its own private network, SAN is able to avoid congestion on the LAN and maintain a sustained, high rate of data transfer.
In summary, SAN is much faster, highly scalable and designed for high-end operations, but it is more costly and requires significant expertise to manage along with a private fiber channel network. NAS on the other hand is an inexpensive, simple technology that can easily be set up with a laptop that operates over your existing LAN.
The Role of Cloud Storage Monitoring
Cloud storage monitoring, such as AWS storage monitoring, is the process of observing, reviewing and managing storage systems within cloud infrastructure, generally implemented through automated monitoring software that provides access and control over the entirety of the cloud infrastructure from one centralized location. Automated performance monitoring techniques can gauge availability and analyze performance, tracked by metrics that can include the number and type of users, database performance, server response times, resource levels, system and process performance, as well as security issues and other disruptions. Administrators can review the operational status and health of cloud servers and components and look for system disruptions or suspicious activities. The ability to continuously evaluate these metrics can give organizations insight into storage system issues or vulnerabilities long before bigger and more damaging problems arise.
Considerations for Monitoring Storage Performance
Storage performance comes down to answering a few key questions about how your storage devices are running, as well as network performance and other performance issues. These include:
- Are users waiting too long for data to be received or written on the storage device? (Is the SAN fast enough?)
- Is data being lost in transit due to congestion or error?
- Do storage devices have ample resources to operate without being constrained?
- If problems emerge, is it easy to discover the root cause and the proper solution?
- Is the system approaching a capacity limit?
Storage Monitoring Performance Metrics & KPIs
Storage performance uses metrics to answer the above questions. The key storage monitoring performance metrics include the following:
- Latency (read and write): Latency is a core 1 metric that measures how responsive a storage device is to requests. This is generally measured in milliseconds, tracking the time it takes for data to be read from or written to the disk.
- Throughput (read and write): Throughput is a close neighbor of latency, measuring the number of bytes read to or written from the device per second. Throughput will change based on demand for data transfers, but the goal in monitoring throughput is to ensure that a device is not regularly “maxed out” at its highest throughput rate.
- IOPS (input/output operations per second): IOPS, closely related to throughput, is a widely referenced metric that gauges how many reads and writes a device is successfully completing each second. It is the analyst’s job to verify that the actual IOPS metric is reasonably close to the specified IOPS for each device and to ensure that it is not degrading over time, which could indicate a larger problem.
- Utilization: This metric refers to the SAN’s CPU utilization, measuring how much time is being spent processing various storage requests. Utilization that climbs above roughly 50% for more than a few seconds could indicate a problem with the SAN.
- Queue length (read and write): The queue length (also called queue depth) of a storage device is the number of input and output requests pending at any given moment. Because a disk can generally handle only one operation at a time, some level of queueing is normal, although this number should always be small (under three at any given time). Queue length is also correlated with latency — high queue length but low latency means that your storage devices are successfully keeping up with high levels of demand.
- Capacity available: This is a simple metric that measures storage resources — specifically how much empty space is available on a storage device. There’s no universal guideline for an acceptable level of free, available capacity, but generally, when available capacity drops below 20%, it’s likely time to upgrade your SAN.
All of these metrics can be sampled in real time, analyzed as an average over a certain time frame (the last hour, for example) and graphed in visualizations over longer periods of time. The IT department monitors these statistics and watches for any troubling trends that might be developing. High-quality storage monitoring software as part of your regular management software suite will make this easier by using intelligence to inform the analyst as to whether something requires attention.
Common Best Practices for Storage Monitoring
Here are some of the most widely-cited best practices for storage monitoring in your environment:
- Understand your organization’s data usage patterns: Does your organization frequently read and write small files or does it infrequently read and write large files? Is the organization subjected to periods of burst traffic (e.g. over the holidays?) The way your organization uses data can impact the storage architecture you choose and affect your analysis of monitoring metrics.
- Use storage management tools that offer a centralized dashboard: This is especially critical for visualizing usage if your data is stored in multiple locations. If you have numerous storage products from different vendors, carefully consider your management tool’s compatibility with all of them.
- Use monitoring statistics to reconsider how storage is being used: Frequently, IT departments that roll out storage monitoring find that some systems are heavily taxed and some are barely touched. Optimizing the locations in which datasets are stored spreads workloads out among various devices, providing better overall performance.
- Ask if compression can be used to improve performance: Compression can also improve your available storage capacity.
- Don’t just analyze past performance; predict future trends: Use storage metrics to look ahead at and plan for capacity expansion down the road.
- Create a future plan for capacity expansion: Your storage management tool should also use trend data to spotlight decaying performance and predict potential device failures.
- Monitor backup devices: Backups should be monitored in the same way as primary storage devices.
Storage Monitoring Tools
Storage monitoring tools have matured in recent years and those specifically designed for the enterprise now offer a broad set of features. Products will likely fit into one or more of these categories:
- NAS Monitoring: These are generally simpler, more basic tools designed to analyze NAS products.
- SAN Monitoring: Like SANs themselves, these tools are enterprise-class products that can monitor a SAN environment, usually supporting multiple vendors’ products.
- Virtual Storage Monitoring: If you have resources in the cloud, you’ll need to ensure your storage monitoring solution can keep tabs on them.
- Network Monitoring: Many tools integrate the monitoring of physical storage and the broader network (or treat storage monitoring as an add-on to a network monitoring tool).
- Application Monitoring: As with network monitoring, some storage monitoring tools can also analyze application performance, particularly as it relates to input/output operations.
Benefits of Storage Monitoring
A properly implemented storage monitoring solution offers myriad benefits to the enterprise. Here are some of the primary advantages:
- Better visibility: Storage monitoring offers improved visibility to IT staff, typically through a unified interface or dashboard that provides a visualization of how storage operations are performing both in real time and in the long run.
- Better performance: By optimizing storage systems, end users are able to benefit from reduced latency and performance bottlenecks, as well as increased throughput, ultimately improving user productivity through a more responsive storage system..
- Higher uptime levels: Storage monitoring enhances alerting capabilities and gives IT the ability to discover errors as they are developing and before they become critical problems.
- Reduced risk of data loss: Equipment failures often result in some level of data loss, depending on how regularly backups are performed. When these failures are avoided, the chances of data loss are reduced..
- Better capacity planning: A real-time look at available storage gives the IT department more advanced warning when it comes to storage capacity expansion planning.
- Lower cost of ownership: When storage runs smoothly, fewer crises emerge, resulting in less overtime, fewer emergency hardware purchases and less money spent on the storage function.
Common Challenges for Storage Monitoring
In complex environments, storage monitoring can become more challenging than simply ensuring hard drives are operating according to specifications and aren’t becoming full. Some of the possible challenges that a storage monitoring team might face include:
- Very large data stores: The IT team has to constantly manage multiple terabytes of unstructured data across a sprawling storage infrastructure, affecting bandwidth and availability. Despite these challenges, they also have to ensure reliability.
- Increased user mobility: Legions of employees working remotely during the pandemic created a new and particularly thorny challenge: How do you monitor and manage storage on users’ laptops and smartphones? Reliably available data has become an even bigger issue in light of the vast number of people who continue to access it remotely.
- The need for enhanced security: The rapid rise in the number and severity of malicious attacks has increased the need for comprehensive data security. Encryption makes data safer, but it also impacts performance, necessitating the development of new strategies to ensure responsiveness.
- The cloud: Data moving from on-premises storage devices to virtualized cloud environments masks visibility into storage systems
Dealing with Data Storage Growth
The following are ways you can reduce and minimize the explosive growth of storage data in your environment.
- Create flexible data retention policies: Retention policies and schedules set in accordance with your needs and priorities allow you to determine how long to hold onto information based on the type of data being stored, business and regulatory needs and other criteria. This flexibility can also extend to specific types of files (e.g. MP3s, video files, etc.), which will help companies avoid storing employees' personal music or video files when data is backed up.
- Integrate with existing platforms: Many storage providers are also integrating cloud storage tiering directly into their products, which is known as native integration. With native integration, you don’t have to purchase or maintain additional software, so there’s less infrastructure to manage. You also won’t require additional servers to support the software — everything you need comes in the operating system from the storage provider itself.
- Implement a centrally managed storage solution supporting multiple locations: Backups can be challenging if your company is highly distributed with numerous offices around the country or globe. Having a centralized backup system in one place allows companies to manage backups from anywhere and continuously support their distributed environments, saving time and reducing IT resources.
- Automate storage and backup processes: Using automated storage and backup systems save time otherwise spent managing data, while also decreasing the time it takes to restore data when necessary, translating into reduced IT staff needs, labor and operating costs. Cloud storage solutions such as Amazon S3 or Azure Blob for your backups, snapshots or replicas are reliable, simplify maintenance tasks and are cost-effective, allowing you to only pay for what you use.
- Select a storage system that scales with your business: When choosing a storage solution, you’ll likely pick one that best aligns with your needs and budget constraints. But looking ahead, you’ll also want to select one that can scale as you — and the volume of your data — continue to grow. Having this foresight will likely save you future hardware, software, replacement and implementation costs down the road.
Account for Storage Monitoring Parameters
The key storage monitoring parameters are in line with the aforementioned storage monitoring metrics that include latency, throughput, IOPS, utilization, queue length and available capacity, to name some of the most critical factors. These parameters can and should be measured at various levels of storage infrastructure, including the storage array level, storage pool level, volume level, LUN (logical unit) level and disk level. These five levels describe increasingly granular storage systems, ranging from the largest and broadest group of storage devices — the storage array — to the smallest and most specific, the single disk.
When monitoring the storage environment, all of these operational categorizations need to be considered separately. Metrics running at the storage array level may indicate the entire array is operating according to specifications when looked at broadly, but upon drilling down to the disk level, an analyst may find that a single disk may be overtaxed and in need of attention.
Ready to get a better handle on your storage infrastructure? Here’s how to get started with a storage monitoring solution:
- Inventory storage assets: You can’t monitor something you don’t know, so start by making a detailed inventory of your storage assets, including NAS and SAN devices, server-based storage, offsite storage and cloud storage assets. Backup devices like tape systems are often overlooked in this process, but they also need to be included.
- Select and implement a storage monitoring system: Different applications offer different levels of support for the various environments, so ensure that any product you’re considering supports all of them. Ideally you want a monitoring tool that can keep tabs on all of these storage products through a single, unified dashboard.
- Understand your organization’s data usage patterns: As you begin using a storage monitoring system, consider your organization’s data usage habits. A video editing team may require the absolute best performance and low latency, while the accounting team may instead stress the importance of data security and reliability. Use your storage monitoring system to determine user experience and how successfully these needs are being met by following key metrics and establishing minimum performance thresholds where relevant.
- Set up alerts to inform analysts when problems arise: Alerting thresholds can be helpful in raising awareness about emerging problems, whether that’s an overloaded hard drive, a performance slowdown, downtime or an imminent device failure. Triggering systems can open a ticket, send an email or create some other type of notification when thresholds are crossed.
The Bottom Line: Staying on top of your storage systems is essential to the success of your environment
Storage simultaneously represents the most essential and least reliable technology component in the enterprise. It’s also an absolute constant: Although the degree of its importance changes from one organization to the next, no modern business and its associated end users can function without access to reliable storage. Without it, any level of computing simply becomes impossible. Storage monitoring is not nearly as complex an endeavor as other aspects of IT management, but it is nonetheless one of the most essential pieces of the puzzle. When storage is involved, it’s a case of not if but when an error, outage or downtime will occur. Failing to invest in a capable storage monitoring solution will put your business at risk.
What is Splunk?
This posting does not necessarily represent Splunk's position, strategies or opinion.