TIPS & TRICKS

Uptime, All the Time: 3 Data Sources for Better Infrastructure Monitoring

When downtime hits, every second counts. Your organization is losing money, and you’re losing your reputation. But it’s often possible to prevent downtime from happening in the first place. And if the worst does happen, it’s possible to find the root cause faster and with less effort. How? By monitoring and managing the data sources within your infrastructure. Below, we’ve picked three non-negotiables for less downtime and better troubleshooting.

1. Anything and everything you’re running on AWS

Let’s talk about the elephant in the room: AWS. If you’re migrating your infrastructure to the cloud, you’ve got to monitor and maintain those workloads—it’s not necessarily set it and forget it. Luckily, AWS services provide similar types of system and service data as traditional IT infrastructure, whether consolidated by CloudWatch logs, metrics, events or another AWS service—you should be pulling this data into a monitoring solution that allows you to look at this data alongside anything you may still be running on-premises.

2. A no-brainer, but often overlooked: your server logs

Alert fatigue is the worst—server logs can help make it better. Server OS data are full of valuable insights that can help you find the root causes of issues more rapidly. For example, they can tell you the “why” behind the “what” is going wrong. They give you a detailed record of overall system health and forensic information about the exact time of errors and anomalous conditions. Here be sure to look for operational, security, error and debugging data like system libraries loaded during boot, application processes open, network connections, file systems mounted and system memory usage.

3. Everyone’s favorite: storage

Storage: the bread and butter of data sources. With shared storage logs, you can monitor and manage overall system health (both hardware and software), error conditions (like a failed controller, network interface or disks) and usage (both capacity used per volume and file or volume accesses). Pulled together, the information can alert you to problems, the need for more capacity or performance bottlenecks.

While we discussed these three data sources, of course, there may be no limit to the data sources in your infrastructure. And the more you’re monitoring, the deeper and more valuable your insights can be. Check out other data sources to add to your arsenal in our "Essential Guide to Machine Data: Infrastructure Machine Data" e-book.

----------------------------------------------------
Thanks!
Keegan Dubbs

Splunk
Posted by

Splunk

Join the Discussion