SignalFx is integrated with Apache Mesos, a cluster manager that allows you to distribute workloads across physical servers. Mesos abstracts computing resources like CPU and memory away from physical servers, instead providing dynamic resource allocation for each application running on the cluster. Mesos helps operators increase the resource utilization and efficiency of their servers.
You can use SignalFx’s built-in dashboards for Mesos to monitor the health of your Mesos deployment using collectd and the collectd-mesos plugin (originally written by SignalFx customer Grovo!). This plugin collects data about the overall cluster, each Mesos master and slave, resource utilization and tasks. You can get started with our Mesos integration from the Integrations page in SignalFx or download the plugin here.
|# Slaves/Cluster||Total # CPUs/Cluster||Total Memory/Cluster|
|Tasks Finished||Tasks Staging||Problematic Master Tasks|
|Slaves by Host CPU %||Slaves by Host Disk %||Top Hosts by Slaves CPU %|
|Top Hosts by Slaves Memory %||Top Slaves by # Tasks Failed||Top Slaves by Tasks Lost|
|Top Slaves by Uptime (sec)||Tasks Running||Slaves Connected|
|Resources %||Messages Dropped||Top Clusters by # Tasks Running|
|Connected vs. Active Frameworks||Top Hosts by Slaves Disk %||Tasks Running 1w Growth %|
|Slaves by Host Memory %||Top Clusters by # Tasks Running|
For complete documentation of the metrics available from Mesos, click here.
Using the SignalFx built-in dashboards for Mesos clusters as well as individual master and slave nodes, you can monitor the following important metrics:
Task Status: It’s important to keep track of the status of tasks in the cluster. An increase in failed tasks for a master or slave can indicate a problem with a framework.
Host Performance: SignalFx helps you identify the performance of individual Mesos hosts in the cluster. An increase in failed tasks for many masters and slaves on a single host may indicate a hardware problem.
Week-Over-Week Change: SignalFlow Analytics makes it easy to monitor the week-over-week growth of tasks in your cluster, to keep track of changing workloads.
Cluster Connections: An unexpectedly low number of connected slaves on a Mesos master can indicate a network problem preventing them from connecting. To verify this, check to see if there’s an unexpectedly high number of dropped messages.
Task Detail: On the Mesos master dashboard, you can view in detail the number of tasks that are finished, failed, lost or errored out. Monitoring connected and active frameworks can help you determine the health of your Mesos scheduler.