Monitor Mesos with SignalFx

SignalFx is integrated with Apache Mesos, a cluster manager that allows you to distribute workloads across physical servers. Mesos abstracts computing resources like CPU and memory away from physical servers, instead providing dynamic resource allocation for each application running on the cluster. Mesos helps operators increase the resource utilization and efficiency of their servers.

mesos on integrations page

You can use SignalFx’s built-in dashboards for Mesos to monitor the health of your Mesos deployment using collectd and the collectd-mesos plugin (originally written by SignalFx customer Grovo!). This plugin collects data about the overall cluster, each Mesos master and slave, resource utilization and tasks. You can get started with our Mesos integration from the Integrations page in SignalFx or download the plugin here.

If you use collectd and the collectd-mesos plugin, SignalFx provides built-in dashboards displaying useful metrics about your Mesos cluster, including:

# Slaves/Cluster Total # CPUs/Cluster Total Memory/Cluster
Tasks Finished Tasks Staging Problematic Master Tasks
Slaves by Host CPU % Slaves by Host Disk % Top Hosts by Slaves CPU %
Top Hosts by Slaves Memory % Top Slaves by # Tasks Failed Top Slaves by Tasks Lost
Top Slaves by Uptime (sec) Tasks Running Slaves Connected
Resources % Messages Dropped Top Clusters by # Tasks Running
Connected vs. Active Frameworks Top Hosts by Slaves Disk % Tasks Running 1w Growth %
Slaves by Host Memory % Top Clusters by # Tasks Running  

For complete documentation of the metrics available from Mesos, click here.

Using the SignalFx built-in dashboards for Mesos clusters as well as individual master and slave nodes, you can monitor the following important metrics:

Task Status: It’s important to keep track of the status of tasks in the cluster. An increase in failed tasks for a master or slave can indicate a problem with a framework.
mesos cluster - task monitoring

Host Performance: SignalFx helps you identify the performance of individual Mesos hosts in the cluster. An increase in failed tasks for many masters and slaves on a single host may indicate a hardware problem.
mesos cluster - hosts and slaves

Week-Over-Week Change: SignalFlow Analytics makes it easy to monitor the week-over-week growth of tasks in your cluster, to keep track of changing workloads.

Cluster Connections:
 An unexpectedly low number of connected slaves on a Mesos master can indicate a network problem preventing them from connecting. To verify this, check to see if there’s an unexpectedly high number of dropped messages.
mesos master - connected slaves

Task Detail: On the Mesos master dashboard, you can view in detail the number of tasks that are finished, failed, lost or errored out. Monitoring connected and active frameworks can help you determine the health of your Mesos scheduler.

Rebecca Tortell
Posted by

Rebecca Tortell

Rebecca is a product manager with many years of experience helping startups make products that users love. Previously she worked at companies like Turn, Playdom, and Disney Interactive.

Monitor Mesos with SignalFx

Show All Tags
Show Less Tags

Join the Discussion