IT

Get The Most Out of Splunk Infrastructure Monitoring and Splunk ITSI

SignalFx was founded in 2013 to enable customers to gather and monitor key information for both their application code and infrastructure. The efficient metrics storage technology enables both high cardinality of metrics as well as the no-sample method of gathering every APM trace. 

This combination of efficiency and standardization is exposed through the SignalFlow language to allow access and manipulation of the vast amount of metric data. If you haven’t already, take the training sessions offered. 

Splunk was founded in 2003 to help customers build monitoring and troubleshooting capabilities by transforming and searching textual data such as log files, API data, system data, and any type of machine-generated data. Over the years, customers have discovered different ways to leverage the Splunk platform to quickly and easily search, correlate, and reason over huge quantities of data (think petabytes per day!). 

In 2015, Splunk upped the ante by introducing Splunk IT Service Intelligence (ITSI). ITSI provided the ability to create correlation searches and key performance indicators (KPIs), and then use predictive health scoring, event analytics, and complex machine learning algorithms to drastically reduce mean time to resolution and minimize alert fatigue. ITSI also provides dynamic visualizations that appeal to all levels of the organization from the C-level to the system administrator.  

Combining the capabilities of Splunk Infrastructure Monitoring (formerly known as SignalFx) with the power of ITSI provides a solution that's far greater than the sum of its parts. In this two-part article we examine how to leverage the integration of these offerings in your environment to save you time in finding the problem, and also uncover problems you didn’t know you had. 

Part 1: Splunk Infrastructure Monitoring Add-on

To integrate Splunk Infrastructure Monitoring with Splunk, start by downloading and installing the new Splunk Infrastructure Monitoring Add-on from Splunkbase.

The add-on includes a command called sim as well as two subcommands that you can use in Splunk Processing Language (SPL). These commands let you retrieve data on-demand from your Infrastructure Monitoring realm. The add-on also provides an out-of-the-box modular input to help you efficiently fetch data on a regular basis from your Infrastructure Monitoring environment. Let’s look at the commands and inputs.

Sim Command

This sim command within the Splunk Infrastructure Monitoring Add-on has a simple syntax. Because it’s an operating command, it always begins with a pipe. Following the pipe are two possible subcommands: flow and event. After one of those subcommands you pass a query. The next two sections go over the details of each subcommand.

Flow Subcommand

The flow subcommand lets you pass a SignalFlow query directly into your Infrastructure Monitoring instance and ingest it directly into Splunk without bringing any of it into your Splunk indexes. You can then manipulate the data using SPL just like any other indexed data in Splunk.

For more information about the SignalFlow query syntax, see SignalFlow Analytics Language in the Infrastructure Monitoring documentation. Consider attending a SignalFlow training to learn some useful components of these queries. 

Here’s an example of using SignalFlow within the Splunk Infrastructure Monitoring Add-on, then manipulating that data with SPL to produce a more meaningful visualization: 

| sim flow query="data('instance.cost', rollup='average').publish()" 
| stats avg(_value) by aws_instance_type

This query retrieves the per-minute cost of your AWS instances directly from your Infrastructure Monitoring environment, and contains some SPL to produce the following visualization: 


This chart is a simple example, but imagine the possibilities! You could retrieve the instances by AWS account, or by tag, or by region. Anything from Infrastructure Monitoring that helps you produce a useful visualization.

Because this data now exists in your core Splunk instance, you can include a variety of other (possibly older) data alongside this visualization, such as your syslog-based monitoring, or your network devices being monitored by Splunk Enterprise.

It’s important to note that executing this command does NOT bring any data into Splunk indexes, so it doesn’t increase your data ingestion costs. You can use the data from Infrastructure Monitoring without paying twice for that data ingestion.

For more information about the flow subcommand, see flow query syntax in the Splunk Infrastructure Monitoring Add-on manual.

Event Subcommand

The event subcommand passes a key-value query to Infrastructure Monitoring. You might need to do some experimentation to get the correct detected events. For example, the following example retrieves any detected event related to the AWS/EC2 namespace:

| sim event query=”namespace:AWS/EC2”

You can use wildcards to get more granular data. The following query uses an asterix wildcard to retrieve any detected events from Azure virtual machines:

| sim event query="resource_type:Microsoft.Compute*"

Sure, you could build a more specific query that uses detector names pointing directly to a specific event created in Infrastructure Monitoring. That would help you build a targeted workflow around one detector in Infrastructure Monitoring. However, building your query effectively using wildcards or other values lets you find any detector regardless of how it was created. See part 2 of this article to find out how Splunk ITSI correlates and manages these events. 

For more information about the event subcommand, see event query syntax in the Splunk Infrastructure Monitoring Add-on manual.

Modular Input

In addition to the sim command, the Splunk Infrastructure Monitoring Add-on also contains a modular input. Within the modular input, you can specify a SignalFlow query to run on a regular basis and place that data into a core Splunk metrics index. This tactic has the following benefits:

  • The data returned by your SignalFlow query populates a local metrics index, letting you offload that data from the standard retention time of Infrastructure Monitoring.
  • You still aren’t charged Splunk ingestion costs for this data, so you don’t pay again for the data you already ingested into Infrastructure Monitoring.
  • The SignalFlow query you specify opens a single streaming channel, improving the efficiency of data retrieval.
     

Configuring this modular input is simple. Navigate to the input within Splunk Web by going to Settings > Data Inputs > Splunk Infrastructure Monitoring Data Streams. Then click New and provide the SignalFlow program you want to use. The following image shows an example SignalFlow expression:


(Note: After clicking Save, you have to click Enable on the input to begin retrieving data.)

This SignalFlow Program is exactly the same as the example in the previous section. However, the add-on now executes this query on a regular basis and brings that data into your local Splunk index. Then, using the following query, you can create the exact same visualization as in the previous example:

| mstats sum("instance.cost") where index=sim_metrics by aws_instance_type 

Here are some important things to note when creating a new modular input:

  • You can specify the resolution of the metrics retrieved from Infrastructure Monitoring. One of the fantastic features of Infrastructure Monitoring is the high resolution of data. In this input, it’s possible to set the resolution to a lower number such as 60000 (1 minute) with a short and efficient query, like the one in the previous example.
  • You can specify “Additional Metadata Flag” to bring back any associated metadata with that metric that you may have applied within using Infrastructure Monitoring.
  • The data is brought into a stash index in Splunk, meaning it doesn’t count towards your Splunk data ingestion costs.
  • You can apply a Splunk sourcetype to the data brought in by this modular input which lets you create fields at ingestion time.
     

For more sample modular input SignalFlow commands, see Configure inputs in the Splunk Infrastructure Monitoring Add-on

Part 2: Splunk IT Service Intelligence

Splunk IT Service Intelligence (ITSI) provides a broad view of everything you monitor within the context of a service. You likely monitor many different components in your environment using a variety of tools. You can collect all that data into a log monitoring platform like Splunk, and sometimes into an advanced signal monitoring tool like Infrastructure Monitoring, formerly known as SignalFx. This data often comes from a closed system or a SaaS provider directly into the Splunk HTTP Event Collector (HEC). ITSI brings all of those different pieces of data together into one view that lets you monitor your entire system.

ITSI version 4.7.0 includes a new integration with Infrastructure Monitoring. The Content Pack for Splunk Infrastructure Monitoring is designed to monitor multiple cloud providers within ITSI. The content pack includes key performance indicators (KPIs) that monitor critical functions, as well as correlation searches that fetch events from Infrastructure Monitoring for use in ITSI Event Analytics. Finally, the content pack provides deep linking back into your Infrastructure Monitoring interface to reduce friction while leveraging both applications within a single organization.


While this released content pack is designed to monitor a subset of cloud providers, you can model its functionality when creating your own services that are important to you and your organization.

Here’s a list of objects included in the content pack along with details to help you understand how to use them in your environment:

 

KPI Base Searches

 


Each cloud provider has different measurements for the objects in their environment. The preconfigured KPI base searches help you understand what those metrics are and where they come from. The searches also show you how to connect the KPI to a specific entity.

 

Entity Discovery Searches

 


Each cloud provider has a different way of identifying a specific entity. For example, AWS might use a unique instance ID, whereas Google Cloud Platform needs to use both the project ID as well as the instance name to uniquely identify an entity. The discovery searches within the content pack retrieve the identifiers needed to create an entity that ITSI can use to monitor performance characteristics.

 

Entity Types

 


New in ITSI is the ability to assign a specific type to an entity which allows for both Navigation Suggestions and a Key Metrics view. This Content Pack includes a definition of the new SignalFx Entity Type, that will build a link directly to the resource in the Splunk Infrastructure Monitoring dashboard (formerly known as SignalFx). Also in this Content Pack are Key Metrics defined for each of these new types, giving you a new infrastructure view of a given metric; rather than a single entity’s metrics you see all of them aggregated.

 

Services

 


This content pack includes services for three cloud providers: AWS, GCP, and Azure. Services are defined for both virtual machines and serverless functions - two popular uses for cloud platforms today. You can easily expand these services into something like RDS, or various other data points that a cloud provider might offer.

Optionally, you can customize the KPI thresholds and apply machine learning capabilities on top of these services to get an intelligent and predictive view of your Splunk Infrastructure Monitoring environment.  

 

Correlation Searches

 


The correlation searches in the content pack use the  event subcommand from the Splunk Infrastructure Monitoring Add-on to identify any detectors that have fired for the given cloud provider. For example, if a detector creates an event when you exceed a limit on unreserved instances of AWS (in other words, it costs you additional money), the AWS correlation search also creates a corresponding notable event in ITSI.

Each notable event in Episode Review contains a drilldown link directly back to the detector that created the event in Infrastructure Monitoring, letting you further investigate in a frictionless way. 

 

Notable Event Aggregation Policies

 


The notable event aggregation policies in the content pack demonstrate how important it is to understand groups of signals or events in your environment rather than one-off events. A given signal might occur once, but an ITSI aggregation policy groups signals from all over your system, including from Infrastructure Monitoring and Splunk Enterprise, into a single episode.

For a full list of objects included in the content pack, see What's new in the Content Pack for Splunk Infrastructure Monitoring.

Summary

The granularity and performance of the monitoring capabilities in Splunk Infrastructure Monitoring are without a doubt the best in the market today. Splunk Enterprise also has a unique capability to bring any data source into an analytics platform. Now that these tools are part of the same company, the new Splunk Infrastructure Monitoring Add-on and Splunk IT Service Intelligence integrate to help you get the most out of each application.

Download and use the Splunk Infrastructure Monitoring TA today.

To learn more, view this .conf20 session on-demand: SFU1536A - Logs and Metrics for Infrastructure Monitoring: Getting the Best of Both Worlds.

Joel Schoenberg is an Advisory Engineer with Splunk, which means he helps guide our product innovations into real and useful outcomes with the customers and partners that use those innovations. Joel has broad experience, working for AppDynamics, Puppet and Microsoft over the years. He brings that customer obsession to the engineering world. In his spare time, Joel enjoys living on a zombie-proof island in the Puget Sound, growing food and making wine.

Join the Discussion