Getting Microsoft Azure Data into Splunk

Tips & Tricks May 08, 2020 Jason Conger

If you're reading this, you're probably wondering how to get data from various Microsoft Azure services into Splunk. With the growing list of Azure services and various data access methods, it can be a little cloudy (pun intended) on what data is available and how to get all that data into Splunk.

In this blog post, I'm going go over how Microsoft makes Azure data available, how to access the data, and out-of-the-box Splunk Add-Ons that can consume this data. So let's dive right in.

How Microsoft Makes Azure Data Available

There are 3 main ways Microsoft makes Azure data available.

Storage Accounts

This was the standard back in the day when Azure was introduced. Basically, Microsoft will dump data from a service into a separate storage location (called a storage account). For example, if you want Virtual Machine event logs, Azure will dump those into a storage account you specify. Since storage accounts are a separate service than a VM, the data about the VM will live on even after you delete the VM. Storage accounts have their own security and retention mechanisms, but we won't get too much into the weeds here. Just know that a source service could be configured to dump data into a separate storage account for retrieval.

Event Hubs

Talking about standards, Event Hubs are the new standard for most Azure services. I like to think of Event Hubs as a scalable, relatively short-term, message bus. What I mean by this is Azure can dump data onto an Event Hub (via a service called Azure Monitor). This is similar to the storage account methodology mentioned above. However, data that goes onto an Event Hub is meant to be retrieved by something else. In fact, Event Hubs have a pretty short retention time for events (typically 24 hours to 7 days). Event Hubs can also scale up or down depending on the load necessary for receiving or delivering data. Hint: if the terms Pub/Sub, Kafka, producer and consumer mean anything to you, think in those terms. If not, forget that last sentence or just Google (or Bing) those terms if you want to dive a little deeper.

REST APIs

The third major way Microsoft makes Azure data available is REST APIs, and there are a lot of them. In the context of Splunk, you're typically looking for the "List" operations. For example, here are all the operations for Azure VMs. The Microsoft Azure Add-on for Splunk (more about that add-on in a bit) uses the "List All" operation to, well, get a list of all the VMs you have in Azure. You can use this information as entities in Splunk IT Service Intelligence (ITSI), Splunk Enterprise Security, or correlate it with other data sources in Splunk.

What Data is Available?

Now that you know the 3 main ways Microsoft makes Azure data available, let’s talk a bit about what data is available. There is no way I could create a comprehensive static list of all the data sources, so I'll stick to some popular Splunk-centric sources.

Activity data _{^{[REST] or [Event Hub]}}: This is basically who did what and when. For example, if I log on to the Azure portal and create a new VM, the VM creation action is captured in an activity log.
Resource data _^[REST]: This data source covers what services you use. If you think of the activity data as "something happened", think of the resource data as "something exists". For example, Virtual Machines, storage accounts, public IP addresses, etc. are all resources.
Authentication data _{^{[REST] or [Event Hub]}}: This is pretty self-explanatory, but I will point out that you can get things like multi-factor authentication data, self-service password reset data, conditional access policy data, and a whole set of Azure Active Directory data.
NSG flow logs _{^{[Storage account]}}: This source is like a network trace including source and destination IP addresses, ports, protocols, etc. For more information on this topic, check out this blog post.
Web Application and App Insights _{^{[Storage account]}}: Web Application data includes web server data (hosted or shared) as well as your web application data. App Insights is APM data.
Cost and consumption _^[REST]: This data source contains details on what services you are using and how much that usage costs. This data can also include VM reservation recommendations to save you money on your VM spend.
Alerts _{^{[REST] or [Event Hub]}}: Both service and security alerts are available as part of the activity log. An example of a service alert may be a degradation of a service in a region. For example, if storage services were impacted in a region you use, that alert and relevant messages would be available. To give you an example of a security alert, Microsoft may send an alert that you only have one global admin.
Metrics _^[REST]: Azure makes a plethora of metrics available. The entire list of available metrics is available from Microsoft here.

How Can Splunk Access Azure Data?

So now that you know how Microsoft makes Azure data available and some different types of data available, how do you go about getting that data in Splunk? The simple answer is add-ons. The two main add-ons used are the Splunk Add-on for Microsoft Cloud Services, and the Microsoft Azure Add-on for Splunk.

Did you notice the _{^{[Storage account]}}, _{^{[Event Hub]}}, and _^[REST] tags above? Those tags are going to help us decide which add-on to use. Here we go.

Splunk Add-on for Microsoft Cloud Services

Activity data and Alerts _{^{[REST] [Event Hub]}}
Authentication data _{^{[Event Hub]}}
NSG flow logs _{^{[Storage account]}}
Web Application and App Insights _{^{[Storage account]}}

Microsoft Azure Add-on for Splunk

Resource data _^[REST]
Authentication data _^[REST]
Cost and consumption _^[REST]
Metrics _^[REST]

Did you notice a pattern there? The Splunk Add-on for Microsoft Cloud Services integrates with Event Hubs, storage accounts, and the activity log. The Microsoft Azure Add-on for Splunk integrates with various REST APIs. Notice that the Splunk Add-on for Microsoft Cloud Services can get the activity log via the REST API or Event Hub. It's the same data either way.

Visual Getting Data In (GDI)

They say a picture is worth a thousand words, so this Sankey diagram will help visualize all those words…

^{Hint: click the image for an interactive diagram.}

Style

two-column

No results

/en_us/blog/fragments/about-splunk

/en_us/blog/fragments/subscribe-footer