With our ability to ingest GCP logs and metrics into Splunk and Splunk Infrastructure Monitoring, there’s never been a better time to start driving value out of your GCP data.
We’ve already started to explore this with the great blog from Matt here: Getting to Know Google Cloud Audit Logs. Expanding on this, there’s now a pre-built set of dashboards available in a Splunkbase App: GCP Application Template for Splunk!
Introducing the App
The purpose of this application template is to provide a starting point for various use cases involving GCP data. Once you start to use and learn about the data and value available, you can add to, delete from, and modify this template to fit your requirements, and correlate with other data sources with GCP Data to provide greater Operational or Security Intelligence.
The application supports the use of data gathered using all of the various methods of data collection of GCP data into Splunk, including metrics from Splunk Infrastructure Monitoring (SignalFX), giving a good range of flexibility of how you can use the app.
As this is a template application, it provides a range of monitoring, operational and security dashboards - so can give you a wide reach of useful capability into your cloud teams. However, note that being fully focused on GCP data, it is not designed to be a complete application to replace the wider applications that Splunk has to offer for IT Ops or Security.
A Quick Tour
Once you have configured the application and data feeds into Splunk, the app will be able to give views and dashboards based on the GCP activities from your projects. Each dashboard generally has a filter for projects, allowing you to view things by all projects or by individual projects. The app is split into 2 main menus:
- Operational views
- Security and Audit
The operational view menu presents 14 dashboards of various services within your GCP projects. As an overall view, the Infrastructure Activity and Asset Inventory dashboards both provide a high level view of what has been happening (creates, updates, deletes) and what service assets are being used.
The other views then explore each of the specific services, for example, Compute Engine, Cloud Functions, BigQuery, and so on. There are dashboards that have an overall service level view which then allows you to drill down to the detailed dashboard views of an individual instance or component. For example, with the Compute Engine overview, you can see a summary of what instances you have running, suspended, or stopped in all projects, in all regions, including any errors/warnings. Selecting one of these instances will then drill into the details for that individual instance, giving a full breakdown of its configuration, metrics and events/logs overlayed on the metrics.
Typical use cases for these operational views are mostly to view and understand what cloud resources are being consumed and to troubleshoot or find potential issues or concerns with your resources. With some dashboards presenting logs and metrics overlaid together, this can help to troubleshoot or spot correlated events and metric issues early.
From the Infrastructure Activity view, we can see what inventory activities are occurring across your project(s):
Some dashboards have detailed views, such as the Single Instance view of your Compute Engine resources. Here we see an instance with its configuration, with metrics and an overlaid annotation noting an error that has occurred just before a spike of activity:
Other dashboards provide service level views, such as the Cloud Function Overview. Where there is more detail available on these views, generally clicking on the instance name will drill down to further details:
Here we see an example of this “drill down” where we can now see an individual cloud function with overlaid annotations, again here showing we had an error occurring:
Security and Audit
The Security and Audit menu presents 6 dashboards that are focused on security. These are mainly overview dashboards to give a “big picture” view of activity within projects versus looking at specific security threats. Where a specific investigation of logs is needed, the Audit Investigation dashboard allows you to search through all the audit logs, select times, services, users, or keywords to visualise when these events occurred, and drill out to view the logs for that specific time period.
In the example below, we can filter on source type, audit logs, service, resources or even by method. Where you know specific users were involved (or service accounts) you can also filter by the principal email. Here we are looking for ERROR in all the logs:
With the VPC Overview, you can get a summary of the network activity in your VPCs, and identify any unusual patterns of activity or locations:
With every Splunk Application, the key starting point is to understand what data you have available. The app is extremely flexible and will allow you to ingest logs and metrics using the following routes:
Splunk Add-on for Google Cloud Platform, GCP DataFlow Splunk Template (via HEC), or Cloud Functions.
With many of the operations dashboard views requiring asset inventory information and metrics, it is recommended that you bring this data into Splunk to enable these dashboards.
Note that metrics can be collected via the add-on, Cloud Functions or also by installing the Splunk Infrastructure Monitoring Add-On, Metrics via SIM can be visualised in the app.
Macros are used by the app to configure these settings, and your current configuration settings can be seen in the Setup dashboard in the app.
Most of the app’s dashboards will use audit or event logs from GCP’s operation logs. On your GCP projects, you will need to create log routers to send these operations logs into GCP’s Pub Subtopics. From here, you have the option to get data in by using either the Splunk Add-on for GCP and GCP’s DataFlow template, or Cloud Functions at lower volumes. Note however that the logs collected by the GCP Add-On can potentially be different in structure from the data collected via Google DataFlow. The Add-on takes log content from Pub-Sub and wraps its contents with an additional layer of JSON in a “data.” object. If using DataFlow, the template will send logs that are not be wrapped in with this additional layer of JSON, and is provided as-is to Splunk without the “data.” structure. However, this doesn’t pose an issue to the app, but you do need to set a configuration setting of a macro that allows the dashboards to know whether your logs are from DataFlow (no “data.”) or the Add-On (with “data.”). Note to set the sourcetype of your DataFlow as google:gcp:pubsub:message.
There are 3 main ways of collecting GCP metrics data into Splunk solutions:
- using the add-On
- using Cloud Functions
- using our multi-cloud observability solution Splunk Infrastructure Monitoring, SIM (previously known as SignalFX)
The application can use all of these methods by simply setting a macro to read from the correct index, source and of course type. If you are using metrics from SIM, you will need to install and configure an additional add-on to connect to the SIM metrics data (https://splunkbase.splunk.com/app/5247). The metrics namespaces that you’ll need to collect from will be linked to the dashboards available, and also for those services, mostly only need the key metrics. For example, the Compute Engine dashboards will require metrics dimensions for compute such as CPU, disk and network utilization. (see here for a list of GCP metrics)
If you want to use the dashboards that report the service inventory that you are using in your projects, then you will need to collect Cloud Asset Inventory data from your projects. This can be easily done by either setting up a cron job to invoke regular asset snapshot using gcloud to Cloud Storage and then using the GCP Add-On to read this, or using the Cloud Functions Library to send into Splunk HEC.
As the logs being collected from GCP are in JSON format, it is easy to configure Splunk to do indexed extractions of these logs. Once indexed, we can then run searches using tstats on the logs, making the search time significantly quicker. This is especially useful when looking at multiple projects from your organisation with large number of logs that can come in from the GCP’s audit logs. (Note that there will be overhead in the index size when doing this).
To enable faster searches, the application requires a macro to be set and also some props and transforms updates to the GCP source types to support the indexed extractions.
Details on how to configure all of these settings are available with the app details on Splunkbase, and on the setup dashboard in the app. Tip - if your dashboard is not showing any data, check that you have the logs, metric or asset data in Splunk (with correct source types and indexes in the app settings), and if you’ve not set indexed extractions for the JSON, make sure the macro is set accordingly.
And finally, keep an eye out for updates and new dashboards to the app!