Pipelines Full of Context: A GitLab CI/CD Journey

By Jeremy Hicks

Do you know what version of your software is running in production? How often is that software deployed, and was it deployed right before last week’s p0 incident? What sort of dependencies are being deployed along with that software, and are any of them potential security risks?

These are all common observability questions that may be difficult to answer. Obtaining those answers requires deeper insight into not only the deployment pipeline but also the code being deployed, and its Software Bill of Materials (or SBoM). In this post you’ll see how data from GitLab CI/CD pipelines can be leveraged and integrated into Splunk to help get your organization the answers it needs concerning these parts of the software lifecycle all in one place:

Continuous Integration Testing
Code / Dependency / Secret Scanning
Deployments
SBoM generation

By combining your CI/CD, SBoM, and code scanning data with what you already ingest into Splunk, you can supercharge your observability and DevOps practices! As an example this post will leverage GitLab CI data to illustrate what is possible with software deployment pipeline data in Splunk.

Purpose Built Pipelines

Like robust software monitoring, repeatable deployment pipelines and automation are a requirement of successful software organizations. They both attempt to bring order to the often chaotic environments, dependencies, and processes of software development. Luckily, these practices can also help each other out! Deployment pipeline data can help enrich monitoring and observability practices by providing better context of what is going on as software is built, tested, and deployed.

When setting up monitoring of pipelines for software deployment the answers to some basic questions should be easily answerable:

Can we tell if a deployment happened?
Can we tell if that deployment event was successful?

These sorts of questions are the most basic concerns, and in the case of GitLab, can be easily answered with webhook data (Splunk Lantern guide) or the Splunk Add-On for GitLab.

But going a step further, some additional questions may be worth considering to improve the observability of your software, such as:

Can we determine how often deployments happen and how often deployments fail?
Are we able to quickly establish if a software deployment was involved in an on-going or previous incident?
What sort of tests are we running prior to deployment and has testing coverage improved or declined?
Are we potentially leaking secrets?
What code dependencies are being deployed automatically along with our software and how safe is the code?

These questions require contextual data about pipeline runs that may be difficult to acquire. Luckily artifacts from deployment pipelines are usually written to disk for sharing between pipeline steps and historical auditing. Combined with the highly configurable and arbitrary needs of deployment pipelines, it means tools like curl can be used to send that data out to other systems like Splunk (Splunk Lantern guide).

^{Figure 1-1. GitLab SBoM (CycloneDX), Dependency Scanning, and Static Code Analysis Scanning results from Gitlab CI deployment pipelines viewed in Splunk.}

Pipeline Data to Splunk Data

Once your CI/CD data from Gitlab (or any other place) is in Splunk the world is your oyster. Leveraging Splunk Dashboards and SPL you can easily observe deployments, code scanning results, and other important data at a glance! But before building anything, think about those questions listed earlier and pursue some of the easy ones. The easiest of these questions is determining when deployments started and finished. This can be accomplished with some very basic SPL to create a timechart. Knowing when deployment events happened allows you to more easily establish if a given deployment was involved in or precipitated an incident or outage.

^{Figure 1-2. Visualizing the start, pending state, and completion of a repository’s deployment pipeline}

This sort of timechart also puts you well on your way to establishing productivity and developer metrics such as how often services are deploying and how often deployments succeed or fail. Instead of creating a timechart, a table of successful and failed deployments can be quickly created from the same data using the stats and table commands.

But, there are even deeper levels of development and deployment data that can be harnessed to quickly ask and answer questions about concerns such as dependencies, and the entire software bill of materials. By leveraging linking fields like repository name, repository project/organization, and commit hash, it is possible to build out data models that work across the various pillars of DevOps data from ticket submission, through code review, deployment, and even security scanning of code. Armed with these linking fields, details about CVEs in currently deployed software or dependencies can be easily identified and cross checked against Software Bill Of Materials and active CVEs.

^{Figure 1-3. Visualize dependencies, vulnerabilities, and other CI related data in highly detailed Splunk Dashboards like the above for analyzing Code Scanning and Dependency Scanning vulnerabilities.}

Finding all repositories using a specific package (Log4J anyone?) is a snap! Simply search your GitLab CI data in Splunk for the vulnerabile package you’re interested in and use stats by repository name. You’ll easily be able to see the current usages associated with that package and if needed, can easily locate the vulnerabilities identified in that dependency.

Making Sense of It All

Having all of your DevOps data in one place allows you to ask and answer questions beyond just real time monitoring of deployed infrastructure. The ability to ask historical questions about the where, when, and how code goes from idea to running in production is an invaluable resource for refining your DevOps and development processes. With your data now in Splunk you’ll have all the power of SPL at your fingertips. DevOps data related to development, deployment, and even security scanning has never been easier to slice and dice. But how can this be made even better? By leveraging data models!

Because products like GitLab, GitHub, and others provide similar data related to code, deployment, and code scanning it is possible to bring in data from multiple sources and assign unified field names using Splunk data models to meld that data together. Imagine, one set of dashboards for all of your code, development, deployment, and code scanning needs! Intriguing right?

Tune in next time and we’ll dig deeper into how data models can make harnessing your DevOps data even more simple!

Next Steps

Don’t have Splunk yet? Want to try all of this out? Sign up to start a free trial of the Splunk Cloud Platform today!

Jeremy Hicks

Jeremy Hicks is an observability evangelist and SRE veteran from multiple Fortune 500 E-commerce companies. His enthusiasm for monitoring, resiliency engineering, cloud, and DevOps practices provide a unique perspective on the observability landscape.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.