TIPS & TRICKS

Getting Github Data with Webhooks

Recently, I was met with a Splunk use case where someone wanted to onboard Github data directly into Splunk Cloud. The type of data they were looking for was audit data around the repository itself. So to be clear, this wasn’t a matter of ingesting a .csv file hosted in a Github repository; this was the need to answer questions like “How can I create a line chart of # of pushes throughout the week?” or "How can I see how many open issues I have in my repository from inside of Splunk?"

The main challenge with this specific use case (and what makes it so interesting in my opinion) was the customer was running Splunk Cloud, and they had no interest in running any kind of on-prem architecture. They were all-in on cloud. So what does one do in a situation like this? And is it sensible to want to be 100% in the cloud? I think so. 

Many people may want to tackle this problem by using something like the Github Add-on for Splunk and a Splunk Heavy Forwarder. While that’s certainly not an incorrect method for collecting this data, it does require some architecture to run Splunk on to achieve it. Of course, EC2 is an option when it comes to cloud technology. In my opinion though, I think the same thing can be achieved using a more lightweight serverless functionality — and I’m here to show you a quick way of how you can achieve this. 

For starters: enter the AWS Webhook to Splunk HTTP Event Collector Serverless Function. This is a fairly basic blueprint of a function I created that you can spin up today with the click of a button. The goal here is to deploy a lightweight AWS Lambda function that acts as a sort of translator between Webhooks and the Splunk HTTP Event Collector. 

Step 1: Deploy the Serverless Function

As I mentioned, deploying this serverless function is extremely simple. You can start out by locating the serverless function on the AWS Serverless Repository. Feel free to name the function anything you’d like. In my case, I wanted to collect Github data from my corona_virus repository, so I titled it accordingly as you’ll see below. 

Step 2: Click Deploy!

Setting up your own private endpoint, is as easy as clicking a button.  

Step 3: Setup Splunk HTTP Event Collector Token

In order to use GIthub Webhooks to send your data to Splunk, you’ll need to create a Splunk HTTP Event Collector Token. All of the information on setting one of these up is located in our Splunk Docs page. The setup time for one should generally be less than 5 minutes. Once you are done with this step, you should have all of the following information:

  • url - the FQDN of your Splunk Server
  • http_method - whether you are running HTTP Event Collector with SSL enabled or not. You'll provide either http, or https.
  • port - the port you are running HTTP Event Collector on.
  • token - your HTTP Event Collector Token.
     

Example:

  • your.server.com
  • https
  • 8088
  • 223342-23242-232324
     

Step 4: Construct the Webhook URL in Your Repository

In the example GIF below, you’ll notice that once my serverless function is completely deployed, I can click the “Test app” button to get a URL for my endpoint which will be used for as part of my Webhook URL. This URL can be used in combination with the information from step 3 (and is documented in the serverless repository README). 

https://<your_api_gateway_url>/Prod/webhook-to-hec?url=your.server.com&port=8088&http_method=https&token=223342-23242-232324

Step 5: Connect Your Github Repository to the Serverless Function

Last but not least, let’s connect Github to the newly created serverless function. Simply visit your Github repository of choice and visit Settings > Webhooks. From there you can select “Add Webhook” and enter all of the settings below:

  • Payload URL: URL from Step 4
  • Content Type: application/json
  • Which Events would you like to Trigger the Webhook? This is optional and you can select whichever you’d like. 
  • Active: Checked
     

Finally, click Add Webhook to confirm. You’ll see the example gif below of what this process looks like. A green “check mark” next to the URL when you’re done means the process was successful. 

And that’s how easy it can be to get data in from one Enterprise Cloud solution to another. I guess I better go solve some of these open issues now. 

If you have any questions or comments, please don’t hesitate to reach out! Please also feel free to deploy this code, and then modify it to your own liking. This is meant to get users up and running for a specific use case, but it will hopefully also be adapted to many more different use cases in the future. 

Ryan O'Connor
Posted by

Ryan O'Connor

Ryan O’Connor is part of the Splunk technology incubation team. Outside of working at Splunk, he's an adjunct professor at the University of Connecticut teaching courses in Machine Learning, Network Security, Security Audit and Compliance, and Industrial IoT. Ryan is a PADI Certified Rescue Diver, and has his Master's in Data Analytics and Project Management and a Graduate Certificate in Healthcare IT from the University of Connecticut.

TAGS

Getting Github Data with Webhooks

Show All Tags
Show Less Tags

Join the Discussion