CLOUD

Getting Github Data with Webhooks (Part 2)

After my last blog around sending Github Data to Splunk via Webhooks, I received a healthy amount of feedback that I want to address here. I learned that (unsurprisingly) a lot of customers are curious about, or dependant on, other cloud platforms out there. In fact, I heard directly from some customers who specifically cannot use any other cloud platforms than one in particular that was not highlighted in my last blog. This is the type of feedback that helps me build tailored content that I know will be having a positive impact on our customers usage of Splunk, so thank you. Outside of these learning moments, I also learned that folks were confused as to why I would be using serverless functions for something that could potentially be done without a serverless function, like in this blog from Luke Netto on firing Webhooks directly to the Splunk HTTP Event Collector. The short answer to that is security. And if you stay tuned in to this blog, you’ll find out what I mean. So for starters, let’s address alternative cloud platforms. 

"This is Great, But We Use (Insert other Cloud Platform Here)"

This is a fair point of consideration. Cloud computing today has more major players than just Amazon Web Services, and so it can sometimes be short sighted to only give examples using one Cloud Platform. AWS just happens to be my default from years of using it. But I’m not one to shy away from giving more examples so more people can enjoy the benefits of serverless functions. So my response to this question was why not just build this example into a serverless function on Microsoft Azure and Google Cloud Platform as well and share the code base in git repositories so people can easily access them? And so I did. 

Microsoft Azure Function

Step 1: Create a new serverless function app inside of Azure

Step 2: Add the example Python Code from Github

For this step, go to your new Function App and click on the “Functions” section. From there you’ll be able to click on “Develop Locally” which will open up a new sidebar that will give you all of the steps you need to set up your local development environment. 

With your new local development environment, you can replace all of the code with the code from the Azure Github Repository. If you want to do this selectively, the two main files you’ll need to modify are __init__.py and requirements.txt. Most everything else should be out of the box code. 


Step 3: Setup Splunk HTTP Event Collector Token

We’ll skip this step, as we covered that in the previous blog in Step 3

Step 4: Construct the Webhook URL your Repository

In the example below, you’ll notice that once my function is completely deployed, I can click the “Get Function URL” button to get a URL for my endpoint to be used for as part of my Webhook URL. This URL can be used in combination with the information from step 3.

https://<your_azure_function_url>?url=your.server.com&port=8088&http_method=https&token=223342-23242-232324

Step 5: Connect your Github repository to the serverless function

We’ll skip this step, as we covered that in the previous blog in Step 5

Google Cloud Platform Function

Step 1: Deploy the Serverless Function

Last but not least, I’ve created the same identical function on Google Cloud Platform. Creating a function here is extremely straightforward. Start out by clicking the “Create Function” button from within the “Cloud Functions” portion of the console. Below you’ll see all of the settings that you’ll want to fill out. The zip file mentioned here, can be found in this GCP specific github repository

Step 2: Click Create!

Once you click Create, your new function will be deployed. 

Step 3: Setup Splunk HTTP Event Collector Token

We’ll skip this step, as we covered that in the previous blog in Step 3

Step 4: Construct the Webhook URL your Repository

In the example below, you’ll notice that once my function is completely deployed, I can click the “Trigger” button to get a URL for my endpoint to be used for as part of my Webhook URL. This URL can be used in combination with the information from step 3. 

https://<your_gcp_function_url>?url=your.server.com&port=8088&http_method=https&token=223342-23242-232324

Step 5: Connect your Github repository to the serverless function

We’ll skip this step, as we covered that in the previous blog in Step 5

"This is Great, But Can’t You Send this Straight to HTTP Event Collector"

Another fair point, but allow me to show you one key differentiator I was saving for a follow-on blog: security. 

A blessing and a curse of the current HTTP Event Collector (HEC) is that provided you have a HEC Token, and the ability to connect to the HTTP Event Collector port, you can send data to a Splunk index. This is one thing that makes HEC so elegant and easy to use. It also is a testament to the fact that you truly want to keep your HEC token secured and private. Adversarial Machine Learning, for example, is a massive risk to data driven organizations. With serverless functions in between something like Github, and your Splunk Environment, you have a method of mitigating some of this risk. Let’s take a look at Github’s API documentation on securing your webhooks to see how this is possible. 

Github allows you to pass a secret token in the header of every single Webhook that they send. What this means, is that on the receiving end of the Webhook, you can check for this exact “Secret” string in the header. If that string does not match, the serverless function could effectively drop the message before it ever lands in a Splunk index. 

By utilizing this method, we can ensure that every single webhook being sent to our serverless function from Github is trusted. Again, this token needs to be secured as well and you should take good care of it. To see this methodology in action, let’s look at what happens when we modify the example Github code slightly. 

Secured Webhook Event

To begin, take a look at the screenshot below where you’ll see the “secured” code. Here on line 23, we check that our secret string in our code matches what we placed in Github. This is our "gate" that determines whether the message will make it to our Splunk server or not. 

Using Postman we can simulate a webhook event going to our serverless function. As you can see, we pass a key in the header called “X-Hub-Signature” as described above in the Github documentation, the value “test” is the secret I have stored in Github. This should allow the message to succeed, and you’ll see that it does. 

A simple switch to “test2” will cause an unauthorized message to be returned and no data is sent to our Splunk server. 

So To Summarize:

  • We now have three different serverless functions that can help facilitate data in between Github and Splunk specifically:
  • We learned about some of the potential Security benefits of using serverless functions
  • We have seen just how easy and lightweight serverless functions are
     

Let’s not also forget about logging and metrics, esepcially in a cloud environment. If you want to ensure your service is always up and running and scaled appropriately, consider using Add-ons like the Splunk Add-on for Amazon Web Services to collect information about AWS Lambda serverless functions. 

If you have any questions or comments, please don’t hesitate to reach out! Please also feel free to deploy this code on any of your Cloud Infrastructure, and then modify it to your own liking. This is meant to get users up and running for a specific use case, and it will hopefully be adapted to many more different use cases in the future. 

Ryan O'Connor
Posted by

Ryan O'Connor

Ryan O’Connor is a manager on the Splunk for Good team, overseeing our Impact Engineering program. He has a Master's Degree in Data Analytics and Project Management from the University of Connecticut. He is a PADI Certified Dive Master that is passionate about Ocean Conservation. 

TAGS

Getting Github Data with Webhooks (Part 2)

Show All Tags
Show Less Tags

Join the Discussion