TIPS & TRICKS

Splunking Webhooks with the HTTP Event Collector

A few weeks ago, one of our customers that has become more reliant on web-based collaboration technologies was deploying the Splunk Remote Work Insights (RWI) Executive Dashboards. They needed to onboard operational data into Splunk from Zoom, and because of their size, the rate and amount of data we needed to collect was pretty vast.


The customer already had Internet facing Heavy Forwarders with open HTTP Event Collector (HEC) ports to ingest various data sources. We installed and configured the webhook-based Splunk Connect for Zoom, but we soon realized we were missing more than 50% of the Zoom events. With around 500,000 employees, one can simply imagine the number of Zoom events per day, especially since so many of their employees were in a mandatory work from home situation.

After doing some research I learned that the majority of webhooks perform a HTTP POST with a JSON, XML, or form data content-type. Zoom is no different: when you create a webhook-only app in Zoom, Zoom will send an HTTP POST request payload to the specified endpoint URL. Unfortunately, Zoom only allows one to set the endpoint URL and does not allow one to specify any authentication methods such as a HEC token. The output JSON format is also not customizable so one is not able to use the “collector” endpoint for HEC.

I knew HEC allows for the option to expose a “raw” endpoint that allows us to POST unformatted events, but I still needed to find an authentication solution. I started reviewing Splunk’s HEC documentation and realized there is a parameter that allows one to embed the token for authentication as part of the URL: allowQueryStringAuth. This solution works for both Splunk Enterprise (on-prem) and Splunk Cloud.

I created the following new HEC input (inputs.conf) on one of the Heavy Forwarders:

[http://zoom]
token = <a random guid>
indexes = scratch
index = scratch
sourcetype = zoom:webhook
allowQueryStringAuth = true
disabled = false


(Note: If you are a Splunk Cloud Customer, you must open a Splunk Support ticket to set allowQueryStringAuth to true on your HEC endpoint.)

I next created a Zoom Webhook Only App following the same instructions listed on docs.splunk.com, but with one key change...I used the following endpoint URL: 

https://externalsplunkinstance.yourdomain.com:8088/services/collector/raw?token=<a random guid>

I then ran a Splunk search and was pleased to see that our events were being received! After a few days we also confirmed that we were no longer missing Zoom events. Since the format and the sourcetype is the same as Splunk Connect for Zoom, we were still able to use Splunk App for Zoom for our visualization needs as well as the RWI Executive Dashboards.

It isn’t just Zoom that allows us to take advantage of this capability to receive webhook posts using HEC. Taking this discovery one-step further, I was setting up Plex at my home and noticed an area to configure webhooks. I used the same method as above and configured Plex to send webhooks to my Splunk setup. I picked a random movie, pushed play, and noticed that the data looked a little different as Plex sends a JSON payload with a form-data Content-Type.

----------------------------225309989493785122838026 Content-Disposition: form-data; name="payload" Content-Type: application/json {"event":"media.pause","user":true,"owner":false,"Account":{"id":123456,"title":"username"},"Player":{"local":false,"publicAddress":"123.123.123.123","title":"Plex Web (Chrome)","uuid":" /P7vNM5bIEa4LOSJ1qdhEw=="},"Server":{"title":"Vod","uuid":"tv.plex.provider.vod"},"Metadata":{"art":"https://image.tmdb.org/t/p/original/k8sRDJV5CFx91N4gPXh57dthPvx.jpg","attributionLogo":"https://provider-static.plex.tv/vod/partners/logos/crackle.png","guid":"plex://movie/5d9f351fca3253001ef27f1d","key":"/library/metadata/5e911fe10cf8cd003e286fc4","rating":6,"ratingCount":1984,"ratingKey":"5e911fe10cf8cd003e286fc4","title":"Thick as Thieves","titleSort":"thick as thieves","type":"movie","thumb":"https://image.tmdb.org/t/p/original/sgRY2ie8koJxfOScMuvzHQ9TuZX.jpg","duration":6222230,"viewCount":0,"viewOffset":0,"indirect":true,"contentRating":"R","ratingImage":"imdb://image.rating"}} ----------------------------225309989493785122838026--

I was able to strip away the form data using very simple props/transforms, leaving me with a clean JSON object.

props.conf
[plex:webhook]
TRANSFORMS = plex_webhook_clean

transforms.conf
[plex_webhook_clean]
REGEX = (\{.+\})
FORMAT = $1
DEST_KEY = _raw

Using the webhook events from Plex I was able to easily tell who in my household (or friends) like to binge watch.

Another example of collecting webhook data from cloud based services is IFTTT — see the recent blog here that uses Arlo’s recipes for IFTTT to post security camera activity over a webhook.

Now, we do need to mention a few security-related caveats…

  • Query strings may be observed in transit and/or logged in cleartext. There is no confidentiality protection for the transmitted tokens.
  • Before using this in production, consult security personnel in your organization to understand and plan to mitigate the risks.
  • At a minimum, always use HTTPS when you enable this feature. Check your client application, proxy, and logging configurations to confirm that the token is not logged in clear text.
  • Give minimal access permissions to the token in HEC and restrict the use of the token only to trusted client applications.
  • If you know the source IP address range of the webhook posts, it is best to create an allow-list or ACL to only permit these specific IP addresses to post to your webhook.

In summary, the majority of webhooks perform a HTTP POST with a JSON, XML, or form data content-type. Splunk can receive webhooks using the “raw” HEC endpoint using allowQueryStringAuth = true for authentication. If the data needs some cleaning, you can use props/transforms to remove unnecessary characters. 

Luke Netto
Posted by

Luke Netto

Luke is a Staff Professional Services Consultant at Splunk with experience in data analytics, security, networking, systems, wireless integration, and software development. Luke has been an adjunct professor at the University of Denver teaching courses in SQL, Python, and data analytics. He holds a Master of Science in Telecommunications and graduate certificate in Energy Communication Networks from the University of Colorado Boulder, a Master of Science in Telecommunications Engineering Technology from Rochester Institute of Technology, and a Master of Business Administration from Clarkson University. Luke enjoys the challenge of onboarding new data sources and enabling organizations to become data-driven using Splunk.

TAGS
Show All Tags
Show Less Tags