I <3 Github. Splunk <3’s Github (check out our repos here). I am told it is just a coincidence our HQ is opposite theirs.
One of the neat things about Github I am just starting to explore is their API. You can use it to do loads of things, from interrogating user activity to searching for keywords within code. I recently saw this analysis of the most popular programming languages hosted on Github and I was inspired to recreate it within Splunk.
Indexing Github data into Splunk makes it super-simple to start exploring it. In this post I wanted to show you some of my first experiments connecting Splunk into the Github API.
The Prep Work
First download and install the Github Modular Input. This will enable us t0 make the API calls to Github.
Now you’ll need to grab a Github token. This is to avoid some rate limiting imposed by unauthicated requests to the API. To do this: log onto github.com > settings > applications > generate new token
Store this somewhere safely.
And that’s it
Add an Input
In the Splunk GUI head to: settings > data inputs > github commits > add new
And that’s it
If you use the example above a basic search should return 100 results, as per the per_page value set in the call.
Here’s some other simple searches we can immediately run on this dataset:
The most active users in the organisation:
source="github_commits://github-commits"| stats count(type) as count by author | sort - count
… or least:
source="github_commits://github-commits"| stats count(type) as count by author | sort count
Repository activity over time:
source="github_commits://github-commits" | timechart count(_raw) as activity
Repository activity over time by user:
source="github_commits://github-commits" | timechart count(_raw) as activity by author
You get the idea.
Now you’ve got the basics nailed go away and show me some cool stuff