Splunkgit – Github just got Splunked! (Part 3/4)

This is the third part in a four part series where Emre and Petter covers their Splunk app, Splunkgit. The Splunk app is available for download on splunkbase here, and it is also on github here.

In the first and second part we explained the technical details of how to fetch the data. In this post and the following one we are going to show you the actual data and how we choose to visualize it.


With the data collected, we created six different pages. Two of them are created with the github data and the rest with the git log data. A list of the pages and what they contain is as follows:

  • Github

    • Issues
      Contains information about the issues on github.
    • Repo statistics
      Contains information about the repository that is github specific (watchers and forks).
  • Git

    • Authors
      Contains information about a specified author.
    • Files
      Contains information about specified files or directories.
    • File types
      Contains information about specified file types.
    • Repository
      Contains various information about the repository.

Github – Issues

This page summarises some useful data about the issues retrieved from the repos github page. The following information can be monitored:

  • Number of open issues
  • Number of closed issues
  • Total number if issues
  • Latest opened issues (Still open)
  • Latest closed issues (Still closed)
  • Latest updated issues
  • Oldest open issues (Still open)
  • Top 10 issue reporters

The information is presented using tables, single values and a chart. A typical use case for this page can be for a manager to monitor the state of the project.

Everything is displayed according to the latest information fetched, however since the fetching only occurs with intervals there may be updates not present on this page. In the future github’s push api should be used together with live search. This way, the view can be configured to always show the up to date state of the issues.

Here are screenshots of the data and the page (click on the images to enlarge):

Data is from ideashower/sharekit

Github – Repo statistics

This page displays the count of watchers and forks of the tracked repository.

  • Number of watchers
  • Number of forks
  • Watchers count over time
  • Forks count over time

Both Watchers count over time and Forks count over time shows the change of values over the period that splunk has fetched data from github. Unfortunately github doesn’t have any API calls for fetching historical data.

Screenshots (click on the images to enlarge):

Data is from ideashower/sharekit

Git – Repository

This is my favorite page. It visualizes some pretty neat git log data.

  • Number of coders

  • Number of active authors over time

    This chart shows the unique count of authors that has made a commit each month. The count is also grouped by whether the author has prevues commits or not.
    You can use the chart to see engagement of old as well as new authors.

  • Number of authors over commit count

    This chart shows how many authors there are for different commit counts.
    You can use it to find out how many authors that have made X number of commits and how many that have at least X number of commits.

  • Impact Over Time

    This one is similar to one of graphs that you can find on github.
    Impact = number of deleted lines + number of added lines.
    Beware that the y axis of the chart is logarithmic. Meaning months with small spikes are actually big spikes. If you see a linear impact graph, this means that the impact is exponential!
    Changes are good, you can use this chart to see the magnitude of the recent changes.

  • Total number of commits

    This chart shows the growth of the commit count. It also groups the counts for each author, letting you see who has made most commits to the repository.
    The information visualized here is similar to git shortlog

    eergenekon:cassandra.git$ git shortlog --all --no-merges -sn
      3582  Jonathan Ellis
       885  Eric Evans
       470  Gary Dusbabek
       418  Sylvain Lebresne
       334  Brandon Williams
        65  Pavel Yaskevich
        51  Chris Goffinet
        42  T Jake Luciani
        29  Johan Oskarsson
        26  Joe Schaefer
        22  Avinash Lakshman
        15  Jun Rao
         6  Laine Jaakko Olavi
         5  Prashant Malik
         1  Tim Almdal

The data in splunk looks like (click on the image to enlarge):

And here are some screen shoots from different repositories (click on the images to enlarge):

The last three pages will be covered in the next and last part.

Emre Berge Ergenekon

Posted by