In the first and second part we explained the technical details of how to fetch the data. In this post and the following one we are going to show you the actual data and how we choose to visualize it.
With the data collected, we created six different pages. Two of them are created with the github data and the rest with the git log data. A list of the pages and what they contain is as follows:
Contains information about the issues on github.
- Repo statistics
Contains information about the repository that is github specific (watchers and forks).
Contains information about a specified author.
Contains information about specified files or directories.
- File types
Contains information about specified file types.
Contains various information about the repository.
Github – Issues
This page summarises some useful data about the issues retrieved from the repos github page. The following information can be monitored:
- Number of open issues
- Number of closed issues
- Total number if issues
- Latest opened issues (Still open)
- Latest closed issues (Still closed)
- Latest updated issues
- Oldest open issues (Still open)
- Top 10 issue reporters
The information is presented using tables, single values and a chart. A typical use case for this page can be for a manager to monitor the state of the project.
Everything is displayed according to the latest information fetched, however since the fetching only occurs with intervals there may be updates not present on this page. In the future github’s push api should be used together with live search. This way, the view can be configured to always show the up to date state of the issues.
Here are screenshots of the data and the page (click on the images to enlarge):
Data is from ideashower/sharekit
Github – Repo statistics
This page displays the count of watchers and forks of the tracked repository.
- Number of watchers
- Number of forks
- Watchers count over time
- Forks count over time
Both Watchers count over time and Forks count over time shows the change of values over the period that splunk has fetched data from github. Unfortunately github doesn’t have any API calls for fetching historical data.
Screenshots (click on the images to enlarge):
Data is from ideashower/sharekit
Git – Repository
This is my favorite page. It visualizes some pretty neat git log data.
Number of coders
Number of active authors over time
This chart shows the unique count of authors that has made a commit each month. The count is also grouped by whether the author has prevues commits or not.
You can use the chart to see engagement of old as well as new authors.
Number of authors over commit count
This chart shows how many authors there are for different commit counts.
You can use it to find out how many authors that have made X number of commits and how many that have at least X number of commits.
Impact Over Time
This one is similar to one of graphs that you can find on github.
Impact = number of deleted lines + number of added lines.
Beware that the y axis of the chart is logarithmic. Meaning months with small spikes are actually big spikes. If you see a linear impact graph, this means that the impact is exponential!
Changes are good, you can use this chart to see the magnitude of the recent changes.
Total number of commits
This chart shows the growth of the commit count. It also groups the counts for each author, letting you see who has made most commits to the repository.
The information visualized here is similar to git shortlog
eergenekon:cassandra.git$ git shortlog --all --no-merges -sn 3582 Jonathan Ellis 885 Eric Evans 470 Gary Dusbabek 418 Sylvain Lebresne 334 Brandon Williams 65 Pavel Yaskevich 51 Chris Goffinet 42 T Jake Luciani 29 Johan Oskarsson 26 Joe Schaefer 22 Avinash Lakshman 15 Jun Rao 6 Laine Jaakko Olavi 5 Prashant Malik 1 Tim Almdal
The data in splunk looks like (click on the image to enlarge):
And here are some screen shoots from different repositories (click on the images to enlarge):
The last three pages will be covered in the next and last part.
Emre Berge Ergenekon