Splunkgit – Github just got Splunked! (Part 4/4)

This is the fourth and last part in a four part series where Petter and Emre covers their Splunk app, Splunkgit. The Splunk app is available for download on splunkbase here, and it is also on github here.

In the first and second part we explained the technical details of how to fetch the data. In this post and the previous one, we show you the actual data and how we choose to visualize it.

Usages of the git tab

Let’s check out what our Splunkgit app can do with out git repository data.
We’ve made four dashboards to show off some examples of how you can visualize this data. The ones we’ve made are:

  • Files
  • Authors
  • File types
  • Repository

In this part of the blog series, we’ll cover the Author, Files and File types dashboards. The Repository dashboard was covered in part 3.
Now, let’s look at all of these dashboards more in-depth and let’s start with the Files dashboard, since that’s the one I think is most interesting. I get to choose.

Files dashboard – Making your @author tags useless (weren’t they already?)

Lets start with looking at what the Files dashboard consists of:

  • Free search field, where you search for a part of a path. Matches filenames, directory names, file types, etc.
  • Row with impact by author and commits by author
  • Commits on file(s) by author over time
  • Table with the all the files which matched the search field.

The File dashboard will start a search for all the files when the dashboard initially loads. This will show statistics over who’s been doing the most impact over all files and committed the most times. This itself is pretty fancy. You can from then on write any part of a file path in the search field. Try a filename, directory, file extension or a prefix of a filename. This is the reason it’s called the Files dashboard and not the File dashboard, since you can view stats over multiple files at the same time.

A search for “Documentation/” in the linux repository.

As a developer, you can use this dashboard when you can’t understand the intent of a package, module or class and you want to know who to ask your questions to. You could try looking at @author tags in the files that concern you, but it might not be sufficient for your needs. You could also use git-blame for lines, and in many cases even methods and classes, but you might have to spent a lot of time going through previous commits and git-blame files through out the repository history. With the Files dashboard you can search for a file, directory or any keywords that matches the path you want, and see who’s done impact(insertions+deletions) on it. Hopefully this will give you a clear perspective on who’s actually been doing impact on the package, module or class.

File types – Who’s your Java guru?

This dashboard consists of the following:

  • Drop down with all the file types in the repository
  • Row with impact by author and commits by author
  • Repository statistics and most common file types

You can see here who’s been doing most impact and commits on a specific file type. You can for example choose java as the file type and find the Java guru of the repository.

The file types dashboard with a search for .classpath in apache’s hadoop-common repository.

The best thing about this dashboard is that the drop down with file types can be reused, which it also is, in the Author dashboard.

Authors – Look mom, I’m on SplunkgiTV!

Authors consists of the following:

  • Row with Author drop down, File type drop down and Free file search
  • Row of commit statistics
  • Row with top file activity and top committed files
  • Repository statistics of the author

So if you have been reading along this blog post, you know what the File type drop down and Free file search is. This dashboard contains both of them and the extra Author drop down, where you can filter the results by anyone who has made a commit to the repository.

The authors dashboard with author Oskar Johansson selected. Repository used is apache’s hadoop-common.

This dashboard can be used to look up your favorite authors (which we all have right?) and see what they’ve been working on. Filter by file types to see if the author has been working on that awesome Objective-C framework, for example.
It might also be interesting for employers to see summaries of what a potential employee has done in that open source project that s/he is mentioning in her/his resume.

Performace of Splunk>git

Splunkgit indexes huge repositories in matter of minutes and average sized repositories in seconds. The time it takes to run our scripts and index in Splunk on apache’s hadoop-common git repository, for the first time, is 1-2 minutes. And it has over 12 000 commits. It takes around 16 minutes to index torvald’s linux, which has around 280 000 commits. Also, since Splunkgit doesn’t re-index all commits when fetching new ones, the following indexing times will be negligible.
The computer used was a Mac mini with core i7 @ 2.7Ghz, 8gb ram and 750gb hdd @ 7200rpm.

What mama didn’t do for you

We believe there’s more cool and useful stuff that you can do with both github and the git repository. You could for example index all the commit messages and make a dashboard for them, or you could play around with github’s API hooks to get realtime updates on commits, issues and other things. You’re more than welcome to fork us on github, implement a new feature and send us a pull request!

And with those words I believe we’re done with this series! I hope you enjoyed and thought Splunkgit is as cool as we think it is. Feel free to contact us @petterik_ and @emreberge, if you have any questions about our project or if you just would like to say hello!

Thank you for reading!

Petter Eriksson

Posted by