Use CI to Automatically Catch Dead Links in Your GitHub Project

Worried about dead links in your GitHub project?

I was, after having a few pointed out to me in the Analytic Stories and detections published by the Splunk Security Research Team. So, like any sane engineer, I decided to automate this project 🙃!

My first step was to look for an easy-to-use URL tester that processes markdown, since all of Splunk Security Research content gets automatically converted to documentation as .md.

I selected something simple to use and deploy, ideally, something I could run in Continuous Integration (CI) to automatically catch broken links. My preferred solution was to use liche, since it hit on both marks. It is an extremely easy-to-use CLI that automatically checks markdown and HTML documents. It is also very easy to deploy, requiring only one command to install: 

go get -u

And one command to run:

liche -r directory

* Tiny gotcha: Go Modules must be enabled. 

An added bonus was that it was extremely fast and returned 1, which made it perfect out-of-the-box for CI. It would fail if it returned any errors.

Finally, I needed to bake this into CircleCI, so it would run as a validation step. Essentially, for every change/pull request, this step validates if the content is up to spec and works. 

Here is the end result in CircleCI:

     - run:
          name: check for broken links using liche 
          command: |
            echo 'export GOROOT=~/.go' >> $BASH_ENV
            echo 'export PATH=$GOROOT/bin:$PATH' >> $BASH_ENV
            echo 'export GOPATH=~/go' >> $BASH_ENV
            echo 'export PATH=$GOPATH/bin:$PATH' >> $BASH_ENV
            echo 'export GO111MODULE="on"' >> $BASH_ENV 
            source $BASH_ENV
            go get -u
            cd security-content
            liche -r docs/

Note that the first step was to add all the Go related paths and install Go, mainly because we are not using the CircleCI Go image that comes with Golang out of the box, but instead the latest python image. 

If you need a full example of how this can be used as part of your CI workflow, please refer to the security-content repo's current CircleCI configuration

José is a Principal Security Researcher at Splunk. He started his professional career at Prolexic Technologies (now Akamai), fighting DDOS attacks from “anonymous” and “lulzsec” against Fortune 100 companies. As a engineering co-founder of Zenedge Inc. (acquired by Oracle Inc.), José helped build technologies to fight bots and web-application attacks. While working at Splunk as a Security Architect, he built and released an auto-mitigation framework that has been used to automatically fight attacks in large organizations. He has also built security operation centers and run a public threat-intelligence service. Although security information has been the focus of his career, José has found that his true passion is in solving problems and creating solutions. As an example, he built an underwater remote-control vehicle called the SensorSub, which was used to test and measure toxicity in Miami's waterways.