Machine Learning for Social Good

In my last blog we focussed on some of the problems with Artificial Intelligence (AI) and public trust that can be compounded by organisational issues such as dark data. This time round we’re going to look at a couple of examples that demonstrate how AI can be used as a force for good

The Ethics of Analytics

Over the past few months we have been working with the World Economic Forum (WEF) to test out some of the guidance on AI that we have been drafting with them. There have been a lot of lively debates as the use of AI is clearly divisive, especially when it comes to image processing.

If we look at the UK there has been controversy recently over police using facial recognition techniques on CCTV footage to support the fight against crime. The use of these techniques has encouraged the UK’s Information Commissioner, Elizabeth Denham, to open an investigation into whether use of image recognition is being used appropriately in this case.

There are even more sinister examples of where image processing can be easily misled, such as this example of tricking an algorithm into thinking a row of rifles was actually a helicopter. With examples like this it’s not difficult to imagine dystopian world such as the one described in this article about deepfakes.

These are cautionary tales for applying deep learning, but thankfully at Splunk we don’t focus on image processing – we’re all about machine data! 

When it comes to machine data there are a wealth of use cases where applying machine learning has definitely had a positive impact on people’s lives. 

Preventing Substance Abuse

If you speak to a well-travelled security or fraud analyst about how they effectively detect threats something that will often come up in conversation is the term ‘outlier’. There are numerous different ways to detect anomalies in data sets, but I’d suggest you start with this technical walkthrough on statistical outliers before moving on to more advanced techniques (described as flavours of ice cream of course!).

Prescription Anomaly Analytics*Clustering technique used to identify fraudulent prescriptions on data from Red dots indicate providers that are already in prison or been investigated in some way for suspicious activity by DEA, Dept of Justice or Law Enforcement.

The good folks at New York Presbyterian hospital realised that many of the techniques commonly used for outlier detection in IT security could also be used to detect outliers in the handling of controlled substances.

The approach to detecting outliers at New York Presbyterian closely followed the chocolate ice cream technique in this anomaly detection blog. Although we can’t show you their results (that wouldn’t be ethical would it?) to demonstrate how it works, there are a few graphics here on open source data that we have run through Splunk where the same technique has been used to:

  • Show how anomalies in prescription records have direct correlation with fraudulent or criminal activity; and
  • Identify Dr Joel Smithers who was recently jailed for prescription fraud.

Many thanks to Gleb Esman for helping provide the details for these examples.

At Splunk we are always ready to support customers who are interested in using anomaly detection techniques or who want to use Splunk to detect fraud. A great place to start is the Splunk Security Essentials for Fraud Detection where some of the techniques in this blog are presented in more detail.

Prescription Anomaly Analytics*Anomaly detection analysis of data published by that contains aggregate details of prescriptions billed to Medicare by providers.

Predicting Student Outcomes

Machine learning isn’t just good at helping fight criminal activity, it can also be used to deliver other positive outcomes.

We have hundreds of universities across the globe who are using Splunk to monitor their IT from a security and operational perspective. One of these universities – the University of Nevada, Las Vegas (UNLV) – had a psychology professor called Matt Bernacki who realised that all of the data they were collecting from systems across campus also gave them a good insight into student engagement. He spear-headed a project that built a model to predict whether or not students looked like they were likely to pass or fail the course, helping lecturers and other academic staff make timely interventions to support students. In the first semester of using this system UNLV identified over 100 failing students that with interventions they turned around to getting top grades.

Student Analytics *Analysis of Open University data to demonstrate how predictions can be made on student outcomes based on their digital footprint from university IT systems.

As well as working with a number of Universities in the US and UK to help build out similar use cases we’ve also just launched a Student Success Toolkit to apply a cookie cutter approach to building out these kind of predictive capabilities.

Human in the Loop

Note that in both of these use cases intervention and analysis is still required by a person in order to take an action. If you view AI as augmented intelligence, it is there to augment a person’s decision-making process rather than replacing it – people are much better at putting information into context than machines are (well at the moment anyway!).

As well as having someone in the loop, a secondary consideration when you’re looking to apply these types of techniques is to make sure that you’d be comfortable with the details being published. If you’re not it might be that the way you are using them doesn’t meet everyone’s standards for ethics: ultimately you need to maintain public trust while trying to deliver positive outcomes.

Deep Learning Toolkit

Finally, to follow up on the clue about product features I mentioned last time, we have now launched the Deep Learning Toolkit for Splunk. Amazing work by Philipp Drieger getting this together - it’s going to be an awesome way of delivering even more advanced use cases in Splunk.

Until next time,


Greg is a recovering mathematician and part of the technical advisory team at Splunk, specialising in how to get value from machine learning and advanced analytics. Previously the product manager for Splunk’s Machine Learning Toolkit (MLTK) he helped set the strategy for machine learning in the core Splunk platform. A particular career highlight was partnering with the World Economic Forum to provide subject matter expertise on the AI Procurement in a Box project.

Before working at Splunk he spent a number of years with Deloitte and prior to that BAE Systems Detica working as a data scientist. Ahead of getting a proper job he spent way too long at university collecting degrees in maths including a PhD on “Mathematical Analysis of PWM Processes”.

When he is not at work he is usually herding his three young lads around while thinking that work is significantly more relaxing than being at home…