#EUvsVirus: Predicting Online Student Outcomes with Automated ML Pipelines

This weekend 10 Splunkers got involved in the #EUvsVirus hackathon to help come up with some solutions to coronavirus related challenges. Here we will describe what team Educating_Splunkers got up to - looking at the rationale behind picking the project as well as discussing some of the outcomes.

Team Educating_Splunkers decided to look at the challenges of effective e-learning.  If any of you have seen any of my previous blogs, you will be aware that predicting student outcomes is something we are passionate about at Splunk. 

While e-learning is familiar to many higher education organisations it is not as widely used in schools. This has been changing due to coronavirus, however, with recent estimates from UNESCO that 1.5 billion (87%) of the world’s students had been affected by school/university closures. The bulk of those impacted are in primary and secondary education, with millions also in pre-primary and higher education. Compared to school closures during other historical global crises, the level of education disruption is much greater today. 

With the immediate need to move to online learning, student engagement and retention is now a very big challenge. 

Education Insights

Building on previous concepts – such as the capability built with the University Nevada, Las Vegas and the Student Success Toolkit – we decided to dig into how we could more effectively predict student outcomes for e-learning, in particular, using the Open University Learning Analytics Dataset

Our results are detailed here where we provided some high-level reports on e-learning data and also developed a workflow for training predictive models on e-learning data: 

A real focus for this bit of work was to help get our customers in education Machine Learning (ML) ready with their data. As described in our State of Dark Data report we recognise that many of our customers do not have ML skills in house, with over 80% of our customers that we spoke to telling us that they do not have skills in AI.

In order to achieve this, we built a pipeline on top of our Machine Learning Toolkit that allows users to follow some simple click-and-select steps to load their e-learning data and select some important features in the data. The app then decides how it should be processed by an ML algorithm – including automatically deciding how the data should be pre-processed in preparation for an algorithm. 

Student Outcome Predictions

Ultimately the app provides non-expert users with a recommended model to apply to their particular data to predict: how likely a student was to pass or fail; and how likely a student was to drop off the course.

Many of our customers are of course much more familiar with data science and are quite happy developing capabilities using the Machine Learning Toolkit or Deep Learning Toolkit. We are hoping that this type of capability presents the less ML ready customers with a way of gaining familiarity with data science concepts.

The Splunk hacking teams were ably led by the man of a million words - Mark Woods - who’s published some of the outcomes of our weekend efforts here, if you’d like to see a bit more. You can also read about our efforts to develop a Personal Protective Equipment (PPE) supply-chain solution here in this blog post.

Massive thanks to Helen, Rupert and Henning for their help working on this project! 

Happy Splunking!


Greg is a recovering mathematician and part of the technical advisory team at Splunk, specialising in how to get value from machine learning and advanced analytics. Previously the product manager for Splunk’s Machine Learning Toolkit (MLTK) he helped set the strategy for machine learning in the core Splunk platform. A particular career highlight was partnering with the World Economic Forum to provide subject matter expertise on the AI Procurement in a Box project.

Before working at Splunk he spent a number of years with Deloitte and prior to that BAE Systems Detica working as a data scientist. Ahead of getting a proper job he spent way too long at university collecting degrees in maths including a PhD on “Mathematical Analysis of PWM Processes”.

When he is not at work he is usually herding his three young lads around while thinking that work is significantly more relaxing than being at home…

Show All Tags
Show Less Tags