Data: The Ultimate Shield-maiden For All the Children

Courage is an interesting concept, especially in the 21st century. Carrying the ability to do something that frightens oneself goes a long way — from standing up for women’s rights, or against racial injustice, or even cyber bullying. Courage paired with the technological advancements of the 21st century allow us to do great things, including supporting children in third-world countries. Sadly, these same technological advancements also allow predators to extend their reach towards these same vulnerable children. Compassion International is able to serve and protect over 2 million children around the world, but with that power comes even more responsibility. So how are they using Splunk to protect these children from sexual predators and human traffickers? It seems like something that wouldn’t even exist, at least until you gain the courage to peak under the covers.

Splunk is a Swiss Army knife. You input your data, and Splunk will do whatever you want it to, with the given data. Data can, and does, come from a million places. In Compassion International’s case, that includes hand-written letters, emails, translated letters, Compassion International’s support website, public sex offender record feeds, address validation feeds via Homeland Security, and more. Let’s take a look at how this lean, mean, data processing machine protects children, and how I was given the opportunity to help Jonathan on this project through Splunk for Good

Compassion International’s on-prem Splunk instance is run by a small team headed by Jonathan Wagner. Jonathan is an old-school Splunk Guru running Splunk as a platform for this screening project, the IT and Security teams’ analytics use cases, and even for multiple business units’ analytics use cases. This includes the Office of Risk Management (ORM). This letter and sponsor screening is called “Protect All The Children" (PATCh). Every employee at Compassion International is a child advocate, and protecting the children must come first. That is why they are open to partnering with organizations who have similar sponsorship models and need similar solutions to ensure those they sponsor are protected as well. Through Splunk for Good, I was able to help Jonathan manage through different supporter letter data sources in MSSQL/Oracle databases, JSON data, and flat file dumps. Having managed tons of Splunk deployments, I was ecstatic to get hands-on with Splunk again, correlating a new set of data sources for new use cases (and how can you say “no” to using Splunk to stop child predators from reaching children? No brainer!). 

When data comes from many different unstructured places, the chances that they are structured the same is slim-to-none. They are going to require different field extractions, line breaking techniques, or even timestamp formats. If you’ve ever used the Common Information Model TA, you know that Splunk makes working with  common fields in data sources easy, so long as you know what those fields should be. Jonathan and I worked through selecting the correct fields for letters (to, from, body, etc.) along with fields for supporter entries and public sex offender watchlists. You’d be amazed at how many different combinations of street addresses and names there really can be! 

Once we built a TA to handle all of these extractions and transformations between props.conf and transforms.conf, it was off to the races for correlation and bad-person catching. In order to catch bad people, we needed to decide what was bad. There are two options - the language in a letter, or the supporter him/herself.

Natural Language Processing

To identify potential sexual grooming language, attempt to communicate out-of-band (Facebook, What’sApp, USPS, etc.), or any inappropriate language (racism, rape, threats, etc.), we needed to break down the letter line-by-line and apply Fuzzy Logic (a form of Natural Language Processing that can be done within Splunk) to pick up on hints. If words or phrases met what the trained professionals in the ORM believed to be inappropriate, they were given an elevated risk score (much like the Risk Analysis framework in Splunk's SIEM - Enterprise Security). Letters with elevated risk are brought to the attention of the ORM. The ORM then takes the appropriate action to remove the threat based on Compassion’s Child Protection Policy.

The way Compassion identifies potentially harmful letters is nothing short of astonishing, but when you’re on a mission to protect ALL the children, you don’t just stop there. Currently, we are working on implementing Splunk’s Machine Learning Toolkit to better and more quickly identify true positives. While we can’t share numbers on flagged letters or anything of that nature, rest assured that when real data hits Splunk with a good admin and even better questions, the right answers are always delivered in jaw-dropping fashion.

Threat Detection

The second piece to stopping bad actors from reaching children, is to actually flag those known bad actors - whether or not they’re trying to hide. To do this, Compassion International is using Splunk to correlate their known supporter records with publicly available sexual offender databases, and even public addresses such as addresses for prisons, hotels, motels, or halfway house residents. These data sources are being injected into Splunk in modular-input fashion and head to a KVStore to be correlated against any and every supporter registration. This helps cut down Compassion International’s Support Screening program from weeks to seconds!

There is a lot of work that goes into protecting children from sexual predators. The fact that I have been able to participate in helping stop that vicious cycle from reaching more precious, innocent children has been one of the most rewarding experiences in my life. As a father of two and another on the way, this is the most important work any human can do. While we’ve been able to have  success, we are only scratching the surface in terms of global impact using Splunk technology. I can’t wait to see what Compassion International uses Splunk for next! I’ve also been impressed to see that Compassion International knows this issue is much bigger than their organization and has the courage to take it on. 

Brian Cusick

Posted by