In the past, Recursion found it difficult to manage large amounts of time-series data collected from computer-controlled instruments and video footage generated from cameras in the laboratory. The initial data management strategy hardly matched the firm’s aggressive high-volume ambitions—its laboratory’s microscopes currently produce on the order of 700,000 TIFF files each week, representing an 800 percent increase in productivity over 10 months.
While the company considered open-source alternatives, Ben Miller, director of high-throughput science (HTS) operations, saw the pivotal role that Splunk Enterprise could fill as Recursion ramped up its capabilities. “I was getting value out of Splunk Enterprise within about three days,” Miller says.
Recursion has built a world-class proprietary machine learning system that analyzes terabytes of experimental image data daily to discover new treatments for critical diseases. This system is integrated with Splunk Enterprise via the Splunk SDK for Python, which passes operational data back into the experiment-design processes, and Splunk DB Connect to enrich log data with quality metrics. From there the Splunk Machine Learning Toolkit makes it easy to comb these higher-level operational metrics for new insights into laboratory processes—in Miller’s words, to “wrangle really large quantities of data and understand what correlations are happening as they are happening, not months later.”