In the next installment of our 'Meet the Doers' series, we highlight Dustin Marling.
Dustin Marling has worked at human resources, payroll and benefits provider Paychex for almost a decade, starting in the call center and working his way up to his current position as a Splunk Developer. As part of his day-to-day, he builds tools and dashboards for any department that needs assistance. “If people run into a roadblock, I'm usually the first person they ask for help,” he says. Paychex is also a cutting-edge technology company, he adds, giving millions of employees the ability to check and update their payroll information online from any device anywhere in the world. For this, Splunk has been critical in preventing downtime, locating errors and ensuring people get paid.
Dustin also spoke at .conf19 about protecting Splunk knowledge objects. I caught up with him there to discuss how Splunk has helped him grow his career and the many ways he has saved the company time, money and the potential for costly errors by using the platform.
What kind of doors were opened for you with Splunk?
The Splunk developer position now exists at Paychex, but before that, it was not a thing. I used to work in the problem management space, and we started using data from Splunk for some of our problem trending. My colleague was the one that started that process, but then I took that on, and we integrated the process of error heuristics.
I don't think that we would have the resiliency and visibility that we have today without Splunk. We're able to immediately find issues. Our average has been significantly reduced because Splunk tells us where the problem is. We know how to fix it.
And with Splunk you can get much more out of your people in significantly less time because they can get to the data quicker. I truly think it’s a force multiplier.
Was there ever a time you caught that big error that would have cost the company a lot of money?
We were working with a third-party provider that we use for basic content delivery. One error only happens if the static content is not available on the person's machine. It’s very rare. Usually, for a short window you see that error and then the client gets the static content, they refresh and they're fine. However, we kept seeing that error consistently, but only from clients who were from California, Washington, Arizona and Oregon. So, we contacted the provider and we told them that our clients were getting errors because they couldn’t get the static content that they were hosting for us. And the provider was like, "Everything is fine. We don't see any problems." But we were definitely sure it was an error.
It didn’t make sense to us, so we asked them why only these four states were the ones having an issue. And they're like, "What do you mean four states?" We told them we were geo-locating with Splunk, and the IP address was getting the errors, and that we had a map. I could see which states were affected — they were bright red. And we sent them the information and they looked and found that an edge server in California was impaired and they didn't know it.
That’s one way to keyhole your way into where the issue was. But without Splunk, there's no way we would've ever figured that out.
What was one of your favorite or most memorable projects using Splunk?
The main thing that I do at Paychex, besides help people use Splunk, is heuristics. I have an intimate knowledge of where stuff is in our infrastructure, so I can basically find anything anywhere, and I've written a lot of related dashboards. My colleague created an application that's in Paychex called Event-Know, basically a lightweight error heuristics. It would get your errors, normalize them, classify them and give them a label that everyone could reference. And then you could attach knowledge to that label — a big lookup file, essentially. But I expanded that use heavily. Now I can tell when an error hits, which environment, which code base caused it, and catch it before it hits production.
It’s been really revolutionary for us to be able to proactively identify issues before they impact our clients in our non-product environment without having to rely on testers to tell us. We can just go look at the data and we can find it ourselves and so we don't have to wait. And we’ve had a dramatic reduction in the number of users who are impacted each month.
In addition to using Splunk to catch and prevent major errors, Dustin has also helped IT become more efficient — for example, giving them the ability to access debug logs in five minutes instead of several hours by putting them in Splunk. “So there are hundreds of thousands of hours that have probably been saved amongst all of our developers because they don't have to keep re-researching something,” he says. “Everything is just there.”