In this next installment of our “Meet the Splunktern'' series, we’re featuring Om Rajyaguru, our Machine Learning Research intern. Om is currently an Applied Mathematics and Statistics student at North Carolina State University. This month, Om was named Splunktern of the Month! Read on to learn more about Om and his Splunktern experience.
Can you tell us about the project(s) you’re working on? What have been some wins and challenges you’ve experienced so far?
Right now I’m working on log clustering. Log clustering is basically dividing your log data into groups automatically rather than manually assigning each log line to a group. To better understand the motivation, imagine the following scenario: A new user will use Splunk Enterprise to ingest their log data and figure out what kind of logs demonstrate a system failure. There are several questions right off the bat. What kind of query would a new user write? Do you look at all the error messages? What is the message supposed to contain? Will it say “system failure” or “shutdown” or “error” or something else?
Answering these questions takes too much time and requires knowledge about the data itself, but not all customers will want to write complex queries or be aware of exactly what format their logs appear in. So clustering is a way of presenting groups of logs with similar patterns to the customer and letting them decide which pattern best suits what they are looking for. However, this is a challenging project.
In order to separate anything into groups, I have to be able to compare the objects to decide what goes into which group. The focus of my research is to find out the best way to compare the individual logs, which would lead to the best clustering. It’s difficult because after all, logs are semi-structured text data and it’s more difficult to compare two pieces of text than something like two numbers.
When I was given the log clustering project, I was super excited. After experiencing some initial failures to develop something completely new, I took a more realistic approach. I studied some research papers and open-source implementations on log clustering. Some of these ideas worked well while others lead me to a dead end, but that’s just the nature of research.
Fortunately, I was able to conduct a thorough analysis of many algorithms on several open-source datasets. Splunk has a wide customer base, where every customer has different formats for their logs. To simulate this, I chose a variety of open-source datasets for testing where each dataset has a unique log format. This led to a successful demo to a PM and their team. I am now working on further optimizing this research and testing more algorithms. The following mockup shows what I hope log clustering will look like in the future. To be clear, this may not be what the product looks like; it is just an example of how users might be able to see the patterns in their data captured from log clustering.
How did you hear about the Machine Learning internship, and what motivated you to apply?
I looked for internship opportunities at companies that valued and leveraged mathematics knowledge to solve complex problems in machine learning. From conducting internet research and talking to Splunk employees and students, I understood that Splunk was one of the companies that matched my career goals. I started keeping an eye on internship opportunities at Splunk.
Splunk’s use case to perform data analytics and machine learning to solve complex problems contributed the most to my motivation to apply. Besides, Splunk is known to hire the best and brightest in Silicon Valley. I wanted to build a strong network of like-minded colleagues, so when I saw the machine learning intern position opening, I applied right away!
What have you enjoyed the most about being a Machine Learning Research Intern?
Being a machine learning research intern, I have had many learning opportunities and many chances to share my thoughts with my team. I’ve had a lot of freedom to complete my project in the direction I want. Yet, I’m also responsible for creating, implementing, and documenting new ideas to take the research forward. When I was doing exploratory research, I noticed some interesting relationships between the metrics used to evaluate clusterings that will provide insights for other clustering/unsupervised research projects in the future. This combination of both freedom and responsibility has really helped my growth as an intern.
Something that I really appreciate about Splunk’s culture is that every team member’s voice is heard and taken seriously. Shoutout to the applied research team in particular, because the team encouraged me to be innovative and to come up with new ideas even when I was unsure of myself at the beginning of my internship.
I have also enjoyed the help and guidance of experienced members on my team. I gained more skills in research, software engineering, team collaboration, idea evaluation and communication in a short period of time, all of which helped contribute to my recent project demo. Thanks to these experiences, I received positive feedback from the product manager and the rest of the team afterwards.
What is your favorite part about being on your team?
It’s so hard to pick just one! If I had to pick, it would be our reading group sessions. This is where we have a Zoom session every other week where one person presents a research paper or an idea they find interesting and the whole group discusses it. Sometimes people agree and sometimes they find flaws in the idea. But the most valuable part is the discussion that ensues. Everyone gets so enthusiastic and passionate, yet provides constructive feedback. Through these discussions, I learn new perspectives and new ways of approaching problems.
How do you maintain a healthy work-life balance?
Maintaining a good work-life balance is far more important when working from home because it is harder to do the things you can in an office. I certainly don’t have access to Splunk’s snack bar and game rooms at home! In addition, it’s not as easy to leave your work behind when you're done for the day. It’s easy to forget about work until the next day when you are leaving an office, but at home I’m always tempted to look back at my computer when I hear a Slack notification or if I feel I forgot to update some code. What I found to be the key is to stay productive enough throughout the normal working hours that I feel satisfied and comfortable leaving work behind once I'm done with my 8 hours for the day.
To that end, I start my day with a healthy breakfast and then I work from 9:30am to 6:00pm, with a lunch break at 12:30. I was given the flexibility to choose my hours before I began my internship since I am located on the east coast, which helps the work-life balance greatly. I often get hungry while I am exercising my brain, so I have a line of fresh fruits in the fridge. I try not to consume overly processed food and would rather choose nutrient rich foods that provide me with enough energy to get me through the day without making me feel lethargic. This helps me be as productive as possible while working so I don’t have to make up the time later on in the day. Doing all of these things allows me to unwind after a long day so I never feel burnt out. After work, I spend the remainder of my day with my family and friends playing sports or board games and playing with my puppy.
What is one thing you are looking forward to for the rest of your internship here at Splunk?
I still have some time left, and there are so many things I want to do! I want to write a paper on my research so far, optimize log clustering and see it included in the product, and continue forming relationships with more interns and colleagues. But if I could choose only one thing, I would love to visit the San Francisco office and meet my team in person. I’ve never been to San Francisco and I’ve only seen pictures of the office, which is all the more reason to make my first trip!