Clara-fication: Data Onboarding Best Practices

I think we can all agree that data is key when it comes to making decisions. As such, it would seem equally important to follow a set of standards when ingesting data into Splunk. These standards would ensure that logs are formatted properly and there is documentation available to understand details about the source data. While every company and use case is different, I’d like to share some of the standards and best practices the Splunk Security Center of Excellence follows. While no one needs to adopt any or all of these, I hope it serves as a guide to improve current data ingestion processes.

Consultations and Metadata

Before you can even begin to bring in data, it’s important to understand details about the data and what it is going to be used for. Internally at Splunk, the Security Center of Excellence requires a form to be filled out for every new data onboarding request. We will not triage a ticket until we have all of the necessary information.

Here are some of the questions we ask and why we ask them:

Process

When the request has been approved to work, we always start by sending data to our development environment. This helps ensure our production indexer cluster only contains validated data. Every aspect of the data ingestion will be tested and approved in the development environment before being promoted to production.

Things to Test While Configuring Data Ingestion

One thing all data ingestion teams should know is the Great 8. The Great 8 refers to configurations every data source should have defined. All of these configurations play a crucial role in parsing data. Defining these settings with specific rules, instead of generic and default rules, should result in improved performance.

These configurations will be set in props.conf and are known as the following:

Other things to check and configure:

Post-Data Ingestion Checks

How to Scale All of This

One of the most important things people can do when it comes to data onboarding is to be able to support all of the requests. If there are requests coming into your queue frequently enough, being efficient in the process is a must, otherwise the backlog will keep growing.

There are a few things that we do internally to improve our operational efficiency when it comes to data ingestion. Some of these may or may not be a good fit for your organization, but we have found that this works well for us.

SHOULD_LINEMERGE = False

LINEBREAKER = ([\r\n]+)

EVENT_BREAKER_ENABLE = True

EVENT_BREAKER = ([\r\n]+)

TRUNCATE = 10000

TIME_PREFIX = ^

MAX_TIMESTAMP_LOOKAHEAD = 35

TIME_FORMAT = %F %T.%3N %:z

We do customize many of these for a lot of our inputs, especially the time options, but this does give us a good starting place.

Final Thoughts

Getting data into your Splunk environment is one of the most important things to get right. As the Splunk Security Center of Excellence, we are committed to build out efficient, scalable, and secure solutions to our services and share our findings with our customers and users. While we understand our solutions may not be the best solution for every use case, we hope that our experience is able to spark new ideas and designs to create or improve your own processes.

Related Articles

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease
Platform
2 Minute Read

Announcing the General Availability of Splunk POD: Unlock the Power of Your Data with Ease

Splunk POD is designed to simplify your on-premises data analytics, so you can focus on what really matters: making smarter, faster decisions that drive your business forward.
Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights
Platform
3 Minute Read

Introducing the New Workload Dashboard: Enhanced Visibility, Faster Troubleshooting, and Deeper Insights

Announcing the general availability of the new workload dashboard – a modern and intuitive dashboard experience in the Cloud Monitoring Console app.
Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ
Platform
5 Minute Read

Leading the Agentic AI Era: The Splunk Platform at Cisco Live APJ

The heart of our momentum at Cisco Live APJ is our deeper integration with Cisco, culminating in the Splunk POD and new integrations, delivering unified, next-generation data operations for every organization.
Dashboard Studio: Token Eval and Conditional Panel Visibility
Platform
4 Minute Read

Dashboard Studio: Token Eval and Conditional Panel Visibility

Dashboard Studio in Splunk Cloud Platform can address more complex use cases with conditional panel visibility, token eval, and custom visualizations support.
Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard
Platform
4 Minute Read

Introducing Resource Metrics: Elevate Your Insights with the New Workload Dashboard

Introducing Resource Metrics in Workload Dashboard (WLD) – a modern and intuitive monitoring experience in the Cloud Monitoring Console (CMC) app.
Powering AI Innovation with Splunk: Meet the Cisco Data Fabric
Platform
3 Minute Read

Powering AI Innovation with Splunk: Meet the Cisco Data Fabric

The Cisco Data Fabric brings AI-centric advancements to the Splunk Platform, seamlessly connecting knowledge, business, and machine data.
Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades
Platform
3 Minute Read

Remote Upgrader for Windows Is Here: Simplifying Fleet-Wide Forwarder Upgrades

Simplify fleet-wide upgrades of Windows Universal Forwarders with Splunk Remote Upgrader—centralized, signed, secure updates with rollback, config preservation, and audit logs.
Dashboard Studio: Spec-TAB-ular Updates
Platform
3 Minute Read

Dashboard Studio: Spec-TAB-ular Updates

Splunk Cloud Platform 10.0.2503 includes a number of enhancements related to tabbed dashboards, trellis for more charts, and more!
Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises
Platform
2 Minute Read

Introducing Edge Processor for Splunk Enterprise: Data Management on Your Premises

Announcing the introduction of Edge Processor for Splunk Enterprise 10.0, designed to help customers achieve greater efficiencies in data transformation and improved visibility into data in motion.