What’s Your ulimit?

If you don’t know the answer to that question, you should go into the corner for a 5 minute time out. 😉 No need to beat yourself up for not knowing. It’s not something most people would think to check when deploying Splunk. Since it usually rears its slightly-monstrous-yet-interesting head when system load creeps higher, let’s just set it and forget it. Or for a little added drama, address it when Splunk crashes or hangs.

If *nix is your operating system then you need to worry about this. For Windows, you probably have other things to worry about so I don’t think this a concern. From the handy man pages, ulimit is short for user limits for system-wide resources. To Splunk, these user limits equate to file descriptors, how many files can be open simultaneously.

Splunk will allocate file descriptors for:

  • files being actively monitoring
  • forwarder connections
  • deployment clients
  • users running searches

The default ulimit value is often 1024. Even if Splunk allocates just a single file descriptor for each of the activities above it’s easy to see how even a few hundred files being monitored, a few hundred forwarders sending data, a handful of very active users on top of reading/writing to/from the datastore can easily exhaust this measly default setting.

Well, Splunk doesn’t just allocate a single file descriptor for everything it touches. Here are additional details on how to size ulimit with consideration for what might go wrong with forwarders and deployment clients. This will help you adjust the ulimit to something sensible for your current system and projected growth.

Setting ulimit on Indexers

When all is humming along, each forwarder will require 2 file descriptors–1 for a data and 1 for a health check connection. So the minimum ulimit setting is 2 x # forwarders.

When things hit a snag and forwarders are unable to connect to an indexer (e.g. an indexer is offline for updates or fails), a forwarder can trigger allocation of up to 5 file descriptors on retries–4 for data and 1 for health check. This means the open file descriptor count can potentially reach 5 x # forwarders. This is the theoretical max.

Therefore, a super safe ulimit will be 8 x # forwarders to account for the additional file descriptors Splunk will need for reading/writing during indexing/searching. This setting is very important for indexers as we are expecting constant concurrent connections from forwarders.

Setting ulimit on the Deployment Server

The importance of increasing the ulimit for deployment servers is lower than for indexers because deployment clients are much more bursty and quick in their communication with the deployment server. This means the likelihood all deployment clients will all check in at once and exhaust file descriptors is much lower.

The low water mark is 2 x # deployment clients and a safe ulimit is 4 x # deployment clients.

Why Not Set ulimit to Unlimited?

It doesn’t hurt to remove the hard limit and set ulimit to unlimited… unless there is some kind of file descriptor leak in Splunk. Such a leak can go undetected for a long time and consume more and more resources. We don’t expect this to happen since we do monitor specifically for these types of problems in our longevity tests conducted with 1000 forwarders across 10 indexers over many days with ulimit set at 2048.

Additional Hints

As studly Splunk Admins you probably already know how to do this, but in case triptophan has an early/lasting grip on you:

  • The ulimit is set in increments of 1024, so please round your calculations up to the next 1024 increment.
  • Use “ulimit -n” to find the current max number of FDs for your system.
  • To change the max open files, here is a good guide:

Many thanks to Jag Kerai, Splunk Superstar Developer, for this incredibly useful insight and guidance! And Happy Thanksgiving!

Vi Ly

Posted by