Topics

| pdf version

Set up distributed search


Splunk > The IT Search Company

  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk

Localized Splunk documentation

Looking for Splunk documentation in other languages?

What is distributed search?

This documentation applies to the following versions of Splunk: 4.0 , 4.0.1 , 4.0.2 , 4.0.3 , 4.0.4 , 4.0.5 , 4.0.6 , 4.0.7 , 4.0.8 , 4.0.9 , 4.0.10

What is distributed search?

In distributed search, Splunk servers send search requests to other Splunk servers and merge the results back to the user. In a typical scenario, one Splunk server searches indexes on several other servers.

These are some of the key use cases for distributed search:

  • Horizonal scaling for enhanced performance. Distributed search provides horizontal scaling by distributing the indexing and searching loads across multiple indexers, making it possible to search and index large quantities of data.
  • Access control. You can use distributed search to control access to indexed data. In a typical situation, some users, such as security personnel, might need access to data across the enterprise, while others need access only to data in their functional area.
  • Managing geo-dispersed data. Distributed search allows local offices to access their own data, while maintaining centralized access at the corporate level. Chicago and San Francisco can look just at their local data; headquarters in New York can search its local data, as well as the data in Chicago and San Francisco.
  • Maximizing data availability. Distributed search, combined with load balancing and cloning from forwarders, is a key component of high availability solutions.

The Splunk instance that does the searching is referred to as the search head. The Splunk instances that do the indexing are called search peers or indexer nodes. Together, the search head and search peers constitute the nodes in a distributed search cluster.

A search head can also index and serve as a search peer. However, in performance-based use cases, such as horizontal scaling, it is recommended that the search head only search and not index.

A search head by default runs its searches across all search peers in its cluster. You can limit a search to one or more search peers by specifying the splunk_server field in your query. See Search across one or more distributed servers in the User manual.

Some search scenarios

This diagram shows a simple distributed search scenario for horizontal scaling, with one search head searching across three peers:

Image:Horizontal_scaling.png


In this diagram showing a distributed search scenario for access control, a "security" department search head has visibility into all the indexing search peers. Each search peer also has the ability to search its own data. In addition, the department A search peer has access to both its data and the data of department B:

Image:Access_control.png


Finally, this diagram shows the use of load balancing and distributed search to provide high availability access to data:

For more information on load balancing, see Configure automatic load balancing in this manual.

For information on Splunk distributed searches and capacity planning, see Dividing up indexing and searching in the Installation manual.

What search heads send to search peers

When initiating a distributed search, the search head distributes its knowledge objects to its search peers. Knowledge objects include saved searches, event types, and other entities used in searching across indexes. The search head needs to distribute this material to its search peers so that they can properly execute queries on its behalf. See What is Splunk knowledge? for detailed information on knowledge objects.

The indexers use the search head's knowledge to execute queries on its behalf. When executing a distributed search, the indexers are ignorant of any local knowledge objects. They have access only to the search head's knowledge.

The process of distributing search head knowledge means that the indexers by default receive nearly the entire contents of all the search head's apps. This set of data is referred to as the distributed bundle. If an app contains large binaries that do not need to be shared with the indexers, you can reduce the size of the bundle by means of the [replicationWhitelist] stanza in distsearch.conf. See Limit distributed bundle size in this manual.

The distributed bundle gets distributed to $SPLUNK_HOME/var/run/searchpeers/ on each search peer.

Because the search head distributes its knowledge, search scripts should not hardcode paths to resources. The distributed bundle will reside at a different location on the search peer's file system, so hardcoded paths will not work properly.

User authorization

All authorization for a distributed search originates from the search head. At the time it sends the search request to its search peers, the search head also distributes the authorization information. It tells the search peers the name of the user running the search, the user's role, and the location of the distributed authorize.conf file containing the authorization information.

Licenses for distributed deployments

Each indexer node in a distributed deployment requires a unique license key.

Search heads performing no indexing or only summary indexing can use the forwarder license. If the search head performs any other type of indexing, it requires a unique key.

See Search head license in the Installation manual for a detailed discussion of licensing issues.

Cross-version compatibility

All search nodes must be running Splunk 4.x to participate in the distributed search. Distributed search is not backwards or forwards compatible with Splunk 3.x.

Revision: 207 Contact Privacy Policy Terms of Use Community content licensed under Creative Commons