Documentation: 3.0.1
Print Version Contents
This page last updated: 11/05/07 11:11am

How data distribution works

Splunk servers can forward data to one another (as well as to other systems) in real time, so that data inputs gathered on one Splunk server can be sent to another Splunk server for indexing and search. Also, Splunk servers can forward data to groups of other Splunk servers, to enable horizontal scaling via clustered indexing. Splunk servers can also clone data to multiple groups of other Splunk servers to provide for data redundancy in high availability environments.

Data distribution covers all configurations in which one Splunk server (the forwarder) is sending data to one or more Splunk servers (the receivers) prior to being indexed. The forwarder can also index data locally.

Please note: each receiving Splunk server must have a unique, valid Splunk Enterprise license.

Data forwarding

Data forwarding is the simplest set up for forwarding and receiving. Data forwarding refers to any server that simply sends data to another server for indexing.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/dataforward.jpg

Learn how to enable forwarding and receiving.

Data routing

With data routing, the forwarder matches conditions based on patterns in the events themselves to selectively send some events to one other server and other events to another server.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/datarouting.jpg

Learn how to enable data routing.

Data cloning

Data cloning refers specifically to a forwarder sending every event to two or more other Splunk servers to provide for data redundancy.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/datacloning.jpg

Learn how to enable cloning.

Data balancing

Data balancing refers to data that is sent in a balanced fashion to groups of servers. This is generally done when you have a large data volume. All of the forwarders send data to some number of receivers, and the receivers index data in a round robin fashion.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/balance.jpg

Data balanced target groups must be made up of multiple servers. Learn how to set up data balancing.

Buffering during data balancing

During data balancing, if a server becomes inaccessible, Splunk will continue to send events to all accessible servers.

Eventually, Splunk will stop trying to send to a server, and will note that the server has gone off line. If all servers are inaccessible, Splunk will write to a buffer on the forwarder's side.

Data buffering values can be set in outputs.conf.

Target groups

Rather than output data to one receiver, forwarders can send to target groups. Target groups are made of one or more receiving servers:

[target group 1]
server 1, server 2

[target group 2]
server 3

[target group 3]
server 4, server 5, server 6

Cloning sends every event to all target groups.

Routing is configured to send specific events to one target group and specific events to other target groups.

You can also set up default groups, which will receive all the data not sent to target groups. If there is more than one group specified the events will be cloned to all listed default groups.
defaultGroup=<groupname1>,<groupname2>...

Learn more about target group configuration.

Security

Any Splunk server can route some or all of its incoming data in real time to other Splunk servers and to other systems via TCP, either in the clear text or via SSL. Learn how to set up SSL.

Send to 3rd party systems

By default, data is routed between Splunk servers as cooked data -- meaning events have been parsed and tagged. However, Splunk can be configured to either receive or send raw data in order to interact with third party systems.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/thirdparty.jpg

Learn how to configure Splunk to send to or receive from third party software.

Distributed search

Splunk servers can be configured to distribute search requests to other Splunk servers and merge the results back to the user. Distributed search combines with balanced indexing to provide horizontal scaling to search and index hundreds of gigabytes or terabytes per day. Additionally, distributed search allows select users to correlate data across different data silos.

http://www.splunk.com/assets/doc-images/30_admin13_forwardreceive/dsearch.jpg

Learn more about distributed search.

Configuration files for data distribution

  • The forwarder uses the TCP output processor, configured by outputs.conf.
  • Conditions for routing are established via transforms.conf and linked to specific sources, source types or hosts via props.conf.
Previous: Set up LDAP    |    Next: Enable forwarding and receiving

Comments

No comments have been submitted.

Log in to comment.