Splunk Deployment Components
The typical components that make up a Splunk deployment include Splunk forwarders, indexers and search heads. Splunk Enterprise is a single package that can perform one or many of the roles that each component would normally deliver, in addition to others. Users can install and deploy Splunk® Enterprise on AWS within minutes of their choice of hardware (physical, cloud or virtual) and operating system. The package is available via a public AMI (Amazon Machine Image) in addition to downloadable packaged forms for most operating systems. While all major Splunk components can be run from a single installation on a single cloud instance, they can also run independently from within different cloud instances. Depending on the deployment infrastructure, considerations must also be taken to allocate the proper amount of resources per component type.
Forwarders perform data collection, data forwarding and data load balancing. Low amounts of resources are required to run a forwarder as they typically read and send data with minimal overhead. A Universal Forwarder is a lightweight package of the Splunk software that can perform most, if not all, of the forwarder functionality.
Indexers write the data to a storage device and perform searches on the data. These can be resource intense and require I/O and CPU allotment.
Search heads search for information across indexers and require CPU and memory allotment.
Budgeting system resources and bandwidth to enable search and index performance depend on the total volume of data being indexed and the number of active concurrent searches (scheduled or otherwise) at any time. In addition to rapidly writing data to disk, indexers perform much of the work involved in running searches: reading data off disk, decompressing it, extracting knowledge and reporting. Since indexers incur most of the workload, increases in indexing volume should be tied to an increase in indexer instances. Deploying additional indexers will distribute the load of increased data volumes, resulting in reduced contention for resources during searches and accelerated search performance.
Most EC2 deployments leverage a combination of forwarders and network streams to send data to the Splunk indexer(s). While a forwarder is not required to gather data from the source, they do provide certain benefits such as flexibility, load balancing and reliability. Splunk easily acquires machine data in a variety of ways: Posting data directly to Splunk’s HTTP Event Collector (HEC), querying an API, monitoring a file, and listening to network data are a few of the most common patterns.
Other Splunk components include the Deployment Server (configuration publishing), License Master (license management) and Master Node (manages index replication).
Performance Considerations Within AWS
There are several performance factors to consider when deploying Splunk software on Amazon Web Services. These considerations are AWS EC2 Instance Size, AWS storage type and Amazon Machine Image selection.
AWS Instance: While spot and on-demand instances can save money when not in use, Splunk is persistent software that is intended to gather and index data at all times; thus, reserved instances are preferred. The following are recommended minimum EC2 instance requirements:
Splunk software is well suited for AWS, as it scales horizontally. Adding Splunk instances offers more performance and capacity depending on data volume requirements. See tables 2-4 for more detail on recommended sizes.
SmartStore Using AWS S3: Splunk recommends using SmartStore as a way to enable the following benefits in a Splunk deployment:
- Reduced storage cost. Your deployment can take advantage of the economy of S3 compatible remote object stores, instead of relying on costly local storage.
- Access to high availability and data resiliency features available through remote object stores.
- The ability to scale compute and storage resources separately, thus ensuring that you use resources efficiently.
- Simple and flexible configuration with per-index settings.
When planning storage requirements for the indexes, take into account that Splunk software will compress the data. Typical installations experience an effective 2:1 compression ratio when storing raw data and the associated index and metadata. This means that if indexing 10GB/day, users should expect to utilize approximately 5GB of storage per day. The size of your SmartStore cache should reflect the average number of days that will be most frequently searched. Additional configurations are necessary to optimize Splunk for a SmartStore deployment, as noted in Table 1, below.
AWS AMI: Splunk Enterprise runs on most widely available operating systems including Windows and *NIX platforms. When choosing the OS for the search head and indexers, a 64-bit architecture is highly recommended. Splunk offers a public AMI containing Splunk Enterprise on top of a 64-bit Linux Amazon OS, via the AWS Marketplace.
Deployment Guidelines and Examples
The tables below describe general guidelines for mapping instances to Splunk workloads. Best practices for architecting and sizing should still be considered when referencing these guidelines. It is important to remember that overall Splunk load is composed of both indexing and searching.Note that i3 instance types use ephemeral storage. Thus, clustering is required to use these instance types.
Table 2: Indexers
Table 3: Search Heads
Table 4: Deployment Server, License or Cluster Master