Massive Streams, Ever Growing Sources, Highly Valuable
At Splunk we talk a lot about machine data. By that we mean the data generated by all the systems running in data centers, the "internet of things", and the new world of connected devices. It's all of the data generated by the applications, servers, network devices, security devices and remote infrastructure that power your organization.
Machine data contains a definitive record of all activity and behavior of your customers, users, transactions, applications, servers, networks, factory machinery, and so on. And it's more than just logs. It's configuration data, data from APIs and message queues, change events, the output of diagnostic commands and call detail records, sensor data from remote equipment, and more.
Splunk software users know that there are thousands of distinct machine data formats. Analyzing these in a meaningful way is critical to diagnosing service problems, detecting sophisticated security threats, understanding the health of remote equipment and demonstrating compliance.
Here are some of the most important machine data sources and what they can tell you. Remember, this list is just the starting point. Every environment has its unique footprint of machine data. Where's your untapped machine data opportunity?
Most homegrown and packaged applications write local logfiles, often via logging frameworks like log4j or log4net, logging services built into application servers like WebLogic, WebSphere and JBoss, or .Net, PHP, etc. These files are critical for day-to-day debugging of production applications by developers and application support. They're also often the best way to report on business and user activity and detect fraud scenarios, since they have all the details of transactions. When developers put timing information into their log events, they can also be used to monitor and report on application performance.
Business Process Logs
Complex events processing and business process management system logs are treasure troves of business and IT relevant data. These logs will generally include definitive records of customer activity across multiple channels such as the web, IVR / contact center or retail. They likely include records of customer purchases, account changes, and trouble reports. Combined with application, CDR and web logs, machine data can be used to implement full business activity monitoring.
Call Detail Records
Call detail records (CDRs), charging data records, event data records are some of the names given to events logged by telecoms and network switches. CDRs contain useful details of the call or service that passed through the switch, such as the number making the call, the number receiving the call, call time, call duration, type of call, etc. As communications services move to Internet protocol-based services, this data is also be referred to as IPDRs, containing details such as IP address, port number, etc. The specs, formats and structure of these files vary enormously and keeping pace with all the permutations has traditionally been a challenge. Yet the data they contain is critical for billing, revenue assurance, customer assurance, partner settlements, marketing intelligence and more. Splunk software can quickly index the data and combine it with other business data to enable users to derive new insights from this rich usage information.
User activity on the Internet is captured in clickstream data. This provides insight into a user's website and web page activity. This information is valuable for usability analysis, marketing and general research. Formats for this data are non-standard and actions can be logged in multiple places, such as the web server, routers, proxy servers, ad servers, etc. Existing monitoring tools look at a partial view of the data, from a specific source. Existing web analytics and data warehouse products often sample the data, missing the complete view of behavior and provide no real-time analysis.
There's no substitute for actual, active system configuration to understand how the infrastructure has been set up. Past configs are needed when debugging failures that occurred in the past and which may recur in the future. When configs change, it's important to know what changed and when, whether the change was authorized, and whether a successful attacker compromised the system to backdoors, time bombs or other latent threats.
Database Audit Logs and Tables
Databases contain some of the most sensitive corporate data—customer records, financial data, patient records and more. Audit records of all database queries are vital to have in order to understand who accessed or changed what data when. Database audit logs are also useful to understand how applications are using databases to optimize queries. Some databases log audit records to files, while others maintain audit tables accessible via SQL.
Filesystem Audit Logs
The sensitive data that's not in databases is on filesystems. In some industries such as healthcare, the biggest data leakage risk is consumer records on shared filesystems. Different operating systems, third-party tools and storage technologies provide different options for auditing read access to sensitive data at the filesystem level. This audit data is a vital data source for monitoring and investigating access to sensitive data.
Management and Logging APIs
Increasingly vendors are exposing critical management data and log events through both standardized and proprietary APIs rather than by logging to files. Checkpoint firewalls log via the OPSEC Log Export API (OPSEC LEA). Virtualization vendors including VMware and Citrix expose configurations, logs and system status via their own APIs.
Message queuing technologies like JMS, RabbitMQ, and AquaLogic are used to pass data and tasks between service and application components on a publish/subscribe basis. Subscribing to these message queues is a good way to debug problems in complex applications—you can see exactly what the next component down the chain received from the prior component. Separately, message queues are increasingly being used as the backbone of logging architectures for applications.
Operating System Metrics, Status and Diagnostic Commands
Operating systems expose critical metrics like CPU and memory utilization and status information using command-line utilities like ps and iostat on Unix and Linux and performance monitor on Windows. This data is usually harnessed by server monitoring tools but rarely persisted, yet it is potentially invaluable for troubleshooting, analyzing trends to discover latent issues and investigating security incidents.
Packet / Flow Data
Data generated by networks is processed using tools such as tcpdump and tcpflow, which generate pcap or flow data and other useful packet-level and session-level information. This information is necessary to handle performance degradation, timeouts, bottlenecks or suspicious activity that indicates that the network may be compromised or the object of a remote attack.
Supervisory Control and Data Acquisition (SCADA) refers to a type of industrial control system (ICS) that gathers and analyzes real-time data from equipment in industries such as energy, transport, oil and gas, water and waste control. These systems produce significant quantities of data about the status, operation, utilization, and communication of components. This data can be used to identify trends, patterns, anomalies in the SCADA infrastructure and used to drive customer value. For example, smart grid meter data captured to enable customers to become better informed of their electricity use through tools, programs, and services targeted to help them save energy, money and reduce the environmental footprint.
The growing network of sensor devices generate data based on monitoring environmental conditions, such as temperature, sound, pressure, power, water levels, etc. This data can have a wide range of practical applications if collected, aggregated, analyzed and acted upon. Examples include, water level monitoring, machine health monitoring and smart home monitoring.
Syslog from your routers, switches and network devices record the state of your network connections, failures of critical network components, performance and security threats. It's a standard for computer data logging. Tapping into this data means tapping into a wide variety of devices for troubleshooting, analysis, security auditing.
Web Access Logs
Web access logs report every request processed by a web server--what client IP it came from, what URL was requested, what the referring URL was, and data regarding the success or failure of the request. They're most commonly processed to produce web analytics reports for marketing—daily counts of visitors, most requested pages, and the like. They can also be customized to include gems like a Session ID or custom HTTP headers.
They're also invaluable as a starting point to investigate a user-reported problem, since the log of a failed request can establish the exact time of an error. Web logs are fairly standard and well structured. The only challenge is sheer volume with busy websites experiencing billions of hits a day as the norm.
Web Proxy Logs
Nearly all enterprises, service providers, institutions and government organizations that provide employees, customers or guests with web access use some type of web proxy to control and monitor that access. Web proxies log every web request made by users through the proxy. They may include corporate usernames and URLs hit. These logs are critical to monitor and investigate "terms of service" abuses or corporate web usage policy and are also a vital component of effective monitoring and the investigation of data leakage.
Windows stores rich information about an IT environment, usage patterns and security information. All is information is stored in Windows event logs—application, security and system. These logs are critical to understanding the health of an organization and can help detect problems with business critical applications, security information and usage patterns.
Wire data is an authoritative record of all communication between systems and applications that occurs in the network. It contains critical information such as payload data, session information, status codes, transaction values, process times, errors, transaction traces, database queries, DNS lookups and records, protocol level information including headers, content and flow records and much more.
By correlating wire data with other application and infrastructure data in Splunk software such as logs, metrics and events, IT admins can gain a complete view of availability, performance and usage of their services. IT Administrators can pinpoint root-cause, proactively monitor the performance and availability of individual technology silos, map dependencies of infrastructure to applications and trend performance to establish baselines. Wire data extends powerful insights to security teams for rapid incident investigations, more complete threat detection, supporting expanded monitoring and compliance. Wire data also captures user interactions and process insights for a deeper understanding of the user interactions, service levels and user experience to make informed decisions.