Machine data, also known as machine-generated data, is information created automatically by software without any manual work required by a person. The data is often, but not always, about humans. Of course, in most instances a human made the strategic decision to have machine data created and in what format – but there is no human-in-the-loop at the point when the data is created and recorded.
This blog post will explore what machine data is, how it’s used, common examples of machine data, why it’s important and more.
What is Machine Data?
Machine data is data generated by all the systems running in data centers, the "internet of things", and the new world of connected devices. It's all of the data generated by everything that powers your organization:
- Network devices
- Security devices
- Remote infrastructure
Machine data contains a definitive record of all activity and behavior of your customers, users, transactions, applications, servers, networks, factory machinery, and so on. And it's more than just logs. It's configuration data, data from APIs and message queues, change events, the output of diagnostic commands and call detail records, sensor data from remote equipment, and more.
There are thousands of distinct machine data formats. Analyzing these in a meaningful way is critical for...
- Diagnosing service problems
- Detecting sophisticated security threats
- Understanding the health of remote equipment
- Demonstrating compliance
Some types of Machine Data
There is an endless, ever-growing list of machine-generated data that impacts our lives in countless ways. We’ve gathered here some of the more common examples of machine data. Remember, this list is just the starting point. Every environment has its unique footprint of machine data.
Exchangeable image file format (Exif) data is a type of metadata (data about data – in this case data about a digital image) recorded by digital cameras, smartphones, scanners and similar devices. Exif data records information about:
- The type of camera used to create an image
- The date and time
- The author
- Other details about the creation of the file
Exif is created without the knowledge of (most) device users and can be important for law enforcement and other investigators.
Website log files
Website log files are a type of machine generated data that record information about how websites are used. Depending on what source you believe, non-human bots make up 42%, 52%, or up to even 64% of total website traffic.
While tools like Google Analytics are intended to help website managers understand the behaviors of humans on their website, log files help a website manager understand total behavior of humans, bots, and any type of activity on their server. Log files are a type of machine data that can let website managers understand security risks and behavior patterns, among other use cases.
Events processing & business process management system logs
Complex events processing and business process management system logs are treasure troves of business and IT relevant data. These logs will generally include definitive records of customer activity across multiple channels such as the web, IVR / contact center or retail. They likely include records of:
- Customer purchases
- Account changes
- Trouble reports
Combined with application, CDR and web logs, machine data can be used to implement full business activity monitoring.
Telecom & network data
Call detail records (CDRs), charging data records, event data records are some of the names given to events logged by telecoms and network switches.
CDRs contain useful details of the call or service that passed through the switch, such as the number making the call, the number receiving the call, call time, call duration, type of call, etc. As communications services move to Internet protocol-based services, this data is also be referred to as IPDRs, containing details such as IP address, port number, etc.
The specs, formats and structure of these files vary enormously and keeping pace with all the permutations has traditionally been a challenge. Yet the data they contain is critical for billing, revenue assurance, customer assurance, partner settlements, marketing intelligence and more.
Databases contain some of the most sensitive corporate data — customer records, financial data, patient records and more.
Audit records of all database queries are vital to have in order to understand who accessed or changed what data when. Database audit logs are also useful to understand how applications are using databases to optimize queries. Some databases log audit records to files, while others maintain audit tables accessible via SQL.
Operating systems expose critical metrics like CPU and memory utilization and status information using command-line utilities like ps and iostat on Unix and Linux and performance monitor on Windows.
This data is usually harnessed by server monitoring tools but rarely persisted, yet it is potentially invaluable for troubleshooting, analyzing trends to discover latent issues and investigating security incidents.
Windows stores rich information about an IT environment, usage patterns and security information. All is information is stored in Windows event logs — application, security and system.
These logs are critical to understanding the health of an organization and can help detect problems with business critical applications, security information and usage patterns.
Sensor & IoT devices
The growing network of sensor devices generate data based on monitoring environmental conditions, such as temperature, sound, pressure, power, water levels, etc. This data can have a wide range of practical applications if collected, aggregated, analyzed and acted upon. Examples include:
- Water level monitoring
- Machine health monitoring
- Smart home monitoring
Devices that don't produce Machine Data
Generally, all “smart” devices produce machine data. Machine data is the key component that enables a device to become “smart”. Take the common household thermostat for example.
Data vs. machine data
A traditional thermostat produces data – but not machine data. A traditional thermostat can determine the current temperature of a room, display that data to a human viewing the device, and turn the AC/Heat on or off in accordance with the current temperature data.
However, the details are not being stored on the device in memory or transmitted to the cloud or other storage device. There is no record of what the temperature was at a particular time, or how frequently the AC/Heat turned on or off, or how many times a human adjusted the desired temperature on the device.
Data including machine data
A smart thermostat produces the same data as the traditional thermostat, while also generating machine data. The machine data could include:
- A date & timestamp log of the current temperature reading once every minute.
- A date & timestamp log of every time the AC or Heat function turns on or off.
- A measurement of the time it takes for the unit to move from current to desired temperature.
- A log of how and when humans request the unit to run.
This type of data alone can provide enough information for an algorithm to begin making suggestions or optimal running conditions. Combined with data (machine or otherwise) derived from other sources – such as outside temperature at the unit’s geographic location – the device can truly become smart enough to predict what the desired setting will be.
A smart device can begin to recognize patterns of behavior that its human owners might not have fully been aware of and adjust its routine accordingly to better meet the desired result of the human.
Another good example is the common alarm clock. A traditional analog (or even digital) alarm clock does not normally produce machine data because it has no internet connectivity or memory for storing data. Its function is simply to keep time and have the alarm go off at the point in time most recently specified by a person.
Adding in machine data can enable a “smart” alarm clock that could track the patterns of usage – for example, noting that every Monday-Friday, the alarm clock is set to go off at 6AM, while on weekends it is set to go off at 7AM. Logging of the settings and activity creates a dataset that can be used to detect patterns and predict future desired behaviors.
Splunk Video: What Is Machine Data?
Transcript of video:
Machine data is everywhere, intersecting with our lives and businesses in ways that change our lives for the better. Take an average day of the average Joe. Joe is cool. He orders a taxi from an app on his smartphone, and it arrives in minutes. The taxi’s location is visible on his phone the whole time. He uses that same phone to adjust his online budget, subtracting the money he just spent on the taxi. He takes taxis a lot. It may not be surprising your phone is generating data.
But here's where machine generated data gets interesting. In the car, data is generated that insurance companies can use to get discounts for safe driving. Or help you improve driving habits and fuel efficiency by measuring braking and acceleration.
That may help Joe's driver but he has other machine data fish to fry. His health monitoring device is aware of his lousy commute, spending an hour in the taxi on his butt on the way to the office and is logging that data into his fitness profile in the cloud.
Stoplights and toll booths, the taxi goes through, can send data to transportation planners to better route traffic and keep congestion and fuel consumption down. He finally gets to the office and his security badge gets him in the building, letting his company know that even with his lousy commute he got to work an hour early. Joe decides to take the elevator to his office on the 34th floor. The elevator is collecting data on how many times it stops on his floor to give the owner of the building insights into how likely the company will be to renew its lease. Once he's in the office, his smart thermostat detects when he's arrived and knows his temperature preferences and sets the temperature to 72.
Joe is a spontaneous kind of guy so he books a last minute trip to Hawaii for the weekend. His favorite travel site is down but worker bees are in the background crunching all the machine data that will get the server error keeping his booking at bay. Fixed. Just a minute later, Joe is off on his trip to the beach, the plane is a flying data center basically. Every bit of data is being collected, from engines and turbines to temperature and speed.
And that makes for a safe journey for Joe who by now is on the beach and not so average.
What is Splunk?
This posting does not necessarily represent Splunk's position, strategies or opinion.