
Imagine you have a network, whether it's a LAN or a vast enterprise-level network spread across different locations. Now, picture yourself wanting to monitor and analyze the data flow within that network. That's where network telemetry comes into play.
Network telemetry is a group of techniques that allow you to understand better what's happening within networks. It's like watching the network's pulse to keep track of its health and performance.
Read on to learn more about the network telemetry landscape.
What is network telemetry?
Network telemetry is the collection, measurement and analysis of data related to the behavior and performance of a network. It involves gathering information about routers, switches, servers and applications to gain insights into how they function and how data moves through them.
To achieve this, network telemetry employs different methods. One common approach is network monitoring tools that capture and analyze traffic data. These tools provide information about network bandwidth, latency, packet loss, and other performance metrics.
Telemetry also includes protocols like SNMP (Simple Network Management Protocol) or NetFlow that enable data collection from network devices and routers. This data can then be processed and visualized to:
- Identify patterns
- Troubleshoot issues
- Optimize network performance
With network telemetry, you can detect and address network bottlenecks, security threats or anomalies that might impact the network's efficiency. It’ll help you make informed decisions, optimize network resources, and ensure a smooth and reliable network experience for users.
(It’s important to note the differences between monitoring, observability & telemetry.)
Defining the telemetry framework
Machine learning helps with analyzing network data to automate network operations. But, you should use multiple data sources and techniques to meet the needs of telemetry data.
A telemetry framework will help organize these different sources and integrate different approaches, making combining data for different applications easier. This simplifies interfaces and makes it more flexible.
The network telemetry framework has four modules. Each module has three components for data configuration, encoding, and instrumentation. The framework uses uniform data mechanisms and types, making it easy to manage and locate data in the system.
Top-level modules
There are four categories of network telemetry's top-level modules:
1) The Management plane includes protocols like SNMP and syslog through which network elements interact with a network management system (NMS). This telemetry must address data subscription, structured data, high-speed transport and congestion avoidance to ensure efficient automatic network operation.
2) Control plane telemetry monitors the health of different network control protocols. It helps to detect, localize, and predict network issues. This method also allows for real-time and detailed network optimization.
3) Forwarding plane telemetry system functions depending on the data that the network device can provide. Ensuring that data meets the quality, quantity, and timing standards can be challenging for devices in the network's data plane where the data originates.
4) In external data telemetry, external events are an essential data source. They can be detected by hardware or software. There are a few challenges in this telemetry:
- The data must meet strict timing requirements.
- Current and future devices and applications must quickly adopt the schema external detectors use.
- Counter-measures are needed to avoid congestion.
Second-level components
Each plane's telemetry module has five different parts.
- Data query, analysis and storage components issue data requirements, receive and process returned data and initiate further data queries. It can be centralized or distributed in network devices or remote controllers.
- Data configuration and subscription components manage data queries and subscriptions on devices, including configuring desired data and determining protocols and channels for data acquisition. Subscription data can be described through models, templates, or programs.
- Data encoding and export components control how telemetry data is sent to the storage component. But the encoding and transport may vary based on the export location.
- Data generation and processing component capture, filter, and process data in network devices from raw sources. Sometimes it's done through in-network computing and processing on fast or slow paths.
- Data object and source component identifies the objects being monitored and their original data sources. Data sources provide raw data, which may require further processing. And some sources are dynamic, while others are static.
Data acquisition mechanism and type abstraction
You can acquire network data through subscription (push) and query (pull):
- Subscriptions are contracts between publisher and subscriber (pub/sub), with subscribed data automatically delivered to registered subscribers until the subscription expires.
- Queries are used when a client expects immediate and one-off feedback from network devices.
Data can be pulled whenever needed, but pushing the data is more efficient and can reduce latency.
Mapping existing mechanisms into the framework
The framework's versatility allows it to function effectively across various computer systems. But, particular challenges may arise when gathering and examining data from multiple domains. So, you should plan and map mechanisms carefully to get accurate and reliable results.
(See how network and application monitoring differ.)
Network telemetry applications
As the network becomes more automated, new requirements are added to the existing techniques used in network telemetry. Each stage builds upon previous techniques and adds new requirements.
Here are the four stages of network telemetry applications:
Stage 0. Static Telemetry
At the time of design, the data source and its type for telemetry are determined. And the network operator's flexibility is limited to configuring how to utilize it.
Stage 1. Dynamic Telemetry
During the first stage, it's possible to program or configure telemetry data on the fly without disrupting the network's operation. This permits a balance to be struck between resource conservation, performance, flexibility, and coverage.
Stage 2. Interactive Telemetry
To meet network operations' visibility needs, you can tailor and adjust the telemetry data in real-time. Modifications are made frequently at this stage, depending on real-time feedback. Some tasks are automated, but human operators are still required to make decisions.
Stage 3. Closed-loop Telemetry
No human operators interfere with the telemetry except when generating reports. The intelligent network operation engine is responsible for automatically requesting telemetry data, analyzing it, and updating network operations through closed control loops.
(No matter the stage, all that data needs to be protected, that’s where network security comes in.)
Network telemetry protocols and standards
Telemetry protocols and standards ensure data is sent and received correctly between devices and systems. They help keep data accurate in monitoring, research, and automation.
Here are three protocols and standards:
- NetFlow is a way for devices to gather information about the traffic on a network and send it to a collector. Many vendors use it, and specific versions like JFlow, RFlow, and NetStream also exist. NetFlow v5 is common among companies, but it only supports IPv4 traffic. But NetFlow v9 supports IPv6 and MPLS traffic and uses templates for the records.
- sFlow helps collect and export packet metadata statistics from fast interfaces. It samples and streams one packet out of every packet to the collector. This makes it suitable for real-time traffic visibility but less accurate than NetFlow for digital forensics and network troubleshooting.
- IPFIX is a standard that combines NetFlow v9 features and offers more data types and flexibility. It's compatible with older versions and supports congestion avoidance and bandwidth optimization.
OpenTelemetry and network telemetry: What's the difference?
OpenTelemetry is an observability framework for managing and exporting telemetry data like traces, metrics, and logs. It helps analyze software performance and is open-source. It's not the same as network telemetry — but it can collect data from network devices.
Network telemetry supports your IT environments
Network telemetry is like having a network detective gathering data and clues about the network's behavior and performance. It empowers network administrators to stay on top of their game, maintaining a robust and efficient network infrastructure.
So, whether you're a network guru or just dipping your toes into the networking world, network telemetry is invaluable for managing and optimizing your network.
What is Splunk?
This posting does not necessarily represent Splunk's position, strategies or opinion.