LEARN

What Is eBPF? A Guide To Improved Observability & Telemetry

Extended Berkeley Packet Filter (eBPF) is an exciting technology that provides secure, high-performance kernel programmability directly from the operating system. It can expose a wide range of applications and kernel telemetry that is otherwise unavailable.

But with operating systems frequently processing very large volumes of network data, even with an efficient framework and cheap eBPF program runs, costs can add up quickly.

eBPF helps to maintain low overhead while enabling a real-time, high-granularity, no-sampling architecture for network insights in seconds — reducing MTTD. This article explains eBPF, including:

  • How it works
  • Benefits it offers
  • How to use it with the Flowmill Collector and OpenTelemetry
  • Solutions for common challenges

What is eBPF?

The Extended Berkeley Packet Filter (eBPF) is a kernel technology that allows programs to run without requiring changes to the kernel source code or the addition of new modules. It's a sandbox virtual machine (VM) inside the Linux kernel where programmers can run BPF bytecode that uses specified kernel resources.

eBPF reduces the need to alter kernel source code, simplifying software's ability to exploit existing layers. As a result, it's a strong technology that has the potential to change the you deliver services such as:

Initially, eBPF’s main use was a way of increasing observability and security while filtering network packets. Today, its functionality has been extended to various use cases such as providing high-performance networking and load balancing in modern data centers and cloud-native environments. Its core capabilities include:

  • Extracting granular security observability data with low overhead
  • Assisting application developers in tracing applications
  • Providing insights for performance troubleshooting and preventive application and container runtime security enforcement, among others

How eBPF works

eBPF lets programmers execute custom sandboxed bytecode within the kernel without having to change the kernel or load kernel modules — all by unlocking access to kernel-level events. It does this by:

  • Verifying programs being loaded at the hook points within the kernel that are triggered by specific events.
  • Calling helper functions to manipulate program data at optimum efficiency.
  • Using key-value pairs mappings to share data between the user and kernel space.  

Benefits of eBPF

eBPF is typically used to trace user-space processes within the Linux kernel and improve on security and observability in networking. The possibilities of the eBPF innovation are endless and it is a safe method to ensure and enhance the following components.

Security

eBPF enables the visibility and control of all aspects to be combined to develop security systems that are more context-aware and have a higher level of control. Programs are effectively sandboxed, which means kernel source code is safe and unaltered. The verification phase makes sure that resources aren't clogged up by programs that operate indefinitely.

(Explore SOAR, security orchestration, automation and response.)

Networking

Using eBPF ensures programmability and increases network efficiency. Since the code is run directly in the kernel, the process of packet processing is optimized without adding additional parsers and logic layers.

Observability & monitoring

eBPF provides a single accessible framework interface for collection and in-kernel aggregation of custom metrics, which:

  • Provides in-depth visibility and a central monitoring dashboard of events’ metrics from a wide range of sources.
  • Significantly reduces the overall system overhead.

Tracing & profiling

eBPF provides a single, powerful and easy-to-use framework for unified profiling and program tracing. When eBPF programs are attached to tracepoints in both the user and kernel spaces, it allows unprecedented visibility into the application runtime behavior, which could generate insights for troubleshooting.

The introspection provides enough sample data for internal visibility and performance improvement.

Using eBPF, Flowmill Collector & Open Telemetry for observability

By guaranteeing that the kernel layer is monitored, eBPF improves observability, allowing for greater visibility, context and accuracy in your data and infrastructure.

One way is with Flowmill Collector, an agent that uses the eBPF technology to collect low-level data directly from the Linux kernel. It accomplishes this with very little expense in terms of CPU and network resources by leveraging open-source eBPF infrastructure to help create robust low-overhead observability.

Network observability is vital when solving system complexity challenges. Modern deployments are complex, some having hundreds to thousands of loosely coupled microservices written in multiple languages, and application frameworks running across an ephemeral compute infrastructure. This complexity makes problems difficult to diagnose. Deployment changes day-to-day as services evolve.

What you might get from observability is a real-time map of the network and its dependencies including where each service is running. It also provides metrics on how services and their dependencies are performing — regardless of the programming languages and application frameworks that the services are built with — by analyzing the data of a setlist of important events. With network telemetry, it is possible to drill down to an individual pod or host level due to its granularity.

In this video, Jonathan Perry and the Splunk architect team explain some challenges they faced when building the Flowmill Collector and how OpenTelemetry solves them:

Challenge 1: More data for value-add context

One challenge of building a collector was that collecting information only from sockets was insufficient. One of the major advantages of network monitoring with eBPF is that you can see not only IP addresses, but also the context of the communication, the process container, and the host associated with the traffic.

To make telemetry valuable, more metadata is required, such as: 

  • Information from the cloud provider
  • Information about containers from Docker and the orchestrator
  • Information about network address translation and the mapping of external addresses to names that the users understand

The solve: The Flowmill Collector contains all this instrumentation, and the key advantage is that much of this instrumentation is reusable for other types of eBPF observability. For example, the metadata continues to be useful if you want to…

  • Monitor context switches
  • Collect profiling information
  • Monitor files instead of sockets
  • Monitor system calls

Challenge 2: Controlling overhead

Another challenge is how to reduce overhead. To measure live systems, every container update contains thousands of process and socket updates and hundreds of thousands of socket activity reports. If you encode container information on every socket report, you could be spending a lot of CPU time sketching, coding, and decoding container metadata.

The solve: The Flowmill Collector solves this by ensuring that it only sends updates for container process and socket metadata. Those updates are cached, eliminating much of the redundant work. This is one of the design decisions that enable the collector to achieve low CPU and network overheads so users can get always-on granular reporting in production.

Other challenges encountered include:

  • Header fetching and caching
  • Visibility into observability
  • Causality when reading from multiple perf-rings
  • Fast dev cycles when loading eBPF code from CLI

Splunk supports OpenTelemetry

Splunk is committed to improving observability efficiency through open-source projects like Open Telemetry, including by donating donating the Flowmill Collector. The telemetry combined with eBPF technology will not only provide a platform for high-performance kernel programmability, but will also augment your observability data pipeline to give users in-depth vital information about their distributed applications.

What is Splunk?

This article was written by Faith Kilonzi, a full-stack software engineer, technical writer, and a DevOps enthusiast, with a passion for problem-solving through implementation of high-quality software products. She holds a bachelor’s degree in Computer Science from Ashesi University. She has experience working in fin-tech, research, technology, and consultancy industries.

This posting does not necessarily represent Splunk's position, strategies or opinion.

Stephen Watts
Posted by

Stephen Watts

Stephen Watts works in growth marketing at Splunk. Stephen holds a degree in Philosophy from Auburn University and is an MSIS candidate at UC Denver. He contributes to a variety of publications including CIO.com, Search Engine Journal, ITSM.Tools, IT Chronicles, DZone, and CompTIA.