eBPF Background

What is eBPF?

eBPF (extended Berkeley Packet Filter) is a revolutionary technology that allows running sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. Originally designed for network packet filtering, eBPF has evolved into a general-purpose execution engine that can be used for a wide variety of use cases.

How eBPF Works

eBPF programs are written in a restricted C subset, compiled to eBPF bytecode, and loaded into the kernel. The kernel verifies the program for safety before executing it, ensuring:

  • Memory Safety: Programs cannot access arbitrary memory locations

  • Termination: Programs must terminate (no infinite loops)

  • Bounded Execution: Programs have limited instruction count and stack size

Key Components

  1. eBPF Programs: Small programs that execute in kernel space

  2. eBPF Maps: Key-value data structures for sharing data between kernel and user space

  3. eBPF Verifier: Ensures program safety before execution

  4. BPF Type Format (BTF): Provides type information for eBPF programs

  5. Helper Functions: Kernel-provided functions that eBPF programs can call

eBPF in DataCrumbs

DataCrumbs leverages eBPF technology to provide low-overhead I/O tracing capabilities. The tool uses several eBPF features:

Kprobes and Uprobes

  • Kprobes: Attach eBPF programs to kernel functions for tracing system calls and kernel-level I/O operations

  • Uprobes: Attach eBPF programs to user-space functions in libraries (libc, libhdf5, etc.)

These probes allow DataCrumbs to intercept function calls without modifying the application or kernel code.

Ring Buffers

DataCrumbs uses eBPF ring buffers (introduced in Linux 5.8) for efficient data transfer from kernel to user space. Ring buffers provide:

  • High-throughput event delivery

  • Low latency

  • Memory efficiency

  • Multi-producer, single-consumer semantics

Maps for State Management

eBPF maps store:

  • Process tracking information

  • File descriptor mappings

  • Thread-local data

  • Aggregated statistics

Advantages of eBPF for I/O Tracing

Minimal Overhead

eBPF programs execute directly in the kernel with JIT compilation, resulting in near-native performance. DataCrumbs typically adds less than 5% overhead to application execution.

Safety and Stability

The eBPF verifier ensures that programs cannot crash the kernel or compromise system security. This makes DataCrumbs safe to use in production environments.

Dynamic Instrumentation

eBPF programs can be loaded and unloaded dynamically without rebooting the system or restarting applications. This allows DataCrumbs to:

  • Start and stop tracing on demand

  • Update probe configurations at runtime

  • Trace running applications without interruption

No Code Modification

Applications do not need to be recompiled or modified to be traced by DataCrumbs. The tool can trace:

  • Binary-only applications

  • Third-party libraries

  • System calls

  • Custom functions

eBPF Limitations

Kernel Version Requirements

eBPF features have evolved over time, with different capabilities available in different kernel versions:

  • Linux 4.18: Basic eBPF support with compatibility layers

  • Linux 5.1+: Modern eBPF features

  • Linux 5.8+: Full modern eBPF features with BPF ring buffers (recommended)

Stack Size Limits

eBPF programs have a limited stack size (512 bytes). DataCrumbs works around this by:

  • Using per-CPU maps for temporary storage

  • Minimizing stack variable usage

  • Splitting complex operations across multiple helper functions

Verifier Restrictions

The eBPF verifier imposes restrictions on:

  • Loop complexity (bounded loops only in newer kernels)

  • Function calls (limited call depth)

  • Memory access patterns (must be verified safe)

DataCrumbs handles these restrictions through careful program design and code generation.

eBPF Tools and Ecosystem

libbpf

DataCrumbs uses libbpf (version 1.5.0+) as the primary library for:

  • Loading eBPF programs into the kernel

  • Managing eBPF maps

  • Attaching probes to functions

  • Handling BTF information

bpftool

bpftool (version 7.5.0+) is used during the build process for:

  • Generating vmlinux.h (kernel type definitions)

  • Creating BPF object files

  • Generating skeleton headers for C programs

  • Inspecting loaded eBPF programs (debugging)

BCC vs libbpf

DataCrumbs uses the libbpf approach rather than BCC (BPF Compiler Collection) because:

  • Portability: libbpf-based programs are compiled once and run anywhere

  • Performance: No runtime compilation overhead

  • Dependencies: Smaller dependency footprint

  • Distribution: Easier to package and deploy

Further Reading

For more information about eBPF: