=================== eBPF Background =================== What is eBPF? ============= eBPF (extended Berkeley Packet Filter) is a revolutionary technology that allows running sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. Originally designed for network packet filtering, eBPF has evolved into a general-purpose execution engine that can be used for a wide variety of use cases. How eBPF Works ============== eBPF programs are written in a restricted C subset, compiled to eBPF bytecode, and loaded into the kernel. The kernel verifies the program for safety before executing it, ensuring: * **Memory Safety**: Programs cannot access arbitrary memory locations * **Termination**: Programs must terminate (no infinite loops) * **Bounded Execution**: Programs have limited instruction count and stack size Key Components -------------- 1. **eBPF Programs**: Small programs that execute in kernel space 2. **eBPF Maps**: Key-value data structures for sharing data between kernel and user space 3. **eBPF Verifier**: Ensures program safety before execution 4. **BPF Type Format (BTF)**: Provides type information for eBPF programs 5. **Helper Functions**: Kernel-provided functions that eBPF programs can call eBPF in DataCrumbs ================== DataCrumbs leverages eBPF technology to provide low-overhead I/O tracing capabilities. The tool uses several eBPF features: Kprobes and Uprobes ------------------- * **Kprobes**: Attach eBPF programs to kernel functions for tracing system calls and kernel-level I/O operations * **Uprobes**: Attach eBPF programs to user-space functions in libraries (libc, libhdf5, etc.) These probes allow DataCrumbs to intercept function calls without modifying the application or kernel code. Ring Buffers ------------ DataCrumbs uses eBPF ring buffers (introduced in Linux 5.8) for efficient data transfer from kernel to user space. Ring buffers provide: * High-throughput event delivery * Low latency * Memory efficiency * Multi-producer, single-consumer semantics Maps for State Management -------------------------- eBPF maps store: * Process tracking information * File descriptor mappings * Thread-local data * Aggregated statistics Advantages of eBPF for I/O Tracing =================================== Minimal Overhead ---------------- eBPF programs execute directly in the kernel with JIT compilation, resulting in near-native performance. DataCrumbs typically adds less than 5% overhead to application execution. Safety and Stability -------------------- The eBPF verifier ensures that programs cannot crash the kernel or compromise system security. This makes DataCrumbs safe to use in production environments. Dynamic Instrumentation ------------------------ eBPF programs can be loaded and unloaded dynamically without rebooting the system or restarting applications. This allows DataCrumbs to: * Start and stop tracing on demand * Update probe configurations at runtime * Trace running applications without interruption No Code Modification -------------------- Applications do not need to be recompiled or modified to be traced by DataCrumbs. The tool can trace: * Binary-only applications * Third-party libraries * System calls * Custom functions eBPF Limitations ================ Kernel Version Requirements --------------------------- eBPF features have evolved over time, with different capabilities available in different kernel versions: * **Linux 4.18**: Basic eBPF support with compatibility layers * **Linux 5.1+**: Modern eBPF features * **Linux 5.8+**: Full modern eBPF features with BPF ring buffers (recommended) Stack Size Limits ----------------- eBPF programs have a limited stack size (512 bytes). DataCrumbs works around this by: * Using per-CPU maps for temporary storage * Minimizing stack variable usage * Splitting complex operations across multiple helper functions Verifier Restrictions --------------------- The eBPF verifier imposes restrictions on: * Loop complexity (bounded loops only in newer kernels) * Function calls (limited call depth) * Memory access patterns (must be verified safe) DataCrumbs handles these restrictions through careful program design and code generation. eBPF Tools and Ecosystem ========================= libbpf ------ DataCrumbs uses **libbpf** (version 1.5.0+) as the primary library for: * Loading eBPF programs into the kernel * Managing eBPF maps * Attaching probes to functions * Handling BTF information bpftool ------- **bpftool** (version 7.5.0+) is used during the build process for: * Generating vmlinux.h (kernel type definitions) * Creating BPF object files * Generating skeleton headers for C programs * Inspecting loaded eBPF programs (debugging) BCC vs libbpf ------------- DataCrumbs uses the **libbpf** approach rather than BCC (BPF Compiler Collection) because: * **Portability**: libbpf-based programs are compiled once and run anywhere * **Performance**: No runtime compilation overhead * **Dependencies**: Smaller dependency footprint * **Distribution**: Easier to package and deploy Further Reading =============== For more information about eBPF: - `eBPF.io `_ - Official eBPF documentation - `libbpf Documentation `_ - libbpf API reference - `Kernel Documentation `_ - Linux kernel eBPF docs - `BPF Performance Tools `_ - Book by Brendan Gregg