Invest in NanoVMs!

Overview of EBPF in the era of unikernels

Introduction

This article overviews EBPF (Extended Berkeley Packet Filter), which is a recent technology available in the Linux kernel to add user-defined functionalities in the kernel. This article aims at understanding what EBPF is and its use-cases. Also, we compare EBPF with unikernels. Please visit [3] If you want to get a better understanding of EBPF.

Overview

EBPF is a recent technology in the Linux kernel that allows users to add user-defined functionalities in the kernel. These functionalities are named EBPF programs that are attached to a specific kernel’s path and they execute when the path is traversed, e.g., send/receive a packet, open a socket, invoke/return from a syscall, etc. In simple words, an EBPF program is a piece of binary that executes in the context of the kernel when a certain event occurs at the kernel level. By adding these programs, users can tweak a kernel behaviour or add a new one. The original goal of BPF was to enable the specification of network filters in a more efficient way [1]. The aim of a network filter is to drop or keep packets by relying on a set of rules that applies to packet’s data, e.g., source IP, TCP type, MAC address, etc. A filter simply evaluates a condition on a packet and returns either true, i.e., keep the packet, or false, i.e., drop the packet. In [1], the authors propose a dedicated language to specify network filters. For example, the following code specifies a filter to accept all the IP packets:

ldh [12]
      jeq #ETHERTYPE_IP, L1, L2
L1:   ret  #TRUE
L2:   ret  #FALSE

From this specification, a bytecode is generated, and then loaded into the corresponding kernel path. For example, this specification can be attached to a raw socket to only receive IP packets during the receive() syscall. To get a taste on how a filter looks like, you can invoke tcpdump in the following way:

tcpdump port 22 -dd

You are going to get something like:

{ 0x28, 0, 0, 0x0000000c },
{ 0x15, 0, 8, 0x000086dd },
{ 0x30, 0, 0, 0x00000014 },
{ 0x15, 2, 0, 0x00000084 },
{ 0x15, 1, 0, 0x00000006 },
{ 0x15, 0, 17, 0x00000011 },
{ 0x28, 0, 0, 0x00000036 },
{ 0x15, 14, 0, 0x00000016 },

This is the bytecode for a filter to accept only packets to port 22. BPF filters perform better than filters at user-space because the filtering happens closely to where the packet has first arrived, i.e., the kernel. In this case, the filter executes in the kernel’s path that corresponds with the reception of network packets. EBPF extends BPF to the whole kernel, in other words, to any possible kernel’s path. EBPF is a domain-specific language that allows the user to extend Linux kernel functionalities. This is a C-like language that is translated to assembler instructions for a EBPF Virtual Machine. This is a simple state machine with a reduced number of registers. The following picture shows the EBPF components which are implemented within kernel:

The user loads a EBPF program by using the bpf() syscall, which is the interface to the kernel. Before loading the program into the kernel, the verifier verifies the program. This is important since the EBPF program executes in kernel mode. The Verifier checks for:

  • Unbounded loops
  • Use of a non-initialized register
  • Bounded number of instructions
  • Invalid access to the stack
  • Unreachable instructions

The loading of a program fails when any of these requirements is not fulfilled. To get a better understanding of the verifier, we recommend watching [2].

After the program is statically analysed, the verifier simulates the execution of the program one instruction at a time. During the execution, the verifier checks the states of the virtual machine before and after the execution of every instruction to ensure that registers and the stack state are valid.

The Just-in-Time compiler (JIT) takes the verified code and compiles this to the native CPU, e.g., x86, ARM. At this point, the EBPF program is attached to the corresponding kernel path and is executed when the right event occurs.

An EBPF program only contains code instructions. There is no actual data or methods to allocate dynamic memory like by using malloc(). For this purpose, EBPF allows the definition of maps. Maps are used to exchange data between other EBPF programs and with user applications. Note that the stack can be used to read values only if the program has already written to it.

EBPF programs are only allowed to interact with the OS by using EBPF helpers. For example, this API allows a program to get the current time or get a random number. This API is stable over time and has backward support.

We highlight the following benefits when using EBPF:

  • It enables users to add/remove/patch certain kernel’s functionality without the need of recompiling the kernel.
  • It enables user to verify the program’s code before deployment.
  • It enables writing programs that interact with the kernel by relying on a stable API. This allows the porting of the program between different releases. Conversely to kernel modules, the stable API allows the developer to port the EBPF program between releases.

Unikernels

Before going into the comparison, let’s overview what a unikernel is. A unikernel is a minimalistic kernel that compiles within the user-application. The unikernel is meant to execute a single user application and to minimise the interference from the kernel. Unikernel leverages cloud infrastructure when deployed as VMs. These VMs require less resources than when hosting a general purpose OS like Linux or Windows.

Discussion

This section presents open questions regarding the use of EBPF and how this technology may compare with unikernels:

  1. When deploying applications as VMs, unikernels require a smaller footprint than general purpose OSs. A unikernel requires not only less memory, CPU and on-disk image, but also it is faster to be up and running. In this sense, unikernels leverage better cloud infrastructure.
  2. In [4], authors propose to improve the communication between microservices that execute in the same host by using EBPF. In this case, a BPF program is used to bypass the TCP/IP stack thus reducing the latency of the communication between microservices. In the context of a cloud provider, this would require not only to install a EBPF program in the host but also to rely on a containerization technology like namespaces or cgroups to limit the host’s resources for these microservices. In the case that these microservices are deployed by using different unikernels, the BPF program would be located in the virtual bridge.
  3. The simplified syntax of EBPF enables developers to write simpler programs. These simple programs can be verified. The early verification enables developers to catch bugs before the program is executed. However, the simplified syntax reduces the expressiveness of the language and thus the potential user’s applications, e.g., unbounded loops, greedy algorithms, regex. For example, languages like C or Rust are more expressive thus allowing users to develop a wide number of applications. This may reduce the scope for EBPF programs.
  4. A BPF program executes with a restricted number of resources like limited stack, limited number of instructions, no dynamic memory allocation, etc. This may limit the number of applications that could be developed by using this technology. This limit might be extended in future releases though.
  5. EBPF proposes the use of a VM to simulate the execution of the program before deployment. This allows the developer to early debug the program. Once the program is deployed, the debugging of the program requires debugging the kernel itself. This may be tricky in production.
  6. In [5], authors propose to rely on EBPF to solve service mesh requirements at the kernel level. This reduces the latency compared with the side-car model. Since the EBPF program executes at kernel level, a vulnerable EBPF program may compromise all kernel owned constructs like processes and containers.

Opinion

This section presents author’s opinion about EBPF:

  1. EBPF is a runtime for programs that execute at kernel level. Unlike kernel modules, the authors claim that an EBPF program can’t crash the kernel. However, this is still not very clear since the generated code is executed natively on the CPU. A bug in the verifier or during the generation may lead to a buggy bytecode that ends up crashing the kernel. In such a scenario, the verifier should be verified so no buggy byte-code could be generated.
  2. An EBPF program executes in kernel mode thus a compromised program may have full access to the kernel's structure.
  3. The programming of an EBPF program requires knowledge on the kernel that a typical developer may not have. This may limit the scope of EBPF programs to features for the kernel.
  4. The EBPF VM is only used during the verification of the EBPF program. The generated bytecode runs natively on the CPU. The execution of an EBPF program does not require a VM. When using a VM, the bytecode would be interpreted.
  5. From the current implementation, It is not clear the semantics of the EBPF programs. For example, can a program block? Does the caller wait for the termination of the program? How are programs scheduled? What is the context in which the program executes? This may require further investigation.

Conclusion

This article has presented an overview of EBPF, which is a recent technology in the Linux kernel that allows developers to define and attach user-defined programs to different kernel’s paths. These programs are first written in a domain-specific language, verified, compiled to native CPU code, and finally deployed in the kernel. We have shown that these programs have been extensively used to build efficient network filters but also for monitoring and to improve the networking communication between microservices. We have shown that EBPF and unikernels seem to target different use-cases. While EBPF is meant to tweak the kernel, unikernels allows the deployment of user applications in a safer way and by limiting the host’s resources. We look forward to new use-cases that EBPF will address and how users will deal with the limited language expressiveness.

Matias E. Vara Larsen

  1. https://www.tcpdump.org/papers/bpf-usenix93.pdf
  2. https://www.youtube.com/watch?v=AZTtTgni7LQ
  3. https://www.kernel.org/doc/html/latest/networking/filter.html#networking-filter
  4. https://cyral.com/blog/lessons-using-ebpf-accelerating-cloud-native/
  5. https://isovalent.com/blog/post/2021-12-08-ebpf-servicemesh

Deploy Your First Open Source Unikernel In Seconds

Get Started Now.