Jumping on the Bandwagon
ARM is ubiquitous.ARM-based hardware platforms command a 90% or greater market share of embedded controller and mobile application processors. Adoption of ARM in the edge and server markets is growing, too, with companies like Apple and Amazon rolling out their own ARM-based silicon in the past year. Devices are increasingly configured with the 64-bit ARM architecture and hardware support for virtualization, making them ideal candidates for running Nanos and a logical next step on our support roadmap.
A Little Background...
Early development of the Nanos unikernel targeted applications running on major cloud providers, which were then universally based on the x86-64 architecture. It seemed that the target audience for Nanos would be firmly seated in this Intel-centric ecosystem for some time to come, and our development was more focused on getting a proof-of-concept up and running rather than considering what platforms might be supported in the future.
With time came increased interest in the application of unikernels at the edge and in embedded devices. This demand led us to take this first step towards portability and add support for the 64-bit ARM architecture (aarch64).
In this post, we'll give a little tour to show how one can cross-compile an aarch64 image, run it under emulation (no ARM device needed) as well as under KVM running on a Raspberry Pi 4. Then we'll discuss the development of this port and some of the peculiarities that arose along the way.
Preparing the environment
For this guide we'll assume you're cross-compiling on a Debian-based machine. First install gcc for aarch64 (we've tested on versions going back to 7.5.0):
sudo apt-get install gcc-aarch64-linux-gnu
Building a modern QEMU:
We recommend installing QEMU version 5.2.0 or newer to get the latest ARM support. Debian's QEMU packages are much older, and the QEMU project doesn't appear to provide Debian packages, so we'll just do a build straight from git:
Install dependencies for the QEMU build:
sudo apt install ninja-build libglib2.0-dev libpixman-1-dev
Now let's pull a QEMU tree, with only one level of commit history, and check out the v5.2.0 tag:
git clone --depth 1 --branch v5.2.0 https://gitlab.com/qemu-project/qemu.git qemu-v5.2.0 cd qemu-v5.2.0/ mkdir build cd build
Configure and build. Note that you may wish to add other targets to "--target-list", separated by commas.
../configure --target-list=aarch64-softmmu --prefix=/usr/local make sudo make install
Your QEMU build is now installed. Be sure that /usr/local/bin is in your $PATH before proceeding.
Building Nanos for the "virt" QEMU machine type
In a native build of the Nanos kernel, or when staging a program executable using ops, common dependencies like shared libraries and configuration files are drawn from the host system. This isn't going to work when building on a host that's a different architecture or OS than the target.
When cross-building for another architecture, we'll need a path to source such dependencies. The NANOS_TARGET_ROOT environment variable supplies this path to the Nanos makefiles. You can use a root image of your own Linux/arm64 installation or download and use the minimal Debian arm64 root image that we provide for our CI tests:
wget https://storage.googleapis.com/testmisc/arm64-target-root.tar.gz mkdir arm64-target-root cd arm64-target-root sudo tar --exclude=dev/* -xzf ../arm64-target-root.tar.gz export NANOS_TARGET_ROOT=`pwd`
Now we're ready to clone a Nanos tree and build it for the virt QEMU machine type. This machine type is ideal for use with Nanos as it is specifically designed for virtual machines and is not restricted by the need to model a real hardware platform. The PLATFORM variable indicates the target platform, which also implies the target architecture (ARCH). The build will check the host architecture and, if it differs from that of the target, automatically set CROSS_COMPILE to "$ARCH-linux-gnu-". CROSS_COMPILE can be overridden to a different prefix if necessary. TARGET specifies the test program to build; we'll start with "hw", which is a simple hello world program written in C.
git clone http://github.com/nanovms/nanos nanos-virt cd nanos-virt make PLATFORM=virt TARGET=hw
We can run the instance under QEMU with emulation using the 'run-noaccel' make target.
make PLATFORM=virt TARGET=hw run-noaccel [...] qemu-system-aarch64 -machine virt -machine gic-version=2 -machine highmem=off -m 1G -kernel /tmp/nanos-virt/output/platform/virt/bin/kernel.img -display none -serial stdio -drive if=none,id=hd0,format=raw,file=/tmp/nanos-virt/output/image/disk.raw -device virtio-blk-pci,drive=hd0 -no-reboot -semihosting -device virtio-net,netdev=n0 -netdev user,id=n0,hostfwd=tcp::8080-:8080,hostfwd=tcp::9090-:9090,hostfwd=udp::5309-:5309 -object filter-dump,id=filter0,netdev=n0,file=/tmp/nanos.pcap -cpu max en1: assigned 10.0.2.15 hello world! args: hw poppy
And we can demonstrate some connectivity with a little Go-based webserver:
make PLATFORM=virt TARGET=webg run-noaccel [...] qemu-system-aarch64 -machine virt -machine gic-version=2 -machine highmem=off -m 1G -kernel /home/wjhun/src/nanos-virt/output/platform/virt/bin/kernel.img -display none -serial stdio -drive if=none,id=hd0,format=raw,file=/home/wjhun/src/nanos-virt/output/image/disk.raw -device virtio-blk-pci,drive=hd0 -no-reboot -semihosting -device virtio-net,netdev=n0 -netdev user,id=n0,hostfwd=tcp::8080-:8080,hostfwd=tcp::9090-:9090,hostfwd=udp::5309-:5309 -object filter-dump,id=filter0,netdev=n0,file=/tmp/nanos.pcap -cpu max en1: assigned 10.0.2.15 Server started on port 8080 en1: assigned FE80::5054:FF:FE12:3456
...and then hit it with some requests using ApacheBench:
$ ab -dSqln 100 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient).....done Server Software: Server Hostname: 127.0.0.1 Server Port: 8080 Document Path: / Document Length: Variable Concurrency Level: 1 Time taken for tests: 0.090 seconds Complete requests: 100 Failed requests: 0 Total transferred: 12900 bytes HTML transferred: 1200 bytes Requests per second: 1114.39 [#/sec] (mean) Time per request: 0.897 [ms] (mean) Time per request: 0.897 [ms] (mean, across all concurrent requests) Transfer rate: 140.39 [Kbytes/sec] received Connection Times (ms) min avg max Connect: 0 0 0 Processing: 1 1 3 Waiting: 1 1 2 Total: 1 1 3
Keep in mind that these benchmark numbers represent the performance of a fully-emulated VM, so please take them with a grain of salt!
Running on a Raspberry Pi 4
The build process on ARM-based host, such as a Raspberry Pi 4, is nearly identical to the cross-build. If the build process detects that the host architecture is aarch64, gcc, ld and other tools will be used as-is, without a prefix. NANOS_TARGET_ROOT may be omitted in favor of the host's root filesystem unless a different root environment is desired.
For reference, the Nanos build on Raspberry Pi 4 was tested under Debian 10 (buster) and Ubuntu 20.04 LTS (Focal Fossa).
If you have support for KVM on your host, you can run Nanos natively, without TCG emulation. Simply use the 'run' make target instead of 'run-noaccel'.
Using Ops to Orchestrate an Image
Ops, the orchestration tool for Nanos, does not yet have complete support for ARM, nor is it able to cross-build ARM images from a non-ARM host. It does run on a Linux/arm64 host, though, and can create Nanos images of an application executable you provide (pre-built ARM packages are not yet available). First get Ops by visiting http://ops.city and running the install script as directed:
$ curl https://ops.city/get.sh -sSfL | sh Installing ops! > Hardware acceleration not supported by your system. > Ops will attempt to enable acceleration by default and will show a warning if the system doesn't support it. > To avoid such warnings you may disable acceleration in configuration or via command line parameters. > Downloading latest release... ✓ > Adding to bash profile... ✓ Note: We've added the following to your /home/wjhun/.bashrc If this isn't the profile of your current shell then please add the following to your correct profile: # OPS config export OPS_DIR="$HOME/.ops" export PATH="$HOME/.ops/bin:$PATH" > Successfully installed ops Ops version: 0.1.22 Nanos version: 0.0! Please open another terminal where the `ops` command will now be available.
Note: You may disregard the warning "Hardware acceleration not supported by your system," as this check is only valid on x86-64 hosts at the moment. You should, however, add yourself to the 'kvm' group if you wish to run with full acceleration to avoid needing to run qemu with root privileges.
As suggested by the script, begin a new login shell in order to add ops to the path. As ARM Nanos binaries and packages are not yet distributed with Ops (more on this later), you'll need to provide the Nanos kernel in the current working directory for Ops to pick up. This is a temporary workaround for running on ARM-based systems and is not typical or documented behavior by Ops.
$ cp output/platform/virt/bin/kernel.img .
Now try running an executable of your choice using 'ops run'. Here we'll just do a simple test by running the 'find' program:
$ ops run /usr/bin/find -a . Downloading.. https://storage.googleapis.com/nanos/release/0.1.34/nanos-release-linux-0.1.34.tar.gz 100% |████████████████████████████████████████| [0s:0s] booting /home/wjhun/.ops/images/find.img ... You specified hardware acceleration, but it is not supported Are you running inside a vm? If so disable accel with --accel=false en1: assigned 10.0.2.15 . ./proc ./proc/self ./proc/self/exe ./proc/self/maps ./proc/sys ./proc/sys/kernel ./proc/sys/kernel/hostname ./etc ./etc/resolv.conf ./etc/passwd ./etc/ssl ./etc/ssl/certs ./etc/ssl/certs/ca-certificates.crt ./dev ./dev/urandom ./dev/null ./lib ./lib/x86_64-linux-gnu ./lib/x86_64-linux-gnu/libnss_dns.so.2 ./lib/ld-linux-aarch64.so.1 ./lib/aarch64-linux-gnu ./lib/aarch64-linux-gnu/libselinux.so.1 ./lib/aarch64-linux-gnu/libpcre.so.3 ./lib/aarch64-linux-gnu/libc.so.6 ./lib/aarch64-linux-gnu/libm.so.6 ./lib/aarch64-linux-gnu/libpthread.so.0 ./lib/aarch64-linux-gnu/libdl.so.2 ./sys ./sys/devices ./sys/devices/system ./sys/devices/system/cpu ./sys/devices/system/cpu/cpu0 ./sys/devices/system/cpu/online ./usr ./usr/bin ./usr/bin/find
A note about KVM and ARM interrupt controllers
The bulk of the ARM port was written using QEMU with full emulation (TCG) and the virt machine model. QEMU and the virt model support several different types of interrupt controllers (variants of GIC - Generic Interrupt Controller in ARM nomenclature). Without any special consideration, the GICv3 interrupt controller was chosen for initial support (without MSI or ITS support).
Upon starting bringup of Nanos on the Raspberry Pi 4 with KVM, it became apparent that reliance on a single, emulated GIC by QEMU would no longer be sufficient. This is because, under KVM, the guest's accesses to the GIC are actually handled by the host's interrupt controller via its VCPU interface. The Pi 4's SoC, bcm2711, contains a variant of GICv2 - with an implementation of the scantly-documented v2m extension for MSI (message-signaled interrupts). We added support for version 2 controllers and v2m and were then able to run on the Pi 4 with KVM, fully-accelerated.
This nuance is noted here because the ARM universe contains a wide array of interrupt controllers and, as such, running Nanos under KVM on your 64-bit ARM silicon might require support for yet another controller. Should you hit such a roadblock, reach out to us and let us know what kind of silicon you're running on. Chances are that we can add support for it.
Some Notes About the Porting Experience
While Nanos wasn't originally written to target multiple architectures, its small size and exclusive support for virtualized environments set the stage for a relatively straightforward first port to another architecture. Eschewing generality, Nanos's internal interfaces do not need to be written with any pretense that there will eventually be support for every architecture on the market. We have the freedom to assume that only 64-bit hardware will be targeted, for instance, and can dispose of support for legacy systems. For instance, we don't need to implement temporary, "highmem" mappings that are needed to support 32-bit targets with large amounts of physical memory.
Address Tagging and Introspection
The wealth of virtual address space available in a fundamentally 64-bit kernel also gives us freedom to use this space in creative ways. One such use is the encoding of type information in address "tags." Type information of any tagged object can be found simply by looking at certain bits of the virtual address. This is used extensively in Nanos's value space which, in turn, is used for configuration, management and filesystem metadata. In this space, a mix of symbols, strings, numbers and other types are represented in a hierarchy of attribute-value tuples. Introspection of select kernel data structures is possible within this space, because the type information provided by the tag can infer access methods to retrieve and mutate attributes of structures in a direct manner. Address tagging allows a wide array of datatypes to exist within the store without the need for invariant fields across all possible types or the somewhat dicey approach of rewinding an object's address to find a type field. The potential applications for such introspection are numerous, particularly for real-time, distributed management of unikernel instances.
On x86_64, a tag is stored in the highest eight bits of mappable virtual address space (bits 46 through 39; bits 47 and higher are set for kernel mappings as part of the canonical address form). Without explicit hardware support for address tagging, such tagged memory is mapped in the page tables as-is, with the tag present in the virtual address of the mapping. aarch64 provides architectural support for address tagging by allowing the highest eight bits of a virtual address to be ignored in address translation. This means that tags may be applied to objects in memory without the need for tag-specific allocators and page table mappings. We can support this trivially by making allocations from the general-purpose (or other) heap and subsequently stamping the tag in the highest-order bits.
This tagging feature is available for use in userspace applications, too, and is utilized by the Nanos runtime environment in userspace tools like mkfs, dump, and tfs-fuse. Following the convention under Linux, use of tagged addresses in userspace must be explicitly enabled with the prctl(2) syscall:
prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0);
Porting Nanos to aarch64 didn't involve a great number of surprises. Substantial refactoring of machine-dependent code throughout the kernel and refinement of some internal interfaces were expected. The page tables and their entries, despite being more richly expressive of memory attributes and supporting a wide array of optional features, fit more-or-less within the existing model used for managing address mappings. For more flexibility, page protections are now composed using a series of helpers that apply transformations (always beginning with a minimum set of permissions, i.e. read-only and no-execute) rather than formed as a union of architecturally-defined flags. Only a basic 4KB page size ("granule") is supported, whereas the architecture also supports 16KB and 64KB sizes. There is room for Nanos to grow in its support of more modern page table architectures, particularly for configurable page sizes and optional security features.
The Syscall Interface
One surprise that came up repeatedly while testing out various applications was the copious amount of seemingly arbitrary changes to the Linux syscall interface and structures that were not related to the ABI or calling conventions. A number of old syscalls were deprecated, which is understandable, but other, more subtle changes could be easily missed. For instance, we were a bit caught off-guard by the sneaky exchange of two arguments in the clone syscall signature:
#ifdef __x86_64__ sysreturn clone(unsigned long flags, void *child_stack, int *ptid, int *ctid, unsigned long newtls) #elif defined(__aarch64__) sysreturn clone(unsigned long flags, void *child_stack, int *ptid, unsigned long newtls, int *ctid) #endif
Equally nefarious are the slight reordering of struct fields, such as in the stat struct. Granted, a thorough review of Linux syscall variations between architectures was called for, and we spent a fair amount of time surveying, implementing and desk-checking these variations.
There are further dimensions to explore in the area of multi-architecture support for Nanos. While we have enjoyed a fairly monolithic set of processor features in the Intel-centric world, the landscapes in ARM and RISC-V territory paint a vastly different picture. There are a large number of options that a chip vendor may select from when using these architectures, and a one-image-fits-all approach is no longer sufficient in this space. The current 'virt' platform build targets the "armv8-a" cpu type to support running with KVM on a Raspberry Pi 4 (bcm2711), but this is less-than-ideal for newer cores. We may soon need a way to select from one of a number of kernel builds during orchestration depending on the target hardware. Builds targeting a particular core or pipeline are hardly a new thing, but configurability of modern cores - especially in the RISC-V space where such configurability can have a major impact on the instruction set available - is now at a degree where some significant effort will be needed to insure that Nanos-based images are always orchestrated with the right kernel build.
With these practical considerations in mind, the process of orchestrating ARM unikernel images using Ops is not yet equal to that of x86-64, and we don't yet provide ready-built application packages for ARM. We expect this situation to improve as we explore further ARM-based deployments and add better support for multiple architectures and platforms within Ops. In the meantime, should you run into any roadblocks deploying your application with Ops and Nanos, please contact us on the newly-opened user support forums at https://forums.nanovms.com.
Stop Deploying 50 Year Old Systems
Introducing the future cloud.