The M8a, LLQ, PCIe and Interrupts

Some of our users like to benchmark. It helps prove the value to other stakeholders in the company and sometimes people just like to get their rocks off. Sometimes that means testing brand new instances.

However, one of the many perks our customers get is that they can ask for their software to go faster and we go in and crank it up to 11.

The m8a.medium instance was released just a few months ago by AWS on October 8th, 2025. It is based on the 5th gen AMD Turin processor. This uses Nitro which means it uses ENA. One of the newer features you'll find on later gen ENA devices is LLQ - low latency queuing.

This is a mouthful I know and we'll get into it. You might not think about it but each new instance type that AWS or GCP launches typically requires code to be added to linux. New hardware, even if virtualized, require new drivers. Since nanos is an operating system kernel we have to add the same support. If it's a brand new driver, like when we added initial gvnic support, that can be a lot of up front work. Adding a new architecture such as risc-v can be an insane amount of work. Luckily sometimes the newer instance is simply toggling on newer features and so while there is still work involved it's not an insane amount.

Let's go back to the beginning though as a few of these terms I threw out had dependence on prior work. Keep in mind - we, at one point, had to implement all of this in the nanos unikernel. It's best to start at the start so let's start with MSI and MSI-X.

What is MSI(-X)

Just kidding - before we start with MSI we need to start with interrupts in general. In it's most basic form an interrupt "interrupts" the cpu to perform different work using an interrupt handler. So pressing a key on a keyboard, or in this example, a network card can generate an interrupt to let the cpu know a packet has arrived.

These style of interrupts are considered asynchronous, vs synchronous ones like exceptions. The way to think about this is that asynchronous is external to the cpu (like a keyboard or nic) wheres synchronous ones are generated by the cpu. Exceptions (synchronous) are actually a type of interrupt as well and can be categorized into faults, traps, and aborts.

Faults

Faults such as a segfault or page faults can occur before the instruction completes. So for instance if you request a page that isn't in memory the MMU might raise an interrupt (as a fault) indicating the page fault. This way the page fault handler can now be invoked to load the page into memory and carry on. The key here is that it happens before the instruction is finished and is recoverable.

Traps

Traps happen after the instructions executes. So a syscall or a breakpoint is an example here.

Aborts

Aborts indicate that something is far too gone to handle - so an example here would be a triple fault - that is faulting during a fault handler itself.

Interrupts can be maskable (ignored) or unmaskable. These get signaled to PICs.

What is an (A)PIC?

Wait - what is a PIC? The PIC is the interrupt controller. The PIC allows us to have many devices send interrupts to the CPU without the CPU sitting there polling each device to see if it wants to do something. It is hard to believe but before this, the Intel 8080 had a single interrupt line.

APICs replaced PICs, and that got divided into IOAPICs for devices and LAPICs for cpu cores. There is even something called IPIs (inter processor interrupts) that allow cpus to interrupt each other.

If you were messing with computers in the 90s you might remember the transition from ISA and EISA to PCI. This is important because this transition led us into MSI. For instance if you had a Trident 9000 that would've been using the ISA bus. However if you had upgraded to a voodoo banshee later on that might've been on AGP (now defunct).

This is the pre-history of MSI.

MSI stands for message signaled interrupts. Instead of using dedicated wires/pins they send messages instead on a network. The first thing to understand about MSI and really PCIe, is that PCIe is not a bus - it's a network. PCIe actually operates its own packet network using TLP (transaction layer packet). MSI was optional on existing APICs up until PCIe was introduced. (See how we're slowly coming back to LLQ?)

MSI-X was introduced around 2003 with PCI 3.0 and allows up to 2048 interrupt vectors per *device function* vs 32 interrupts with the older MSI. It's important to distinguish between device and device function as technology such as SR-IOV can expose multiple functions each with their own 2048 limit. We'll look at this again shortly.

So now in 2026 MSI has been superseded by MSI-X and is used by devices like the ENA.

What is ENA

ENA or elastic network adapter is a virtualized driver that utilizes SR-IOV (single-root input/output virtualization) for AWS instances that use the Nitro system. This allows multiple virtual devices to share a single common physical device (like a network card or a GPU).

A simple network card might have one tx/rx buffer to allow the cpu to handle packets, however you can have multiple queues handle it as well if you need greater throughput. We added multi-queue support to Nanos a while back ago. Essentially, as you add on more vCPU you can add a new pair of tx/rx queues to process the packets faster. The ENA device has a tx/rx queue pair for each cpu core and gets its own MSI-X interrupt vector. Remember how we discussed MSI?

I/O is sent through each tx/rx submission queue (SQ) and has a completion queue (CQ) with it. Together these form descriptor rings.

ENA can support things like checksum offload and transmit segmentation offload (TSO) which most modern nics will do and can actually contribute a lot to throughput.

What is LLQ

Finally.

LLQ stands for low latency queuing. It is the optional feature that is used on the m8a instance types and the core of the work we needed to do to enable this instance type (noting again that all this other stuff had to come first).

In regular mode the TX SQs live in host memory and the ENA device fetches descriptors and packet data from host memory.

In LLQ mode the driver pushes the tx descriptors and the first 128 bytes of the packet data directly to the ENA memory. Then the rest of the packet data is fetched by the device normally. To do this the driver has a dedicated PCI BAR mapped with write-only.

Note that this only works for the TX side - RX SQs operate via the regular mode.

Here we go again.

What is a BAR

A base address register (BAR) acts kinda like a pointer to another memory region that the cpu can read/write to.

As we mentioned, LLQ pushes TX descriptors write to ENA memory with a dedicated PCI BAR. Remember when we briefly talked about TLP packets on the PCIe network?

When TLP routes packets by id (not the only method) it is defined by the format <bus>:<device>.<function>. Each of these functions has 6 BARs available and the device itself can have up to 48. Remember how we mentioned SR-IOV can have multiple functions? LLQ uses one of these bars on one of those functions and that my friends is how you make a LLQ omelette.

What does this all mean in English? New hardware (in the form of new instance types) routinely crops up and that generally means we need to go in add some code. So even though this chunk of code was not a lot, it was built on many many other layers that had to come before it to enable it.

Computer architecture has a very long winding history and it can be difficult to understand how newer features such as this "simple" LLQ feature get enabled if you don't understand the underlying architectures at play. While you might never go implement any of this yourself I hope it gives you a better understanding of how all these technologies work together and how we can make brand new instance types, released just a few months ago, like the m8a, work.

Deploy Your First Open Source Unikernel In Seconds

Get Started Now.