Zero Copy with copy_file_range

One of our users had asked about running WildFly and while putting together the package for it I noticed a call to copy_file_range was failing. In many cases these types of functions are opportunistic - that is they fall back on other less performant ways of doing the same thing. You can actually see this happening in this trace:

[DeploymentScann] futex(0x000000017c6db6f0, 393, 0, 0x0000000000000000,
0x0000000000000000, 4294967295 
[Controller Boot] copy_file_range() = -ENOSYS
[Controller Boot] sleep uninterruptible
[Controller Boot] sendfile(462, 436, NULL, 2147479552 

What happens in various versions of the jdk is that it tries to discover if my_copy_file_range has support by looking for it's address dynamically.

my_copy_file_range_func =
        (copy_file_range_func*) dlsym(RTLD_DEFAULT, "copy_file_range");

If it finds it then it'll call that otherwise fall back to sendfile.

Java_sun_nio_fs_LinuxNativeDispatcher_directCopy0
    (JNIEnv* env, jclass this, jint dst, jint src, jlong cancelAddress)
{
 ...*snip*...

    if (my_copy_file_range_func != NULL) {
        do {
            RESTARTABLE(my_copy_file_range_func(src, NULL, dst, NULL,
count, 0),
                                                bytes_sent);
....*snip*...
  }

    do {
        RESTARTABLE(sendfile(dst, src, NULL, count), bytes_sent);

Typically if you want to copy one file to another you must open a file, read it entirely into the kernel, then write the contents back out from the kernel to your other file. You might have to do this in many several chunks too. This can be slow. What copy_file_range does is that it performs a copy from one fd to another within the kernel skipping the kernel<>user roundtrips. Functions like these are considered a zero copy mechanism. io_uring also exists in this realm.

An example I ripped from the man page might look like the following:

#define _GNU_SOURCE
#define _FILE_OFFSET_BITS 64
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
   int          fd_in, fd_out;
   off_t        size, ret;
   struct stat  stat;

   if (argc != 3) {
       fprintf(stderr, "Usage: %s  \n", argv[0]);
       exit(EXIT_FAILURE);
   }

   fd_in = open(argv[1], O_RDONLY);
   if (fd_in == -1) {
       perror("open (argv[1])");
       exit(EXIT_FAILURE);
   }

   if (fstat(fd_in, &stat) == -1) {
       perror("fstat");
       exit(EXIT_FAILURE);
   }

   size = stat.st_size;

   fd_out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0644);
   if (fd_out == -1) {
       perror("open (argv[2])");
       exit(EXIT_FAILURE);
   }

   do {
       ret = copy_file_range(fd_in, NULL, fd_out, NULL, size, 0);
       if (ret == -1) {
           perror("copy_file_range");
           exit(EXIT_FAILURE);
       }

       size -= ret;
   } while (size > 0 && ret > 0);

   close(fd_in);
   close(fd_out);
   exit(EXIT_SUCCESS);
}

As shown just a little bit ago, a related syscall, sendfile, is very similar, however copy_file_range allows both input and output offsets to be set. On some filesystems that support COW (copy on write) this can also allow for instant copies where reflinks are created that point at the same data block and the actual duplication only occurs when a file is changed.

At first blush copy_file_range looks precisely the same as another very similar syscall called splice.

ssize_t copy_file_range(int fd_in, off_t *_Nullable off_in,
                    int fd_out, off_t *_Nullable off_out,
                    size_t size, unsigned int flags);
ssize_t splice(int fd_in, off_t *_Nullable off_in,
                    int fd_out, off_t *_Nullable off_out,
                    size_t size, unsigned int flags);

The difference is that with splice one of the fds has to be a pipe. copy_file_range is by default going to be faster for file to file copying. So now you know what the difference between these various syscalls and when you might want to use one vs the other. Til next time.

Stop Deploying 50 Year Old Systems

Introducing the future cloud.

Ready for the future cloud?

Ready for the revolution in operating systems we've all been waiting for?

Schedule a Demo