Creation and Usage of Machine Learning Classification Models Using Unikernels

This is a guest blogpost by Danylo Kuropiatnyk.

Introduction

What do you think are the biggest problems in the IT industry? Let’s imagine that a high-leveled, well-bearded IT specialist from some famous company is sitting in front of you, and you are just a newbie in this field (even if this isn’t true, don’t ruin my story). You could start with something about large servers and storing enormous data sets that are expensive and slow. But the specialist would easily destroy your arguments mentioning things such as cloud services, containers for data packing, etc.

Okay, second try. How about time spending for just the start of an app? Wouldn’t it be slow? Unfortunately, your opponent could say that containers (Kubernetes, Docker, etc.) or virtual machines (like Virtual Box) are quite swift, taking only a few minutes for application to start.

But isn’t it difficult to rewrite a piece of source code just because of the platform changing? And how about system dependencies? Well, this is the main goal of containers - making the process of compilation and building of an app easier and more portable. Also, the specialist could refer to AWS and say that all cloud computing uses Docker.

Forget it, you have one more chance. Security? What could happen if an easily portable system will be hacked? Here the specialist would try to make your head spin because of certain terms like "namespace isolation", "control groups" and even "attack surface".

Yeah, it’s better to just finish your conversation but you will leave without knowing that you were right!

Absolutely right, from beginning to end, however, so was your opponent. How is it possible?

Well, containers are trendy, with many large companies using them. Currently, this is a good decision in the market. That is, only if we forget about unikernels.

What are unikernels

First things first. There isn’t a better definition of some technology than the one from its official site: "Unikernels are specialized, single-address-space machine images constructed by using library operating systems."

Let’s figure out what this means. The triumph of brevity in programming - that's what it is. The logic is very simple. There is no necessity to store anything that the application doesn’t need - no utilities, never-used drivers or libraries, etc. It has the ideal structure - application code plus kernel for it. And when I say "kernel", I mean a lightweight operating system with only one task - to service an application.

Let’s compare the abilities of an unikernel to the already mentioned containers.

  • It proposes a giant reduction of memory and resources. A minimized OS image would leave a minimized footprint and also would take very little storage.
  • Unreal booting time. Instead of waiting a few minutes to boot a normal VM or container, unikernels can boot in a few milliseconds! And there is room for perfection.
  • Absolute independence from the host OS and rubbish data. There is no need for a complicated userspace install commonly found on ubuntu vms or even in alpine containers.
  • Dramatic security improvement. There is no ssh connection or shell. Random software bugs and overflows won’t be used for attacks. Utility programs don’t exist at all. And even if a criminal could break an unikernel, all that he would get is a binary file no passwords or anything else. You can't even run other programs. Have fun, bad hacker.

As you may see, your debate partner would be easily destroyed by this information.

Of course, unikernels aren’t ideal for everything. There will always be difficult and multi-layered applications with a large number of tasks. But with the help of unikernels, it is possible to split some tasks into single units and make machine images for them.

However, for some use cases, unikernels look like the best option. The first thing that comes to mind is the Internet of things (IoT). It is an ideal scenario for unikernels. Especially after the legendary Mirai attack.

Environment Preparation

Now that we know what a unikernel is let’s start programming! So the main goal of this part is to show some real opportunities that everybody could use and test right away. First of all, we should choose tools for making the unikernels. I prefer ops because it follows a similar concept with the unikernel philosophy. Easier = better. And it supports python (even python2, by the way).

Actually, ops is a quite powerful technology for some justified reasons:

  • It’s free, open-source (so you could easily upgrade the original code for your own purposes).
  • It supports a large list of programming languages (you can check examples for each language here).
  • It’s pretty fast because of using virtualization and KVM. It works on various forms of Unix-based systems (Fedora, Ubuntu, Debian, Centos, and macOS). In the cloud you can even deploy your applications as vms without linux.

Okay, now let’s do some cool unikernels! Let’s start with installing ops. I have Ubuntu 18.04, but the process is pretty the same way no matter what OS you are using.

The first step - installing ops and qemu on your OS. You can download ops from https://ops.city. In the end, you should get something like this:

$ ops version
0.1.6

Now we need to create our application. As I mentioned before, we will do this using python (I assume, you know how to install it, don’t you?). Let’s create the most common and most popular application in software history. On course, I am talking about "Hello world!".

Create a file called hello.py with the next code:

print('Hello world!')

Save it, and open the terminal in the folder where that file exists. Then, all we need to do for creating an unikernel is to use one simple line:

$ ops load python_3.6.7 -a hello.py

This command loads the python package (for the first time, it will also download a package from the official servers) and tries to make an image of the code and all required python modules. Important note: Don't forget to use the '-a' parameter because it informs python what file you want to run.

So the result should be the following:

[python3 hello.py]
booting /home/danylo/.ops/images/python3.img ...
assigned: 10.0.2.15
Hello world!
exit status 1

Ta-dah! Congratulations on your first unikernel! Now, you can find .img file in the path that was on the second response line. In my case it’s “/home/danylo/.ops/images/python3.img”. And yeah, this beautiful file has everything for independent existence.

But what if we want to do more complicated apps? What if, for example, we want to use a few files or even folders? Well, ops has a nice feature (and almost required in real programming). I'm talking about the config.json file. It helps ops to configure an application correctly. Let’s modify our hello.py with the next line:

print(open("hello.txt").readline()))

After that, in the same folder, create a file hello.txt with some text (in my case - "Hello from the text file!"). If you run this immediately with the previous command it will fail, so we need to create config.json file with some instructions for ops:

{
  "Files": [ "hello.py", "hello.txt" ],
  "Args": ["hello.py" ]
}

Save it and use next ops command in the terminal:

$ ops load python_3.6.7 -c config.json

As you may see, parameter -a disappeared. We replaced it by "Args" object in the configuration file. So the result will be like the following:

[python3 hello.py]
booting /home/danylo/.ops/images/python3.img ...
assigned: 10.0.2.15
Hello world from txt file!
exit status 1

It works! Configuration allows the user to make customized unikernels, so I highly recommend this page. Believe me, with serious usage of config.json, opportunities are unlimited.

ML Model Creation

The biggest thing that I hate in different tutorials is their simplicity. Almost everyone has an easy demo example of something. However, this tutorial isn't typical, so we are moving on. Let’s increase the bets, and try to make a more difficult but at the same time useful application.

What do you know about machine learning? It’s a popular, widespread, and very powerful tool for computer science and automatization of some routine processes and models play a key role in this ecosystem. A model is an object that is studied on training data and able to make predictions for different purposes. So, let’s make a simple ML model and create a cool unikernel.

The first thing we need is the sklearn module. It’s the most popular and detailed module in the ML community. Install it using pip.

After that, we should decide what kind of model we want to create. It will be a well-known example - the iris classification.

The data consists of 150 samples of 3 iris species with some features that may help us classify them.

Our task is to create a model that will give accurate predictions for new samples.

Create a model.py file and put in these imports:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import metrics, svm

Here we import all the necessary methods and submodules. Below are some explanations:

load_iris - for using the above-mentioned iris dataset

train_test_split - for splitting all data in train (for learning a model) and test (for checking) sub-datasets

metrics - for making a detailed report about the test results

svm - for using support vector machine (one of the most popular ML methods).

Okay, all prerequisites are satisfied, let’s create our model. Paste the code below into model.py:

X_iris, y_iris = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X_iris, y_iris, test_size=0.33, random_state=42)
clf = svm.LinearSVC()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(metrics.classification_report(y_test, y_pred))

Intuitively easy code, right? It downloads the iris dataset (with pre-splitted features X and answers y), then makes train and test data, creates a model and fits it on training data, and finally makes a report with prediction results. That’s all!

ML Model Inference

So we know how to make a unikernel and how to create a machine learning mode, but what about mixing these techniques? There is no easier task.

First of all, you need to take care of site-packages (in our case, sklearn and its dependencies). For this purpose, config.json contains the "MapDirs" key. This allows adding folders and files directly into the file system of the image:

{
  "MapDirs": {"/home/danylo/.local/*": "/.local" },
  "Dirs": ["usr", "lib"],
  "Files": ["model.py"]
}

There's probably a better way to do this but I didn't find it immediately. What I do is first rm -rf ~/.local. Then I pip install sklearn and then copy over the folder so I know sklearn is the only thing installed.

I also found the following libraries needed to be added to the filesystem as they were not in the official python3 package yet. There's an issue on github whether or not that should be in the official package or a new one should be made. None the less ops makes this super easy to do if you find yourself in a similiar situation.

danylo@s1:~/pt3$ tree lib
lib
└── x86_64-linux-gnu
    ├── libgcc_s.so.1
    └── librt.so.1

1 directory, 2 files
danylo@s1:~/pt3$ tree usr
usr
└── lib
    └── x86_64-linux-gnu
        ├── libffi.so.6
        └── libstdc++.so.6

2 directories, 2 files

After that, we can run our model. So, input the next command in your terminal:

It will recompile the disk image with new files (and model.py too, of course). You should get the following result:

$ ops load python_3.6.7 -c config.json
[python3 model.py]
booting /home/danylo/.ops/images/python3.img ...
assigned: 10.0.2.15
              precision   recall  f1-score   support

        0     1.00    1.00    1.00      19
        1     0.94    1.00    0.97      15
        2     1.00    0.94    0.97      16

  accuracy                        0.98      50
   macro avg    0.98    0.98    0.98      50
weighted avg    0.98    0.98    0.98      50
exit status 1

It looks like everything works pretty well! Don’t be scared by the result, it’s a typical classification report for a model. Let’s walk through each part of this report.

Rows 0-2 contain detailed data about the special iris class (as you might remember, the dataset has 3 different classes). Here we could evaluate not only accuracy but some more advanced metrics about our predictor. Precision shows how many predicted n-class objects are really assigned to n-class. For example, let’s say you have 40 predicted zero-class irises, but there are 20 non-zero-class objects. That means that for this class, your model works with 0.5 precision (50%). Try to avoid those results, the model should definitely be better.

Recall that it helps to find out how many real objects of n-class were predicted to it. Let’s say that the test dataset has 30 first-class irises, and your model predicted 25. In this case, the recall will be 0.83 (83%) - not so bad, by the way.

F1-score is a metric for tracking interaction between precision and recall. Actually, it’s a simple harmonic mean between precision and recall that gives us an explanation of how good our model is.

Support - the number of test objects for an appropriate class. The next few rows contain information about accuracy with different tools. So, accuracy is the average success rate of the predictor. In our case, it was 0.98, which is really cool!

Also, as you may see, the final model has almost perfect results for each class in our dataset. It means that we could proudly say: “Yeah, our model is ready".

Let’s check this classification monster on some specific test objects. Add these lines in model.py:

new_iris = X_test[42].reshape(1, -1)
print("Real iris species:", y_test[42])
print("Predicted iris species:", clf.predict(new_iris))

Here we take a single iris test object (with index 42 - don’t forget about the answer to life, the universe, and everything) and process it for required shape (our model takes 2d-arrays, so we are just reshaping a row as an array with two rows, where the second one is empty). Let’s try our model in the real test!

Rebuild the unikernel with ops load command and wait for the next response:

Real iris species: 1
Predicted iris species: [1]

Our model works, the unikernel is independent, so here we are! Just imagine that we could refresh the training data for the model by requests to some storage with the new objects and this iris OS (I wanted to name it iOS, but this name is taken.) could become an automated and very powerful system with lightweight prerequisites and a quick response.

Conclusion

In this article, we have found out how to work with unikernels and why they are useful. Let’s summarize the key ideas:

  • Unikernels are powerful, easy-to-use, lightweight, and very experimental.
  • If you use ops for making the unikernel, you will be able to work with a large number of languages on different platforms.
  • Implementing a ML model with ops is pretty simple.
  • Making an inference for the iris classification could be upgraded for real projects, but even in this test example, the results are good. Don’t be afraid to make experiments with mentioned techniques as all of them have a responsive community that will always help with any possible issues.

Hope to see you soon as the pioneer of unikernels. Have fun!

Stop Deploying 50 Year Old Systems

Introducing the future cloud.

Ready for the future cloud?

Ready for the revolution in operating systems we've all been waiting for?

Schedule a Demo