Boosting application boot time with container snapshots

Recently I came across the CRIU technology. It lets you checkpoint any running application and serialize its state on disk, to resume it later from that state. What's more interesting is that it comes with Docker integration, potentially allowing you to run a container, make a serializable snapshot of it and recreate it later - possibly even on another host.

This technology might be beneficial for live migrations (in fact, Google uses it to live migrate batch jobs in Borg) - but what excited me is that this could help with the long boot time problem. As a Rails app grows, it ends up with more Ruby code to parse and load on boot, which makes the startup time quite long. Autoloading and bootsnap help in local and CI environments, but in production (where you want to eager-load everything) is still quite slow. It's not uncommon for some of the largest monoliths to take 1+ minute to startup before it's able to serve requests.

Note that I'm using Rails as an example, but technically this applies to any app written in a scripting language with ever-growing database and number of dependencies.

If we could prepare a snapshot of a live application server beforehand and use that to start containers in production, maybe we could save some of the boot time? That's what I wanted to explore.

The brief content of this post is: 1) setting up a lab with Docker + CRIU to snapshot and restore containers 2) automating that with a script and leveraging compute instances in Google Cloud 3) measuring the savings.

Setting up a lab

All CRIU magic is based on Linux kernel features, so Docker for Mac is not an option. I would have to setup a Linux VM with all the dependencies.

One option would be to spin an instance on AWS or GCP, but I've already had VMWare on my Mac, and I wanted to save some terminal latency (my ISP in France was not great!). I went with a Linux Alpine VM in VMWare since I've heard that Alpine is a good lightweight distributive. It wasn't too hard to install CRIU and Docker on it with apk. However, as I tried to verify the setup with criu check I found that for some reason the Linux kernel that comes with Alpine doesn't have all the features needed for CRIU.

I wasn't looking forward building my own kernel, so I went ahead with Ubuntu Server 18.04 LTS which would hopefully come with a full-feature kernel.

I followed the CRIU docs for Docker. I've noticed that they were disabling seccomp (with a note that a newer kernel is required) and the container network was disabled as CRIU won't checkpoint any open TCP connections. I decided to try it anyway and see if it becomes an issue later.

It worked amazingly for an elementary Rails app (I was able to snapshot live and restore!), but as soon as I made the app talk to a database, I've noticed that docker checkpoint was failing with a CRIU-level error:

$ cat /run/containerd/io.containerd.runtime.v1.linux/moby/be56af6556e28725f3d69b4d91c8905268521af9d32e8aa4525fe16a07138a5e/criu-dump.log

...
(00.152987) sockets: Searching for socket 0x27a13 family 2
(00.152991) Error (criu/sk-inet.c:199): inet: Connected TCP socket, consider using --tcp-established option.

I started looking for a way to enable tcp-establish. The CRIU configuration guide suggested echo 'tcp-established' > /etc/criu/runc.conf for containerized deployments. However, doing it had no effect. That's when I found that the support for configs only arrived in runc 1.0-rc7 and CRIU 3.11 - while Ubuntu packages came with older runc 1.0-rc4 and CRIU 3.6.

It took me some time to build the latest runc and CRIU from sources, but finally, I was able to snapshot a process with open TCP sockets and seccomp enabled! That was a success. I could even jump into the snapshot and see its content:

./checkpoint-omg/
./checkpoint-omg/mountpoints-12.img
./checkpoint-omg/inventory.img
./checkpoint-omg/tmpfs-dev-73.tar.gz.img
./checkpoint-omg/tmpfs-dev-72.tar.gz.img
./checkpoint-omg/core-9.img
./checkpoint-omg/tmpfs-dev-71.tar.gz.img
./checkpoint-omg/core-1.img
./checkpoint-omg/core-10.img
./checkpoint-omg/cgroup.img
./checkpoint-omg/core-15.img
./checkpoint-omg/fdinfo-2.img
./checkpoint-omg/core-11.img
./checkpoint-omg/core-14.img
./checkpoint-omg/ids-1.img
./checkpoint-omg/core-20.img
./checkpoint-omg/pipes-data.img
./checkpoint-omg/core-17.img
./checkpoint-omg/fs-1.img
./checkpoint-omg/mm-1.img
./checkpoint-omg/tmpfs-dev-68.tar.gz.img
./checkpoint-omg/utsns-11.img
./checkpoint-omg/pagemap-1.img
./checkpoint-omg/core-12.img
./checkpoint-omg/seccomp.img
./checkpoint-omg/core-13.img
./checkpoint-omg/tmpfs-dev-74.tar.gz.img
./checkpoint-omg/pstree.img
./checkpoint-omg/core-8.img
./checkpoint-omg/core-19.img
./checkpoint-omg/core-16.img
./checkpoint-omg/ipcns-var-10.img
./checkpoint-omg/tcp-stream-cd45.img
./checkpoint-omg/files.img
./checkpoint-omg/pages-1.img
./checkpoint-omg/core-18.img
./checkpoint-omg/descriptors.json
./checkpoint-omg/core-7.img
./checkpoint-omg/tcp-stream-6b38.img

Each of those dumps is inspectable with crit show.

Now it was time to prepare some kind of a sample Rails app that's slow to boot.

From Shopify experience, it comes to huge amount of code to load and parse. That code includes classes in your app, a bunch of YAML configuration (for a large app, it's natural to have lots of configs around), and all your gem dependencies.

To simulate all of that, I stuffed the Gemfile with 250 gems and wrote a small code generator for Ruby and YAML.

Now was time to checkpoint the fat app and try to restore it. That was easy enough!

$ docker run --name fat-app-donor -p 3000:3000 -d fat-app:latest
# curl localhost:3000 to verify that the app is booted and running

$ docker checkpoint create fat-app-donor my-checkpoint
# the checkpoint is located in /var/lib/docker/containers/<donor-container-id>/checkpoints/my-checkpoint

$ docker create --name fat-app-clone -p 3000:3000 fat-app:latest

$ cp -r /var/lib/docker/containers/<donor-container-id>/checkpoints/my-checkpoint /var/lib/docker/containers/<clone-container-id>/checkpoints

$ docker start --checkpoint my-checkpoint fat-app-clone

Yay! You can now curl localhost:3000 and hit the container that has been restored from a serialized state!

Automating and running in environment closer to production

In the step above, I was able to take the snapshot of a live container on one local Ubuntu VM and re-create it on another, but I also wanted to run the experiment in a production-like environment. I planned to create a GCE instance, upload the container snapshot to GCS (S3-like store from Google), download it from GCS, and recover from it.

Why upload and download the container to/from the remote store? I wanted to make it as close as possible to production and measure the penalty of downloading that blob.

I was able to automate all these steps based on commands that I was running manually before. Rather than describing the steps, I thought it would be self-explanatory if you read the script itself.

{% raw %}

set -e -x

IMAGE=fat-app:latest
CHECKPOINT_NAME=checkpoint-omg

echo "+++ SNAPSHOT PART"

docker run --name fat-app-donor -p 3000:3000 -d $IMAGE

echo "+++ Waiting for container to boot"
time (while ! curl localhost:3000 > /dev/null 2>&1; do : sleep 0.5 ; done )

echo "+++ Boot stats"
curl http://localhost:3000/stats

echo "+++ Creating a checkpoint"
sudo time docker checkpoint create fat-app-donor $CHECKPOINT_NAME

DONOR_CONTAINER_ID=$(docker inspect --format="{{.Id}}" fat-app-donor)

echo "+++ Packing the checkpoint"
sudo time tar cvzf checkpoint.tar.gz -C /var/lib/docker/containers/$DONOR_CONTAINER_ID/checkpoints .

echo "+++ Checkpoint size:"
ls -l --block-size=M

echo "+++ Uploading the checkpoint:"
time gsutil cp checkpoint.tar.gz gs://kirs-criu/checkpoints-experiment/$CHECKPOINT_NAME.tar.gz

echo "
Written in July 2019.
Kir Shatrov

Kir Shatrov helps businesses to grow by scaling the infrastructure. He writes about software, scalability and the ecosystem. Follow him on Twitter to get the latest updates.