Desacralizing the Linux overlay filesystem in Docker

Overlay filesystems (also called union filesystems) is a fundamental technology in Docker to create images and containers. They allow creating a union of directories to create a filesystem. Multiple filesystems, which are just directories, are superposed one on top of another to create a new filesystem. These directories are called layers and the unification process is referred to as a union mount. If two files with the same path exist in two layers, only the last file will appear in the overlay filesystem.

We will learn how to create an overlay filesystem ourselves and how Docker is using it to build images and run containers.

Creating an overlay filesystem is easy

The overlay filesystem is made from two types of filesystem. One or more lower filesystems that are immutable. Their content is only read and no modification will occur inside. One upper filesystem receives all the changes from the overlay filesystem including file creations, modifications, and deletions.

Creating an overlay filesystem is easy once you get your hands on it. All you need to a Linux machine with a root or sudoer access. A virtual machine will do.

We initialize the layout by creating multiple folders, each of them corresponding to a layer. We also need a mount folder at the place where we want the overlay filesystem to be created and a workdir folder for internal purposes.

mkdir overlay; cd overlay
mkdir \
  lower-layer-1 lower-layer-2 lower-layer-3 upper-layer \
  mount \
  workdir

Let’s also create some files into 3 folders. We leave the upper folder empty.

echo "Content layer 1" > ./lower-layer-1/file-in-layer-1
echo "Content layer 2" > ./lower-layer-2/file-in-layer-2
echo "Content layer 3" > ./lower-layer-3/file-in-layer-3

Two or more directories are required. They make a list of lower directories and an upper directory. The lower directories of the filesystem are read-only, whereas the upper directory can be used for both reads and writes. The mount command creates the overlay filesystem with the external type -t set to overlay. It must be executed as root.

sudo mount -t overlay my-overlay \
  -o lowerdir=$HOME/overlay/lower-layer-1:$HOME/overlay/lower-layer-2:$HOME/overlay/lower-layer-3,upperdir=$HOME/overlay/upper-layer,workdir=$HOME/overlay/workdir \
  $HOME/overlay/mount

The df command lists all the filesystem along with some useful information such as the amount of free space and the type of filesystem when executed with the -T Flag. The -h flag is only here to print the filesystem size in a human-readable format.

df -Th | grep overlay
my-overlay overlay    20G  5.5G   14G  29% /home/ubuntu/overlay/mount

The overlay filesystem is created and mounted inside the mount folder. It contains the files from all the original filesystems.

ls -l mount
total 12
-rw-rw-r-- 1 ubuntu ubuntu 13 Jun  2 22:38 file-in-layer-1
-rw-rw-r-- 1 ubuntu ubuntu 13 Jun  2 22:38 file-in-layer-2
-rw-rw-r-- 1 ubuntu ubuntu 13 Jun  2 22:38 file-in-layer-3

cat mount/file-in-layer-3 
Content layer 3

Let’ try to create a file in the mount folder:

echo "new content" > mount/new-file

It is written to the upper directory, upper-layer:

tree
.
├── lower-layer-1
│   └── file-in-layer-1
├── lower-layer-2
│   └── file-in-layer-2
├── lower-layer-3
│   └── file-in-layer-3
├── mount
│   ├── new-file
│   ├── file-in-layer-1
│   ├── file-in-layer-2
│   └── file-in-layer-3
├── upper-layer
│   └── new-file
└── workdir
    └── work [error opening dir]

7 directories, 8 files

Now, we modify a file, for example, file-in-layer-1.

echo 'Add a new line' >> mount/file-in-layer-1

The original file present inside lower-layer-1 is not modified. Instead, a new file in upper-layer is created:

cat lower-layer-1/file-in-layer-1 
Content layer 1

cat upper-layer/file-in-layer-1 
Content layer 1
Add a new line

cat mount/file-in-layer-1 
Content layer 1
Add a new line

Let see the behavior when removing a file, for example, file-in-layer-2.

The original file inside lower-layer-2 is still there. A new file in upper-layer is created with a special type, it is a character file. This is how the overlay filesystem represents a deleted file.

ls -l lower-layer-2/file-in-layer-2 
-rw-rw-r-- 1 ubuntu ubuntu 13 Jun  2 22:38 lower-layer-2/file-in-layer-2

ls -l upper-layer/file-in-layer-2 
c--------- 1 root root 0, 0 Jun  2 23:33 upper-layer/file-in-layer-2

Now that the lab is finished, we can unmount the filesystem and purge our files.

sudo umount $HOME/overlay/mount

ls -l mount/
total 0

cd ..
rm -rf overlay

Overlay in Docker

Docker supports multiple storage drivers to write data to a container’s writable layer, OverlayFS is the recommended storage driver. If you print the information from your local Docker installation, chances are that it prints the overlay2 storage driver.

docker info | grep "Storage Driver"
 Storage Driver: overlay2

Docker uses the overlay filesystem to create images as well as to position the container layer on top of the image layers.

When an image is downloaded, its layers are located inside the /var/lib/docker/overlay2 folder. For example, downloading a 3-layers image using docker pull ubuntu creates 3+1 directories. The l directory contains shortened layer identifiers as symbolic links.

docker pull ubuntu
Using default tag: latest
latest: Pulling from library/ubuntu
345e3491a907: Pull complete 
57671312ef6f: Pull complete 
5e9250ddb7d0: Pull complete 
Digest: sha256:adf73ca014822ad8237623d388cedf4d5346aa72c270c5acc01431cc93e18e2d
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest

In my case, 3 layers are downloaded:

ls -l /var/lib/docker/overlay2/
total 16
drwx------ 4 root root 4096 Jun  3 11:21 289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/
drwx------ 4 root root 4096 Jun  3 11:21 40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/
drwx------ 3 root root 4096 Jun  3 11:21 88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/
drwx------ 2 root root 4096 Jun  3 11:21 l/

ls -l /var/lib/docker/overlay2/l/
total 12
lrwxrwxrwx 1 root root 72 Jun  3 11:21 NSEHV6LZKQIRKICXA2T7T5252D -> ../88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff/
lrwxrwxrwx 1 root root 72 Jun  3 11:21 QPAIOX2SCZPFZIIXB27PFVHUPH -> ../40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff/
lrwxrwxrwx 1 root root 72 Jun  3 11:21 USIDUBYHQEGWIRN4JOSF74ZWIL -> ../289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/

Those layers are also exposed by inspecting the Docker image. The output of the Docker command is in JSON. We use jq to filter the part with the most interest to us.

docker image inspect ubuntu | jq -r '.[0] | Data: .GraphDriver.Data'

  "Data": 
    "LowerDir":  "/var/lib/docker/overlay2/40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff:/var/lib/docker/overlay2/88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff",
    "MergedDir": "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/merged",
    "UpperDir":  "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff",
    "WorkDir":   "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/work"
  

The precedence order starts with the upper directory and then evaluates the lower directories from left to right. Thus, the layers are evaluated in this order:

1: 88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e
2: 40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab
3: 289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b

Starting with the first layer to be evaluated, its content is the Ubuntu filesystem:

ls /var/lib/docker/overlay2/88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff
bin  boot  dev  etc  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

From there, the second layer additional directories:

tree /var/lib/docker/overlay2/40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff/
├── etc
│   ├── apt
│   │   └── apt.conf.d
│   │       ├── docker-autoremove-suggests
│   │       ├── docker-clean
│   │       ├── docker-gzip-indexes
│   │       └── docker-no-languages
│   └── dpkg
│       └── dpkg.cfg.d
│           └── docker-apt-speedup
├── usr
│   └── sbin
│       ├── initctl
│       └── policy-rc.d
└── var
    └── lib
        └── dpkg
            ├── diversions
            └── diversions-old

10 directories, 9 files

And the third layer as well:

tree /var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/
/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/
└── run
    └── systemd
        └── container

2 directories, 1 file

The instructions creating the layers are defined inside the Dockerfile. The curl command download the Dockerfile file.

curl https://raw.githubusercontent.com/tianon/docker-brew-ubuntu-core/c5bc8f61f0e0a8aa3780a8dc3a09ae6558693117/focal/Dockerfile

By default, it prints the content to the console.

FROM scratch
ADD ubuntu-focal-core-cloudimg-amd64-root.tar.gz /


RUN set -xe \	\	&& echo '#!/bin/sh' > /usr/sbin/policy-rc.d \	&& echo 'exit 101' >> /usr/sbin/policy-rc.d \	&& chmod +x /usr/sbin/policy-rc.d \	\	&& dpkg-divert --local --rename --add /sbin/initctl \	&& cp -a /usr/sbin/policy-rc.d /sbin/initctl \	&& sed -i 's/^exit.*/exit 0/' /sbin/initctl \	\	&& echo 'force-unsafe-io' > /etc/dpkg/dpkg.cfg.d/docker-apt-speedup \	\	&& echo 'DPkg::Post-Invoke  true"; ;' > /etc/apt/apt.conf.d/docker-clean \	&& echo 'APT::Update::Post-Invoke  "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin ;' >> /etc/apt/apt.conf.d/docker-clean \	&& echo 'Dir::Cache::pkgcache ""; Dir::Cache::srcpkgcache "";' >> /etc/apt/apt.conf.d/docker-clean \	\	&& echo 'Acquire::Languages "none";' > /etc/apt/apt.conf.d/docker-no-languages \	\	&& echo 'Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/docker-gzip-indexes \	\	&& echo 'Apt::AutoRemove::SuggestsImportant "false";' > /etc/apt/apt.conf.d/docker-autoremove-suggests

RUN [ -z "$(apt-get indextargets)" ]



RUN mkdir -p /run/systemd && echo 'docker' > /run/systemd/container
CMD ["/bin/bash"]CMD hello

The ADD created the first layer. The first RUN command created the second layer. The second RUN didn’t create any layer because no file was created. The third RUN command created the third layer. The CMD command didn’t create any layer because it is evaluated at runtime when the container is created from the image.

Conclusion

Once we understand how overlay filesystems work, it is quite easy to see how Docker used the overlay filesystem in its Dockerfile with additional caching between each layer. It is easily combined with the chroot jail to provide an isolated filesystem to the container on top of immutable filesystems from the image layers. Distributing images is just about combining multiple images together as tar archive.

Leave a Reply