Desacralizing the Linux overlay filesystem in Docker
Overlay filesystems (also called union filesystems) is a fundamental technology in Docker to create images and containers. They allow creating a union of directories to create a filesystem. Multiple filesystems, which are just directories, are superposed one on top of another to create a new filesystem. These directories are called layers and the unification process is referred to as a union mount. If two files with the same path exist in two layers, only the last file will appear in the overlay filesystem.
We will learn how to create an overlay filesystem ourselves and how Docker is using it to build images and run containers.
Creating an overlay filesystem is easy
The overlay filesystem is made from two types of filesystem. One or more lower filesystems that are immutable. Their content is only read and no modification will occur inside. One upper filesystem receives all the changes from the overlay filesystem including file creations, modifications, and deletions.
Creating an overlay filesystem is easy once you get your hands on it. All you need to a Linux machine with a root or sudoer access. A virtual machine will do.
We initialize the layout by creating multiple folders, each of them corresponding to a layer. We also need a
mount folder at the place where we want the overlay filesystem to be created and a
workdir folder for internal purposes.
mkdir overlay; cd overlay mkdir \ lower-layer-1 lower-layer-2 lower-layer-3 upper-layer \ mount \ workdir
Let’s also create some files into 3 folders. We leave the upper folder empty.
echo "Content layer 1" > ./lower-layer-1/file-in-layer-1 echo "Content layer 2" > ./lower-layer-2/file-in-layer-2 echo "Content layer 3" > ./lower-layer-3/file-in-layer-3
Two or more directories are required. They make a list of lower directories and an upper directory. The lower directories of the filesystem are read-only, whereas the upper directory can be used for both reads and writes. The
mount command creates the overlay filesystem with the external type
-t set to
overlay. It must be executed as
sudo mount -t overlay my-overlay \ -o lowerdir=$HOME/overlay/lower-layer-1:$HOME/overlay/lower-layer-2:$HOME/overlay/lower-layer-3,upperdir=$HOME/overlay/upper-layer,workdir=$HOME/overlay/workdir \ $HOME/overlay/mount
df command lists all the filesystem along with some useful information such as the amount of free space and the type of filesystem when executed with the
-T Flag. The
-h flag is only here to print the filesystem size in a human-readable format.
df -Th | grep overlay my-overlay overlay 20G 5.5G 14G 29% /home/ubuntu/overlay/mount
The overlay filesystem is created and mounted inside the
mount folder. It contains the files from all the original filesystems.
ls -l mount total 12 -rw-rw-r-- 1 ubuntu ubuntu 13 Jun 2 22:38 file-in-layer-1 -rw-rw-r-- 1 ubuntu ubuntu 13 Jun 2 22:38 file-in-layer-2 -rw-rw-r-- 1 ubuntu ubuntu 13 Jun 2 22:38 file-in-layer-3 cat mount/file-in-layer-3 Content layer 3
Let’ try to create a file in the mount folder:
echo "new content" > mount/new-file
It is written to the upper directory,
tree . ├── lower-layer-1 │ └── file-in-layer-1 ├── lower-layer-2 │ └── file-in-layer-2 ├── lower-layer-3 │ └── file-in-layer-3 ├── mount │ ├── new-file │ ├── file-in-layer-1 │ ├── file-in-layer-2 │ └── file-in-layer-3 ├── upper-layer │ └── new-file └── workdir └── work [error opening dir] 7 directories, 8 files
Now, we modify a file, for example,
echo 'Add a new line' >> mount/file-in-layer-1
The original file present inside
lower-layer-1 is not modified. Instead, a new file in
upper-layer is created:
cat lower-layer-1/file-in-layer-1 Content layer 1 cat upper-layer/file-in-layer-1 Content layer 1 Add a new line cat mount/file-in-layer-1 Content layer 1 Add a new line
Let see the behavior when removing a file, for example,
The original file inside
lower-layer-2 is still there. A new file in
upper-layer is created with a special type, it is a character file. This is how the overlay filesystem represents a deleted file.
ls -l lower-layer-2/file-in-layer-2 -rw-rw-r-- 1 ubuntu ubuntu 13 Jun 2 22:38 lower-layer-2/file-in-layer-2 ls -l upper-layer/file-in-layer-2 c--------- 1 root root 0, 0 Jun 2 23:33 upper-layer/file-in-layer-2
Now that the lab is finished, we can unmount the filesystem and purge our files.
sudo umount $HOME/overlay/mount ls -l mount/ total 0 cd .. rm -rf overlay
Overlay in Docker
Docker supports multiple storage drivers to write data to a container’s writable layer, OverlayFS is the recommended storage driver. If you print the information from your local Docker installation, chances are that it prints the
overlay2 storage driver.
docker info | grep "Storage Driver" Storage Driver: overlay2
Docker uses the overlay filesystem to create images as well as to position the container layer on top of the image layers.
When an image is downloaded, its layers are located inside the
/var/lib/docker/overlay2 folder. For example, downloading a 3-layers image using
docker pull ubuntu creates 3+1 directories. The
l directory contains shortened layer identifiers as symbolic links.
docker pull ubuntu Using default tag: latest latest: Pulling from library/ubuntu 345e3491a907: Pull complete 57671312ef6f: Pull complete 5e9250ddb7d0: Pull complete Digest: sha256:adf73ca014822ad8237623d388cedf4d5346aa72c270c5acc01431cc93e18e2d Status: Downloaded newer image for ubuntu:latest docker.io/library/ubuntu:latest
In my case, 3 layers are downloaded:
ls -l /var/lib/docker/overlay2/ total 16 drwx------ 4 root root 4096 Jun 3 11:21 289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/ drwx------ 4 root root 4096 Jun 3 11:21 40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/ drwx------ 3 root root 4096 Jun 3 11:21 88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/ drwx------ 2 root root 4096 Jun 3 11:21 l/ ls -l /var/lib/docker/overlay2/l/ total 12 lrwxrwxrwx 1 root root 72 Jun 3 11:21 NSEHV6LZKQIRKICXA2T7T5252D -> ../88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff/ lrwxrwxrwx 1 root root 72 Jun 3 11:21 QPAIOX2SCZPFZIIXB27PFVHUPH -> ../40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff/ lrwxrwxrwx 1 root root 72 Jun 3 11:21 USIDUBYHQEGWIRN4JOSF74ZWIL -> ../289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/
Those layers are also exposed by inspecting the Docker image. The output of the Docker command is in JSON. We use jq to filter the part with the most interest to us.
docker image inspect ubuntu | jq -r '. | Data: .GraphDriver.Data' "Data": "LowerDir": "/var/lib/docker/overlay2/40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff:/var/lib/docker/overlay2/88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff", "MergedDir": "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/merged", "UpperDir": "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff", "WorkDir": "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/work"
The precedence order starts with the upper directory and then evaluates the lower directories from left to right. Thus, the layers are evaluated in this order:
1: 88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e 2: 40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab 3: 289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b
Starting with the first layer to be evaluated, its content is the Ubuntu filesystem:
ls /var/lib/docker/overlay2/88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff bin boot dev etc home lib lib32 lib64 libx32 media mnt opt proc root run sbin srv sys tmp usr var
From there, the second layer additional directories:
tree /var/lib/docker/overlay2/40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff/ ├── etc │ ├── apt │ │ └── apt.conf.d │ │ ├── docker-autoremove-suggests │ │ ├── docker-clean │ │ ├── docker-gzip-indexes │ │ └── docker-no-languages │ └── dpkg │ └── dpkg.cfg.d │ └── docker-apt-speedup ├── usr │ └── sbin │ ├── initctl │ └── policy-rc.d └── var └── lib └── dpkg ├── diversions └── diversions-old 10 directories, 9 files
And the third layer as well:
tree /var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/ /var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/ └── run └── systemd └── container 2 directories, 1 file
The instructions creating the layers are defined inside the Dockerfile. The
curl command download the Dockerfile file.
By default, it prints the content to the console.
FROM scratch ADD ubuntu-focal-core-cloudimg-amd64-root.tar.gz / RUN set -xe \ \ && echo '#!/bin/sh' > /usr/sbin/policy-rc.d \ && echo 'exit 101' >> /usr/sbin/policy-rc.d \ && chmod +x /usr/sbin/policy-rc.d \ \ && dpkg-divert --local --rename --add /sbin/initctl \ && cp -a /usr/sbin/policy-rc.d /sbin/initctl \ && sed -i 's/^exit.*/exit 0/' /sbin/initctl \ \ && echo 'force-unsafe-io' > /etc/dpkg/dpkg.cfg.d/docker-apt-speedup \ \ && echo 'DPkg::Post-Invoke true"; ;' > /etc/apt/apt.conf.d/docker-clean \ && echo 'APT::Update::Post-Invoke "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin ;' >> /etc/apt/apt.conf.d/docker-clean \ && echo 'Dir::Cache::pkgcache ""; Dir::Cache::srcpkgcache "";' >> /etc/apt/apt.conf.d/docker-clean \ \ && echo 'Acquire::Languages "none";' > /etc/apt/apt.conf.d/docker-no-languages \ \ && echo 'Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/docker-gzip-indexes \ \ && echo 'Apt::AutoRemove::SuggestsImportant "false";' > /etc/apt/apt.conf.d/docker-autoremove-suggests RUN [ -z "$(apt-get indextargets)" ] RUN mkdir -p /run/systemd && echo 'docker' > /run/systemd/container CMD ["/bin/bash"]CMD hello
ADD created the first layer. The first
RUN command created the second layer. The second
RUN didn’t create any layer because no file was created. The third
RUN command created the third layer. The
CMD command didn’t create any layer because it is evaluated at runtime when the container is created from the image.
Once we understand how overlay filesystems work, it is quite easy to see how Docker used the overlay filesystem in its Dockerfile with additional caching between each layer. It is easily combined with the
chroot jail to provide an isolated filesystem to the container on top of immutable filesystems from the image layers. Distributing images is just about combining multiple images together as