Optimizing Docker Images Is More Than Just A One And Done Thing

This article is a part of a series of posts where I will walk through every line of the Rails default Dockerfile and explain the best practices and optimizations.

Docker images can be optimized in different ways that include, but are not limited to, image size reduction, build performance optimization, security, and maintainability best practices, and application-specific optimizations. In the first article, I will touch only image size reduction optimization and explain why they are important.

Why to optimize the image size?

As in every other process of software development, each developer will list his reasons why he wants to make his Docker builds faster. I will list the reasons that are most important to me.

Faster builds & deployments

Smaller images are faster to build because fewer files and layers need to be processed. This improves developer productivity, especially during iterative development cycles. Smaller images take less time to push to a registry and pull from it during deployments. This is especially critical in CI/CD pipelines where containers are built and deployed frequently.

Reduced storage costs & network bandwidth usage

Smaller images consume less storage on container registries, local development machines, and production servers. This reduces infrastructure costs, especially for large-scale deployments. Smaller images use less bandwidth when transferred between servers, especially important when you’re building images locally or in CI/CD pipelines and pushing them to a registry.

“We spent $3.2m on cloud in 2022… We stand to save about $7m in server expenses over five years from our cloud exit.” David Heinemeier Hansson — HEY World

Improved performance & security

Smaller images require fewer resources (e.g., CPU, RAM) to load and run, improving the overall performance of containerized applications. Faster startup times mean your services are ready more quickly, which is crucial for scaling and high-availability systems. Minimal base images like alpine or debian-slim contain fewer pre-installed packages, decreasing the risk of unpatched or unnecessary software being exploited.

Besides everything mentioned above, removing unnecessary files and tools minimizes distractions when diagnosing issues and leads to better maintainability and reduced technical debt.

Inspecting Docker images

To get different parameters of the image, including the the size, you can either look at the Docker Desktop or run the docker images command in the terminal.

➜ docker images
REPOSITORY        TAG       IMAGE ID       CREATED        SIZE
kamal-dashboard   latest    673737b771cd   2 days ago     619MB
kamal-proxy       latest    5f6cd8983746   6 weeks ago    115MB
docs-server       latest    a810244e3d88   6 weeks ago    1.18GB
busybox           latest    63cd0d5fb10d   3 months ago   4.04MB
postgres          latest    6c9aa6ecd71d   3 months ago   456MB
postgres          16.4      ced3ad69d60c   3 months ago   453MB

Knowing the size of the image does not give you the full picture. You don’t know what is inside the image, how many layers it has, or how big each layer is. A Docker image layer is a read-only, immutable file system layer that is a component of a Docker image. Each layer represents a set of changes made to the image’s file system, such as adding files, modifying configurations, or installing software.

Docker images are built incrementally, layer by layer, and each layer corresponds to an instruction in the Dockerfile. To get the layers of the image, you can run the docker history command.

➜ docker history kamal-dashboard:latest
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
673737b771cd   4 days ago    CMD ["./bin/thrust" "./bin/rails" "server"]     0B        buildkit.dockerfile.v0
<missing>      4 days ago    EXPOSE map[80/tcp:{}]                           0B        buildkit.dockerfile.v0
<missing>      4 days ago    ENTRYPOINT ["/rails/bin/docker-entrypoint"]     0B        buildkit.dockerfile.v0
<missing>      4 days ago    USER 1000:1000                                  0B        buildkit.dockerfile.v0
<missing>      4 days ago    RUN /bin/sh -c groupadd --system --gid 1000 …   54MB      buildkit.dockerfile.v0
<missing>      4 days ago    COPY /rails /rails # buildkit                   56.2MB    buildkit.dockerfile.v0
<missing>      4 days ago    COPY /usr/local/bundle /usr/local/bundle # b…   153MB     buildkit.dockerfile.v0
<missing>      4 days ago    ENV RAILS_ENV=production BUNDLE_DEPLOYMENT=1…   0B        buildkit.dockerfile.v0
<missing>      4 days ago    RUN /bin/sh -c apt-get update -qq &&     apt…   137MB     buildkit.dockerfile.v0
<missing>      4 days ago    WORKDIR /rails                                  0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   CMD ["irb"]                                     0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   RUN /bin/sh -c set -eux;  mkdir "$GEM_HOME";…   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   ENV PATH=/usr/local/bundle/bin:/usr/local/sb…   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   ENV BUNDLE_SILENCE_ROOT_WARNING=1 BUNDLE_APP…   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   ENV GEM_HOME=/usr/local/bundle                  0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   RUN /bin/sh -c set -eux;   savedAptMark="$(a…   78.1MB    buildkit.dockerfile.v0
<missing>      3 weeks ago   ENV RUBY_DOWNLOAD_SHA256=018d59ffb52be3c0a6d…   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   ENV RUBY_DOWNLOAD_URL=https://cache.ruby-lan…   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   ENV RUBY_VERSION=3.4.1                          0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   ENV LANG=C.UTF-8                                0B        buildkit.dockerfile.v0
<missing>      3 weeks ago   RUN /bin/sh -c set -eux;  mkdir -p /usr/loca…   19B       buildkit.dockerfile.v0
<missing>      3 weeks ago   RUN /bin/sh -c set -eux;  apt-get update;  a…   43.9MB    buildkit.dockerfile.v0
<missing>      3 weeks ago   # debian.sh --arch 'arm64' out/ 'bookworm' '…   97.2MB    debuerreotype 0.15

Since I already provided theory about images, and layers, it is time to explore the Dockerfile. Starting from Rails 7.1, the Dockerfile is generated with the new Rails application. Below is an example of what it may look like.

# syntax=docker/dockerfile:1
# check=error=true

# Make sure RUBY_VERSION matches the Ruby version in .ruby-version
ARG RUBY_VERSION=3.4.1
FROM docker.io/library/ruby:$RUBY_VERSION-slim AS base

# Rails app lives here
WORKDIR /rails

# Install base packages
# Replace libpq-dev with sqlite3 if using SQLite, or libmysqlclient-dev if using MySQL
RUN apt-get update -qq && 
    apt-get install --no-install-recommends -y curl libjemalloc2 libvips libpq-dev && 
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Set production environment
ENV RAILS_ENV="production" 
    BUNDLE_DEPLOYMENT="1" 
    BUNDLE_PATH="/usr/local/bundle" 
    BUNDLE_WITHOUT="development"

# Throw-away build stage to reduce size of final image
FROM base AS build

# Install packages needed to build gems
RUN apt-get update -qq && 
    apt-get install --no-install-recommends -y build-essential curl git pkg-config libyaml-dev && 
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Install application gems
COPY Gemfile Gemfile.lock ./
RUN bundle install && 
    rm -rf ~/.bundle/ "${BUNDLE_PATH}"/ruby/*/cache "${BUNDLE_PATH}"/ruby/*/bundler/gems/*/.git && 
    bundle exec bootsnap precompile --gemfile

# Copy application code
COPY . .

# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/

# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN SECRET_KEY_BASE_DUMMY=1 ./bin/rails assets:precompile

# Final stage for app image
FROM base

# Copy built artifacts: gems, application
COPY --from=build "${BUNDLE_PATH}" "${BUNDLE_PATH}"
COPY --from=build /rails /rails

# Run and own only the runtime files as a non-root user for security
RUN groupadd --system --gid 1000 rails && 
    useradd rails --uid 1000 --gid 1000 --create-home --shell /bin/bash && 
    chown -R rails:rails db log storage tmp
USER 1000:1000

# Entrypoint prepares the database.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]

# Start server via Thruster by default, this can be overwritten at runtime
EXPOSE 80
CMD ["./bin/thrust", "./bin/rails", "server"]

Below I will provide a list of approaches and rules that where applied to the Dockerfile above to make the final image size efficient.

Optimize packages installations

I am sure you keep only needed software on your local development machine. The same should be applied to Docker images. In the examples below I will consistently making worse the Dockerfile extracted from the Rails Dockerfile above. I will reference it as an original Dockerfile version.

Rule #1: Use minimal base images

FROM docker.io/library/ruby:$RUBY_VERSION-slim AS base

The base image is the starting point for the Dockerfile. It is the image that is used to create the container. The base image is the first layer in the Dockerfile, and it is the only layer that is not created by the Dockerfile itself.

The base image is specified with the FROM command, followed by the image name and tag. The tag is optional, and if not specified, the latest tag is used. The base image can be any image available on Docker Hub or any other registry.

In the Dockerfile about, we are using the ruby image with the 3.4.1-slim tag. The ruby image is the official Ruby image available on Docker Hub. The 3.4.1-slim tag is a slim version of the Ruby image that is based on the debian-slim image. While the debian-slim image is a minimal version of the Debian Linux image that is optimized for size. Look at the table below to get an idea of how smaller the slim image is.

➜ docker images --filter "reference=ruby"
REPOSITORY   TAG              IMAGE ID       CREATED      SIZE
ruby         3.4.1-slim       0bf957e453fd   5 days ago   219MB
ruby         3.4.1-alpine     cf9b1b8d4a0c   5 days ago   99.1MB
ruby         3.4.1-bookworm   1e77081540c0   5 days ago   1.01GB

As of January, 2024, current Debian release is called bookworm and the previous one is bullseye.

219 MB instead of 1GB — a huge difference. But what if the alpine image is even smaller? The alpine image is based on the Alpine Linux distribution, which is a super lightweight Linux distribution that is optimized for size and security. Alpine uses the musl library (instead of glibc) and busybox (a compact set of Unix utilities) instead of GNU counterparts. While it is technically possible to use the alpine image to run Rails, I will not cover it in this article.

Rule #2: Minimize layers

RUN apt-get update -qq && 
    apt-get install --no-install-recommends -y curl libjemalloc2 libvips libpq-dev && 
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

Each RUN, COPY and FROM instruction in Dockerfile creates a new layer. The more layers you have, the bigger the image size. This is why the best practice is to combine multiple commands into a single RUN instruction. To illustrate this point, let’s look at the example below.

# syntax=docker/dockerfile:1
# check=error=true

# Make sure RUBY_VERSION matches the Ruby version in .ruby-version
ARG RUBY_VERSION=3.4.1
FROM docker.io/library/ruby:$RUBY_VERSION-slim AS base

RUN apt-get update -qq
RUN apt-get install --no-install-recommends -y curl
RUN apt-get install --no-install-recommends -y libjemalloc2
RUN apt-get install --no-install-recommends -y libvips
RUN apt-get install --no-install-recommends -y libpq-dev
RUN rm -rf /var/lib/apt/lists /var/cache/apt/archives

CMD ["echo", "Whalecome!"]

I have split the RUN instruction into multiple lines, which obviously makes them more human-readable. But how will it affect the size of the image? Let’s build the image and check it out.

➜ time docker build -t no-minimize-layers --no-cache -f no-minimize-layers.dockerfile .
0.31s user 0.28s system 2% cpu 28.577 total

It took 28 seconds to build the image, while to build the original version with minimized layers takes only 19 seconds (almost 33% faster).

➜ time docker build -t original --no-cache -f original.dockerfile .
0.25s user 0.28s system 2% cpu 19.909 total

Let’s check the size of the images.

➜ docker images --filter "reference=*original*" --filter "reference=*no-minimize*"
REPOSITORY           TAG       IMAGE ID       CREATED          SIZE
original             latest    f1363df79c8a   8 seconds ago    356MB
no-minimize-layers   latest    ad3945c8a8ee   43 seconds ago   379MB

The image with minimized layers is 23 MB smaller than the one with no minimized layers. This is a 6% reduction in size. While it seems like a small difference in this example, the difference will be much bigger if you split all the RUN instructions into multiple lines.

Rule #3: Install only what needed

By default, apt-get install installs the recommended packages as well as packages you asked it to install. The --no-install-recommends option tells apt-get to install only the packages that are explicitly specified and not the recommended ones.

➜ time docker build -t without-no-install-recommends --no-cache -f without-no-install-recommends.dockerfile .
0.33s user 0.30s system 2% cpu 29.786 total


➜ docker images --filter "reference=*original*" --filter "reference=*recommends*"
REPOSITORY                      TAG       IMAGE ID       CREATED          SIZE
without-no-install-recommends   latest    41e6e37f1e2b   3 minutes ago    426MB
minimize-layers                 latest    dff22c85d84c   17 minutes ago   356MB

As you can see, the image without --no-install-recommends is 70 MB bigger than the original one. This is a 16% increase in size.

Use dive utility to see which files were added to the image – read more about it in the end of the article.

Rule #4: Clean up after installations

The original Dockerfile includes the rm -rf /var/lib/apt/lists/* /var/cache/apt/archives command after the apt-get install command. This command removes the package lists and archives that are no longer needed after the installation. Let’s see how it affects the image size, to achieve that, I will create a new Dockerfile without the cleaning command.

RUN apt-get update -qq && 
    apt-get install --no-install-recommends -y curl libjemalloc2 libvips libpq-dev

Building the images takes almost the same time as the original one, which makes sense.

➜ time docker build -t without-cleaning --no-cache -f without-cleaning.dockerfile .
0.28s user 0.30s system 2% cpu 21.658 total

Let’s check the size of the images.

➜ docker images --filter "reference=*original*" --filter "reference=*cleaning*"
REPOSITORY         TAG       IMAGE ID       CREATED          SIZE
without-cleaning   latest    52884fe50773   2 minutes ago    375MB
original           latest    f1363df79c8a   16 minutes ago   356MB

The image without cleaning is 19 MB bigger than the one with cleaning, this is a 5% increase in size.

The worst scenario

What if all four optimizations mentioned above are not applied? Let’s create a new Dockerfile without any optimizations and build the image.

# syntax=docker/dockerfile:1
# check=error=true

ARG RUBY_VERSION=3.4.1
FROM docker.io/library/ruby:$RUBY_VERSION AS base

RUN apt-get update -qq
RUN apt-get install -y curl
RUN apt-get install -y libjemalloc2
RUN apt-get install -y libvips
RUN apt-get install -y libpq-dev

CMD ["echo", "Whalecome!"]

➜ time docker build -t without-optimizations --no-cache -f without-optimizations.dockerfile .
0.46s user 0.45s system 1% cpu 1:02.21 total

Wow, it took more than a minute to build the image.

➜ docker images --filter "reference=*original*" --filter "reference=*without-optimizations*"
REPOSITORY              TAG       IMAGE ID       CREATED         SIZE
without-optimizations   latest    45671929c8e4   2 minutes ago   1.07GB
original                latest    f1363df79c8a   27 hours ago    356MB

The image without optimizations is 714 MB bigger than the original one, this is a 200% increase in size. This clearly shows how important it is to optimize the Dockerfile, larger images take more time to build and consume more disk space.

Always use .dockerignore

The .dockerignore file is similar to the .gitignore file used by Git. It is used to exclude files and directories from the context of the build. The context is the set of files and directories that are sent to the Docker daemon when building an image. The context is sent to the Docker daemon as a tarball, so it is important to keep it as small as possible.

If, for any reason, you don’t have the .dockerignore file in your project, you can create it manually. I suggest you use the official Rails .dockerignore file template as a starting point. Below is an example of what it may look like.

# See https://docs.docker.com/engine/reference/builder/#dockerignore-file for more about ignoring files.

# Ignore git directory.
/.git/
/.gitignore

# Ignore bundler config.
/.bundle

# Ignore all environment files.
/.env*

# Ignore all default key files.
/config/master.key
/config/credentials/*.key

# Ignore all logfiles and tempfiles.
/log/*
/tmp/*
!/log/.keep
!/tmp/.keep

# Ignore pidfiles, but keep the directory.
/tmp/pids/*
!/tmp/pids/.keep

# Ignore storage (uploaded files in development and any SQLite databases).
/storage/*
!/storage/.keep
/tmp/storage/*
!/tmp/storage/.keep

# Ignore assets.
/node_modules/
/app/assets/builds/*
!/app/assets/builds/.keep
/public/assets

# Ignore CI service files.
/.github

# Ignore development files
/.devcontainer

# Ignore Docker-related files
/.dockerignore
/Dockerfile*

Having a .dockerfile file in the project not only allows excluding unnecessary files and directories (e.g., GitHub workflows from the .github folder or JavaScript dependencies from the node_modules) from the context. It also helps to avoid accidentally adding sensitive information to the image. For example, the .env file that contains the environment variables or the master.key file that is used to decrypt the credentials.

Use Dive

All the optimizations mentioned above may seem obvious when explained. What to do if you already have a massive image, and you don’t know where to start?

My favorite and most useful tool is Dive. Dive is a TUI tool for exploring a Docker image, layer contents, and discovering ways to shrink the image size. Dive can be installed with your system package manager, or you can use its official Docker image to run it. Let’s use the image from our worst scenario.

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive:latest without-optimizations

In the screenshot above, you can see the inspection of our the most non-optimal image. Dive shows the size of each layer, the total size of the image, and the files that were changed (added, modified, or deleted) in each layer. For me, this is the most useful feature of Dive. By listing the files in the right panel, you can easily identify the files that are not needed and remove commands that add them to the image.

One thing that I truly love about Dive is that, besides having a terminal UI, it also can provide a CI-friendly output, which can be effective in a local development too. To use it, run Dive with the CI environment variable set to true, the output of the command is in the screenshot below.

docker run -e CI=true --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive:latest without-optimizations

My personal preference is to use Dive on a scheduled basis, for example, once a week, to ensure your images are still in a good shape. In the upcoming articles, I will cover automated workflows I use to check my Dockerfile, including Dive and Hadolint.

Don’t squash layers

One approach to minimizing image size that I’ve seen is to try to squash the layers. The idea was to combine several layers into a single layer to reduce the image size. Docker had an experimental option --squash, besides this, there were third-party tools like docker-squash.

While this approach worked in the past, currently it is deprecated and not recommended to use. Squashing layers destroyed Docker’s fundamental feature of layer caching. Apart from that, while using --squash you could unintentionally include sensitive or temporary files from earlier layers in the final image. This is an all-or-nothing approach that lacks fine-grained control.

Instead of squashing layers, it is recommended to use multi-stage builds. Rails Dockerfile already uses multi-stage builds, I will explain how it works in the next article.

Conclusions

Optimizing Docker images, just like any other optimization, cannot be done once and forgotten. It is an ongoing process that requires regular checks and improvements. I tried to cover the basics, but they are critical to know and understand. In the next articles, I will cover more advanced techniques and tools that can help to make your Docker builds faster and more efficient.

Optimizing Docker Images is More Than Just a One and Done Thing | HackerNoon

Why to optimize the image size?

Faster builds & deployments

Reduced storage costs & network bandwidth usage

Improved performance & security

Inspecting Docker images

Optimize packages installations

Rule #1: Use minimal base images

Rule #2: Minimize layers

Rule #3: Install only what needed

Rule #4: Clean up after installations

The worst scenario

Always use .dockerignore

Use Dive

Don’t squash layers

Conclusions

Leave a Reply

Why to optimize the image size?

Faster builds & deployments

Reduced storage costs & network bandwidth usage

Improved performance & security

Inspecting Docker images

Optimize packages installations

Rule #1: Use minimal base images

Rule #2: Minimize layers

Rule #3: Install only what needed

Rule #4: Clean up after installations

The worst scenario

Always use .dockerignore

Use Dive

Don’t squash layers

Conclusions

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply