How To Make Docker Builds Faster With Layer Caching

Introduction

Have you ever stared at your terminal, waiting for a Docker build, and wondered why a tiny code change triggered a 10-minute recompilation of your entire project? Or why your final image is hundreds of megabytes larger than you think it should be? These aren’t quirks of a mysterious system; they are the predictable outcomes of understandable mechanics. The difference between a frustratingly slow workflow and an efficiently fast one often comes down to understanding the engine of docker build.

This article is a guide to that engine room. We will demystify the build process by mastering three pillars of efficiency: the layer caching system, the art of the RUN command, and the role of the .dockerignore file as the gatekeeper to your build. By the end, you will not just know what commands to run, but why they work, empowering you to craft truly professional and optimized containers. This project features a simple AI application that uses a BERT model for text classification, and we’ll use its Dockerfile from our layered_image project as a case study to illustrate these core principles.

The Foundation: Docker Layers and the Build Cache – The Immutable Ledger

Imagine your Docker image not as a single, monolithic file, but as a stack of precisely defined changes, like an immutable ledger where each transaction is recorded on a new page. This is the essence of Docker’s layered filesystem. Each instruction in your Dockerfile—FROM, COPY, RUN, CMD, etc. typically creates a new layer. This layer doesn’t contain a full copy of the filesystem; instead, it records only the differences introduced by that specific instruction compared to the layer beneath it. If a RUN apt-get install curl command adds curl, that layer essentially says “+ curl and its dependencies.” If a subsequent COPY my_script.py /app/ adds a script, that new layer says “+ /app/my_script.py.”

This layered approach is ingenious for efficiency. When you pull an image, Docker only downloads layers it doesn’t already have. When you build images that share common base layers (like python:3.10-slim), those base layers are stored once and shared.

Building upon this layered filesystem is the Docker build cache. It’s Docker’s memory of past operations. When you issue a docker build command, Docker steps through your Dockerfile instruction by instruction. For each instruction, it checks three things:

The exact instruction itself (e.g., COPY my_file.txt /dest/).
The content of any files involved in that instruction (e.g., the checksum of my_file.txt).
The parent image layer upon which this instruction is based.

If Docker finds an existing layer in its cache that was created from the exact same parent layer using the exact same instruction with the exact same input files, it reuses that cached layer instantly. This is a cache hit.

However, if any of these conditions change for e.g. if the instruction is different, if a copied file’s content has changed, or if the parent layer is different (because a previous instruction was a cache miss), then Docker experiences a cache bust. When a cache bust occurs, Docker must execute that instruction from scratch, creating a new layer. Critically, all subsequent instructions in the Dockerfile will also be executed from scratch, regardless of whether they might have matched the cache on their own. The cache is invalidated from that point downwards.

This leads to the golden rule of caching: Order instructions from least frequently changed to most frequently changed. Think of it like organizing your desk: things you rarely touch go in the back drawers; things you use constantly stay on top.

Interactive Experiment to Feel the Cache:

First, build the layered_image (which has a cache-friendly order) using a command like time docker build -t bert-classifier:layered -f layered_image/Dockerfile layered_image/. For us, this initial build took about 23 seconds.
Now, open layered_image/app/predictor.py and make a trivial change, like adding a comment. Rebuild the image: time docker build -t bert-classifier:layered -f layered_image/Dockerfile layered_image/. The build should complete in less than a second. Why? Docker sees FROM, WORKDIR, COPY runtime_requirements.txt are unchanged and reuses their layers. It sees the RUN pip install instruction is the same and its input (runtime_requirements.txt) hasn’t changed its content, so it reuses the massive layer created by pip install. Only when it reaches COPY layered_image/app/ ./app/ does it detect a change (your modified predictor.py), so it rebuilds that layer and subsequent ones. If you want proof, go ahead and add the —progress=plain flag to the end the build command. Docker CLI will show you the cached layers.
Next, the crucial test for understanding cache invalidation: edit your layered_image/Dockerfile. Move the line COPY layered_image/app/ ./app/ to before the RUN pip install ... line. Make one more trivial change to layered_image/app/predictor.py and rebuild. What happens? The build takes the full 23 seconds again! The change to app/predictor.py busted the cache at the (now earlier) COPY ./app/ step. Because the pip install step comes after this cache bust, it too is forced to re-run from scratch, even though runtime_requirements.txt didn’t change.

This experiment powerfully demonstrates how a cache bust cascades and why the order of your Dockerfile instructions is paramount for a fast development loop. Here’s the cache-friendly structure we advocate from our layered_image project:

# Cache-Friendly Order (from layered_image/Dockerfile runtime stage)
FROM python:3.10-slim AS runtime
WORKDIR /app

# 1. Copy requirements first (changes less often than app code)
COPY layered_image/runtime_requirements.txt ./runtime_requirements.txt

# 2. Install dependencies (slow step, now cached if requirements.txt doesn't change)
RUN pip install --no-cache-dir -r runtime_requirements.txt # (Full command shown later)

# 3. Copy app code last (changes most often)
COPY layered_image/app/ ./app/
COPY layered_image/sample_data/ ./sample_data/

CMD ["python", "app/predictor.py", "sample_data/sample_text.txt"]

The Art of the `RUN` Command: Chaining for Microscopic Layers

The pursuit of an efficient Dockerfile has a parallel in the physical world: trying to minimize the volume of a collection of items. Each RUN command in your Dockerfile creates a new layer. If you download a tool, use it, and then delete it in separate RUN commands, you’re like someone putting an item in a box, then putting an empty wrapper for that item in another box on top. The original item is still there, in the lower box, taking up space, even if the top box says “it’s gone.”

Specifically, files created in one layer cannot be truly removed from the overall image size by a command in a subsequent layer. The subsequent layer simply records that those files are “deleted” or “hidden,” but the bits comprising those files still exist in the image’s historical layers. This is what tools like dive often report as “wasted space.”

Consider this anti-pattern:

# Anti-Pattern: Separate RUN commands leading to bloat
FROM python:3.10-slim
WORKDIR /app
COPY runtime_requirements.txt .
RUN pip install --no-cache-dir -r runtime_requirements.txt  # Step 1: Install
RUN pip cache purge                                         # Step 2: Cleanup attempt 1
RUN rm -rf /tmp/* /var/tmp/*                                # Step 3: Cleanup attempt 2
# ... (further cleanup attempts)

If you were to build image and the run docker history bert-classifier-layers, you’d observe the output for each RUN step. The first RUN pip install... step would show a significant amount of data being written ( approx 679MB). The subsequent RUN pip cache purge and RUN rm -rf /tmp/ steps would show very little data written for their layers, perhaps only a few kilobytes. This is because they aren’t removing data from the previous 679MB layer; they are just adding new, small layers on top that mark those files as deleted. The 679MB layer remains part of the image history.

docker history bert-classifier-layers                                                                
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
f09d44f97ab4   34 minutes ago   CMD ["python" "app/predictor.py" "sample_dat…   0B        buildkit.dockerfile.v0
<missing>      34 minutes ago   COPY layered_image/sample_data/ ./sample_dat…   376B      buildkit.dockerfile.v0
<missing>      34 minutes ago   COPY layered_image/app/ ./app/ # buildkit       5.51kB    buildkit.dockerfile.v0
<missing>      34 minutes ago   RUN /bin/sh -c rm -rf /tmp/* /var/tmp/* &&  …   0B        buildkit.dockerfile.v0
<missing>      34 minutes ago   RUN /bin/sh -c pip cache purge # buildkit       6.21kB    buildkit.dockerfile.v0
<missing>      34 minutes ago   RUN /bin/sh -c pip install --no-cache-dir -r…   679MB     buildkit.dockerfile.v0
<missing>      34 minutes ago   COPY layered_image/runtime_requirements.txt …   141B      buildkit.dockerfile.v0
<missing>      3 hours ago      WORKDIR /app                                    0B        buildkit.dockerfile.v0
<missing>      11 days ago      CMD ["python3"]                                 0B        buildkit.dockerfile.v0
<missing>      11 days ago      RUN /bin/sh -c set -eux;  for src in idle3 p…   36B       buildkit.dockerfile.v0
<missing>      11 days ago      RUN /bin/sh -c set -eux;   savedAptMark="$(a…   46.4MB    buildkit.dockerfile.v0
<missing>      11 days ago      ENV PYTHON_SHA256=ae665bc678abd9ab6a6e1573d2…   0B        buildkit.dockerfile.v0
<missing>      11 days ago      ENV PYTHON_VERSION=3.10.18                      0B        buildkit.dockerfile.v0
<missing>      11 days ago      ENV GPG_KEY=A035C8C19219BA821ECEA86B64E628F8…   0B        buildkit.dockerfile.v0
<missing>      11 days ago      RUN /bin/sh -c set -eux;  apt-get update;  a…   9.17MB    buildkit.dockerfile.v0
<missing>      11 days ago      ENV LANG=C.UTF-8                                0B        buildkit.dockerfile.v0
<missing>      11 days ago      ENV PATH=/usr/local/bin:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      11 days ago      # debian.sh --arch 'arm64' out/ 'bookworm' '…   97.2MB    debuerreotype 0.15

The solution is to perform all related operations, especially creation and cleanup of temporary files or tools, within a single RUN command, chaining them with &&. This ensures that any temporary artifacts exist only ephemerally during the execution of that single RUN command and are gone before the layer is finalized and committed.

Let’s look at the aggressive cleanup RUN command from our layered_image/Dockerfile.:

RUN pip install --no-cache-dir -r runtime_requirements.txt && 
    pip cache purge && 
    rm -rf /tmp/* /var/tmp/* && 
    find /usr/local/lib/python*/site-packages/ -name "*.pyc" -delete && 
    find /usr/local/lib/python*/site-packages/ -name "__pycache__" -type d -exec rm -rf {} + || true

This command is a carefully choreographed dance:

pip install --no-cache-dir -r runtime_requirements.txt: Installs Python packages without leaving downloaded wheel files in pip’s HTTP cache.
pip cache purge: Explicitly clears out any other cache pip might maintain.
rm -rf /tmp/* /var/tmp/: Removes files from standard temporary directories.
find ... -name “.pyc" -delete: Deletes compiled Python byte code files.
find ... -name “pycache" -type d -exec rm -rf {} +: Removes the pycache directories.
|| true: Ensures the RUN command succeeds even if find doesn’t locate any files (which can return a non-zero exit code).

The Impact (Showcased with docker history):

With this single, chained RUN command, the resulting layer for our layered_image project is 572MB. If these steps were unchained, the initial pip install would create a layer of approximately 679MB. The docker history command would reflect this:

docker history bert-classifier-layers                                                                
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
17d0319094f4   2 minutes ago   CMD ["python" "app/predictor.py" "sample_dat…   0B        buildkit.dockerfile.v0
<missing>      2 minutes ago   COPY layered_image/sample_data/ ./sample_dat…   376B      buildkit.dockerfile.v0
<missing>      2 minutes ago   COPY layered_image/app/ ./app/ # buildkit       5.51kB    buildkit.dockerfile.v0
<missing>      2 minutes ago   RUN /bin/sh -c pip install --no-cache-dir -r…   572MB     buildkit.dockerfile.v0
<missing>      2 minutes ago   COPY layered_image/runtime_requirements.txt …   141B      buildkit.dockerfile.v0
<missing>      3 hours ago     WORKDIR /app                                    0B        buildkit.dockerfile.v0
<missing>      11 days ago     CMD ["python3"]                                 0B        buildkit.dockerfile.v0
<missing>      11 days ago     RUN /bin/sh -c set -eux;  for src in idle3 p…   36B       buildkit.dockerfile.v0
<missing>      11 days ago     RUN /bin/sh -c set -eux;   savedAptMark="$(a…   46.4MB    buildkit.dockerfile.v0
<missing>      11 days ago     ENV PYTHON_SHA256=ae665bc678abd9ab6a6e1573d2…   0B        buildkit.dockerfile.v0
<missing>      11 days ago     ENV PYTHON_VERSION=3.10.18                      0B        buildkit.dockerfile.v0
<missing>      11 days ago     ENV GPG_KEY=A035C8C19219BA821ECEA86B64E628F8…   0B        buildkit.dockerfile.v0
<missing>      11 days ago     RUN /bin/sh -c set -eux;  apt-get update;  a…   9.17MB    buildkit.dockerfile.v0
<missing>      11 days ago     ENV LANG=C.UTF-8                                0B        buildkit.dockerfile.v0
<missing>      11 days ago     ENV PATH=/usr/local/bin:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      11 days ago     # debian.sh --arch 'arm64' out/ 'bookworm' '…   97.2MB    debuerreotype 0.15

This direct comparison in layer size demonstrates a saving of 107MB simply by structuring the cleanup correctly within the same RUN instruction.

The Gatekeeper: Mastering `.dockerignore`

Our final principle concerns: the very beginning of the build process. When you execute docker build ., the . (or any path you specify) defines the “build context.” Docker meticulously packages everything within this path (respecting the .dockerignore file, of course) into an archive and transmits it to the Docker daemon. The daemon then unpacks this context and uses it as the sole source of local files for any COPY or ADD instructions in your Dockerfile. It has no access to anything on your filesystem outside this context.

The problem, particularly for AI projects, is that our project directories are often treasure troves of files utterly irrelevant to the final runtime image: local datasets, model checkpoints, Jupyter notebooks, Python virtual environments, and the entire .git history. Sending a multi-gigabyte context isn’t just slow (especially if your daemon is remote, like in many CI systems), it’s also a security and cleanliness concern. You risk accidentally COPYing sensitive information or development artifacts into your image.

The .dockerignore file is your vigilant gatekeeper. It’s a simple text file, placed in the root of your build context, that uses patterns (much like .gitignore) to specify which files and directories should be excluded from the context before it’s ever packaged and sent to the daemon.

A comprehensive .dockerignore for an AI project might look like this:

# .dockerignore
# Python virtual environments
.venv/
env/
venv/

# Python caches and compiled files
__pycache__/
*.py[cod] # .pyc, .pyo, .pyd
*.egg-info/
dist/
build/
*.so # Compiled shared objects, unless explicitly needed and copied

# IDE and OS specific
.vscode/
.idea/
*.swp
*.swo
.DS_Store
Thumbs.db

# Notebooks and exploratory artifacts
notebooks/
*.ipynb_checkpoints

# Test-related files (if not run inside the container build)
tests/
.pytest_cache/
htmlcov/
.coverage

# Large data or model files not intended for baking into the image
data/
models/
model_checkpoints/
*.pt
*.onnx
*.h5

# Log files
*.log

# Dockerfile itself (usually not needed to be COPIED into the image)
# Dockerfile

# Version control (see note below)
# .git
# .gitignore

By meticulously defining what to ignore, you ensure the build context is lean. This speeds up the initial “Sending build context to Docker daemon…” step, reduces the chance of accidental data inclusion, and makes your COPY . . commands safer and more predictable.

Conclusion

In some sense, a Dockerfile is just another tool. Yet, by delving into its mechanics, by understanding how it transforms your instructions into an image, you gain a craftsman’s control. We’ve seen that the deliberate ordering of instructions to honor the build cache can turn minutes of waiting into seconds of action. We’ve learned that the artful chaining within RUN commands isn’t just about syntax; it’s about sculpting lean, efficient layers. And we’ve recognized the .dockerignore file not as a minor detail, but as a crucial guardian of our build process’s integrity and speed.

These principles, layers, caching, chaining, and context management, are fundamental. Mastering them is key to moving beyond simply creating Docker images to truly engineering them for efficiency, speed, and cleanliness, especially in the demanding world of AI.

Your Turn

Now that you understand these mechanics, revisit your own Dockerfiles. Can you reorder layers for better caching? Can you chain RUN commands for more aggressive cleanup? Implement a robust .dockerignore. Share your findings or questions in the comments below!

How to Make Docker Builds Faster with Layer Caching | HackerNoon

Introduction

The Foundation: Docker Layers and the Build Cache – The Immutable Ledger

The Art of the `RUN` Command: Chaining for Microscopic Layers

The Gatekeeper: Mastering `.dockerignore`

Conclusion

Your Turn

Leave a Reply

Introduction

The Foundation: Docker Layers and the Build Cache – The Immutable Ledger

The Art of the RUN Command: Chaining for Microscopic Layers

The Gatekeeper: Mastering .dockerignore

Conclusion

Your Turn

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

The Art of the `RUN` Command: Chaining for Microscopic Layers

The Gatekeeper: Mastering `.dockerignore`

Leave a Reply