Why Is My Docker Image So Big? A Deep Dive With ‘dive’ To Find The Bloat

Key Takeaways

A Docker image isn’t just a monolithic file, but rather it is a stack of immutable layers, where each layer represents the changes made by a single Dockerfile instruction.

Large AI Docker images primarily bloat from massive AI library installations and hefty base OS components.

Master Docker diagnostics by combining docker history to see layer sizes with dive to interactively explore their contents and pinpoint the exact sources of bloat.

Pinpointing specific bloat sources with these diagnostic tools enables informed decisions for targeted image size reduction and efficiency gains.

Effective image diagnosis scrutinizes not only Python dependencies, but also the base OS system package installations, and files copied from the build context.

Introduction

There are two great reasons to use a Docker image for an AI project: it can work, faithfully running your model, and it can be crafted, meaning it’s lean, builds quickly, and deploys efficiently. It might seem as if these two reasons are unrelated, like a powerful engine and a sleek chassis. And yet, I don’t think they are. I think an image that is well-crafted is more likely to work reliably and scale gracefully in the demanding world of software engineering and AI.

The goal of this article is to transform our Docker images from opaque, surprisingly large, black boxes into something more refined. Why bother? Because in the world of AI, where iteration speed is king and cloud bills can be princely, a 5GB image that takes an age to build and deploy is more than an inconvenience, it’s a drag on progress and increases deployment costs.

Before we can optimize, we must diagnose. We need to become Docker image detectives, peering into every nook and cranny of our images to understand how these digital containers are constructed, looking layer by layer, and pinpointing exactly where the bloat, the inefficiency, the digital detritus, truly lies.

The “Why Optimize?” for AI Docker Images

People are tool-builders; Docker is a phenomenal tool for packaging and deploying our AI creations. But like any tool, its effectiveness depends on how we wield it. An unoptimized Docker image in an AI workflow can lead to numerous problems.

Slower Development Cycles

Inefficient CI/CD Pipelines

Each push and pull of that 2.54GB image through your continuous integration and deployment system consumes time and bandwidth. While 2.54GB might be acceptable for an infrequent deployment, production systems often involve more frequent updates for retraining models, patching libraries, or rolling out new features. If your production image swells to 5GB, 10GB, or more (which is not uncommon), these continuous integration and continuous delivery (CI/CD) operations become significant bottlenecks, delaying releases and consuming more resources.

Higher Cloud Costs

Storing multi-gigabyte images in container registries isn’t free, especially when managing multiple versions across numerous projects. Shrinking our 2.54GB image will yield immediate storage cost savings. More critically, this drive for efficiency aligns with modern sustainability goals. By reducing the data transferred during pushes, pulls, and scaling events, we decrease the energy consumption and associated carbon footprint of our cloud infrastructure. Crafting a lightweight Docker image isn’t just a technical or financial optimization, it’s a tangible step towards building more responsible and “green” AI systems.

A Less “Clean” State

A leaner image is inherently more secure. A bloated Docker image, by its very nature, contains more than just your application. Often it carries a full operating system’s worth of utilities, shells, package managers (e.g., apt and pip), as well as libraries that are not strictly required. Each component represents a potential vector for attack. If a vulnerability is discovered in curl, bash, or any of the hundreds of other OS utilities present, and that utility is in your image, your deployment is now vulnerable. By aggressively minimizing our container contents, we are practicing the principle of least privilege at the filesystem level, which drastically reduces the attack surface and leaves fewer tools for a potential intruder to exploit. This pursuit of a “clean” state transforms optimization from a mere performance tweak into a fundamental security best practice.

The goal is not just to make things smaller, but to make our entire AI development and deployment lifecycle faster, more efficient, and ultimately, more robust. The make-it-small principle is so fundamental to modern cloud operations that it’s precisely why hyper-scalers like AWS, Microsoft Azure, and Google Cloud invest in creating and promoting their own lean Linux distributions, such as Bottlerocket OS and CBL-Mariner. They understand that, at scale, every megabyte saved and every millisecond gained during image transfer and startup translates into significant improvements in cost, performance, and security. By optimizing our own AI images, we are applying the same battle-tested logic that powers the world’s largest cloud infrastructures.

Our Specimen: The Naive BERT Classifier

Let’s introduce our “patient” for today’s diagnostic session. It’s a simple text classification application using the popular bert-base-uncased model from Hugging Face Transformers.

This walk-through is accompanied by a repository on Github that showcases our “naive_image“.

The ingredients are straightforward:

The requirements.txt file (located in our project’s naive_image/ directory)


# Core Dependencies
transformers==4.52.3
torch==2.7.0
torchvision==0.22.0
torchaudio==2.7.0

# Web Framework for Server
flask==2.3.3

# Development/Runtime Deps
pandas
numpy==1.26.4
requests==2.32.3
pillow
scikit-learn

# Development/Analysis Deps

pytest
jupyter
ipython
matplotlib
seaborn
black
flake8
mypy

A “Problematic” Dockerfile

This file builds our bert-classifier-naive image. It’s functional, but we’ve intentionally left in a few common missteps to make our diagnostic journey more enlightening.


# naive_image/Dockerfile
# This is the initial, naive Dockerfile.
# It aims to be simple and functional, but NOT optimized for size or speed.

# Use a standard, general-purpose Python image.
FROM python:3.10


RUN apt-get update && apt-get install -y curl

# Set the working directory inside the container
# All subsequent commands will run from this directory
WORKDIR /app

# Copy requirements first for better layer caching
COPY naive_image/requirements.txt ./requirements.txt

# Install all dependencies listed in requirements.txt.
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code and data
COPY naive_image/app/ ./app/

COPY naive_image/sample_data/ ./sample_data/

RUN echo "Build complete" > /app/build_status.txt

# Command to run the application when the container starts.
# This runs the predictor script with the sample text file.
CMD ["python", "app/predictor.py", "sample_data/sample_text.txt"]

When we build this image, we create our 2.54GB image.

docker build -t bert-classifier-naive.

Now, let’s open it up.

The Diagnostic Toolkit: Peeling Back the Layers

Think of a Docker image not as a monolith, but as a stack of transparent sheets, each representing a change or an addition. Our tools will help us examine these sheets.

The First Glance

docker image ls

This is your quick weigh-in.

docker image ls bert-classifier-naive

The output immediately flags our bert-classifier-naive image at a hefty 2.54GB. A clear signal that there’s room for improvement.


> docker images bert-classifier-naive

     REPOSITORY        TAG    IMAGE ID          CREATED      SIZE
bert-classifier-naive latest b0693be54230 About a minute ago 2.54GB

The Command Log

docker history bert-classifier-naive

If docker image ls shows you the final, total size of the image, docker history breaks down that total. It lists every command from your Dockerfile and shows you exactly how much each step contributed to size.

docker history bert-classifier-naive

The output will resemble this:


IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
b0693be54230   2 minutes ago   CMD ["python" "app/predictor.py" "sample_dat…   0B        buildkit.dockerfile.v0
<missing>      2 minutes ago   RUN /bin/sh -c echo "Build complete" > /app/…   15B       buildkit.dockerfile.v0
<missing>      2 minutes ago   COPY naive_image/sample_data/ ./sample_data/…   376B      buildkit.dockerfile.v0
<missing>      2 minutes ago   COPY naive_image/app/ ./app/ # buildkit         12.2kB    buildkit.dockerfile.v0
<missing>      2 minutes ago   RUN /bin/sh -c pip install --no-cache-dir -r…   1.51GB    buildkit.dockerfile.v0
<missing>      3 minutes ago   COPY naive_image/requirements.txt ./requirem…   362B      buildkit.dockerfile.v0
<missing>      3 minutes ago   WORKDIR /app                                    0B        buildkit.dockerfile.v0
<missing>      3 minutes ago   RUN /bin/sh -c apt-get update && apt-get ins…   19.4MB    buildkit.dockerfile.v0
<missing>      3 weeks ago     CMD ["python3"]                                 0B        buildkit.dockerfile.v0
<missing>      3 weeks ago     RUN /bin/sh -c set -eux;  for src in idle3 p…   36B       buildkit.dockerfile.v0
<missing>      3 weeks ago     RUN /bin/sh -c set -eux;   wget -O python.ta…   58.2MB    buildkit.dockerfile.v0
<missing>      3 weeks ago     ENV PYTHON_SHA256=4c68050f049d1b4ac5aadd0df5…   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago     ENV PYTHON_VERSION=3.10.17                      0B        buildkit.dockerfile.v0
<missing>      3 weeks ago     ENV GPG_KEY=A035C8C19219BA821ECEA86B64E628F8…   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago     RUN /bin/sh -c set -eux;  apt-get update;  a…   18.2MB    buildkit.dockerfile.v0
<missing>      3 weeks ago     ENV LANG=C.UTF-8                                0B        buildkit.dockerfile.v0
<missing>      3 weeks ago     ENV PATH=/usr/local/bin:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      16 months ago   RUN /bin/sh -c set -ex;  apt-get update;  ap…   560MB     buildkit.dockerfile.v0
<missing>      16 months ago   RUN /bin/sh -c set -eux;  apt-get update;  a…   183MB     buildkit.dockerfile.v0
<missing>      2 years ago     RUN /bin/sh -c set -eux;  apt-get update;  a…   48.5MB    buildkit.dockerfile.v0

From this history, two things scream out. First, the 1.51GB layer from our pip install command is the main contributor from our direct actions. Following that, the base image itself contributes significantly, with one layer alone being 560MB and our apt-get install curl adding another 19.4MB. This historical view tells us which commands are the heavy hitters.

The Deep Inspection

dive bert-classifier-naive

Now for the star of our diagnostic show: dive. Dive is an open-source CLI tool for exploring a Docker image, layer contents, and discovering ways to shrink the image size.

Homebrew is the easiest way to install dive.

brew install dive

Launch it with:

dive bert-classifier-naive

Let’s walk through our bert-classifier-naive image using dive:

The Foundation – Base Image Layers

Select one of the largest layers at the bottom of the layer list on the left. For instance, the one that docker history told us was 560MB. In the right pane, you’ll see the filesystem structure. This is the bulk of the python:3.10 base image (a full Debian operating system), Python’s standard library, and more. It’s like buying a furnished house when all you needed was a specific room.

Figure 1: Dive view of bert-classifier-naive.

The apt-get install curl Layer (19.4MB)

Navigate to this layer. On the right, dive will show curl and its dependencies being added. Of importance, if you explore /var/lib/apt/lists/, you’ll find it populated with package metadata. Because we didn’t clean this up in the same layer, this data, though not useful at runtime, remains part of this layer’s contribution to the image size. Notice that dive even has a “Potential wasted space” metric (bottom left, yours showed 9.5MB) which often flags such omissions.

Figure 2: Dive view of apt-get install layer of the bert-classifier-naive image.

The pip install Layer (The Main Event)

Select this layer. This is where our AI-specific dependencies make their grand entrance. Expand /usr/local/lib/python3.10/site-packages/ on the right. You’ll see the culprits: hefty directories for torch, transformers, numpy, and their friends. This isn’t “bloat” in the sense of being unnecessary (we need these libraries), but their sheer size is a major factor we’ll need to manage.

Figure 3: Dive view of pip install layer, showing the bulk of the bert-classifier-naive image.

COPY Layers

The layers for COPY naive_image/requirements.txt ./requirements.txt, COPY naive_image/app/ ./app/, and COPY naive_image/sample_data/ ./sample_data/ are small in our case (320B, 12.2kB, and 376B, respectively). However, dive would starkly reveal if we’d forgotten a .dockerignore file and accidentally copied in our entire .git history, local virtual environments, or large datasets from these source directories. A COPY . . command, without a vigilant.dockerignore, can be a Trojan horse for bloat.

Figure 4: Dive view of COPY and RUN commands in the Dockerfile, showing files added/modified by them.

Using dive transforms the abstract concept of layers into a tangible, explorable filesystem. It lets us see precisely what each Dockerfile command does, how much space it consumes, and where inefficiencies lie.

Exploring the Code Repository

All the code, including the naive_image Dockerfile and application files we dissected today, is available in the accompanying GitHub repository.

The repository also contains several other directories, such as slim_image, multi_stage_build, layered_image, and distroless_image, which demonstrate different approaches to constructing a leaner container for our BERT application. This provides a perfect sandbox for you to practice your new diagnostic skills. We encourage you to build the images from these other Dockerfiles and run dive on them yourself to see precisely how their structure, size, and composition differ from our naive starting point. It’s an excellent way to solidify your understanding of how Dockerfile changes are reflected in the final image layers.

Your Turn to dive In

Our investigation of bert-classifier-naive has been revealing:

Our image totals 2.54GB.

The Python dependencies for our BERT model (torch, transformers, etc.) account for a massive 1.51GB.

The python:3.10 base image itself contributes hundreds of megabytes of operating system and standard library components.

Even smaller operations, like installing curl without cleaning up package manager caches, add unnecessary weight (our 19.4MB layer contained ~9.5MB of “wasted space”).

We now have a clear map of where the gigabytes reside. This detailed diagnosis is the bedrock upon which all effective optimization is built. With tools like dive, you’re now equipped to dissect your own images and identify these very same patterns. The logical next steps in any optimization journey would naturally involve scrutinizing the foundational choices, such as the base image, and exploring techniques to isolate build-time needs from runtime essentials.

I encourage you to grab dive and point it at one of your own Docker images. What surprises will you find?

Why Is My Docker Image So Big? A Deep Dive with ‘dive’ to Find the Bloat

Key Takeaways

Introduction

The “Why Optimize?” for AI Docker Images

Slower Development Cycles

Related Sponsored Content

Inefficient CI/CD Pipelines

Higher Cloud Costs

A Less “Clean” State

Our Specimen: The Naive BERT Classifier

The Diagnostic Toolkit: Peeling Back the Layers

Exploring the Code Repository

Your Turn to dive In

Leave a Reply Cancel reply

Stay Connected

Latest News

Tablift MaxPro: Phone & Tablet Freedom at Last!

11 Alibaba apps begin development for Huawei HarmonyOS · TechNode

Travis Kelce apologized after punching Chiefs teammate – but Mahomes ‘loved’ it

The best iPad to buy

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Key Takeaways

Introduction

The “Why Optimize?” for AI Docker Images

Slower Development Cycles

Related Sponsored Content

Inefficient CI/CD Pipelines

Higher Cloud Costs

A Less “Clean” State

Our Specimen: The Naive BERT Classifier

The Diagnostic Toolkit: Peeling Back the Layers

Exploring the Code Repository

Your Turn to dive In

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News