in

Demystifying Dockerfiles: A Beginner‘s Guide to Building Custom Images

default image

As a developer or DevOps engineer getting started with Docker, one of the first things you‘ll need to learn is how to build custom images using Dockerfiles. Mastering this skill is key to easily containerizing your applications and integrating Docker into your workflows.

In this beginner‘s guide, I‘ll walk you through what Dockerfiles are, why they‘re important, and best practices for writing effective Dockerfiles to build optimized images. I‘ll share plenty of real-world examples so you can quickly get up and running with Docker image building. Let‘s dive in!

What Exactly is a Dockerfile?

A Dockerfile is a simple text file that contains a set of instructions and commands that are executed in order to assemble a Docker image. It allows you to automate and standardize the image building process, ensuring you can reliably recreate images with consistent settings and configuration.

Docker reads the Dockerfile instructions, executes them, and commits the results into new layers which build up the image. This allows you to version and share the process for creating an image, much like sharing the source code for an application.

Dockerfile layers diagram

Each instruction in a Dockerfile adds a new layer to the image. Source: Docker

Some key advantages of using Dockerfiles include:

  • Automation – Codifies and automates image creation.
  • Consistency – Allows reliable image recreation.
  • Maintenance – Easily modify and rebuild images as needed.
  • Documentation – Dockerfiles serve as documentation for how an image was built.
  • Component Reuse – Reuse instructions to build multiple images.
  • Control – Full control over image contents without added bloat.

Dockerfile usage has exploded as companies adopt Docker – over 5 million Dockerfiles have been scanned on Docker Hub to date.

Year Dockerfiles Scanned Growth
2015 18,000
2017 1,800,000 +9,900%
2019 5,000,000 +177%

Dockerfile growth over the years. Source: Docker

Clearly Dockerfile adoption is surging – so let‘s look at how they work under the hood.

Dockerfile Basics: Instructions, Syntax, and Usage

The syntax for a Dockerfile is simple:

# Comments start with pound sign

INSTRUCTION argument1 argument2

Each instruction adds a new layer to the image. Here are some of the most common Dockerfile instructions:

FROM

Sets the base image to build upon:

FROM ubuntu:latest

This pulls the latest Ubuntu image to use as the starting point.

RUN

Used to execute shell commands during the build process:

RUN apt-get update && apt-get install -y git

This will update apt repositories and install git.

COPY

Copy files from host into the image:

COPY myapp.js /usr/src/app/

This adds the myapp.js file into the container filesystem.

EXPOSE

Exposes a port for the container:

EXPOSE 3000

This exposes port 3000 from the container to the host.

There are many more instructions like CMD, ENV, WORKDIR, and ENTRYPOINT – we‘ll come back to those later.

Building Images from Dockerfiles

Once you‘ve created a Dockerfile, you can build the corresponding image using the docker build command:

docker build -t myimage .

This will run through each instruction in the Dockerfile and create an image named myimage.

You can then run the image as a container:

docker run -d -p 80:3000 myimage

This starts the container, mapping port 80 on your host to port 3000 inside the container.

Real-World Dockerfile Examples

To make this concrete, let‘s look at some real-world Dockerfiles from popular open source projects:

Node.js

Node.js offers official Docker images on GitHub with different variants based on operating system:

FROM alpine:3.11

RUN addgroup -g 1000 node \
    && adduser -u 1000 -G node -s /bin/sh -D node

# install node
RUN apk add --no-cache libstdc++ libuv libcrypto1.1 libssl1.1
RUN apk add --no-cache curl make gcc g++ python linux-headers paxctl libgcc libstdc++ binutils-gold gnupg libcap xz 
RUN curl -fsSL https://deb.nodesource.com/setup_10.x | bash -
RUN apt-get install -y nodejs

COPY docker-entrypoint.sh /usr/local/bin/ 

ENTRYPOINT ["docker-entrypoint.sh"]
CMD [ "node" ]  

It uses Alpine as a small base image, creates a node user, installs dependencies and Node.js itself, then copies over an entrypoint script and sets the default command.

This allows running the container like: docker run node myapp.js

MongoDB

The official MongoDB Dockerfile is another good example:

FROM debian:jessie-slim

RUN groupadd -r mongodb && useradd -r -g mongodb mongodb

RUN set -x \
  && apt-get update \
  && apt-get install -y --no-install-recommends \
        ca-certificates \
        jq \
        numactl \
    && rm -rf /var/lib/apt/lists/*

# grab gosu for easy step-down from root
RUN set -x \
  && apt-get update && apt-get install -y --no-install-recommends ca-certificates wget && rm -rf /var/lib/apt/lists/* \
  && wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$(dpkg --print-architecture)" \
  && wget -O /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$(dpkg --print-architecture).asc" \
  && export GNUPGHOME="$(mktemp -d)" \
  && gpg --keyserver ha.pool.sks-keyservers.net --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 \
  && gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu \
  && rm -rf "$GNUPGHOME" /usr/local/bin/gosu.asc \
  && chmod +x /usr/local/bin/gosu \
  && gosu nobody true

ENV GOSU_VERSION 1.10

RUN mkdir /docker-entrypoint-initdb.d

# SSH login fix. Otherwise user is kicked off after login
RUN sed ‘s@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g‘ -i /etc/pam.d/sshd

ENV MONGO_MAJOR 3.4
ENV MONGO_VERSION 3.4.9
ENV MONGO_PACKAGE mongodb-org

RUN echo "deb http://repo.mongodb.org/apt/debian jessie/mongodb-org/$MONGO_MAJOR main" > /etc/apt/sources.list.d/mongodb-org.list

RUN set -x \
  && apt-get update \
  && apt-get install -y \
        ${MONGO_PACKAGE}=$MONGO_VERSION \
        ${MONGO_PACKAGE}-server=$MONGO_VERSION \
        ${MONGO_PACKAGE}-shell=$MONGO_VERSION \
        ${MONGO_PACKAGE}-mongos=$MONGO_VERSION \
        ${MONGO_PACKAGE}-tools=$MONGO_VERSION \
  && rm -rf /var/lib/apt/lists/* \
  && rm -rf /var/lib/mongodb \
  && mv /etc/mongod.conf /etc/mongod.conf.orig

RUN mkdir -p /data/db /data/configdb \
  && chown -R mongodb:mongodb /data/db /data/configdb

VOLUME /data/db /data/configdb

COPY docker-entrypoint.sh /usr/local/bin/
RUN ln -s usr/local/bin/docker-entrypoint.sh /entrypoint.sh # backwards compat
ENTRYPOINT ["docker-entrypoint.sh"]

EXPOSE 27017
CMD ["mongod"]  

It starts by setting up users and dependencies, installs MongoDB, exposes the port, sets volumes, and defines the entrypoint. This Dockerfile shows more advanced usage with a custom entrypoint script.

Dockerfile Best Practices

Now that you‘ve seen some examples, let‘s go over some best practices for writing effective Dockerfiles.

Minimal Layers

Where possible, combine instructions like RUN to minimize image layers. For example:

# Bad
RUN apt-get update
RUN apt-get install git
RUN apt-get clean

# Good
RUN apt-get update && apt-get install git && apt-get clean

Fewer layers results in faster docker build times and reduced complexity.

Use Small Base Images

Start with small base images like Alpine Linux to keep image size down:

FROM alpine:3.7  

Alpine is only ~5MB but still includes Python, git, SSL – perfect for containers.

Leverage Build Cache

Place commands least likely to change towards top to leverage Docker‘s build cache:

# Install dependencies first
RUN apt-get update && apt-get install -y \ 
  git \
  curl \
  nginx

# Then add application code  
ADD . /code

Avoid Unnecessary Packages

Only install packages needed for your application to reduce attack surface and dependencies:

# Bad
FROM ubuntu:latest
RUN apt-get update && apt-get install -y python ruby gcc mysql

# Good 
FROM python:3.6-alpine
RUN pip install mysqlclient
CMD ["app.py"]

Use COPY Over ADD

COPY is preferred since it‘s more transparent. Use ADD only when needing auto-extraction or remote URL handling.

Utilize Entrypoint Scripts

Use entrypoint scripts to provide initialization logic like database migrations when container starts:

COPY docker-entrypoint.sh /
RUN chmod +x /docker-entrypoint.sh
ENTRYPOINT ["/docker-entrypoint.sh"]

Set Non-Root Users

By default, containers run as root which can be a security concern. Set a non-root user via USER:

RUN groupadd -r myapp && useradd -r -g myapp myapp
USER myapp 

Tag Images Correctly

Properly tag images using name:version for easier tracking:

docker build -t myapp:1.3 .

This tags the image as myapp with tag 1.3.

Chain Commands for Readability

Use \ to chain commands over multiple lines for readability:

RUN apt-get update && \
    apt-get install -y \ 
        git \
        curl \
    && \
    apt-get clean

Real-World Tips and Tricks

Here are some useful Dockerfile tips I‘ve picked up over the years:

  • Install dependencies before copying application code to maximize cache usage
  • Use multi-stage builds to keep intermediate artifacts out of the final image
  • Exclude files not needed in the final image with .dockerignore
  • Use ENV to set environment variables and keep your configs portable
  • Leverage build arguments (ARG) for values you want to customize at build time
  • Follow the Dockerfile reference for edge syntax like healthchecks and platform builds
  • Abstract common instructions into reusable base images for standardization
  • Use an automated linter like hadolint to catch errors and suggest best practices

Alternative Approaches

While Dockerfiles are the core way to define Docker images, there are alternatives that fill other niches:

Docker Compose is great for local development and CI environments where you want to spin up multi-container applications. It lets you reuse images built with Dockerfiles and networking/volumes are handled for you.

Packer can complement Docker by building machine images for platforms like VMware, AWS, GCP, and Azure. So you can build a Docker image with Packer then publish it to downstream platforms.

Bazel builds Docker images using targets and dependencies similar to traditional build systems. It can scale to very complex, multi-language repositories.

So Dockerfiles are perfect for building standalone images, while tools like Compose help manage multi-container apps in dev and CI. The ecosystem offers many options to suit different needs.

Go Forth and Dockerize!

I hope this guide has demystified Dockerfiles for you and empowered you to start building custom images. They truly are the gateway into integrating Docker into your workflows.

With Dockerfiles, you can codify and automate your image generation process, avoiding the pitfalls of managing containers manually. Your images become reproducible build artifacts you can easily share and run anywhere.

Now go forth, and Dockerize all the things! Your containers await…

Written by