During my docker learning journey, while i was going through many blogs, tutorials, conference videos; most of them were mentioning about some of the anti-patterns. I would like to collate all the items which i have been learning and will be learning.
A. Data or Logs in Container.
Containers are ideal for stateless applications and are meant to be ephemeral (only for a short period of time). This means no data or logs should be stored in the container — otherwise, they’ll be lost when the container terminates. Instead, use volume mapping to persist them outside the containers. The ELK stack could be used to store and process logs. If managed volumes are used during early in the testing process, then remove them using a -v switch with the docker rm command.
B. IP Addresses of Containers
Each container is assigned an IP address. Multiple containers communicate with each other to create an application, for example, an application deployed on an application server will need to talk with a database. Existing containers are terminated and new containers are started all the time.
Relying upon the IP address of the container will require constantly updating the application configuration. This will make the application fragile. Instead, create services. This will provide a logical name that can be referred independent of the growing and shrinking number of containers. And it also provides a basic load balancing as well.
C. Run a Single Process in a Container
A Dockerfile uses one CMD and ENTRYPOINT. Often, CMD will use a script that will perform some configuration of the image and then start the container. Don’t try to start multiple processes using that script. It’s important to follow separation of concerns pattern when creating Docker images. This will make managing your containers, collecting logs, and updating each individual process that much harder. You may consider breaking up applications into multiple containers and managing them independently.
D. Don’t Use docker exec
The docker exec command starts a new command in a running container. This is useful for attaching a shell using the docker exec -it {cid} bash. But other than that, the container is already running the process that it’s supposed to be running.
E. Keep Your Image Lean
Create a new directory and include the Dockerfile and other relevant files in that directory. Also consider using .dockerignore to remove any logs, source code, et.c before creating the image. Make sure to remove any downloaded artifacts after they are unzipped. We can also prefer docker multistage builds.
F. Create Images From a Running Container
A new image can be created using the docker commit command. This is useful when any changes in the container have been made. But images created using this are non-reproducible. Instead, make changes in the Dockerfile, terminate existing containers, and start a new container with the updated image.
G. Security Credentials in a Docker Image
Do not store security credentials in the Dockerfile. They are in clear text and checked into a repository. This makes them completely vulnerable. Use -e to specify passwords as runtime environment variable. Alternatively --env-file can be used to read environment variables from a file. Another approach is to used CMD or ENTRYPOINT to specify a script. This script will pull the credentials from a third party and then configure your application.
H. The latest Tag
Tagging the docker image is important. If we aren’t giving any name, it will be assigned to latest. But we can’t assure whether its the latest image or an old image. In an production environment its better to use the tag with the version.
eg: image-name:enterprise-1.0.9
G. Impedance Mismatch
Don’t use different images, or even different tags in the dev, test, staging, and production environment. The image that is the “source of truth” should be created once and pushed to a repo. That image should be used for different environments going forward. In some cases, you may consider running your unit tests on the WAR file as part of maven build and then create the image. But any system integration testing should be done on the image that will be pushed in production.
I. Publishing Ports
Don’t use -P to publish all the exposed ports. This will allow you to run multiple containers and publish their exposed ports. But this also means that all the ports will be published. Instead use -p to publish specific ports.
J. Treating docker container as a virtual machine
We should consider docker as a simple stateless, immutable, short lived box which runs a single process and can be recreated again and again. But developers are asking questions like how to ssh in to a container ?, how do i get logs out from a container ?, how to run multiple programs in a container ?
If you regularly find yourself wanting to open ssh sessions to running containers in order to “upgrade” them or manually get logs/files out of them you are definitely using Docker in the wrong way and you need to do some extra reading on how containers work.
K. Creating docker images with magic folders
FROM alpine:3.4
RUN apk add --no-cache
ca-certificates
pciutils
ruby
ruby-irb
ruby-rdoc
&&
echo http://dl-4.alpinelinux.org/alpine/edge/community/ >> /etc/apk/repositories &&
apk add --no-cache shadow &&
gem install puppet:"5.5.1" facter:"2.5.1" &&
/usr/bin/puppet module install puppetlabs-apk
# Install Java application
RUN /usr/bin/puppet agent --onetime --no-daemonize
ENTRYPOINT ["java","-jar","/app/spring-boot-application.jar"]
Here the Dockerfile, is having the dependency of the puppet tool which is installed in your machine. Consider your machine is having some root access rights. If this image is getting build with puppet running in your machine; its difficult to reproduce the same image again. Consider, puppet is down in your machine, or else there is been an upgrade. So it can’t be a same image today and tommorow. We should create the image which is not dependent on these magic folders.
- Python: pip –no-cache-dir
L. Storing data inside containers
The ephemeral nature of container filesystems means you shouldn’t be writing data within them. Persistent data created by your application’s users, such as uploads and databases, should be stored in Docker volumes or it will be lost when your containers restart.
Other kinds of useful data should avoid writing to the filesystem wherever possible. Stream logs to your container’s output stream, where they can be consumed via the docker logs command, instead of dumping them to a directory which would be lost after a container failure.
Container filesystem writes can also incur a significant performance penalty when modifying existing files. Docker’s use of the “copy-on-write” layering strategy means files that exist in lower filesystem layers are read from that layer, rather than your image’s final layer. If a change is made to the file, Docker must first copy it into the uppermost layer, then apply the change. This process could take several seconds for larger files.
M. Storing zip, tar and other archives
It is generally a bad idea to add an archive (zip, tar.gz or otherwise) to a container image. It is certainly a bad idea if the container unpacks that archive when it starts, because it will waste time and disk space, without providing any gain whatsoever!
It turns out that Docker images are already compressed when they are stored on a registry and when they are pushed to, or pulled from, a registry. This means two things:
- storing compressed files in a container image doesn’t take less space,
- storing uncompressed files in a container image doesn’t use more space. If we include an archive (e.g. a tarball) and decompress it when the container starts:
- we waste time and CPU cycles, compared to a container image where the data would already be uncompressed and ready to use;
- we waste disk space, because we end up storing both the compressed and uncompressed data in the container filesystem;
- if the container runs multiple times, we waste more time, CPU cycles, and disk space each time we run an additional copy of the container. If you notice that a Dockerfile is copying an archive, it is almost always better to uncompress the archive (e.g. using a multi-stage build) and copy the uncompressed files.
N. Using the root user
A fourth common Dockerfile anti-pattern is using the root user to run your application or your commands. This can expose your container to security risks, as any malicious code or user can gain full access to your container and your host system. To avoid this, you should use a non-root user to run your application or your commands, and use the USER directive in your Dockerfile to specify it. You should also use the least privilege principle, and only grant the necessary permissions to your user.
O. Not using BuildKit
BuildKit is a new backend for docker build. It’s a complete rehaul with a ton of new features, including parallel builds, cross-arch builds (e.g. building ARM images on Intel and vice versa), building images in Kubernetes Pods, and much more; while remaining fully compatible with the existing Dockerfile syntax. It’s like switching to a fully electric car: we still drive it with a wheel and two pedals, but internally it is completely different from the old thing.
If you are using a recent version of Docker Desktop, you are probably already using BuildKit, so that’s great. Otherwise (in particular, if you’re on Linux), set the environment variable DOCKER_BUILDKIT=1 and run your docker build or docker-compose command; for instance:
DOCKER_BUILDKIT=1 docker build . --tag test
while comparing time, docker buildkit consumes lesser time.
P. Conflicting names for scripts and images
Avoid to name your scripts in a way that could conflict with other popular programs. Some folks will see it and they will be careful, others might not notice and accidentally run the wrong thing.
This is particularly true with 2-letter commands, because UNIX has so many of them! For instance:
- bc and dc (“build container” and “deploy container” for some folks, but also some relatively common text-mode calculators on UNIX)
- cc (“create container” but also the standard C compiler on UNIX)
- go (conflicts with the Go toolchain)
