dev/breeze/doc/adr/0009-exclude-all-files-from-dockerignore-by-default.md
Table of Contents generated with DocToc
<!-- END doctoc generated TOC please keep comment here to allow auto update -->Date: 2022-02-13
Accepted
Building Docker container always starts with sending the build context first - depending on a number of files in the context. The context might have even many hundreds of MB, which - depending on where your docker builder is and how fast your system is - might lead to even tens of seconds of delays before the docker build command is run and the actual build starts.
The context has to be compressed, sent, decompressed, so it takes CPU, networking and I/O.
Airflow - unfortunately has Dockerfiles, and sources directly in the top-level of the project. There is no "src" folder and by default the docker commands use the folder where the "Dockerfile" is placed and the context files cannot be taken from outside the context. Thus - Dockerfiles have to be put at the "top-level" of the airflow project.
By default, all files in the current context should be sent
as context unless you ignore them via .dockerfile - in a
concept that is similar to .gitignore. Airflow has many
files that are huge and generated during running and building
(for example node_modules) but also .egginfo and plenty of
other files in many folders.
It sounds like a reasonable approach to do to ignore specific folders, however it has one drawback. You might simply not realise that some newly generated files have been added and increase the context - thus increase the overhead needed to build the docker images. There is no way to prevent or check such accidental additions - for example when refactoring files, or adding new functionalities.
There are a number of strategies that can address the problem, ranging by convention and automated checks but in our case, we have multiple independent contributors and committers reviewing the code, such a change might easily slip-through. So the solution should be "self-managing". Luckily, there is a way that has been discussed in a number of places but notably here
The strategy involves ignoring all files by default and only selectively excluding certain folders and patterns that should be allowed to be part of the context.
The .dockerignore file has
appropriate functionality
This is what we decided to use for our Dockerfile and Dockerfile.ci.
There are two consequences of this decision:
.dockerignore what to do
in this case, and .dockerignore is the place where you would
look for a problem anyway. The users should be guided to add
new pattern to the .dockerignore in this case.