tensorflow/tools/tf_sig_build_dockerfiles/README.md
Standard Dockerfiles for TensorFlow builds, used internally at Google.
Maintainer: @angerson (TensorFlow OSS DevInfra; SIG Build)
These docker containers are for building and testing TensorFlow in CI
environments (and for users replicating those CI builds). They are openly
developed in TF SIG Build, verified by Google developers, and published to
tensorflow/build on Docker Hub.
The TensorFlow OSS DevInfra team uses these containers for most of our
Linux-based CI, including tf-nightly tests and Pip packages and TF release
packages for TensorFlow 2.9 onwards.
These Dockerfiles are built and deployed to Docker Hub via Github Actions.
The tags are defined as such:
latest tags are kept up-to-date to build TensorFlow's master branch.version number tags target the corresponding TensorFlow version. We
continuously build the current-tensorflow-version + 1 tag, so when a new
TensorFlow branch is cut, that Dockerfile is frozen to support that branch.For simple changes, you can adjust the source files and then make a PR. Send it to @angerson for review. We have presubmits that will make sure your change still builds a container. After approval and submission, our GitHub Actions workflow deploys the containers to Docker Hub.
devel.requirements.txtdevel.packages.txtbazel build works, look at devel.usertools/*.bazelrc.To rebuild the containers locally after making changes, use this command from this directory:
DOCKER_BUILDKIT=1 docker build \
--build-arg PYTHON_VERSION=python3.10 --target=devel -t my-tf-devel .
It will take a long time to build devtoolset and install CUDA packages. After
it's done, you can use the commands below to test your changes. Just replace
tensorflow/build:latest-python3.10 with my-tf-devel to use your image
instead.
TensorFlow team members (i.e. Google employees) can apply a Build and deploy to gcr.io for staging tag to their PRs to the Dockerfiles, as long as the PR
is being developed on a branch of this repository, not a fork. Unfortunately
this is not available for non-Googler contributors for security reasons.
The TensorFlow DevInfra team runs a daily test suite that builds tf-nightly
and runs a bazel test suite on both the Pip package (the "pip" tests) and
on the source code itself (the "nonpip" tests). These test scripts are often
referred to as "The Nightly Tests" and can be a common reason for a TF PR to be
reverted. The build scripts aren't visible to external users, but they use
the configuration files which are included in these containers. Our test suites,
which include the build of tf-nightly, are easy to replicate with these
containers, and here is how you can do it.
Presubmits are not using these containers... yet.
Here are some important notes to keep in mind:
The Ubuntu CI jobs that build the tf-nightly package build at the GitHub
nightly tag. You can see the specific commit of a tf-nightly package on
pypi.org in tf.version.GIT_VERSION, which will look something like
v1.12.1-67282-g251085598b7. The final section, g251085598b7, is a short
git hash.
If you interrupt a docker exec command with ctrl-c, you will get your
shell back but the command will continue to run. You cannot reattach to it,
but you can kill it with docker kill tf (or docker kill the-container-name).
This will destroy your container but will not harm your work since it's
mounted. If you have any suggestions for handling this better, let us know.
Now let's build tf-nightly.
Set up your directories:
/tmp/tensorflow/tmp/packages/tmp/bazelcacheChoose the Docker container to use from Docker
Hub. The options for the
master branch are:
tensorflow/build:latest-python3.12tensorflow/build:latest-python3.11tensorflow/build:latest-python3.10For this example we'll use tensorflow/build:latest-python3.10.
Pull the container you decided to use.
docker pull tensorflow/build:latest-python3.10
Start a backgrounded Docker container with the three folders mounted.
/tf/tensorflow./tf/pkg./tf/cache. You don't need /tf/cache if
you're going to use the remote cache.Here are the arguments we're using:
--name tf: Names the container tf so we can refer to it later.-w /tf/tensorflow: All commands run in the /tf/tensorflow directory,
where the TF source code is.-it: Makes the container interactive for running commands-d: Makes the container start in the background, so we can send
commands to it instead of running commands from inside.And -v is for mounting directories into the container.
docker run --name tf -w /tf/tensorflow -it -d \
--env TF_PYTHON_VERSION=3.10 \
-v "/tmp/packages:/tf/pkg" \
-v "/tmp/tensorflow:/tf/tensorflow" \
-v "/tmp/bazelcache:/tf/cache" \
tensorflow/build:latest-python3.10 \
bash
Note: if you wish to use your own Google Cloud Platform credentials for
e.g. RBE, you may also wish to set -v $HOME/.config/gcloud:/root/.config/gcloud to make your credentials
available to bazel. You don't need to do this unless you know what you're
doing.
Now you can continue on to any of:
tf-nightly and then (optionally) run a test suite on the pip package
(the "pip" suite)tf-nightly and run Pip testsApply the update_version.py script that changes the TensorFlow version to
X.Y.Z.devYYYYMMDD. This is used for tf-nightly on PyPI and is technically
optional.
docker exec tf python3 tensorflow/tools/ci_build/update_version.py --nightly
Build TensorFlow by following the instructions under one of the collapsed sections below. You can build both CPU and GPU packages without a GPU. TF DevInfra's remote cache is better for building TF only once, but if you build over and over, it will probably be better in the long run to use a local cache. We're not sure about which is best for most users, so let us know on Gitter.
This step will take a long time, since you're building TensorFlow. GPU takes much longer to build. Choose one and click on the arrow to expand the commands:
<details><summary>TF Nightly CPU - Remote Cache</summary>Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
build --config=sigbuild_remote_cache \
tensorflow/tools/pip_package:build_pip_package
And then construct the pip package:
docker exec tf \
./bazel-bin/tensorflow/tools/pip_package/build_pip_package \
/tf/pkg \
--cpu \
--nightly_flag
Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
build --config=sigbuild_remote_cache \
tensorflow/tools/pip_package:build_pip_package
And then construct the pip package:
docker exec tf \
./bazel-bin/tensorflow/tools/pip_package/build_pip_package \
/tf/pkg \
--nightly_flag
Make sure you have a directory mounted to the container in /tf/cache!
Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
build --config=sigbuild_local_cache \
tensorflow/tools/pip_package:build_pip_package
And then construct the pip package:
docker exec tf \
./bazel-bin/tensorflow/tools/pip_package/build_pip_package \
/tf/pkg \
--cpu \
--nightly_flag
Make sure you have a directory mounted to the container in /tf/cache!
Build the sources with Bazel:
docker exec tf \
bazel --bazelrc=/usertools/gpu.bazelrc \
build --config=sigbuild_local_cache \
tensorflow/tools/pip_package:build_pip_package
And then construct the pip package:
docker exec tf \
./bazel-bin/tensorflow/tools/pip_package/build_pip_package \
/tf/pkg \
--nightly_flag
Run the helper script that checks for manylinux compliance, renames the wheels, and then checks the size of the packages.
docker exec tf /usertools/rename_and_verify_wheels.sh
Take a look at the new wheel packages you built! They may be owned by root
because of how Docker volume permissions work.
ls -al /tmp/packages
To continue on to running the Pip tests, create a venv and install the testing packages:
docker exec tf /usertools/setup_venv_test.sh bazel_pip "/tf/pkg/tf_nightly*.whl"
And now run the tests depending on your target platform: --config=pip
includes the same test suite that is run by the DevInfra team every night.
If you want to run a specific test instead of the whole suite, pass
--config=pip_venv instead, and then set the target on the command like
normal.
Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
test --config=sigbuild_remote_cache \
--config=pip
Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
test --config=sigbuild_remote_cache \
--config=pip
Make sure you have a directory mounted to the container in /tf/cache!
Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
test --config=sigbuild_local_cache \
--config=pip
Make sure you have a directory mounted to the container in /tf/cache!
Build the sources with Bazel:
docker exec tf \
bazel --bazelrc=/usertools/gpu.bazelrc \
test --config=sigbuild_local_cache \
--config=pip
Run the tests depending on your target platform. --config=nonpip includes
the same test suite that is run by the DevInfra team every night. If you
want to run a specific test instead of the whole suite, you do not need
--config=nonpip at all; just set the target on the command line like usual.
Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
test --config=sigbuild_remote_cache \
--config=nonpip
Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
test --config=sigbuild_remote_cache \
--config=nonpip
Make sure you have a directory mounted to the container in /tf/cache!
Build the sources with Bazel:
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
test --config=sigbuild_local_cache \
--config=nonpip
Make sure you have a directory mounted to the container in /tf/cache!
Build the sources with Bazel:
docker exec tf \
bazel --bazelrc=/usertools/gpu.bazelrc \
test --config=sigbuild_local_cache \
--config=nonpip
Run the tests depending on your target platform.
--config=libtensorflow_test includes the same test suite that is run by
the DevInfra team every night. If you want to run a specific test instead of
the whole suite, just set the target on the command line like usual.
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
test --config=sigbuild_remote_cache \
--config=libtensorflow_test
docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
test --config=sigbuild_remote_cache \
--config=libtensorflow_test
Make sure you have a directory mounted to the container in /tf/cache!
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
test --config=sigbuild_local_cache \
--config=libtensorflow_test
Make sure you have a directory mounted to the container in /tf/cache!
docker exec tf \
bazel --bazelrc=/usertools/gpu.bazelrc \
test --config=sigbuild_local_cache \
--config=libtensorflow_test
Build the libtensorflow packages.
<details><summary>TF Nightly CPU - Remote Cache</summary>docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
build --config=sigbuild_remote_cache \
--config=libtensorflow_build
docker exec tf bazel --bazelrc=/usertools/gpu.bazelrc \
build --config=sigbuild_remote_cache \
--config=libtensorflow_build
Make sure you have a directory mounted to the container in /tf/cache!
docker exec tf bazel --bazelrc=/usertools/cpu.bazelrc \
build --config=sigbuild_local_cache \
--config=libtensorflow_build
Make sure you have a directory mounted to the container in /tf/cache!
docker exec tf \
bazel --bazelrc=/usertools/gpu.bazelrc \
build --config=sigbuild_local_cache \
--config=libtensorflow_build
Run the repack_libtensorflow.sh utility to repack and rename the archives.
docker exec tf /usertools/repack_libtensorflow.sh /tf/pkg "-cpu-linux-x86_64"
docker exec tf /usertools/repack_libtensorflow.sh /tf/pkg "-gpu-linux-x86_64"
Every night the TensorFlow team runs code_check_full, which contains a
suite of checks that were gradually introduced over TensorFlow's lifetime
to prevent certain unsable code states. This check has supplanted the old
"sanity" or "ci_sanity" checks.
docker exec tf bats /usertools/code_check_full.bats --timing --formatter junit
Shut down and remove the container when you are finished.
docker stop tf
docker rm tf