Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/rework dockerfile feedback #292

Conversation

ruffsl
Copy link
Contributor

@ruffsl ruffsl commented May 29, 2024

Some feedback on the Dockerfile changes.
Would be nice to decouple dependency declaration from compilation scripts.
Also, it seem that Cmake is going wild and touching the network interface for external resources at build time.

ruffsl added 13 commits May 28, 2024 19:09
internet should not be required when compiling
as apt does not have a stable CLI
and thus unsuitable for scripting
- https://unix.stackexchange.com/a/590703/213124
to avoid needless busting build cache of unrelated stages
to flush out all non-determinism
given offline caching is impractical otherwise
For both docker and github actions.
This should:
- auto open a PR when new nvidia tags are available
- as well as keep Github actions versions up to date
to robustly ignore randomly named IDE files and temp folders

# Copy rest of source tree
COPY . .
RUN --mount=type=bind,from=optix,target=${OptiX_INSTALL_DIR} \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not require an internet connection at build time?

#14 [builder 3/4] RUN --mount=type=bind,from=optix,target=/optix     ./setup.py
#14 0.443 -- The C compiler identification is GNU 11.4.0
#14 0.474 -- The CXX compiler identification is GNU 11.4.0
#14 0.845 -- The CUDA compiler identification is NVIDIA 11.7.99
#14 0.851 -- Detecting C compiler ABI info
#14 0.885 -- Detecting C compiler ABI info - done
#14 0.888 -- Check for working C compiler: /usr/bin/cc - skipped
#14 0.888 -- Detecting C compile features
#14 0.888 -- Detecting C compile features - done
#14 0.890 -- Detecting CXX compiler ABI info
#14 0.939 -- Detecting CXX compiler ABI info - done
#14 0.942 -- Check for working CXX compiler: /usr/bin/c++ - skipped
#14 0.942 -- Detecting CXX compile features
#14 0.943 -- Detecting CXX compile features - done
#14 0.945 -- Detecting CUDA compiler ABI info
#14 1.322 -- Detecting CUDA compiler ABI info - done
#14 1.335 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
#14 1.336 -- Detecting CUDA compile features
#14 1.336 -- Detecting CUDA compile features - done
#14 1.505 [ 11%] Creating directories for 'spdlog-populate'
#14 1.505 [ 22%] Performing download step (git clone) for 'spdlog-populate'
#14 1.505 Cloning into 'spdlog-src'...
#14 1.505 fatal: unable to access 'https://github.com/gabime/spdlog.git/': Could not resolve host: github.com
#14 1.505 Cloning into 'spdlog-src'...
#14 1.505 fatal: unable to access 'https://github.com/gabime/spdlog.git/': Could not resolve host: github.com
#14 1.505 Cloning into 'spdlog-src'...
#14 1.505 fatal: unable to access 'https://github.com/gabime/spdlog.git/': Could not resolve host: github.com
#14 1.505 -- Had to git clone more than once:
#14 1.505           3 times.
#14 1.505 CMake Error at spdlog-subbuild/spdlog-populate-prefix/tmp/spdlog-populate-gitclone.cmake:31 (message):
#14 1.505   Failed to clone repository: 'https://github.com/gabime/spdlog.git'
#14 1.505 
#14 1.505 
#14 1.505 gmake[2]: *** [CMakeFiles/spdlog-populate.dir/build.make:102: spdlog-populate-prefix/src/spdlog-populate-stamp/spdlog-populate-download] Error 1
#14 1.505 gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/spdlog-populate.dir/all] Error 2
#14 1.505 gmake: *** [Makefile:91: all] Error 2
#14 1.505 
#14 1.505 CMake Error at /usr/share/cmake-3.22/Modules/FetchContent.cmake:1087 (message):
#14 1.505   Build step for spdlog failed: 2
#14 1.505 Call Stack (most recent call first):
#14 1.505   /usr/share/cmake-3.22/Modules/FetchContent.cmake:1216:EVAL:2 (__FetchContent_directPopulate)
#14 1.505   /usr/share/cmake-3.22/Modules/FetchContent.cmake:1216 (cmake_language)
#14 1.505   /usr/share/cmake-3.22/Modules/FetchContent.cmake:1259 (FetchContent_Populate)
#14 1.505   external/CMakeLists.txt:9 (FetchContent_MakeAvailable)
#14 1.505 
#14 1.505 
#14 1.506 -- Configuring incomplete, errors occurred!
#14 1.506 See also "/opt/rgl/build/CMakeFiles/CMakeOutput.log".
#14 1.511 Found CUDA 11.7.99
#14 1.511 Executing command: 'cmake -B build -G 'Unix Makefiles' -DCMAKE_TOOLCHAIN_FILE= -DVCPKG_TARGET_TRIPLET= -DRGL_BUILD_PCL_EXTENSION=OFF -DRGL_BUILD_ROS2_EXTENSION=OFF -DRGL_BUILD_UDP_EXTENSION=OFF -DRGL_BUILD_SNOW_EXTENSION=OFF -DCMAKE_SHARED_LINKER_FLAGS="-Wl,-rpath=\$ORIGIN" -DRGL_BUILD_TAPED_TESTS=OFF '
#14 1.511 Traceback (most recent call last):
#14 1.511   File "/opt/rgl/./setup.py", line 361, in <module>
#14 1.511     sys.exit(main())
#14 1.511   File "/opt/rgl/./setup.py", line 175, in main
#14 1.512     run_subprocess_command(f"cmake -B {args.build_dir} -G {cfg.CMAKE_GENERATOR} {cmake_args}")
#14 1.512   File "/opt/rgl/./setup.py", line 211, in run_subprocess_command
#14 1.512     raise RuntimeError(f"Failed to execute command: '{command}'")
#14 1.512 RuntimeError: Failed to execute command: 'cmake -B build -G 'Unix Makefiles' -DCMAKE_TOOLCHAIN_FILE= -DVCPKG_TARGET_TRIPLET= -DRGL_BUILD_PCL_EXTENSION=OFF -DRGL_BUILD_ROS2_EXTENSION=OFF -DRGL_BUILD_UDP_EXTENSION=OFF -DRGL_BUILD_SNOW_EXTENSION=OFF -DCMAKE_SHARED_LINKER_FLAGS="-Wl,-rpath=\$ORIGIN" -DRGL_BUILD_TAPED_TESTS=OFF '
#14 ERROR: process "/bin/sh -c ./setup.py" did not complete successfully: exit code: 1
------
 > [builder 3/4] RUN --mount=type=bind,from=optix,target=/optix     ./setup.py:
1.511 Found CUDA 11.7.99
1.511 Executing command: 'cmake -B build -G 'Unix Makefiles' -DCMAKE_TOOLCHAIN_FILE= -DVCPKG_TARGET_TRIPLET= -DRGL_BUILD_PCL_EXTENSION=OFF -DRGL_BUILD_ROS2_EXTENSION=OFF -DRGL_BUILD_UDP_EXTENSION=OFF -DRGL_BUILD_SNOW_EXTENSION=OFF -DCMAKE_SHARED_LINKER_FLAGS="-Wl,-rpath=\$ORIGIN" -DRGL_BUILD_TAPED_TESTS=OFF '
1.511 Traceback (most recent call last):
1.511   File "/opt/rgl/./setup.py", line 361, in <module>
1.511     sys.exit(main())
1.511   File "/opt/rgl/./setup.py", line 175, in main
1.512     run_subprocess_command(f"cmake -B {args.build_dir} -G {cfg.CMAKE_GENERATOR} {cmake_args}")
1.512   File "/opt/rgl/./setup.py", line 211, in run_subprocess_command
1.512     raise RuntimeError(f"Failed to execute command: '{command}'")
1.512 RuntimeError: Failed to execute command: 'cmake -B build -G 'Unix Makefiles' -DCMAKE_TOOLCHAIN_FILE= -DVCPKG_TARGET_TRIPLET= -DRGL_BUILD_PCL_EXTENSION=OFF -DRGL_BUILD_ROS2_EXTENSION=OFF -DRGL_BUILD_UDP_EXTENSION=OFF -DRGL_BUILD_SNOW_EXTENSION=OFF -DCMAKE_SHARED_LINKER_FLAGS="-Wl,-rpath=\$ORIGIN" -DRGL_BUILD_TAPED_TESTS=OFF '
------
Dockerfile:48
--------------------
  47 |     COPY . .
  48 | >>> RUN --mount=type=bind,from=optix,target=${OptiX_INSTALL_DIR} \
  49 | >>>     ./setup.py
  50 |     
--------------------

This is a best practice for bolstering deterministic builds, that many other projects try to follow:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible to configure a build environment that does not require an internet connection at build time. We can stop using FetchContent and clone dependent projects with python script at --install-deps option.
We see the benefits of this practice and would like to follow it as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably, this arg could call a separate script file, that Docker could then COPY and invoke directly instead of this massive setup.py file, without the need to COPY any other miscellaneous sources that could otherwise change from tasks unrelated to dependency updates or dependency management.

FROM rgl-core AS build
# install dependencies while caching apt downloads
# RUN --mount=type=cache,sharing=locked,target=/var/cache/apt \
# ./setup.py --install-deps-only
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This --install-deps-only arg obviously doesn't exist, but it would be nice if it did.

Even better would be splitting up the setup and compilation scripts, such that changes in the compilation script wouldn't ivalidate the build cache of consecutive layers that could install GB of dependencies.

Or- a package manifest such as a package.xml that one could point rosdep at to decouple the dependency lock files from turing complete setup scripts would be even cooler!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thank you for the suggestion.
I will try to do it.

@msz-rai
Copy link
Collaborator

msz-rai commented May 29, 2024

Thank you for your feedback. Your changes and suggestions are valuable and significantly enhance RGL’s Docker-based pipeline.

If you do not mind, we would like to merge your PR first and work on your suggestions on top of your changes.

@ruffsl ruffsl marked this pull request as ready for review May 29, 2024 13:08
@ruffsl
Copy link
Contributor Author

ruffsl commented May 29, 2024

If you do not mind, we would like to merge your PR first and work on your suggestions on top of your changes.

Sure, feel free to merge.

@msz-rai msz-rai merged commit 830130f into RobotecAI:feature/rework-dockerfile May 29, 2024
@ruffsl ruffsl deleted the feature/rework-dockerfile-feedback branch May 29, 2024 13:18
!/.clang-format
!/.githooks
!/CMakeLists.txt
!/Dockerfile
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msz-rai , I forgot to remove this line. The Dockerfile can of course be ignored, so that changes it itself don't break the build cache after the COPY . . directive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants