Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

created a new build_docker_remote.sh script for building the docker image remotely with google cloud build #7951

Merged
merged 3 commits into from
Jul 25, 2022

Conversation

jamesemery
Copy link
Collaborator

Fixes #7949

@codecov
Copy link

codecov bot commented Jul 19, 2022

Codecov Report

Merging #7951 (3bcf33a) into master (ace7821) will decrease coverage by 0.025%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #7951       +/-   ##
===============================================
- Coverage     87.089%   87.064%   -0.025%     
- Complexity     36993     37014       +21     
===============================================
  Files           2217      2218        +1     
  Lines         173847    173760       -87     
  Branches       18791     18769       -22     
===============================================
- Hits          151402    151282      -120     
- Misses         15816     15892       +76     
+ Partials        6629      6586       -43     
Impacted Files Coverage Δ
...er/tools/spark/sv/evidence/BreakpointEvidence.java 66.822% <0.000%> (-12.383%) ⬇️
...ute/hellbender/utils/reference/ReferenceUtils.java 73.529% <0.000%> (-5.418%) ⬇️
...on/FindBreakpointEvidenceSparkIntegrationTest.java 96.296% <0.000%> (-3.704%) ⬇️
...rg/broadinstitute/hellbender/utils/SVInterval.java 90.000% <0.000%> (-2.857%) ⬇️
...ute/hellbender/tools/spark/utils/IntHistogram.java 86.335% <0.000%> (-2.484%) ⬇️
.../sv/integration/SVIntegrationTestDataProvider.java 93.333% <0.000%> (-0.784%) ⬇️
...nder/utils/runtime/StreamingProcessController.java 66.818% <0.000%> (-0.455%) ⬇️
...tructuralVariationDiscoveryArgumentCollection.java 94.595% <0.000%> (-0.142%) ⬇️
...nstitute/hellbender/utils/haplotype/Haplotype.java 90.244% <0.000%> (ø)
...itute/hellbender/engine/spark/GATKRegistrator.java 100.000% <0.000%> (ø)
... and 18 more

build_docker_remote.sh Outdated Show resolved Hide resolved

## generate the tag if it wasn't explicitly specified
if [ -z "$DOCKER_IMAGE_TAG" ]; then
DOCKER_IMAGE_TAG=${GCR_REPO}:$(whoami)-${GITHUB_TAG}-${GIT_HASH_FOR_TAG}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a Github tag is used this ends up with a name ending in tag-tag. For example: us.gcr.io/broad-dsde-methods/broad-gatk-snapshots/gatk-remote-builds:mshand-4.2.6.1-4.2.6.1. I don't think that's a problem, but just fyi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I decided against going down the rabbit hole of trying too hard to make a reasonable image name.... If you pull a specific commit then there isn't necessarily a branch associated and I don't think this is super high priority to fix given that most ofthe time people should give their own -T tags

``build_docker.sh`` is a script to create the full gatk4 docker image.
``build_docker_remote.sh`` is a script to create the full gatk4 docker image using google cloud build remotely. NOTE: this requires the user first specify their project with the command `gcloud config set project <PROJECT>` to a project that has access to google cloud build.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth mentioning somewhere that one potential motivation for using the remote version is if you have M1 hardware?

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Back to you with my comments


REPO=broadinstitute
PROJECT=gatk
REPO_PRJ=${REPO}/${PROJECT}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete REPO_PRJ -- it is a reference to the dockerhub repo, which we can't push to in this script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False, its used farther down git clone https://github.com/${REPO}/${PROJECT}.git when we do the git clone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not -- that's $REPO and $PROJECT. $REPO_PRJ is unused, and in the original script it's used for the dockerhub repo.

build_docker_remote.sh Outdated Show resolved Hide resolved
build_docker_remote.sh Outdated Show resolved Hide resolved
build_docker_remote.sh Outdated Show resolved Hide resolved
build_docker_remote.sh Outdated Show resolved Hide resolved

SUBMIT_COMMAND="gcloud builds submit --tag ${DOCKER_IMAGE_TAG} --timeout=24h --machine-type n1_highcpu_8"
echo "running the following gcloud command: ${SUBMIT_COMMAND}"
echo -n "" >> .gcloudignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this line do? Can you add a comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its from the Carrot version of this code. Without an explicit gcloudignore set it blacklists the .git directory which causes our script to crash since we need to download the lfs files from the intermediate docker image. I've added a comment


cd ${ORIGINAL_WORKING_DIRECTORY}
if [ -n "$STAGING_DIR" ] ; then
rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the clone actually may get put into ${STAGING_DIR}/${STAGING_DIR}/${STAGING_CLONE_DIR} rather than ${STAGING_DIR}/${STAGING_CLONE_DIR}. Can you double-check this and make sure the script is cleaning up properly?

build_docker_remote.sh Show resolved Hide resolved
scripts/docker/README.md Outdated Show resolved Hide resolved
export GITHUB_TAG=4.2.6.1

# REPLACE with the directory where you'd like to clone the repo
export STAGING_DIR=~/tmp/tmp_build_docker_image/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't actually have to be exported, since you're passing them into the script directly below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm simply copying the syntax of the other dozen of these things in this folder

@jamesemery
Copy link
Collaborator Author

@droazen i responded to your comments can this be merged?

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Still a few issues here...

mkdir -p ${STAGING_DIR}
cd ${STAGING_DIR}
set +e
rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cd into ${STAGING_DIR}, and then rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}. This still seems wrong...

build_docker.sh Outdated
@@ -93,7 +93,7 @@ if [ -n "$STAGING_DIR" ]; then
set +e
rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here -- we've just cd'd into ${STAGING_DIR}, so why are we deleting ${STAGING_DIR}/${STAGING_CLONE_DIR}?

rm -Rf ${STAGING_DIR}/${STAGING_CLONE_DIR}
set -e
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/${REPO}/${PROJECT}.git ${STAGING_CLONE_DIR}
cd ${STAGING_DIR}/${STAGING_CLONE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this cd command even succeed? We already cd'd into ${STAGING_DIR}, and now we are cd'ing into ${STAGING_DIR}/${STAGING_CLONE_DIR}?

build_docker_remote.sh Outdated Show resolved Hide resolved
build_docker_remote.sh Outdated Show resolved Hide resolved
build_docker_remote.sh Outdated Show resolved Hide resolved

echo "Building image with the tag ${DOCKER_IMAGE_TAG}..."

SUBMIT_COMMAND="gcloud builds submit --tag ${DOCKER_IMAGE_TAG} --timeout=24h --machine-type n1_highcpu_8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will an existing image with the same tag get overwritten by this command?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. but thats effectively impossible unless the user specifies the image themselves (in which case its their fault and problem) since we include the github tag/hash in the image name

scripts/docker/README.md Outdated Show resolved Hide resolved
@jamesemery
Copy link
Collaborator Author

@droazen one more round

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Merge after doing a test run of the latest version with both relative and absolute paths for the staging dir.

@jamesemery
Copy link
Collaborator Author

@droazen just checked and it works with both relative and absolute paths. I'm going to hit merge.

@jamesemery jamesemery merged commit 8c348aa into master Jul 25, 2022
@jamesemery jamesemery deleted the je_addRemoteDockerBuildScript branch July 25, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a Gcloud build script to build docker images.
3 participants