Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully reproducible repo2docker builds #11

Open
1 of 3 tasks
weiji14 opened this issue Nov 29, 2022 · 4 comments
Open
1 of 3 tasks

Fully reproducible repo2docker builds #11

weiji14 opened this issue Nov 29, 2022 · 4 comments

Comments

@weiji14
Copy link
Member

weiji14 commented Nov 29, 2022

Ensure that people can reproduce the cryocloud software environment locally and in the future.

Some actionable steps:

Relevant discussions previously:

@weiji14 - absolutely!! My take on this issue is that we should take a "two-level" approach:

  • Once we settle on a complete list of packages for the base environment.yml file, then we run a script @yuvipanda wrote that will write in the current conda-forge package versions in it. But that's not a conda lock file, b/c it remains platform independent as it won't have build hashes in it. This makes it easy for users to replicate the environment locally across OSes, which I think is a very important feature (not everything can be done locally, but it's very valuable for users to at least have a reproducible offline workflow that matches the cloud service as much as possible).
  • Then, we generate a conda-lock.yml file to pin the docker images with exact builds. We tag the image with a version number at that point.

Regarding versioning, my suggestion would be to use an 2022.X.Y scheme, where the "major" number X changes when new packages are added, whereas the "minor" Y changes when any package version or even build is updated. This will let people say "This project needs the cryohub environment 2022.3.4" with that being a pretty unambiguous reference.

As for the apt packages, I think that's a bit less of a concern, since they change more slowly and we can ensure that a given YYYY series never changes the underlying Ubuntu base image, so only minor version fixes will go in. I don't know of the equivalent of a conda-lock for apt packages, but that's probably just my ignorance.

Originally posted by @fperez in CryoInTheCloud/CryoCloudWebsite#1 (comment)

Ideally, we would use this conda-linux-64.lock to build the docker image for full reproducibility, but there are some PyPI-only packages right now that are not on conda-forge.

@yuvipanda, have you thought about creating conda-forge packages for https://github.com/yuvipanda/jupyter-desktop-server and https://github.com/yuvipanda/jupyter-syncthing-proxy?

Originally posted by @weiji14 in #9 (comment)

References:

@fperez
Copy link
Contributor

fperez commented Nov 29, 2022

Awesome - many thanks for opening this here so we track progress!

@tsnow03 we should mention this on Friday so the community users are aware of the longer-term view regarding these points, even if not all the pieces will be in place by then.

@yuvipanda
Copy link
Contributor

Also, I think right now repo2docker doesn't actually respect the lockfile at all, so I don't think it's used. The 'right' way to fix this is to add lockfile support to repo2docker. Until then, I suggest:

  1. renaming the environment.yml to something else, so it doesn't trigger the repo2docker behavior
  2. Adding a postBuild bash file that installs the environment from the lockfile

@weiji14
Copy link
Member Author

weiji14 commented Nov 29, 2022

Also, I think right now repo2docker doesn't actually respect the lockfile at all, so I don't think it's used. The 'right' way to fix this is to add lockfile support to repo2docker.

Yep, that is being tracked by jupyterhub/repo2docker#1157.

Until then, I suggest:

  1. renaming the environment.yml to something else, so it doesn't trigger the repo2docker behavior

  2. Adding a postBuild bash file that installs the environment from the lockfile

Ok, then we will need to use the unified conda-lock.yml file rather than the current explicit lockfile conda-linux-64.lock which doesn't handle pip-only dependencies. Let me see if I can get to working using postBuild.

@weiji14
Copy link
Member Author

weiji14 commented Nov 30, 2022

@yuvipanda, have you thought about creating conda-forge packages for https://github.com/yuvipanda/jupyter-desktop-server and https://github.com/yuvipanda/jupyter-syncthing-proxy?

Turns out we really do need these two packages to be on conda-forge, otherwise conda-lock won't work, see #14 (comment). So I've started Pull Requests at conda-forge/staged-recipes#21368 and conda-forge/staged-recipes#21369 to add those to conda-forge 🙂

weiji14 added a commit that referenced this issue Dec 5, 2022
More human readable Calendar Version ([CalVer](https://calver.org/))
tags for docker images published on the docker registry!

Steps to trigger this workflow:
1. Make a git tag manually, or on
https://github.com/CryoInTheCloud/hub-image/tags. Note that this tag
should ideally be in CalVer format (e.g. `2022.12.02`)
2. The tag will trigger this Continuous Integration workflow `retag.yml`
which will:
1. Pull down the build docker image from
https://quay.io/repository/cryointhecloud/cryo-hub-image corresponding
to the git commit that was tagged
  2. The docker image will be retagged to a name like `2022.12.02`
3. This retagged docker image is then pushed back up to the docker
registry

GitHub Actions workflow adapted from
https://github.com/pangeo-data/pangeo-docker-images/blob/2022.12.01/.github/workflows/Publish.yml

Motivated by
#11 (comment)
and
#13 (comment).
@tsnow03 tsnow03 moved this to 📋 Backlog in March 2023 Event Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants