From fb194c84ce15c54122d3b0dcd2e1ca6b572fdd15 Mon Sep 17 00:00:00 2001 From: Ryan Lovett Date: Thu, 16 Jan 2025 14:04:59 -0800 Subject: [PATCH 1/2] Move user image docs into new section. Move and somewhat consolidate user image documentation. Update some of the wording around active processes since we've fully migrated to the new method. --- docs/_quarto.yml | 15 +- docs/admins/structure.qmd | 3 +- .../managing-multiple-user-image-repos.qmd | 370 ++++++++++++++++++ docs/tasks/user-images/new-image.qmd | 148 +++++++ docs/tasks/user-images/new-packages.qmd | 82 ++++ .../user-images/rebuild-postgres-image.qmd | 25 ++ docs/tasks/user-images/repo2docker-local.qmd | 70 ++++ docs/tasks/user-images/transition-image.qmd | 88 +++++ docs/tasks/user-images/update-image.qmd | 15 + 9 files changed, 808 insertions(+), 8 deletions(-) create mode 100644 docs/tasks/user-images/managing-multiple-user-image-repos.qmd create mode 100644 docs/tasks/user-images/new-image.qmd create mode 100644 docs/tasks/user-images/new-packages.qmd create mode 100644 docs/tasks/user-images/rebuild-postgres-image.qmd create mode 100644 docs/tasks/user-images/repo2docker-local.qmd create mode 100644 docs/tasks/user-images/transition-image.qmd create mode 100644 docs/tasks/user-images/update-image.qmd diff --git a/docs/_quarto.yml b/docs/_quarto.yml index 829ba32b6..e635fe591 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -43,12 +43,15 @@ website: - tasks/core-pool.qmd - tasks/new-hub.qmd - tasks/rebuild-hub-image.qmd - - tasks/rebuild-postgres-image.qmd - - tasks/managing-multiple-user-image-repos.qmd - - tasks/new-image.qmd - - tasks/repo2docker-local.qmd - - tasks/transition-image.qmd - - tasks/new-packages.qmd + - section: "User Images" + contents: + - tasks/user-images/new-image.qmd + - tasks/user-images/update-image.qmd + - tasks/user-images/new-packages.qmd + - tasks/user-images/repo2docker-local.qmd + - tasks/user-images/managing-multiple-user-image-repos.qmd + - tasks/user-images/transition-image.qmd + - tasks/user-images/rebuild-postgres-image.qmd - tasks/course-config.qmd - tasks/calendar-scaler.qmd - tasks/prometheus-grafana.qmd diff --git a/docs/admins/structure.qmd b/docs/admins/structure.qmd index ecfe74177..af01ca7ea 100644 --- a/docs/admins/structure.qmd +++ b/docs/admins/structure.qmd @@ -57,8 +57,7 @@ Documentation is published to via a ## User Images -Each user image is stored in it's own repository in the `berkeley-dsep-infra` -organization. You can find them [here](https://github.com/orgs/berkeley-dsep-infra/repositories?language=&q=image&sort=&type=all). +Each user image is managed in [separate repositories](https://github.com/orgs/berkeley-dsep-infra/repositories?language=&q=image&sort=&type=all) in the `berkeley-dsep-infra` organization. These repositories determine the environment provided to the user. For example, it controls: diff --git a/docs/tasks/user-images/managing-multiple-user-image-repos.qmd b/docs/tasks/user-images/managing-multiple-user-image-repos.qmd new file mode 100644 index 000000000..432eb6d70 --- /dev/null +++ b/docs/tasks/user-images/managing-multiple-user-image-repos.qmd @@ -0,0 +1,370 @@ +--- +title: Managing multiple user image repos +aliases: + - ../admins/howto/managing-multiple-user-image-repos.html + - ../managing-multiple-user-image-repos.html +--- + +## Managing user image repos + +Since we have many multiples of user images in their own repos, managing these +can become burdensome... Particularly if you need to make changes to many or +all of the images. + +For this, we have a tool named [manage-repos](https://github.com/berkeley-dsep-infra/manage-repos). + +`manage-repos` uses a config file with a list of all of the git remotes for the +image repos ([repos.txt](https://github.com/berkeley-dsep-infra/datahub/blob/staging/scripts/user-image-management/repos.txt)) +and will allow you to perform basic git operations (sync/rebase, clone, branch +management and pushing). + +The script "assumes" that you have all of your user images in their own +sub-folder (in my case, `$HOME/src/images/...`). + +## Installation of instructions + +### Via cloning and manual installation + +Clone [the repo](https://github.com/berkeley-dsep-infra/manage-repos), and from +within that directory run: + +``` +pip install --editable . +``` + +The `--editable` flag is optional, and allows you to hack on the tool and have +those changes usable without reinstalling or needing to hack your `PATH`. + +### Via `pip` + +``` +python3 -m pip install --no-cache git+https://github.com/berkeley-dsep-infra/manage-repos +``` + +### Installing the `gh` tool + +To use the `pr` and `merge` sub-commands, you will also need to install the +Github CLI tool: https://github.com/cli/cli#installation + +## Usage + +### Overview of git operations included in `manage-repos`: + +`manage-repos` allows you to perform basic `git` operations on a large number +of similar repositories: + +* `branch`: Create a feature branch +* `clone`: Clone all repositories in the config file to a location on the + filesystem specified by the `--destination` argument. +* `merge`: Merge the most recent pull request in the managed repositories. +* `patch`: Apply a [git patch](https://git-scm.com/docs/git-apply) to all + repositories in the config file. +* `pr`: Create pull requests in the managed repositories. +* `push`: Push a branch from all repos to a remote. The remote defaults to + `origin`. +* `stage`: Performs a `git add` and `git commit` to stage changes before + pushing. +* `sync`: Sync all of the repositories, and optionally push to your fork. + +### Usage overview +The following sections will describe in more detail the options and commands +available with the script. + +#### Primary arguments for the script +``` +$ manage-repos.py --help +usage: manage-repos [-h] [-c CONFIG] [-d DESTINATION] {branch,clone,patch,push,stage,sync} ... + +positional arguments: + {branch,clone,patch,push,stage,sync} + Command to execute. Additional help is available for each command. + +options: + -h, --help show this help message and exit + -c CONFIG, --config CONFIG + Path to the file containing list of repositories to operate on. Defaults to repos.txt located in the current working + directory. + -d DESTINATION, --destination DESTINATION + Location on the filesystem of the directory containing the managed repositories. Defaults to the current working directory. + --version show program's version number and exit +``` + +`--config` is required, and setting `--destination` is recommended. + +### Sub-commands + +#### `branch` + +``` +$ manage-repos branch --help +usage: manage-repos branch [-h] [-b BRANCH] + +options: + -h, --help show this help message and exit + -b BRANCH, --branch BRANCH + Name of the new feature branch to create. +``` + +The feature branch to create is required, and the tool will switch to `main` +before creating and switching to the new branch. + +#### `clone` + +``` +$ manage-repos.py clone --help +usage: manage-repos clone [-h] [-s [SET_REMOTE]] [-g GITHUB_USER] + +Clone repositories in the config file and optionally set a remote for a fork. +If a repository sub-directory does not exist, it will be created. + +options: + -h, --help show this help message and exit + -s [SET_REMOTE], --set-remote [SET_REMOTE] + Set the user's GitHub fork as a remote. Defaults to 'origin'. + -g GITHUB_USER, --github-user GITHUB_USER + The GitHub username of the fork to set in the remote. + Required if --set-remote is used. +``` + +This command will clone all repositories found in the config, and if you've +created a fork, use the `--set-remote` and `--github-user` arguments to update +the remotes in the cloned repositories. This will set the primary repository's +remote to `upstream` and your fork to `origin` (unless you override this by +passing a different remote name with the `--set-remote` argument). + +After cloning, `git remote -v` will be executed for each repository to allow +you to confirm that the remotes are properly set. + +#### `merge` + +``` +$ usage: manage-repos merge [-h] [-b BODY] [-d] [-s {merge,rebase,squash}] + +Using the gh tool, merge the most recent pull request in the managed +repositories. Before using this command, you must authenticate with gh to +ensure that you have the correct permission for the required scopes. + +options: + -h, --help show this help message and exit + -b BODY, --body BODY The commit message to apply to the merge (optional). + -d, --delete Delete your local feature branch after the pull request + is merged (optional). + -s {merge,rebase,squash}, --strategy {merge,rebase,squash} + The pull request merge strategy to use, defaults to + 'merge'. +``` + +Be aware that the default behavior is to merge only the newest pull request in +the managed repositories. The reasoning behind this is that if you have created +pull requests across many repositories, the pull request numbers will almost +certainly be different, and adding interactive steps to merge specific pull +requests will be cumbersome. + +#### `patch` + +``` +$ manage-repos patch --help +usage: manage-repos patch [-h] [-p PATCH] + +Apply a git patch to managed repositories. + +options: + -h, --help show this help message and exit + -p PATCH, --patch PATCH + Path to the patch file to apply. +``` + +This command applies a git patch file to all of the repositories. The patch is +created by making changes to one file, and redirecting the output of `git diff` +to a new file, eg: + +``` +git diff > patchfile.txt +``` + +You then provide the location of the patch file with the `--patch` argument, +and the script will attempt to apply the patch to all of the repositories. + +If it is unable to apply the patch, the script will continue to run and notify +you when complete which repositories failed to accept the patch. + +#### `pr` +``` +$ manage-repos pr --help +usage: manage-repos pr [-h] [-t TITLE] [-b BODY] [-B BRANCH_DEFAULT] + [-g GITHUB_USER] + +Using the gh tool, create a pull request after pushing. + +options: + -h, --help show this help message and exit + -t TITLE, --title TITLE + Title of the pull request. + -b BODY, --body BODY Body of the pull request (optional). + -B BRANCH_DEFAULT, --branch-default BRANCH_DEFAULT + Default remote branch that the pull requests will be + merged to. This is optional and defaults to 'main'. + -g GITHUB_USER, --github-user GITHUB_USER + The GitHub username used to create the pull request. +``` + +After you've `stage`d and `push`ed your changes, this command will then create +a pull request using the `gh` tool. + +#### `push` + +``` +$ manage-repos push --help +usage: manage-repos push [-h] [-b BRANCH] [-r REMOTE] + +Push managed repositories to a remote. + +options: + -h, --help show this help message and exit + -b BRANCH, --branch BRANCH + Name of the branch to push. + -r REMOTE, --remote REMOTE + Name of the remote to push to. This is optional and + defaults to 'origin'. +``` + +This command will attempt to push all staged commits to a remote. The +`--branch` argument is required, and needs to be the name of the feature +branch that will be pushed. + +The remote that is pushed to defaults to `origin`, but you can override this +with the `--remote` argument. + +#### `stage` + +``` +$ manage-repos stage --help +usage: manage-repos stage [-h] [-f FILES [FILES ...]] [-m MESSAGE] + +Stage changes in managed repositories. This performs a git add and commit. + +options: + -h, --help show this help message and exit + -f FILES [FILES ...], --files FILES [FILES ...] + Space-delimited list of files to stage in the + repositories. Optional, and if left blank will default + to all modified files in the directory. + -m MESSAGE, --message MESSAGE + Commit message to use for the changes. +``` + +`stage` combines both `git add ...` and `git commit -m`, adding and committing +one or more files to the staging area before you push to a remote. + +The commit message must be a text string enclosed in quotes. + +By default, `--files` is set to `.`, which will add all modified files to the +staging area. You can also specify any number of files, separated by a space. + +#### `sync` + +``` +$ manage-image-repos.py sync --help +usage: manage-repos sync [-h] [-b BRANCH_DEFAULT] [-u UPSTREAM] [-p] + [-r REMOTE] + +Sync managed repositories to the latest version using 'git rebase'. Optionally +push to a remote fork. + +options: + -h, --help show this help message and exit + -b BRANCH_DEFAULT, --branch-default BRANCH_DEFAULT + Default remote branch to sync to. This is optional and + defaults to 'main'. + -u UPSTREAM, --upstream UPSTREAM + Name of the parent remote to sync from. This is + optional and defaults to 'upstream'. + -p, --push Push the locally synced repo to a remote fork. + -r REMOTE, --remote REMOTE + The name of the remote fork to push to. This is + optional and defaults to 'origin'. +``` + +This command will switch your local repositories to the `main` branch, and sync +all repositories from the config to your device from a remote. With the +`--push` argument, it will push the local repository to another remote. + +By default, the script will switch to the `main` branch before syncing, and can +be overridden with the `--branch-default` argument. + +The primary remote that is used to sync is `upstream`, but that can also be +overridden with the `--upstream` argument. The remote for a fork defaults to +`origin`, and can be overridden via the `--remote` argument. + + +### Tips, tricks and usage examples + +#### Tips and tricks + +`manage-repos` is best run from the parent folder that will contain all of the +repositories that you will be managing as the default value of `--destination` +is the current working directory (`.`). + +You can also create a symlink in the parent folder that points to the config +file elsewhere on your filesystem: + +``` +ln -s /scripts/user-image-management/repos.txt repos.txt +``` + +With this in mind, you can safely drop the `--config` and `--destination` +arguments when running `manage-repos`. Eg: + +``` +manage-repos sync -p +``` + +Another tip is to comment out or delete entries in your config when performing +git operations on a limited set of repositories. Be sure to `git restore` the +file when you're done! + +#### Usage examples + +Clone all of the image repos to a common directory: + +``` +manage-repos --destination ~/src/images/ --config /path/to/repos.txt clone +``` + +Clone all repos, and set `upstream` and `origin` for your fork: + +``` +manage-repos -d ~/src/images/ -c /path/to/repos.txt clone --set-remote --github-user +``` + +Sync all repos from `upstream` and push to your `origin`: + +``` +manage-repos -d ~/src/images/ -c /path/to/repos.txt sync --push +``` + +Create a feature branch in all of the repos: + +``` +manage-repos -d ~/src/images -c /path/to/repos.txt branch -b test-branch +``` + +Create a git patch and apply it to all image repos: + +``` +git diff envorinment.yml > /tmp/git-patch.txt +manage-repos -d ~/src/images -c /path/to/repos.txt patch -p /tmp/git-patch.txt +``` + +Once you've tested everything and are ready to push and create a PR, add and +commit all modified files in the repositories: + +``` +manage-repos -d ~/src/images -c /path/to/repos.txt stage -m "this is a commit" +``` + +After staging, push everything to a remote: + +``` +manage-repos -d ~/src/images -c /path/to/repos.txt push -b test-branch +``` diff --git a/docs/tasks/user-images/new-image.qmd b/docs/tasks/user-images/new-image.qmd new file mode 100644 index 000000000..f05648ceb --- /dev/null +++ b/docs/tasks/user-images/new-image.qmd @@ -0,0 +1,148 @@ +--- +title: Create a New Single User Image +aliases: + - ../admins/howto/new-image.html + - ../new-image.html +--- + +You might need to create a new user image when deploying a new hub, or changing +from a shared single user server image. We use +[repo2docker](https://github.com/jupyterhub/repo2docker) to generate our images. + +There are two approaches to creating a repo2docker image: + +1. Use a repo2docker-style image [template](https://github.com/berkeley-dsep-infra/datahub/tree/staging/deployments/data100/image) (environment.yaml, etc) +2. Use a [Dockerfile](https://github.com/berkeley-dsep-infra/datahub/tree/staging/deployments/datahub/images/default) (useful for larger/more complex images) + +Generally, we prefer to use the former approach. + +If we need to install software as `root`, you can add a [Dockerfile.appendix](https://repo2docker.readthedocs.io/en/latest/usage.html#cmdoption-jupyter-repo2docker-appendix) to the repo. (this involves setting `APPENDEX_FILE: Dockerfile.appendix` within the build-* GitHub Actions workflows as is done in the ugr01 image) + +There are two approaches to pre-populate the image's assets: + +1. Use an existing image as a template. Browse through our [image repos](https://github.com/orgs/berkeley-dsep-infra/repositories?language=&q=image&sort=&type=all) to find a hub that is similar to the one you are trying to create. This will give you a good starting point. + +1. Fork [hub-user-image-template](https://github.com/berkeley-dsep-infra/hub-user-image-template). Click "Use this template" > "Create a new repository". Be sure to follow convention and name the repo `-user-image`, and the owner needs to be `berkeley-dsep-infra`. When that is done, create your own fork of the new repo. + +### Image Repository Settings + +There are now a few steps to set up the CI/CD for the new image repo. In the +`berkeley-dsep-infra` image repo, click on `Settings`, and under `General`, +scroll down to `Pull Requests` and check the box labeled `Automatically delete +head branches`. + +Scroll back up to the top of the settings, and in the left menu bar, click on +`Secrets and variables`, and then `Actions`. + +From there, click on the `Variables` tab and then `New repository variable`. We +will be adding two new variables: + +1. `HUB`: the name of the hub (eg: datahub) + +1. `IMAGE`: the Google Artifact Registry path and image name. The path will +always be `ucb-datahub-2018/user-images/` and the +image name will always be the same as the repo: `-user-image`. + +### Your Fork's Repository Settings + +Now you will want to disable Github Actions for your fork of the image repo. +If you don't, whenever you push PRs to the root repo the workflows *in your +fork* will attempt to run, but don't have the proper permissions to +successfully complete. This will then send you a nag email about a workflow +failure. + +To disable this for your fork, click on `Settings`, `Actions` and `General`. +Check the `Disable actions` box and click save. + +### Enable Artifact Registry Pushing + +The image repository needs to be added to the list of allowed repositories in +the `berkeley-dsep-infra` secrets. Go to the `berkeley-dsep-infra` [Secrets and +Variables](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions). +Give your repository permissions to push to the Artifact Registry, +as well as to push a branch to the [datahub repo](https://github.com/berkeley-dsep-infra/datahub). + +Edit both `DATAHUB_CREATE_PR` and `GAR_SECRET_KEY`, and click on the gear icon, +search for your repo name, check the box and save. + +### Configure `hubploy` + +You need to let `hubploy` know the specifics of the image by updating your +deployment's `hubploy.yaml`. Change the `name` of the image in +`deployments//hubploy.yaml` to point to your new image name, and after +the name add `:PLACEHOLDER` in place of the image sha. This will be +automatically updated after your new image is built and pushed to the Artifact +Registry. + +Example: + +```yaml +images: + images: + - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/fancynewhub-user-image:PLACEHOLDER + +cluster: + provider: gcloud + gcloud: + project: ucb-datahub-2018 + service_key: gke-key.json + cluster: spring-2024 + zone: us-central1 +``` + +Next, add the ssh clone path of the root image repo to [repos.txt](https://github.com/berkeley-dsep-infra/datahub/blob/staging/scripts/user-image-management/repos.txt). + +Create a PR and merge to staging. You can cancel the +[`Deploy staging and prod hubs` job in Actions](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/deploy-hubs.yaml), +or just let it fail. + +## Subscribe to GitHub Repo in Slack + +Go to the #ucb-datahubs-bots channel, and run the following command: + +``` +/github subscribe berkeley-dsep-infra/ +``` + +## Modify the Image + +This step is straightforward: create a feature branch, and edit, delete, or add +any files to configure the image as needed. + +We also strongly recommend copying `README-template.md` over the default +`README.md`, and modifying it to replace all occurrences of `` with +the name of your image. + +## Submit Pull Requests + +Familiarize yourself with [pull +requests](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) +and [repo2docker](https://github.com/jupyter/repo2docker), and create a fork of +the [datahub staging branch](https://github.com/berkeley-dsep-infra/datahub). + +1. Set up your git/dev environment by following the [image templat's +contributing + guide](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/CONTRIBUTING.md). + +1. [Test the image locally](repo2docker-local.qmd) using `repo2docker`. +1. Submit a PR to `staging`. +1. Commit and push your changes to your fork of the image repo, and + create a new pull request at + https://github.com/berkeley-dsep-infra/. + +1. After the build passes, merge your PR in to `main` and the image will + be built again and pushed to the Artifact Registry. If that succeeds, + then a commit will be crafted that will update the `PLACEHOLDER` field in + `hubploy.yaml` with the image's SHA and pushed to the datahub repo. + You can check on the progress of this workflow in your root image repo's + `Actions` tab. + +1. After the previous step is completed successfully, go to the Datahub repo + and click on the [New pull + request](https://github.com/berkeley-dsep-infra/datahub/compare) + button. Next, click on the `compare: staging` drop down, and you should see + a branch named something like `update--image-tag-`. Select + that, and create a new pull request. + +1. Once the checks has passed, merge to `staging` and your new image will be + deployed! You can watch the progress in the [deploy-hubs workflow](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/deploy-hubs.yaml). diff --git a/docs/tasks/user-images/new-packages.qmd b/docs/tasks/user-images/new-packages.qmd new file mode 100644 index 000000000..bbf0f842c --- /dev/null +++ b/docs/tasks/user-images/new-packages.qmd @@ -0,0 +1,82 @@ +--- +title: Testing and Upgrading New Packages +aliases: + - ../admins/howto/new-packages.html + - ../new-packages.html +--- + +It is helpful to test package additions and upgrades for yourself before +they are installed for all users. You can make sure the change behaves +as you think it should, and does not break anything else. Once tested, +request that the change by installed for all users by by [creating a new +issue in +github](https://github.com/berkeley-dsep-infra/datahub/issues),contacting +cirriculum support staff, or creating a new pull request. Ultimately, +thoroughly testing changes locally and submitting a pull request will +result in the software being rolled out to everyone much faster. + +Install a python package in your notebook +================================== + +When testing a notebook with new version of the package, add the +following line to a cell at the beginning of your notebook. + +``` bash +!pip install --upgrade packagename==version +``` + +You can then execute this cell every time you run the notebook. This +will ensure you have the version you think you have when running your +code. + +To avoid complicated errors, make sure you always specify a version. You +can find the latest version by searching on +[pypi.org](https://pypi.org). + +Find current version of a python package +=============================== + +To find the current version of a particular installed package, you can +run the following in a notebook. + +``` bash +!pip list | grep +``` + +This should show you the particular package you are interested in and +its current version. + +Install/Update a R package in your RStudio +================================== + +When the required version of package is missing in the R Studio, Try the +following command to check whether the default installation repo +contains the package (and the version) required. + +``` R +install.packages("packagename") +``` + +This should install the particular package you are interested in and its +latest version. You can find the latest version of a R package by +searching on [CRAN](https://cran.r-project.org/). + +Find current version of a R package =============================== + +To find the current version of a particular installed package, you can +run the following in RStudio. + +``` R +packageVersion("") +``` + +This should show you the particular package you are interested in and +its current version. + +## Tips for Upgrading Package + +- Conda can take an extremely long time to resolve version dependency + conflicts, if they are resolvable at all. When upgrading Python + versions or a core package that is used by many other packages, such + as [requests]{.title-ref}, clean out or upgrade old packages to + minimize the number of dependency conflicts. diff --git a/docs/tasks/user-images/rebuild-postgres-image.qmd b/docs/tasks/user-images/rebuild-postgres-image.qmd new file mode 100644 index 000000000..d0d3be399 --- /dev/null +++ b/docs/tasks/user-images/rebuild-postgres-image.qmd @@ -0,0 +1,25 @@ +--- +title: "Customize the Per-User Postgres Docker Image" +aliases: + - ../admins/howto/rebuild-postgres-image.html + - ../rebuild-postgres-image.html +--- + +We provide each student on `data100` with a postgresql server. We want the +[python extension](https://www.postgresql.org/docs/current/plpython.html) +installed. So we inherit from the [upstream postgresql docker +image](https://hub.docker.com/_/postgres), and add the appropriate +package. + +This image is in `images/postgres`. If you update it, you need to +rebuild and push it. + +1. Modify the image in `images/postgres` and make a git commit. +2. Run `chartpress --push`. This will build and push the image, *but + not put anything in YAML*. There is no place we can put this in + `values.yaml`, since this is only used for data100. +3. Notice the image name + tag from the `chartpress --push` command, + and put it in the appropriate place (under `extraContainers`) in + `data100/config/common.yaml`. +4. Make a commit with the new tag in `data100/config/common.yaml`. +5. Proceed to deploy as normal. diff --git a/docs/tasks/user-images/repo2docker-local.qmd b/docs/tasks/user-images/repo2docker-local.qmd new file mode 100644 index 000000000..f74c8cc24 --- /dev/null +++ b/docs/tasks/user-images/repo2docker-local.qmd @@ -0,0 +1,70 @@ +--- +title: Test User Images Locally +aliases: + - ../repo2docker-local.html +--- + +You should use `repo2docker` to build and test the image on your own device before you push and create a PR. It is often faster to do this first before using CI/CD since you can take advantage of local caching and rapid iteration. There's no need to waste Github Action minutes to test build images when you can do this on your own device. + +## Common Usage + +One can simply run `repo2docker /path/to/image/assets`. For example if one has changed into the directory containing the `repo2docker` files (such as `environment.yml` and/or `Dockerfile`), the command would be: + +```shell +repo2docker . +``` + +This works on Linux and Windows Subsystem for Linux (WSL). It will build the image, then launch jupyter server and display a localhost URL. Copy the URL and paste it into a local web browser. + +If you just want to build the image without also running the server, +add the `--no-run` argument: + +```shell +repo2docker --no-run . +``` + +## On Apple Silicon + +Apple's ARM-based CPUs (the "M" chips) are different from those run on the virtual machines in our clusters. macOS is capable of emulating x86_64/amd64, but it is necessary to optimize docker for this emulation, and to explicitly tell your local docker runtime that the images should be built on the `linux/amd64` platform. + +In Docker's settings: + + - Under **General** > **Virtual Machine Options**, either enable both **Apple Virtualization framework** and **Use Rosetta for x86_64/amd64 emulation on Apple Silicon**, or enable **Docker VMM**. + - Under **Resources** it is also recommended to raise the memory limit to at least 4GB. + +There are two methods for building `linux/amd64` images. The default uses `repo2docker`'s support for `docker-py`, while the second uses a `repo2docker` plugin that can invoke your local docker command-line interface. + +### docker-py (default) + +Run `jupyter-repo2docker` with the following arguments: + +``` +repo2docker \ + --Repo2Docker.platform=linux/amd64 \ + -e PLAYWRIGHT_BROWSERS_PATH=/srv/conda \ + --user-id=1000 --user-name=jovyan \ + --target-repo-dir=/home/jovyan/.cache \ + . +``` + +where the final parameter is the path to the assets or `.` if they are in the current directory. + +The `--user-id` and `--user-name` options are for non-Dockerfile based builds. Images with Dockerfiles do not need those options. + +Note that you may see (possibly harmless) architecture mismatch warnings with this method. + +### `docker` CLI + +You can instruct `repo2docker` to use your machine's local `docker` executable directly rather than the default of `docker-py`. You will first need to install [repo2podman](https://github.com/manics/repo2podman), a plugin that lets you use any container runtime with a command-line user interface similar to that of `docker`. This is useful if you want to leverage [docker buildx](https://github.com/docker/buildx/) (for things like multi-stage builds) or if you want to use an alternative executable like `podman`. This also eliminates architecture mismatch warnings. + +::: {.callout-warning} +repo2podman reportedly does not work yet on WSL. +::: + +``` +repo2docker \ + --Repo2Docker.platform=linux/amd64 \ + -e PLAYWRIGHT_BROWSERS_PATH=/srv/conda \ + --engine podman --PodmanEngine.podman_executable=docker \ + . +``` diff --git a/docs/tasks/user-images/transition-image.qmd b/docs/tasks/user-images/transition-image.qmd new file mode 100644 index 000000000..91954cf23 --- /dev/null +++ b/docs/tasks/user-images/transition-image.qmd @@ -0,0 +1,88 @@ +--- +title: Transition Single User Image to GitHub Actions +aliases: + - ../admins/howto/transition-image.html + - ../transition-image.html +--- + +Single user images were originally maintained within the main datahub repo, however we moved them into their own repositories. It makes testing notebooks easier, and we can delegate write access to course staff if necessary. + +This was the process for transitioning images to their own repositories. + +## Prerequisites + +You will need to install `git-filter-repo`. + +```bash +wget -O ~/bin/git-filter-repo https://raw.githubusercontent.com/newren/git-filter-repo/main/git-filter-repo +chmod +x ~/bin/git-filter-repo +``` + +## Create the repository + +1. Go to https://github.com/berkeley-dsep-infra/hub-user-image-template. Click "Use this template" > "Create a new repository". +1. Set the owner to `berkeley-dsep-infra`. Name the image `{hub}-user-image`, or some approximation of there are multiple images per hub. +1. Click create repository. +1. In the new repository, visit Settings > Secrets and variables > Actions > Variables tab. Create new variables: + a. Set HUB to the hub deployment, e.g. `shiny`. + b. Set IMAGE to `ucb-datahub-2018/user-images/{hub}-user-image`, e.g. `ucb-datahub-2018/user-images/shiny-user-image`. +1. Fork the new image repo into your own github account. + +## Preparing working directories + +As part of this process, we will pull the previous image's git history into the new image repo. + +1. Clone the *datahub* repo into a new directory named after the image repo. + ```bash + git clone git@github.com:berkeley-dsep-infra/datahub.git {hub}-user-image --origin source + ``` +1. Change into the directory. +1. Run `git-filter-repo`: + ```bash + git filter-repo --subdirectory-filter deployments/{hub}/image --force + ``` +1. Add new git remotes: + ```bash + git remote add origin git@github.com:{your_git_account}/{hub}-user-image.git + git remote add upstream git@github.com:berkeley-dsep-infra/{hub}-user-image.git + ``` +1. Pull in the contents of the new user image that was created from the template. + ```bash + git fetch upstream + git checkout main # pulls in .github + ``` + +1. Merge the contents of the previous datahub image with the new user image. + ```bash + git rm environment.yml + git commit -m "Remove default environment.yml file." + git merge staging --allow-unrelated-histories -m 'Bringing in image directory from deployment repo' + git push upstream main + git push origin main + ``` + +## Preparing continuous integration + +1. In the [berkeley-dsep-infra org settings](https://github.com/organizations/berkeley-dsep-infra/settings/profile), visit [Secrets and variables > Actions](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions). Edit the secrets for `DATAHUB_CREATE_PR` and `GAR_SECRET_KEY`, and enable the new repo to access each. + +1. In the datahub repo, in one PR: + a. remove the hub deployment steps for the hub: + - *Deploy {hub}* + - *hubploy/build-image {hub} image build* (x2) + + a. under `deployments/{hub}/hubploy.yaml`, remove the registry entry, and set the `image_name` to have `PLACEHOLDER` for the tag. + + a. In the datahub repo, under the deployment image directory, update the README to point to the new repo. Delete everything else in the image directory. + +1. Merge these changes to datahub staging. + +1. Make a commit to trigger a build of the image in its repo. + +1. In a PR in the datahub repo, under .github/workflows/deploy-hubs.yaml, add the hub with the new image under `determine-hub-deployments.py --only-deploy`. + +1. Make another commit to the image repo to trigger a build. When these jobs finish, a commit will be pushed to the datahub repo. Make a PR, and merge to staging after canceling the CircleCI builds. (these builds are an artifact of the CircleCI-to-GitHub migration -- we won't need to do that long term) + +1. Subscribe the *#ucb-datahubs-bots* channel in UC Tech slack to the repo. + ```bash + /github subscribe berkeley-dsep-infra/ + ``` diff --git a/docs/tasks/user-images/update-image.qmd b/docs/tasks/user-images/update-image.qmd new file mode 100644 index 000000000..d4ed17c43 --- /dev/null +++ b/docs/tasks/user-images/update-image.qmd @@ -0,0 +1,15 @@ +--- +title: Update the Image +--- + +Updating a user image involves forking the image's git repository, making changes in your fork, and then making a pull request to the original repository. Here is the general outline: + +1. Set up your git and development environment by following [the instructions](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/CONTRIBUTING.md). +1. Fork the image repository. +1. Create a new branch for this PR. +1. Make a change in your fork of the image repo and commit it. This may involve modifying [repo2docker's configuration files](https://repo2docker.readthedocs.io/en/latest/configuration/index.html). We typically prefer using `conda` packages, and `pip` only if necessary. Please pin to a specific version (no wildards, etc). Note that package versions for `conda` are specified using `=`, while in `pip` they are specified using `==`. +1. [Test the changes locally](repo2docker-local.qmd) using `repo2docker`. +1. Create a [pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) to the repo in `berkeley-dsep-infra`. This will trigger a Github Action workflow that will test to see if the image builds successfully. +1. If the build succeeds, someone with sufficient access (DataHub staff, or course staff with elevated privileges) can merge the PR. This will trigger another workflow that will build and push the image to the image registry. You can check on the progress of this workflow in your image repo's `Actions` tab. +1. When that process succeeds, another PR will be created for you in the berkeley-dsep-infra/datahub repo. This will configure the infrastructure to deploy the image's new tag. +1. Once the PR is reviewed and merged, a GitHub Action workflow will deploy the new image to the staging instance of the hub. You can watch the progress [here](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/deploy-hubs.yaml). From c3deb700cd67e13847b671f67961d6d514077f5f Mon Sep 17 00:00:00 2001 From: Ryan Lovett Date: Thu, 16 Jan 2025 14:08:57 -0800 Subject: [PATCH 2/2] Don't keep duplicates. --- .../managing-multiple-user-image-repos.qmd | 369 ------------------ docs/tasks/new-image.qmd | 156 -------- docs/tasks/new-packages.qmd | 133 ------- docs/tasks/rebuild-postgres-image.qmd | 24 -- docs/tasks/repo2docker-local.qmd | 68 ---- docs/tasks/transition-image.qmd | 98 ----- 6 files changed, 848 deletions(-) delete mode 100644 docs/tasks/managing-multiple-user-image-repos.qmd delete mode 100644 docs/tasks/new-image.qmd delete mode 100644 docs/tasks/new-packages.qmd delete mode 100644 docs/tasks/rebuild-postgres-image.qmd delete mode 100644 docs/tasks/repo2docker-local.qmd delete mode 100644 docs/tasks/transition-image.qmd diff --git a/docs/tasks/managing-multiple-user-image-repos.qmd b/docs/tasks/managing-multiple-user-image-repos.qmd deleted file mode 100644 index b801155bd..000000000 --- a/docs/tasks/managing-multiple-user-image-repos.qmd +++ /dev/null @@ -1,369 +0,0 @@ ---- -title: Managing multiple user image repos -aliases: - - ../admins/howto/managing-multiple-user-image-repos.html ---- - -## Managing user image repos - -Since we have many multiples of user images in their own repos, managing these -can become burdensome... Particularly if you need to make changes to many or -all of the images. - -For this, we have a tool named [manage-repos](https://github.com/berkeley-dsep-infra/manage-repos). - -`manage-repos` uses a config file with a list of all of the git remotes for the -image repos ([repos.txt](https://github.com/berkeley-dsep-infra/datahub/blob/staging/scripts/user-image-management/repos.txt)) -and will allow you to perform basic git operations (sync/rebase, clone, branch -management and pushing). - -The script "assumes" that you have all of your user images in their own -sub-folder (in my case, `$HOME/src/images/...`). - -## Installation of instructions - -### Via cloning and manual installation - -Clone [the repo](https://github.com/berkeley-dsep-infra/manage-repos), and from -within that directory run: - -``` -pip install --editable . -``` - -The `--editable` flag is optional, and allows you to hack on the tool and have -those changes usable without reinstalling or needing to hack your `PATH`. - -### Via `pip` - -``` -python3 -m pip install --no-cache git+https://github.com/berkeley-dsep-infra/manage-repos -``` - -### Installing the `gh` tool - -To use the `pr` and `merge` sub-commands, you will also need to install the -Github CLI tool: https://github.com/cli/cli#installation - -## Usage - -### Overview of git operations included in `manage-repos`: - -`manage-repos` allows you to perform basic `git` operations on a large number -of similar repositories: - -* `branch`: Create a feature branch -* `clone`: Clone all repositories in the config file to a location on the - filesystem specified by the `--destination` argument. -* `merge`: Merge the most recent pull request in the managed repositories. -* `patch`: Apply a [git patch](https://git-scm.com/docs/git-apply) to all - repositories in the config file. -* `pr`: Create pull requests in the managed repositories. -* `push`: Push a branch from all repos to a remote. The remote defaults to - `origin`. -* `stage`: Performs a `git add` and `git commit` to stage changes before - pushing. -* `sync`: Sync all of the repositories, and optionally push to your fork. - -### Usage overview -The following sections will describe in more detail the options and commands -available with the script. - -#### Primary arguments for the script -``` -$ manage-repos.py --help -usage: manage-repos [-h] [-c CONFIG] [-d DESTINATION] {branch,clone,patch,push,stage,sync} ... - -positional arguments: - {branch,clone,patch,push,stage,sync} - Command to execute. Additional help is available for each command. - -options: - -h, --help show this help message and exit - -c CONFIG, --config CONFIG - Path to the file containing list of repositories to operate on. Defaults to repos.txt located in the current working - directory. - -d DESTINATION, --destination DESTINATION - Location on the filesystem of the directory containing the managed repositories. Defaults to the current working directory. - --version show program's version number and exit -``` - -`--config` is required, and setting `--destination` is recommended. - -### Sub-commands - -#### `branch` - -``` -$ manage-repos branch --help -usage: manage-repos branch [-h] [-b BRANCH] - -options: - -h, --help show this help message and exit - -b BRANCH, --branch BRANCH - Name of the new feature branch to create. -``` - -The feature branch to create is required, and the tool will switch to `main` -before creating and switching to the new branch. - -#### `clone` - -``` -$ manage-repos.py clone --help -usage: manage-repos clone [-h] [-s [SET_REMOTE]] [-g GITHUB_USER] - -Clone repositories in the config file and optionally set a remote for a fork. -If a repository sub-directory does not exist, it will be created. - -options: - -h, --help show this help message and exit - -s [SET_REMOTE], --set-remote [SET_REMOTE] - Set the user's GitHub fork as a remote. Defaults to 'origin'. - -g GITHUB_USER, --github-user GITHUB_USER - The GitHub username of the fork to set in the remote. - Required if --set-remote is used. -``` - -This command will clone all repositories found in the config, and if you've -created a fork, use the `--set-remote` and `--github-user` arguments to update -the remotes in the cloned repositories. This will set the primary repository's -remote to `upstream` and your fork to `origin` (unless you override this by -passing a different remote name with the `--set-remote` argument). - -After cloning, `git remote -v` will be executed for each repository to allow -you to confirm that the remotes are properly set. - -#### `merge` - -``` -$ usage: manage-repos merge [-h] [-b BODY] [-d] [-s {merge,rebase,squash}] - -Using the gh tool, merge the most recent pull request in the managed -repositories. Before using this command, you must authenticate with gh to -ensure that you have the correct permission for the required scopes. - -options: - -h, --help show this help message and exit - -b BODY, --body BODY The commit message to apply to the merge (optional). - -d, --delete Delete your local feature branch after the pull request - is merged (optional). - -s {merge,rebase,squash}, --strategy {merge,rebase,squash} - The pull request merge strategy to use, defaults to - 'merge'. -``` - -Be aware that the default behavior is to merge only the newest pull request in -the managed repositories. The reasoning behind this is that if you have created -pull requests across many repositories, the pull request numbers will almost -certainly be different, and adding interactive steps to merge specific pull -requests will be cumbersome. - -#### `patch` - -``` -$ manage-repos patch --help -usage: manage-repos patch [-h] [-p PATCH] - -Apply a git patch to managed repositories. - -options: - -h, --help show this help message and exit - -p PATCH, --patch PATCH - Path to the patch file to apply. -``` - -This command applies a git patch file to all of the repositories. The patch is -created by making changes to one file, and redirecting the output of `git diff` -to a new file, eg: - -``` -git diff > patchfile.txt -``` - -You then provide the location of the patch file with the `--patch` argument, -and the script will attempt to apply the patch to all of the repositories. - -If it is unable to apply the patch, the script will continue to run and notify -you when complete which repositories failed to accept the patch. - -#### `pr` -``` -$ manage-repos pr --help -usage: manage-repos pr [-h] [-t TITLE] [-b BODY] [-B BRANCH_DEFAULT] - [-g GITHUB_USER] - -Using the gh tool, create a pull request after pushing. - -options: - -h, --help show this help message and exit - -t TITLE, --title TITLE - Title of the pull request. - -b BODY, --body BODY Body of the pull request (optional). - -B BRANCH_DEFAULT, --branch-default BRANCH_DEFAULT - Default remote branch that the pull requests will be - merged to. This is optional and defaults to 'main'. - -g GITHUB_USER, --github-user GITHUB_USER - The GitHub username used to create the pull request. -``` - -After you've `stage`d and `push`ed your changes, this command will then create -a pull request using the `gh` tool. - -#### `push` - -``` -$ manage-repos push --help -usage: manage-repos push [-h] [-b BRANCH] [-r REMOTE] - -Push managed repositories to a remote. - -options: - -h, --help show this help message and exit - -b BRANCH, --branch BRANCH - Name of the branch to push. - -r REMOTE, --remote REMOTE - Name of the remote to push to. This is optional and - defaults to 'origin'. -``` - -This command will attempt to push all staged commits to a remote. The -`--branch` argument is required, and needs to be the name of the feature -branch that will be pushed. - -The remote that is pushed to defaults to `origin`, but you can override this -with the `--remote` argument. - -#### `stage` - -``` -$ manage-repos stage --help -usage: manage-repos stage [-h] [-f FILES [FILES ...]] [-m MESSAGE] - -Stage changes in managed repositories. This performs a git add and commit. - -options: - -h, --help show this help message and exit - -f FILES [FILES ...], --files FILES [FILES ...] - Space-delimited list of files to stage in the - repositories. Optional, and if left blank will default - to all modified files in the directory. - -m MESSAGE, --message MESSAGE - Commit message to use for the changes. -``` - -`stage` combines both `git add ...` and `git commit -m`, adding and committing -one or more files to the staging area before you push to a remote. - -The commit message must be a text string enclosed in quotes. - -By default, `--files` is set to `.`, which will add all modified files to the -staging area. You can also specify any number of files, separated by a space. - -#### `sync` - -``` -$ manage-image-repos.py sync --help -usage: manage-repos sync [-h] [-b BRANCH_DEFAULT] [-u UPSTREAM] [-p] - [-r REMOTE] - -Sync managed repositories to the latest version using 'git rebase'. Optionally -push to a remote fork. - -options: - -h, --help show this help message and exit - -b BRANCH_DEFAULT, --branch-default BRANCH_DEFAULT - Default remote branch to sync to. This is optional and - defaults to 'main'. - -u UPSTREAM, --upstream UPSTREAM - Name of the parent remote to sync from. This is - optional and defaults to 'upstream'. - -p, --push Push the locally synced repo to a remote fork. - -r REMOTE, --remote REMOTE - The name of the remote fork to push to. This is - optional and defaults to 'origin'. -``` - -This command will switch your local repositories to the `main` branch, and sync -all repositories from the config to your device from a remote. With the -`--push` argument, it will push the local repository to another remote. - -By default, the script will switch to the `main` branch before syncing, and can -be overridden with the `--branch-default` argument. - -The primary remote that is used to sync is `upstream`, but that can also be -overridden with the `--upstream` argument. The remote for a fork defaults to -`origin`, and can be overridden via the `--remote` argument. - - -### Tips, tricks and usage examples - -#### Tips and tricks - -`manage-repos` is best run from the parent folder that will contain all of the -repositories that you will be managing as the default value of `--destination` -is the current working directory (`.`). - -You can also create a symlink in the parent folder that points to the config -file elsewhere on your filesystem: - -``` -ln -s /scripts/user-image-management/repos.txt repos.txt -``` - -With this in mind, you can safely drop the `--config` and `--destination` -arguments when running `manage-repos`. Eg: - -``` -manage-repos sync -p -``` - -Another tip is to comment out or delete entries in your config when performing -git operations on a limited set of repositories. Be sure to `git restore` the -file when you're done! - -#### Usage examples - -Clone all of the image repos to a common directory: - -``` -manage-repos --destination ~/src/images/ --config /path/to/repos.txt clone -``` - -Clone all repos, and set `upstream` and `origin` for your fork: - -``` -manage-repos -d ~/src/images/ -c /path/to/repos.txt clone --set-remote --github-user -``` - -Sync all repos from `upstream` and push to your `origin`: - -``` -manage-repos -d ~/src/images/ -c /path/to/repos.txt sync --push -``` - -Create a feature branch in all of the repos: - -``` -manage-repos -d ~/src/images -c /path/to/repos.txt branch -b test-branch -``` - -Create a git patch and apply it to all image repos: - -``` -git diff envorinment.yml > /tmp/git-patch.txt -manage-repos -d ~/src/images -c /path/to/repos.txt patch -p /tmp/git-patch.txt -``` - -Once you've tested everything and are ready to push and create a PR, add and -commit all modified files in the repositories: - -``` -manage-repos -d ~/src/images -c /path/to/repos.txt stage -m "this is a commit" -``` - -After staging, push everything to a remote: - -``` -manage-repos -d ~/src/images -c /path/to/repos.txt push -b test-branch -``` diff --git a/docs/tasks/new-image.qmd b/docs/tasks/new-image.qmd deleted file mode 100644 index b0302cbcf..000000000 --- a/docs/tasks/new-image.qmd +++ /dev/null @@ -1,156 +0,0 @@ ---- -title: Create a New Single User Image -aliases: - - ../admins/howto/new-image.html ---- - -You might need to create a new user image when deploying a new hub, or changing -from a shared single user server image. We use -[repo2docker](https://github.com/jupyterhub/repo2docker) to generate our images. - -There are two approaches to creating a repo2docker image: - -1. Use a repo2docker-style image [template](https://github.com/berkeley-dsep-infra/datahub/tree/staging/deployments/data100/image) (environment.yaml, etc) -2. Use a [Dockerfile](https://github.com/berkeley-dsep-infra/datahub/tree/staging/deployments/datahub/images/default) (useful for larger/more complex images) - -Generally, we prefer to use the former approach, unless we need to -install specific packages or utilities outside of python/apt as `root`. -If that is the case, only a `Dockerfile` format will work. - -As always, create a feature branch for your changes, and submit a PR when done. - -There are two approaches to pre-populate the image's assets: - - - Use an existing image as a template. - Browse through our [image -repos](https://github.com/orgs/berkeley-dsep-infra/repositories?language=&q=image&sort=&type=all) - to find a hub that is similar to the one you are trying to create. This will - give you a good starting point. - - - Fork [hub-user-image-template](https://github.com/berkeley-dsep-infra/hub-user-image-template). Click "Use this template" > "Create a new repository". - Be sure to follow convention and name the repo `-user-image`, and - the owner needs to be `berkeley-dsep-infra`. When that is done, create your - own fork of the new repo. - -### Image Repository Settings - -There are now a few steps to set up the CI/CD for the new image repo. In the -`berkeley-dsep-infra` image repo, click on `Settings`, and under `General`, -scroll down to `Pull Requests` and check the box labeled `Automatically delete -head branches`. - -Scroll back up to the top of the settings, and in the left menu bar, click on -`Secrets and variables`, and then `Actions`. - -From there, click on the `Variables` tab and then `New repository variable`. We -will be adding two new variables: - -1. `HUB`: the name of the hub (eg: datahub) - -1. `IMAGE`: the Google Artifact Registry path and image name. The path will -always be `ucb-datahub-2018/user-images/` and the -image name will always be the same as the repo: `-user-image`. - -### Your Fork's Repository Settings - -Now you will want to disable Github Actions for your fork of the image repo. -If you don't, whenever you push PRs to the root repo the workflows *in your -fork* will attempt to run, but don't have the proper permissions to -successfully complete. This will then send you a nag email about a workflow -failure. - -To disable this for your fork, click on `Settings`, `Actions` and `General`. -Check the `Disable actions` box and click save. - -### Enable Artifact Registry Pushing - -The image repository needs to be added to the list of allowed repositories in -the `berkeley-dsep-infra` secrets. Go to the `berkeley-dsep-infra` [Secrets and -Variables](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions). -Give your repository permissions to push to the Artifact Registry, -as well as to push a branch to the [datahub repo](https://github.com/berkeley-dsep-infra/datahub). - -Edit both `DATAHUB_CREATE_PR` and `GAR_SECRET_KEY`, and click on the gear icon, -search for your repo name, check the box and save. - -### Configure `hubploy` - -You need to let `hubploy` know the specifics of the image by updating your -deployment's `hubploy.yaml`. Change the `name` of the image in -`deployments//hubploy.yaml` to point to your new image name, and after -the name add `:PLACEHOLDER` in place of the image sha. This will be -automatically updated after your new image is built and pushed to the Artifact -Registry. - -Example: - -```yaml -images: - images: - - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/fancynewhub-user-image:PLACEHOLDER - -cluster: - provider: gcloud - gcloud: - project: ucb-datahub-2018 - service_key: gke-key.json - cluster: spring-2024 - zone: us-central1 -``` - -Next, add the ssh clone path of the root image repo to [repos.txt](https://github.com/berkeley-dsep-infra/datahub/blob/staging/scripts/user-image-management/repos.txt). - -Create a PR and merge to staging. You can cancel the -[`Deploy staging and prod hubs` job in Actions](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/deploy-hubs.yaml), -or just let it fail. - -## Subscribe to GitHub Repo in Slack - -Go to the #ucb-datahubs-bots channel, and run the following command: - -``` -/github subscribe berkeley-dsep-infra/ -``` - -## Modify the Image - -This step is straightforward: create a feature branch, and edit, delete, or add -any files to configure the image as needed. - -We also strongly recommend copying `README-template.md` over the default -`README.md`, and modifying it to replace all occurrences of `` with -the name of your image. - -## Submit Pull Requests - -Familiarize yourself with [pull -requests](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) -and [repo2docker](https://github.com/jupyter/repo2docker), and create a fork of -the [datahub staging branch](https://github.com/berkeley-dsep-infra/datahub). - -1. Set up your git/dev environment by following the [image templat's -contributing - guide](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/CONTRIBUTING.md). - -1. [Test the image locally](repo2docker-local.qmd) using `repo2docker`. -1. Submit a PR to `staging`. -1. Commit and push your changes to your fork of the image repo, and - create a new pull request at - https://github.com/berkeley-dsep-infra/. - -1. After the build passes, merge your PR in to `main` and the image will - be built again and pushed to the Artifact Registry. If that succeeds, - then a commit will be crafted that will update the `PLACEHOLDER` field in - `hubploy.yaml` with the image's SHA and pushed to the datahub repo. - You can check on the progress of this workflow in your root image repo's - `Actions` tab. - -1. After the previous step is completed successfully, go to the Datahub repo - and click on the [New pull - request](https://github.com/berkeley-dsep-infra/datahub/compare) - button. Next, click on the `compare: staging` drop down, and you should see - a branch named something like `update--image-tag-`. Select - that, and create a new pull request. - -1. Once the checks has passed, merge to `staging` and your new image will be - deployed! You can watch the progress in the [deploy-hubs workflow](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/deploy-hubs.yaml). diff --git a/docs/tasks/new-packages.qmd b/docs/tasks/new-packages.qmd deleted file mode 100644 index cec2f4b68..000000000 --- a/docs/tasks/new-packages.qmd +++ /dev/null @@ -1,133 +0,0 @@ ---- -title: Testing and Upgrading New Packages -aliases: - - ../admins/howto/new-packages.html ---- - -It is helpful to test package additions and upgrades for yourself before -they are installed for all users. You can make sure the change behaves -as you think it should, and does not break anything else. Once tested, -request that the change by installed for all users by by [creating a new -issue in -github](https://github.com/berkeley-dsep-infra/datahub/issues),contacting -cirriculum support staff, or creating a new pull request. Ultimately, -thoroughly testing changes locally and submitting a pull request will -result in the software being rolled out to everyone much faster. - -Install a python package in your notebook -================================== - -When testing a notebook with new version of the package, add the -following line to a cell at the beginning of your notebook. - -``` bash -!pip install --upgrade packagename==version -``` - -You can then execute this cell every time you run the notebook. This -will ensure you have the version you think you have when running your -code. - -To avoid complicated errors, make sure you always specify a version. You -can find the latest version by searching on -[pypi.org](https://pypi.org). - -Find current version of a python package -=============================== - -To find the current version of a particular installed package, you can -run the following in a notebook. - -``` bash -!pip list | grep -``` - -This should show you the particular package you are interested in and -its current version. - -Install/Update a R package in your RStudio -================================== - -When the required version of package is missing in the R Studio, Try the -following command to check whether the default installation repo -contains the package (and the version) required. - -``` R -install.packages("packagename") -``` - -This should install the particular package you are interested in and its -latest version. You can find the latest version of a R package by -searching on [CRAN](https://cran.r-project.org/). - -Find current version of a R package =============================== - -To find the current version of a particular installed package, you can -run the following in RStudio. - -``` R -packageVersion("") -``` - -This should show you the particular package you are interested in and -its current version. - -## Submitting a pull request - -Familiarize yourself with [pull -requests](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) -and [repo2docker](https://github.com/jupyter/repo2docker) , and create a -fork of the the image repo. - -1. Set up your git/dev environment by [following the instructions - here](https://github.com/berkeley-dsep-infra/hub-user-image-template/blob/main/CONTRIBUTING.md). - -2. Create a new branch for this PR. - -3. Find the correct `environment.yml`{.interpreted-text role="file"} - file for your class. This should be in the root of the image repo. - -4. In `environment.yml`{.interpreted-text role="file"}, packages listed - under `dependencies` are installed using `conda`, while packages - under `pip` are installed using `pip`. Any packages that need to be - installed via `apt` must be added to either - `apt.txt` or - `Dockerfile`. - -5. Add any packages necessary. We typically prefer using `conda` packages, and `pip` only if necessary. Please pin to a specific version (no wildards, etc). - - - Note that package versions for `conda` are specified using - `=`, while in `pip` they are specified using `==` - -6. Test the changes locally using `repo2docker`, then submit a PR to `main`. - - - To use `repo2docker`, be sure that you are inside the image - repo directory on your device, and then run `repo2docker .`. - -7. Commit and push your changes to your fork of the image repo, and - create a new pull request at - https://github.com/berkeley-dsep-infra/``. - -8. After the build passes, merge your PR in to `main` and the image will - be built again and pushed to the Artifact Registry. If that succeeds, - then a commit will be crafted that will update the `PLACEHOLDER` field in - `hubploy.yaml` with the image's SHA and pushed to the datahub repo. - You can check on the progress of this workflow in your root image repo's - `Actions` tab. - -9. After 4 is completed successfully, go to the Datahub repo and click on - the [New pull request](https://github.com/berkeley-dsep-infra/datahub/compare) - button. Next, click on the `compare: staging` drop down, and you should see - a branch named something like `update--image-tag-`. Select that, - and create a new pull request. - -10. Once the checks has passed, merge to `staging` and your new image will be - deployed! You can watch the progress [here](https://github.com/berkeley-dsep-infra/datahub/actions/workflows/deploy-hubs.yaml). - -## Tips for Upgrading Package - -- Conda can take an extremely long time to resolve version dependency - conflicts, if they are resolvable at all. When upgrading Python - versions or a core package that is used by many other packages, such - as [requests]{.title-ref}, clean out or upgrade old packages to - minimize the number of dependency conflicts. diff --git a/docs/tasks/rebuild-postgres-image.qmd b/docs/tasks/rebuild-postgres-image.qmd deleted file mode 100644 index 07f8484a9..000000000 --- a/docs/tasks/rebuild-postgres-image.qmd +++ /dev/null @@ -1,24 +0,0 @@ ---- -title: "Customize the Per-User Postgres Docker Image" -aliases: - - ../admins/howto/rebuild-postgres-image.html ---- - -We provide each student on `data100` with a postgresql server. We want the -[python extension](https://www.postgresql.org/docs/current/plpython.html) -installed. So we inherit from the [upstream postgresql docker -image](https://hub.docker.com/_/postgres), and add the appropriate -package. - -This image is in `images/postgres`. If you update it, you need to -rebuild and push it. - -1. Modify the image in `images/postgres` and make a git commit. -2. Run `chartpress --push`. This will build and push the image, *but - not put anything in YAML*. There is no place we can put this in - `values.yaml`, since this is only used for data100. -3. Notice the image name + tag from the `chartpress --push` command, - and put it in the appropriate place (under `extraContainers`) in - `data100/config/common.yaml`. -4. Make a commit with the new tag in `data100/config/common.yaml`. -5. Proceed to deploy as normal. diff --git a/docs/tasks/repo2docker-local.qmd b/docs/tasks/repo2docker-local.qmd deleted file mode 100644 index 27b59a0bc..000000000 --- a/docs/tasks/repo2docker-local.qmd +++ /dev/null @@ -1,68 +0,0 @@ ---- -title: Test User Images Locally ---- - -You should use `repo2docker` to build and test the image on your own device before you push and create a PR. It is often faster to do this first before using CI/CD since you can take advantage of local caching and rapid iteration. There's no need to waste Github Action minutes to test build images when you can do this on your own device. - -## Common Usage - -One can simply run `repo2docker /path/to/image/assets`. For example if one has changed into the directory containing the `repo2docker` files (such as `environment.yml` and/or `Dockerfile`), the command would be: - -```shell -repo2docker . -``` - -This works on Linux and Windows Subsystem for Linux (WSL). It will build the image, then launch jupyter server and display a localhost URL. Copy the URL and paste it into a local web browser. - -If you just want to build the image without also running the server, -add the `--no-run` argument: - -```shell -repo2docker --no-run . -``` - -## On Apple Silicon - -Apple's ARM-based CPUs (the "M" chips) are different from those run on the virtual machines in our clusters. macOS is capable of emulating x86_64/amd64, but it is necessary to optimize docker for this emulation, and to explicitly tell your local docker runtime that the images should be built on the `linux/amd64` platform. - -In Docker's settings: - - - Under **General** > **Virtual Machine Options**, either enable both **Apple Virtualization framework** and **Use Rosetta for x86_64/amd64 emulation on Apple Silicon**, or enable **Docker VMM**. - - Under **Resources** it is also recommended to raise the memory limit to at least 4GB. - -There are two methods for building `linux/amd64` images. The default uses `repo2docker`'s support for `docker-py`, while the second uses a `repo2docker` plugin that can invoke your local docker command-line interface. - -### docker-py (default) - -Run `jupyter-repo2docker` with the following arguments: - -``` -repo2docker \ - --Repo2Docker.platform=linux/amd64 \ - -e PLAYWRIGHT_BROWSERS_PATH=/srv/conda \ - --user-id=1000 --user-name=jovyan \ - --target-repo-dir=/home/jovyan/.cache \ - . -``` - -where the final parameter is the path to the assets or `.` if they are in the current directory. - -The `--user-id` and `--user-name` options are for non-Dockerfile based builds. Images with Dockerfiles do not need those options. - -Note that you may see (possibly harmless) architecture mismatch warnings with this method. - -### `docker` CLI - -You can instruct `repo2docker` to use your machine's local `docker` executable directly rather than the default of `docker-py`. You will first need to install [repo2podman](https://github.com/manics/repo2podman), a plugin that lets you use any container runtime with a command-line user interface similar to that of `docker`. This is useful if you want to leverage [docker buildx](https://github.com/docker/buildx/) (for things like multi-stage builds) or if you want to use an alternative executable like `podman`. This also eliminates architecture mismatch warnings. - -::: {.callout-warning} -repo2podman reportedly does not work yet on WSL. -::: - -``` -repo2docker \ - --Repo2Docker.platform=linux/amd64 \ - -e PLAYWRIGHT_BROWSERS_PATH=/srv/conda \ - --engine podman --PodmanEngine.podman_executable=docker \ - . -``` diff --git a/docs/tasks/transition-image.qmd b/docs/tasks/transition-image.qmd deleted file mode 100644 index b2ba0bb5e..000000000 --- a/docs/tasks/transition-image.qmd +++ /dev/null @@ -1,98 +0,0 @@ ---- -title: Transition Single User Image to GitHub Actions -aliases: - - ../admins/howto/transition-image.html ---- - -Single user images have been maintained within the main datahub repo since its inception, however we decided to move them into their own repositories. It will make testing notebooks easier, and we will be able to delegate write access to course staff if necessary. - -This is the process for transitioning images to their own repositories. Eventually, once all repositories have been migrated, we can update our documentation on creating new single user image repositories, and maintaining them. - - -## Prerequisites - -You will need to install `git-filter-repo`. - -```bash -wget -O ~/bin/git-filter-repo https://raw.githubusercontent.com/newren/git-filter-repo/main/git-filter-repo -chmod +x ~/bin/git-filter-repo -``` - -## Create the repository - -1. Go to https://github.com/berkeley-dsep-infra/hub-user-image-template. Click "Use this template" > "Create a new repository". -1. Set the owner to `berkeley-dsep-infra`. Name the image `{hub}-user-image`, or some approximation of there are multiple images per hub. -1. Click create repository. -1. In the new repository, visit Settings > Secrets and variables > Actions > Variables tab. Create new variables: - a. Set HUB to the hub deployment, e.g. `shiny`. - b. Set IMAGE to `ucb-datahub-2018/user-images/{hub}-user-image`, e.g. `ucb-datahub-2018/user-images/shiny-user-image`. -1. Fork the new image repo into your own github account. - -## Preparing working directories - -As part of this process, we will pull the previous image's git history into the new image repo. - -1. Clone the *datahub* repo into a new directory named after the image repo. - ```bash - git clone git@github.com:berkeley-dsep-infra/datahub.git {hub}-user-image --origin source - ``` -1. Change into the directory. -1. Run `git-filter-repo`: - ```bash - git filter-repo --subdirectory-filter deployments/{hub}/image --force - ``` -1. Add new git remotes: - ```bash - git remote add origin git@github.com:{your_git_account}/{hub}-user-image.git - git remote add upstream git@github.com:berkeley-dsep-infra/{hub}-user-image.git - ``` -1. Pull in the contents of the new user image that was created from the template. - ```bash - git fetch upstream - git checkout main # pulls in .github - ``` - -1. Merge the contents of the previous datahub image with the new user image. - ```bash - git rm environment.yml - git commit -m "Remove default environment.yml file." - git merge staging --allow-unrelated-histories -m 'Bringing in image directory from deployment repo' - git push upstream main - git push origin main - ``` - -## Preparing continuous integration - -1. In the [berkeley-dsep-infra org settings](https://github.com/organizations/berkeley-dsep-infra/settings/profile), visit [Secrets and variables > Actions](https://github.com/organizations/berkeley-dsep-infra/settings/secrets/actions). Edit the secrets for `DATAHUB_CREATE_PR` and `GAR_SECRET_KEY`, and enable the new repo to access each. - -1. In the datahub repo, in one PR: - a. remove the hub deployment steps for the hub: - - *Deploy {hub}* - - *hubploy/build-image {hub} image build* (x2) - - a. under `deployments/{hub}/hubploy.yaml`, remove the registry entry, and set the `image_name` to have `PLACEHOLDER` for the tag. - - a. In the datahub repo, under the deployment image directory, update the README to point to the new repo. Delete everything else in the image directory. - -1. Merge these changes to datahub staging. - -1. Make a commit to trigger a build of the image in its repo. - -1. In a PR in the datahub repo, under .github/workflows/deploy-hubs.yaml, add the hub with the new image under `determine-hub-deployments.py --only-deploy`. - -1. Make another commit to the image repo to trigger a build. When these jobs finish, a commit will be pushed to the datahub repo. Make a PR, and merge to staging after canceling the CircleCI builds. (these builds are an artifact of the CircleCI-to-GitHub migration -- we won't need to do that long term) - -1. Subscribe the *#ucb-datahubs-bots* channel in UC Tech slack to the repo. - ```bash - /github subscribe berkeley-dsep-infra/ - ``` - -## Making changes - -Once the image repo is set up, you will need to follow this procedure to update it and make it available to the hub. - -1. Make a change in your fork of the image repo. -1. Make a pull request to the repo in `berkeley-dsep-infra`. This will trigger a github action that will test to see if the image builds successfully. -1. If the build succeeds, someone with sufficient access (DataHub staff, or course staff with elevated privileges) can merge the PR. This will trigger another build, and will then push the image to the image registry. -1. In order for the newly built and pushed image to be referenced by datahub, you will need to make PR at datahub. Visit the previous merge action's *update-deployment-image-tag* entry and expand the *Create feature branch, add, commit and push changes* step. Find the URL beneath, *Create a pull request for 'update-{hub}-image-tag-{slug}*, and visit it. This will draft a new PR at datahub for you to create. -1. Once the PR is submitted, an action will run. It is okay if CircleCI-related tasks fail here. Merge the PR into staging once the action is complete.