Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supplemental development process for Docker #1319

Merged
merged 9 commits into from
Aug 24, 2022

Conversation

benjaoming
Copy link
Contributor

Updated the approach from #1312

  • Docker environment no longer used in CircleCI
  • Added support for using npm run dev by proxying commands to make docker-run command=<npm-command>
  • Documentation updated
  • Found an approach to install Python package in editable mode without writing to the external environment (that's the intention!)

@benjaoming benjaoming requested a review from a team as a code owner August 19, 2022 12:57
@benjaoming benjaoming marked this pull request as draft August 19, 2022 13:15
@benjaoming benjaoming marked this pull request as ready for review August 19, 2022 13:29
@benjaoming benjaoming added this to the 1.1 milestone Aug 19, 2022
@agjohnson
Copy link
Collaborator

So, similar to the site-community change to the workflow, isn't this what you'd use docker compose for? Seems this is reproducing much of what that gives us already -- defined volumes, ports, image building.

docs/contributing.rst Outdated Show resolved Hide resolved
@benjaoming
Copy link
Contributor Author

So, similar to the site-community change to the workflow, isn't this what you'd use docker compose for? Seems this is reproducing much of what that gives us already -- defined volumes, ports, image building.

Yes, I have noted the choice several places: Either you have docker-compose as a dependency or you write a rather long docker run command. I was happy about this one in particular because I didn't have to add docker-compose in CI (when I was doing that). But it also saves at least a line of text in the instructions for setting this up :)

@agjohnson
Copy link
Collaborator

Aye. We're already using docker compose in other repos, so seems that should be safe to include as a dependency here too. A homebrew approach is almost certainly bound to cause more problems than introducing docker compose would cause, and docker compose is way more approachable.

I still don't think we want to go as far as to worry about docker in CI, as we aren't using that pattern in any of our repos.

@benjaoming
Copy link
Contributor Author

Aye. We're already using docker compose in other repos

I disagree. This repo is much more public-facing.

@benjaoming
Copy link
Contributor Author

Anyways, this works, and since you're not using Docker in your development environment and this is just a supplemental change, I would like to conclude it without adding docker-compose. I'm totally open to adding it later 👍 But perhaps we can wait and see if there's a CI integration coming up as well 🔮

@agjohnson
Copy link
Collaborator

agjohnson commented Aug 19, 2022

I disagree. This repo is much more public-facing.

Regardless of who is using it, I don't think asking for docker-compose is that high of a barrier. I would say it's a much lower barrier, as I can work with docker compose, but would not feel comfortable mucking around with a homebrew replacement for docker compose. I'm sure other contributors will feel the same.

Dockerfile Outdated Show resolved Hide resolved

docker-copy-assets:
docker cp "$(shell cat .container_id):/project/sphinx_rtd_theme" .
docker cp "$(shell cat .container_id):/project/package-lock.json" .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And discussed elsewhere already, but just noting here, this homebrew configuration seems to overlap with what docker-compose provides. I'd put some preference on using standard tooling, as it's more approachable.

@humitos would have the most input/opinions here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about homebrew, it's a MacOS thing right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of docker cp is to avoid a write operation on a bind mount because file system permissions are usually troublesome, so it can be better to be explicit about what you expect to get from the docker runtime.

I'm already fixing this on your request @agjohnson

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now removed, the docker-compose command mounts the current git repo and builds inside that one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about homebrew, it's a MacOS thing right?

I meant this in the sense that this is a homebrewed process -- it is a manual, one-off creation, as opposed to a standardized, repeatable process using docker-compose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand what this step is for. Why are we copying these files from inside the container into the host?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to backtrack again from the approach where a host directory was mounted and written to by the container.

The reason was that I had issues with a mix of files owned by the user and files owned by root. I also needed to run npm cache clear and I never understood why.

But the fundamental reason was that I was mounting host system stuff into the container and writing irrelevant files there.

The best approach IMO is to keep everything containerized and copy out the artifacts directly from their known location. If you can separate out a bind mount and ensure that ONLY intended artifacts are written there, then that's also a fine approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Yeah, creating files inside the container that then are required outside the container usually generates permission problems. The only way to do that correctly is mapping the UID/GID between the host and the container and that is where things get more complicated. We've been there and I didn't enjoyed it 😄

I like the pattern you used here with docker cp for this requirement, where files generated inside the container are required to be distributed/used outside the container.

We currently have this problem when creating migrations on Read the Docs, for example, where the resulting .py files is owned by root and left in a directory owned by "your user" in the host. Then you have to run sudo chown to fix the permissions on that file.

@agjohnson agjohnson requested a review from humitos August 19, 2022 22:33
@benjaoming
Copy link
Contributor Author

This additions is high priority for working through the remaining PRs for 1.1 to be fast-tracked. Since it's an optional and supplemental part of the development/build process, I'll merge it.

I'm quite sure that I'll be back with some adjustments, since this isn't really perfect.

@humitos as you can see from the current setup, there's a bit of simplification work left in the separation logic between docker image, docker runtime and docker host. I'll happily introduce this on a call :)

docker-npm-build:
rm -f .container_id
docker-compose run -d sphinx_rtd_theme build > .container_id
docker container wait "$(shell cat .container_id)"
docker cp "$(shell cat .container_id):/project/sphinx_rtd_theme" .
docker cp "$(shell cat .container_id):/project/package-lock.json" .
@echo "Done building"

@benjaoming benjaoming merged commit eb3fb18 into readthedocs:master Aug 24, 2022
@benjaoming benjaoming deleted the dockerize branch August 24, 2022 15:08
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great. I just asked a bunch of questions to understand the workflow a little more.

Comment on lines +1 to +2
# This implicitely includes Python 3.10
FROM node:14-alpine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use Ubuntu here if possible. Mainly to keep consistency with the rest of the repositories we have. Also, because I've found different weird compilation issues when working with Alpine and the saving in space is not really huge.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're on the same page here and Ubuntu images would be the best way to go.

Dockerfile Show resolved Hide resolved
Comment on lines +49 to +52
# Copy in files that we need to run the project. These files
# will not be mounted into the runtime from external environment
# so we copy them during build.
COPY webpack.common.js webpack.dev.js webpack.prod.js /project/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we don't mount them via docker compose instead of copying them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can mount stuff at runtime. This is build time, so we need to copy into the image. The goal is to have node packages built together with the image.. however, if I copy the ENTIRE repo, then Docker's cache will be invalid everytime I build the image, so I have to build it over and over again.. so I copy the minimal set of files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure I follow the need for avoiding mounts. Could you clarify what issue are you trying to solve? There is probably prior art to reuse in our application Docker configuration, whether that is permissions issues or avoiding host filesystem artifacts.

I would look at what we do in the application docker processes, as we don't seem to have the issues that this docker config is trying to solve. I would put strong preference on reusing patterns that are working for us already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what issue are you trying to solve?

Issue is to create a reusable and containerized environment for development. As the development currently also has responsibility for building the production assets, then that's also solved, but I understand that we want to remove these artifacts from Git as soon as possible.

I'd like to propose that the development container specification can be reused for CI. I can open an issue for that.

I would put strong preference on reusing patterns that are working for us already.

Or perhaps developing and improving those patterns? :)

The reason I put docker-compose with a bind mount was to echo what's in readthedocs.org, although I'd always recommend to keep the mounted set of files to an absolute minimum, especially so that files from the container aren't dumped on the host.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue is to create a reusable and containerized environment for development.

This seems like the overall goal of your change, but I'm more curious what issue you are hitting that requires this.

The main concern here is that we're making a complex solution for a problem we don't really have currently. The initial goal is to give contributors easy environment creation, which this setup does. But absolute isolation isn't currently a strong roadmap concern.

A longer term discussion point should be describing why isolation of build files is necessary for us. There are general arguments for this, but I'm not sure they are enough to bump up the priority of this work to an immediate team priority.

I'd like to propose that the development container specification can be reused for CI

What issue are you hitting that requires this change? The reasons to stick with the current pattern are that it's already providing isolation and deterministic builds, and there is more prior art and information on using CircleCI without custom Docker images.

So switching this feels a bit like sideways progress, but might also lead to us spending more time scratching our heads.

Or perhaps developing and improving those patterns?

It depends on the problems you are trying to solve, what design decision we come to, and where we place the work on our roadmap. The place to start changes like this is with discussion first, and I would suggest opening an issue on meta. From there, we'd find a place for the work on our roadmap if we all feel it's a priority.

Whatever we decide, sticking to a singular pattern is the main concern. One-off processes will add confusion and friction to maintenance.

Copy link
Member

@humitos humitos Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjaoming

You can mount stuff at runtime. This is build time, so we need to copy into the image

I don't think there is need to copy files into the container that are not used at build time. If these files are used at run time, they should be mounted at run time. Otherwise, modifying any of these files would require rebuilding the Docker image when it's not if they are mounted.

Actually, that's why I commented on this particular line and not the previous one (https://github.com/readthedocs/sphinx_rtd_theme/pull/1319/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R26-R28) where, to me, it's a justified used of COPY and makes total sense there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files are used at build time, they are required for package.json and npm install to work. We could perhaps mock them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... in that case, I don't follow it. These files are copied after running npm install in the Dockerfile: https://github.com/readthedocs/sphinx_rtd_theme/pull/1319/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R36-R38

Also, the comment right above copying these files says they are required to run the project.

Following these two things, I'd think they are not required at build time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm giving some pretty bad off-hand answers right now. I'll have to go over this again and give a proper answer. As I said, the idea is to avoid having anything mounted into the runtime environment to avoid making a mess in the host system's files.

I think this is easier to go through over a meeting than async.


docker-copy-assets:
docker cp "$(shell cat .container_id):/project/sphinx_rtd_theme" .
docker cp "$(shell cat .container_id):/project/package-lock.json" .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand what this step is for. Why are we copying these files from inside the container into the host?

Comment on lines +8 to +9
rm -f .container_id
docker-compose run -d sphinx_rtd_theme build > .container_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is no need to do rm -f since you are redirecting the output with > in the next command. If it exists, it will overwrite it with the new output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha yes, will remove it.... 'twas because docker run could write to a file - now I need to detach in order to get the ID because docker-compose run doesn't have the same option that docker run has.

Comment on lines +87 to +88
# Runs the development webserver
$ docker-compose run sphinx_rtd_theme dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be better to use docker-compose up here which is the standard and call whatever is required behind the scenes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker-compose run is doing what docker run does but reading the docker-compose configuration - I start the service that I want and give an argument dev to the entrypoint. docker-compose up is a different type of command.

Comment on lines +90 to +94
# If you want to copy stuff out of the Docker environment, run this make
# target or read the actual Makefile to see what is going on.
# We suggest running this command every time that you want to quickly build
# new CSS/JS assets
$ make docker-build-all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using invoke to handle this type of extra commands in other repositories. I'm not opposite to use Makefile, tho, just saying 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also prefer to have a singular pattern to use, as much as I like make.

Copy link
Contributor Author

@benjaoming benjaoming Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Containers provide a quick way to get started. make exists on most systems and doesn't sit on top of an additional stack that's customized by the project.

With invoke, we'd have to firstly bootstrap a Python environment with the correct version of invoke - or as a new contributor understand what invoke even is. I think there's a better chance that people can copy-paste make commands without worrying about setting up an environment for make.

If the feeling is "let's not have Make", then I would rather have a shell script for running the docker-build-all target, since that's the only command that chains a few things together and should be scripted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still agree that we should stick with a single pattern and use Invoke here. Both Python and Invoke are a normal part of our development workflow and already in use in other repositories, so are reasonable requirements for contributors. The goal is not to make this repository universally accessible as possible, it's just to make a repeatable environment that is easy for the core team to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants