Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/BLD: host doc build artefacts of PRs #17921

Closed
jorisvandenbossche opened this issue Oct 19, 2017 · 10 comments
Closed

DOC/BLD: host doc build artefacts of PRs #17921

jorisvandenbossche opened this issue Oct 19, 2017 · 10 comments
Assignees
Labels
Build Library building on various platforms CI Continuous Integration Docs
Milestone

Comments

@jorisvandenbossche
Copy link
Member

In scikit-learn it is possible to see the docs for a PR (see contributing guide: http://scikit-learn.org/stable/developers/contributing.html#documentation) because they build and upload it with circle-ci, and make it available from a canonical url, eg http://scikit-learn.org/circle?13937/documentation.html

Would be cool to do something similar for pandas. Certainly for PRs that rework part of the documentation, it is useful to check how it is looking.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 13, 2021

cc @afeld for awareness.

IMO the lowest-effort path forward is to

  1. Always rsync the built docs to the webserver, not just for pushes to master. We'd put it in something like /pr/<PR>/. Probably don't need to do specific commits. I think rsync will just overwrite if there are subsequent pushes.
  2. Set up a cron job on the webserver to clear everything in pr/ nightly.

@jorisvandenbossche
Copy link
Member Author

Set up a cron job on the webserver to clear everything in pr/ nightly.

Would it become too complex to only clear the pr/ sub-directories that haven't been touched in eg a week? Otherwise for a PR of only a day old, the docs are not viewable anymore (if there was no new push), making reviewing the built docs more difficult.

@datapythonista
Copy link
Member

I think we should be able to use GitHub actions, to do the sync the docs into the server, and also to delete the directory once the PR is merged or closed.

I think there could be a cron job that with certain periodicity deletes the directory of the PRs that are not open anymore, just in case.

I see two challenges:

  • We can't access GitHub secrets on jobs of PRs, only on master (not sure if webhooks can be used to notify the server to pull the docs, instead of the job pushing them)
  • We will have as many copies of the docs as open PRs (not sure how much are the rendered docs right now, but we need that multiplied by 150x or 200x)

Maybe an option would be instead of publishing the docs of every PR, we could implement an action that when someone writes /docs in a comment, we publish them. The implementation can be a bit simpler (it would be separated from the PR jobs), and would save space in the server. I think for many PRs we don't care that much about the rendered docs.

@jorisvandenbossche
Copy link
Member Author

We can't access GitHub secrets on jobs of PRs, only on master (not sure if webhooks can be used to notify the server to pull the docs, instead of the job pushing them)

This was my main question when we were discussing this earlier today: is this possible security wise to push from a PR to our server?

I agree that a comment bot would already be very nice (we recently added a @github-actions run pre-commit), and that could limit the amount of unneeded stored doc builds.

If pushing to our server from a PR can't work, I was thinking of a potential alternative: we could also have a separate github repo to just host PR build reviews on github pages (assuming it might be easier to push to a github repo from github actions in a branch compared to pushing to our server, but that's just a guess)

@datapythonista
Copy link
Member

To access both our server or a separate repo we should have a key that should be stored as a secret, so I don't think that's an option (unless there is something I missing, but I don't think so).

The best approach that came to my mind is having a call like curl https://our-server.com/simple-service/?pr=1234 in our build for a PR, and build that service that pulls the artifact of the specified pr, and serves it via http.

There may be other ways, but this is the simplest that came to my mind.

@jorisvandenbossche
Copy link
Member Author

So actually saving the built docs as an artifact is something we could already start doing (https://docs.github.com/en/actions/guides/storing-workflow-data-as-artifacts).

And then some service to get this and host it somewhere else.
It doesn't seem that simple to get the URL to the artifact, though, eg see the discussions in actions/upload-artifact#50

@afeld afeld self-assigned this Jan 25, 2021
@afeld
Copy link
Member

afeld commented Jan 25, 2021

Made the built website+docs available for download, as a starting point: #39390

@jreback jreback added this to the 1.3 milestone Feb 11, 2021
@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

this can be closed as merged #39390 unless further enhancements needed

@jorisvandenbossche
Copy link
Member Author

Let's keep this open, as ideally we still serve the docs online as a preview

@jorisvandenbossche jorisvandenbossche changed the title DOC/BLD: upload doc build artefacts of PRs DOC/BLD: host doc build artefacts of PRs Feb 11, 2021
@afeld
Copy link
Member

afeld commented Feb 11, 2021

@jorisvandenbossche We have #38996 for that, so I'm going to go ahead and close this one.

@afeld afeld closed this as completed Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms CI Continuous Integration Docs
Projects
None yet
Development

No branches or pull requests

5 participants