Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cache control and invalidation for docs site (prep for Fastly) #870

Merged
merged 11 commits into from
Mar 9, 2019

Conversation

tobiasmcnulty
Copy link
Member

@tobiasmcnulty tobiasmcnulty commented Feb 23, 2019

Currently, djangoproject.com uses a per-site cache with a 5-minute timeout, which while appropriate for www & dashboard seems like overkill for docs:

https://github.com/django/djangoproject.com/blob/master/djangoproject/settings/common.py#L27

The vast majority of the documentation (esp. for releases) will rarely ever change. As discussed some below, the biggest issue we'll likely have to contend with is serving out-of-sync pages, e.g., a new release could in theory show up for the "Prev" link on the detail page for a point release (e.g., https://docs.djangoproject.com/en/2.1/releases/2.1.6/) but be missing from the TOC (https://docs.djangoproject.com/en/2.1/releases/) until the latter expires (I've seen a similar issue with Read The Docs).

I've updated this PR to set a Surrogate-Control header that Fastly will use to determine how often it refetches from the origin. Browsers will still see the 5-minute Cache-Control header and re-fetch from Fastly at least that often.

Then, in update_docs, we purge Django's per-view cache (just for docs pages, with a new 'docs-pages' cache in Redis) and the Fastly cache any time there's a change in git. If we're sure the change was only to the dev docs, we purge only that section of the site in Fastly; otherwise, we purge everything for simplicity.

Per @MarkusH's suggestion I've added a separate cache just for docs.views.document() to keep this change from clearing all other cached pages on the site at the same time. I opted to go with Redis because it's simple to install and includes support for multiple databases, in the event we need other more granular caches in the future. For this cache I chose DB 2 not because anything else is using DB 1, but to future-proof it against other apps that may not think to set a DB when connecting to Redis.

@timgraham
Copy link
Member

The main issue I see with a delay is when doing security releases. It would be confusing if we didn't delay the blog post until the release notes are updated.

@tobiasmcnulty
Copy link
Member Author

The new page would still be accessible as soon as the docs rebuilt. But, any older cached pages not have a link to the new one until the cache expires.

We could theoretically pick a set of key pages to invalidate when a release is pushed out, or even selectively set a shorter cache time for the release notes pages (in particular the TOC).

@timgraham
Copy link
Member

Often there will be existing "stub release notes" for security releases that are pushed out before a release so it would be nice to have a solution.

@tobiasmcnulty
Copy link
Member Author

tobiasmcnulty commented Feb 23, 2019

Ah, yes that could cause some confusion. What do you think of keeping the timeout at 5 minutes for anything under /releases/ and setting it to an hour (or more) for other URLs? Are there others that would be good to update more frequently? We could theoretically do the same thing for the whole dev tree if that would be useful, for example.

@timgraham
Copy link
Member

Might be fine. There are also rare cases where a security patch modifies other documentation.

@timgraham
Copy link
Member

Thinking about it more, I feel like were inevitably going to get occasional reports when someone is looking at some release notes for a new feature in the development version and the link to the feature won't work since the rest of the docs haven't updated. Maybe the development version of the docs shouldn't be cached longer either. I don't know -- probably worth writing to the mailing list to get a consensus.

@tobiasmcnulty tobiasmcnulty changed the title extend cache for docs document() view to 1 hour add cache control and invalidation for docs site (prep for Fastly or other downstream cache) Feb 24, 2019
@tobiasmcnulty tobiasmcnulty changed the title add cache control and invalidation for docs site (prep for Fastly or other downstream cache) add cache control and invalidation for docs site (prep for Fastly) Feb 24, 2019
# make some allowance for temporary network failures for our .post() request below
retry = Retry(
total=5,
method_whitelist=frozenset(['POST']),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, but does this have to be a frozenset? Would {'POST'} not be OK here?

# If only the dev docs have changed, we can purge only the surrogate key we've set
# up for the dev docs release in Fastly. This will usually happen with every new commit
# to django master (on the next hour, when the cron job runs).
url = '%s/purge/dev-docs-key' % fastly_service_url
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a urljoin here in case fastly_service_url is modified to include a trailing slash at any point?

@tobiasmcnulty
Copy link
Member Author

@timgraham @MarkusH @apollo13 Any objections to merging and deploying this? I've added a separate cache just for the docs pages, so I think this should be good to go. But, please take a look and let me know what you think.

@tobiasmcnulty tobiasmcnulty merged commit 0a6cf5c into master Mar 9, 2019
@tobiasmcnulty tobiasmcnulty deleted the docs-cache branch March 9, 2019 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants