-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cloud-function): redirect non-canonical URLs #11151
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a way that doesn't add another fetch please?
This reverts commit 1c76989.
1. Build `sitemap.txt`, with all pages from all sitemaps. 2. Build `sitemap.json` for the Cloud Function, similar to the `redirects.json`. 3. Use middleware to redirect all URLs that match after normalization.
Here's my second approach, utilizing a global sitemap. |
cloud-function/src/build-sitemap.ts
Outdated
@@ -0,0 +1,39 @@ | |||
import { readFile, stat, writeFile } from "node:fs/promises"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, why not create the json instead of the .txt? From what I see, this does not need to be in cloud function and it's just on line in the build/sitemaps.ts
.
I mainly want be aware of what's the contract between building and the cloud function and I think the json makes a better api.
If there's a strong reason to keep this I'd just have ask to use a Object instead of the Map since we're putting it into and Object any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, why not create the json instead of the .txt?
I went for text format, because text sitemaps are a thing and it's a simpler format with fewer assumptions than a JSON, although I'm not technically adhering to the sitemap convention of using fully qualified URLs.
I mainly want be aware of what's the contract between building and the cloud function and I think the json makes a better api.
Are you suggesting to publish a sitemap.json
with a mapping of normalized path to canonical path, to be used by other consumers potentially? That would actually be my concern that other parties start using this JSON if they come across it. On the other hand, would we change it?
If there's a strong reason to keep this I'd just have ask to use a Object instead of the Map since we're putting it into and Object any way.
Done in d2e58ca.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was suggesting to build the same sitemap.json
as in this file. I wouldn't call it sitemap though.
I would not publish that file. Is there a benefit for having the sitemap as text and as XML? If we go for only txt sitemap I'm all in for this. Otherwise this ads complexity and we potentially have to support two sitemaps formats from now on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind renaming (done in c9a9474), and I could surely build that map as part of the build, but I don't think it reduces complexity:
- I need to get all URLs from all sitemaps, so either I gather them as a side-effect of building the sitemaps, or I go over all (gzip-compressed) XML sitemaps later, parsing out the URLs.
- We build and deploy the Cloud Function after we sync the build to Cloud Storage, and
gsutil
doesn't support exclusion very well (cf. theplain.html
files that we sync), so we might need to prevent access to that file in the Cloud Function. - We definitely want to keep using the XML sitemap for search engines (as advised in
robots.txt
), because the text format doesn't support modification dates. - We would need to extract the
normalizePath()
function into a lib shared by build and cloud function, and I assume this would no longer be possible with rari. We could then duplicate the logic, but it's still risky.
Overall, I would recommend to keep the current approach.
2913e3a
to
d2e58ca
Compare
This pull request has merge conflicts that must be resolved before it can be merged. |
cloud-function/src/app.ts
Outdated
@@ -39,6 +40,7 @@ router.all( | |||
router.all("/submit/mdn-yari/*", requireOrigin(Origin.main), proxyTelemetry); | |||
router.all("/pong/*", requireOrigin(Origin.main), express.json(), proxyPong); | |||
router.all("/pimg/*", requireOrigin(Origin.main), proxyPong); | |||
router.use(redirectNonCanonicals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reorder the routes here? Move everything that doesn't need the redirectNonCanonicals
up (I guess only get / and the assests).
Otherwise ready to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fiji-flo I moved redirectNonCanonicals
down instead (and the search-index.json
route up next to the assets route), to make the change less invasive: https://github.com/mdn/yari/pull/11151/files/55d08d37374c549d154529d8228dabcca4fc9a2d..45501138daec268e2fdb0dfa7a6ab9229f99f4b3
0a78822
to
353c553
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Previously, we served the same content independent of capitalization: - https://developer.mozilla.org/en-US/docs/web (incorrect capitalization) - https://developer.mozilla.org/en-US/docs/Web (correct capitalization) Now, we build a map to match the requested against the canonical URL, redirecting if necessary.
Summary
(MP-1111)
Problem
We currently serve the same content independent of capitalization:
Solution
Fetch themetadata.json
and match the URL (excluding search parameters) against themdn_url
in it.Build a canonical map to determine the canonical URL for a requested URL, and redirect if the requested URL differs.
How did you test this change?
npm ci && npm start
in/cloud-function
.http://localhost:5100/en-US/docs/web?foo=bar
redirects tohttp://localhost:5100/en-US/docs/Web?foo=bar
.