Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support home and search pages in S3 #2922

Merged
merged 6 commits into from
Feb 26, 2021
Merged

support home and search pages in S3 #2922

merged 6 commits into from
Feb 26, 2021

Conversation

escattone
Copy link
Contributor

This provides the necessary support for both the new Yari-based home and search pages. In the main, it does two things:

  1. Does what Django used to do, that is redirect locale-based home-page requests without a trailing slash (e.g., /{locale} --> /{locale}/) as well as search page requests with a trailing slash (e.g., /{locale}/search/ --> /{locale}/search). Inconsistent? Yes and no. No, because that's the way it's been for years on MDN and we're remaining consistent with that, but yes in the sense that we don't handle trailing slashes the same across all URLs.
  2. Always strips the trailing slash from requests on their way to S3. Why? We store the objects in S3 as their path name itself, for example en-us for the English home page, not en-us/index.html, which is what S3 would look for if we kept the trailing slash.

Note. This is already "live" on dev, stage, and prod, so the review is more about looking for any adjustments not whether it works or not.

@ghost
Copy link

ghost commented Feb 14, 2021

This is not a solution is a comment.
As I interpret it without training slash it is called a resource without an extension, a file, in general a computed and non-static resource given that it has no extension, with training slash you search inside a folder leaving the web server the possibility to serve the file by default for that folder which can be a static one placed in that position or computed by a script.

Since the result is computed in this case the correct url is without training slash and without extension.

This allows you to have a hypothetical search folder in the file system with for example the search page inside?

🤔🧂

Base automatically changed from master to main February 16, 2021 16:38
@peterbe
Copy link
Contributor

peterbe commented Feb 16, 2021

if url.match('/en-US'):
  return redirect('/en-US/')
elif url.match('/en-US/'):
   url = url[:-1]  # for the S3 lookup
elif url.endswith('/'):
  return redirect url[:-1]


...
# No need for the...
    if (request.uri.endsWith("/")) {
      // For all requests to S3, strip the trailing slash, since we
      // store the objects as their path name itself, for example
      // "en-us" for the English home page, not "en-us/index.html",
      // which is what S3 would look for if we left the trailing slash.
      request.uri = request.uri.slice(0, -1);
    }

Copy link
Contributor

@peterbe peterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted about this. We/I typed some pseudo code that we agreed is better because it doesn't mention any other things at all.

@peterbe peterbe mentioned this pull request Feb 16, 2021
@escattone
Copy link
Contributor Author

I made changes and deployed this updated mdn-content-origin-request Lambda@Edge function (version 449) to dev and stage. I then spent too much time debugging dev, and realized that I needed to update the default dev behavior's caching and forwarding policies to match stage and prod. So, in summary, the code as it currently stands is "live" on dev and stage for review, but not yet prod.

@escattone
Copy link
Contributor Author

I forgot to mention the most important thing! I realized that I needed to add the request's query string (if present) to the redirect that removes the trailing slash.

Ready for review again.

@escattone escattone requested a review from peterbe February 17, 2021 00:56
@escattone
Copy link
Contributor Author

Thanks @peterbe and @nschonni for your reviews!

@peterbe
Copy link
Contributor

peterbe commented Feb 17, 2021

I tried to do a thorough review. I started typing some comments and the more I wrote the more my confidence waned. To back up my comments I typed some real JS code in a Node prompt to assert my statements. Eventually, it was easier to type it into a real temporary .js file. Eventually, it got messy to the point where I wrote down some very basic unit tests. One thing led to another and my messy testing became a big simulation. So here it is:

function handler(uri) {
  const url = new URL(uri, "http://example.com");
  const pathname = url.pathname;

  const split = pathname.split("/");
  const first = split[1];

  if (pathname === "/") {
    split.splice(1, 1, "en-US");
    if (split.length === 2) {
      split.push("");
    }
    return split.join("/") + url.search;
  }

  // Is it '/en-Us' or '/ZH-Cn/' or '/Ja/docs/foo'?
  if (
    VALID_LOCALES.has(first.toLowerCase()) &&
    VALID_LOCALES.get(first.toLowerCase()) !== first
  ) {
    if (split.length === 2) {
      split.push("");
    }
    split.splice(1, 1, VALID_LOCALES.get(first.toLowerCase()));
    return split.join("/") + url.search;
  }

  // Is it '/en-US' or '/zh-CN' (just the locale, but lacking trailing /)
  if (VALID_LOCALES.has(first.toLowerCase()) && split.length == 2) {
    split.push("");
    return split.join("/") + url.search;
  }

  if (pathname.endsWith("/")) {
    split.pop();
  }

  return `S3 LOOKUP: ${split.join("/") + url.search}`;
}

const tests = [
  ["/", "/en-US/"], // just slash
  ["/?a=b", "/en-US/?a=b"], // just slash and query string
  ["/", "/en-US/"], // just slash
  ["/?a=b", "/en-US/?a=b"], // just slash and query string
  ["/EN-US", "/en-US/"], // wrong case and no trailing slash
  ["/EN-US?next=foo", "/en-US/?next=foo"], // wrong case and no trailing slash and query string
  ["/EN-us/", "/en-US/"], // wrong case
  ["/EN-us/?a=b", "/en-US/?a=b"], // wrong case and query string
  ["/EN-us/docs/Foo", "/en-US/docs/Foo"], // wrong case and more
  ["/en-US", "/en-US/"], // no trailing slash
  ["/en-US?a=b", "/en-US/?a=b"], // no trailing slash and query string
  ["/en-US/", "S3 LOOKUP: /en-US"], // perfect
  ["/en-US/?next=foo", "S3 LOOKUP: /en-US?next=foo"], // perfect and query string
  ["/en-US/favicon.ico", "S3 LOOKUP: /en-US/favicon.ico"], // perfect
  ["/en-US/search?q=w", "S3 LOOKUP: /en-US/search?q=w"], // perfect and query string
  ["/en-US/docs/foo/", "S3 LOOKUP: /en-US/docs/foo"], // trailing slash
  ["/en-US/docs/foo/?a=b", "S3 LOOKUP: /en-US/docs/foo?a=b"], // trailing slash and query string
];

tests.forEach(([test, expect], i) => {
  const got = handler(test);
  console.log(
    i + 1,
    "\t",
    test.padEnd(30),
    got.startsWith("S3") ? "👍🏼" : "👉",
    got
  );
  if (got !== expect) {
    throw new Error("unexpected");
  }
  console.log();
});

I know that if we just implement that, properly, as proper Lambda'esque code, we'll cover all bases.
Is this helpful? Perhaps I can write a "counter-PR" against this PR?

@escattone
Copy link
Contributor Author

escattone commented Feb 17, 2021

@peterbe That is helpful. Here are some of my thoughts:

  • You're adding redirects for the properly-cased locale, which is what Kuma does currently, so I guess we should keep that in Yari too.
  • We don't need the query string (your url.search) for home and document requests (since they're never used), but I guess it doesn't hurt.
  • and probably the main thing, you've inspired me to add to our integration tests to cover these cases.

Let me adjust this PR. It may not be until Friday with everything else going on.

Copy link
Contributor

@peterbe peterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* We don't need the query string (your `url.search`) for home and document requests (since they're never used), but I guess it doesn't hurt.

Be careful with that. :) We might. Who knows.

It would be nice if loading:

  • /en-US/docs/Web?q=a
  • /en-US/docs/Web?q=b
  • /en-US/docs/Web?q=c

means...:

  • Cache MISS
  • Cache HIT
  • Cache HIT

...in CloudFront but that we still can get these into the rendering.
It's not inconceivable that we uses these for something. For example Google Analytics referral query string. Or some other functionality that we haven't thought of yet.

And we know the /EN-us/search?q=foo definitely needs it so it'd be something we have to code in support for anyway.

@@ -6,6 +6,16 @@ const {
encodePath,
slugToFolder,
} = require("@yari-internal/slug-utils");
const { VALID_LOCALES } = require("@yari-internal/constants");

const THIRTY_DAYS = 3600 * 24 * 30;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it THIRTY_DAYS means it's no point making it a variable. It's easy enough to read...

...
cacheControlSeconds: 3600 * 24 * 30,

I think a more appropriate name, if it's even needed is; LONG_CACHE_TTL or something.

// page, not "en-us/index.html", which is what S3 would look for if
// we left the trailing slash.
request.uri = request.uri.slice(0, -1);
} else if (request.uri.endsWith("/")) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this to requests for /{locale}/account/ since that endpoint (in Kuma) requires a trailing slash. If we do, we get a redirect loop, since Kuma redirects /{locale}/account to /{locale}/account/. 🤦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm out of time to solve this today. There are several options available, including routing account requests directly to Kuma, bypassing lambda@edge (but that requires at least 3 new CDN behaviors to handle all of the cases).

escattone and others added 2 commits February 25, 2021 15:28
@escattone
Copy link
Contributor Author

For the record, @peterbe verbally approved me moving forward with this. I've deployed this code to dev, stage, and prod, as well as run the integration tests against stage and prod. The integration tests pass 100% on stage, and will also soon on prod once we release the Yari-built search page.

@escattone escattone merged commit 62ec395 into mdn:main Feb 26, 2021
@escattone escattone deleted the support-home-and-search-pages-in-s3 branch February 26, 2021 00:01
peterbe pushed a commit to peterbe/yari that referenced this pull request Jun 1, 2021
* support home and search pages in S3

* feedbacked

* ensure redirect includes query string

* more redirect cases and tests

* feedback and fixes

* Update deployer/aws-lambda/content-origin-request/index.js

Co-authored-by: Peter Bengtsson <peterbe@mozilla.com>

Co-authored-by: Peter Bengtsson <peterbe@mozilla.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants