-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made legacy bitstream URLs redirect with 301 status code #3062
Made legacy bitstream URLs redirect with 301 status code #3062
Conversation
…e redirect status code to 301 Moved Permanently
…direct_contribute-main # Conflicts: # src/app/bitstream-page/bitstream-page-routing.module.ts # src/app/bitstream-page/legacy-bitstream-url.resolver.spec.ts # src/app/bitstream-page/legacy-bitstream-url.resolver.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexandrevryghem : This isn't working properly for me from either Chrome or from curl
(version 7). In both, I see the same behavior.
- First, for a bitstream without special characters, I do get a 301 redirect, but it's
Location:
is wrongly set to the REST API (/api/core/bitstreams/[uuid]/content
). So, it looks like this fromcurl
(and I see this same result in Chrome DevTools from the "Network" tab):curl --head http://localhost:4000/bitstream/handle/123456789/1361/test_excel.xls HTTP/1.1 301 Moved Permanently X-Powered-By: Express X-RateLimit-Limit: 500 X-RateLimit-Remaining: 499 Date: Mon, 20 May 2024 19:06:02 GMT X-RateLimit-Reset: 1716232004 Cache-Control: max-age=604800 Link: <http://localhost:4000/signposting/linksets/9a171726-3060-4859-9d0d-8c147cb148c5> ; rel="linkset" ; type="application/linkset" , <http://localhost:4000/signposting/linksets/9a171726-3060-4859-9d0d-8c147cb148c5/json> ; rel="linkset" ; type="application/linkset+json" , <http://localhost:4000/items/9a171726-3060-4859-9d0d-8c147cb148c5> ; rel="collection" ; type="text/html" Location: http://localhost:8080/server/api/core/bitstreams/8443db73-d5c4-4069-a438-f04fe6d02f57/content Vary: Accept Content-Type: text/plain; charset=utf-8 Content-Length: 127 Connection: keep-alive Keep-Alive: timeout=5
- The proper behavior here (for Google Scholar & similar) would be to perform a 301 Redirect to the new User Interface path, e.g. http://localhost:4000/bitstream/handle/123456789/1361/test_excel.xls might 301 redirect to http://localhost:4000/bitstreams/[uuid]/download (which is the URL you see when you hover over the download link in the UI). It's necessary for this to be a User Interface path for SEO purposes, as we want this link associated with indexing the UI.
- I do realize this would mean we'd have a 301 Redirect (to
/bitstreams/[uuid]/download
in the UI) followed by a 302 Redirect (to/api/core/bitstreams/[uuid]/content
in the REST API). But, the purpose of the 301 is to provide the new UI download URL to Google Scholar (and similar indexers), while the 302 is then to actually send along the content.
- Unfortunately, if I attempt to use a bitstream with a special character, I now receive a 404 response both in Chrome and in
curl
. So, if I paste a URL like this into Chrome: http://localhost:4000/bitstream/handle/123456789/1361/test_pdf_ć.pdf , I see that the browser encodes the special characterć
into%C4%87
and the resulting URL returns a 404 response. I can also reproduce this fromcurl --head
when usingcurl
version 7.81.0
My suspicion is that the URL encoding that occurs is tripping things up. This might be related to the issues (at least the 200 OK ones) in #2727 as well, though I'm not sure.
…id bitstream urls in combination with a HardRedirectService#redirect, this will make ensure the redirect is visible for curl instead of being performed by Angular
…itstream-redirect_contribute-main # Conflicts: # src/app/bitstream-page/legacy-bitstream-url-redirect.guard.spec.ts # src/app/bitstream-page/legacy-bitstream-url-redirect.guard.ts
@tdonohue: Thnx for the feedback! The first issue you mentioned with the legacy bitstream URL directly pointing to the backend has been solved. This was because there was actually no real redirect because the redirect was handled by Angular, so curl wasn't able to see this internal redirect. For the second issue, I was not able to reproduce it with my local setup even without these new changes so could you check if you can still reproduce this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexandrevryghem : I can confirm that the 301 redirect is now working properly. So, I'm basically a +1 to this PR.
However, I'm still getting a 404 Error whenever I try to use these legacy bitstream URLs with any web browser. I'm a bit surprised you cannot reproduce this on your end, as I see the same thing in Firefox (v126.0), Chrome (v125.0.6422.77 (Official Build) (64-bit)) and even Microsoft Edge (v125.0.2535.51 (Official build) (64-bit))
In any of those browsers, if I copy a URL like this into my browser window, I always end up at DSpace's 404 page:
http://localhost:4000/bitstream/handle/123456789/1361/test_pdf_ć.pdf
However, if the URL has no special characters, then I see the 301 followed by the 302 in all browsers. For example:
http://localhost:4000/bitstream/handle/123456789/1361/test_pdf.pdf
So, overall, I think this PR fixes the 301 redirect issue. But, I'm still seeing the open bug (#2727) related to bitstream redirects with special characters.
If you could retest these redirects in other browsers, or see if you have colleagues who could retest them, I'd appreciate it. I'm confused as to why I get consistent errors for redirects with special characters (in every browser/tool that I try), but you do not.
@tdonohue: I retested it and asked someone else to also retest it and I think that it might just be because the URL you used wasn't valid. Could you maybe try and check if you can reproduce it by running this branch in prod mode with
I added their links in the metadata under |
@alexandrevryghem : I tried your example item on the Sandbox site, and I can verify that Item works for me. I run this PR locally on http://localhost:4000 pointed at the Sandbox backend, and I see the 301 redirect followed by the 302 redirect, even for the files which have special characters. I tried this in several browsers and they all work! However, strangely, this code doesn't work locally. I tried copying your example item to my own backend running in Docker on http://localhost:8080. I copied it by exporting it using the It's baffling to me that I can run the same UI code via http://localhost:4000/ and it works when pointed at https://sandbox.dspace.org/server/ but doesn't work when pointed at http://localhost:8080/server/. I have no idea why this is happening. But, at this point, I'm assuming it must be something in my local Docker setup, since it seems to only be reproducible on my machine. In any case, since you've proven this works with the Sandbox as a backend, I'll merge this as-is and mark #2727 as "fixed" by this. I appreciate you investigating this further and showing that it works on the Sandbox! I have no idea why the same item won't work locally, but it definitely seems like your fixes are working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks again @alexandrevryghem ! Merging this as it works properly when I use https://sandbox.dspace.org/server/ as the backend.
Successfully created backport PR for |
References
Description
This fixes an issue where legacy bitstream URLs are redirected with status code 302 instead of 301 (permanent URL redirect).
Instructions for Reviewers
List of changes in this PR:
Guidance for how to test & review this PR:
curl --head http://localhost:4000/bitstream/handle/123456789/258/Money%20and%20Emerging%20Adults.pdf
and verify that the status code is now 301curl --head http://localhost:4000/bitstream/handle/123456789/258/Money%20and%20Emerging
and verify that the status code for an invalid url is still 404Checklist
yarn lint
yarn check-circ-deps
)package.json
), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.