Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add robots.txt to specify doc versions to appear in search engines #3291

Merged
merged 3 commits into from
Jan 11, 2021

Conversation

dhalbert
Copy link
Collaborator

Fixes #3263 in a simple way, adding a static robots.txt file. The stable version specified in the robots.txt needs to be updated when it changes; it's currently 5.3.x. The robots.txt could generated automatically, though I'm not sure how off the bat.

@jepler
Copy link
Member

jepler commented Aug 17, 2020

How will this affect URLs on circuitpython.readthedocs.io such as https://circuitpython.readthedocs.io/projects/display_text/en/latest/ -- will search engines still be permitted to index them? The way I read it, they would be forbidden.

@dhalbert
Copy link
Collaborator Author

How will this affect URLs on circuitpython.readthedocs.io such as https://circuitpython.readthedocs.io/projects/display_text/en/latest/ -- will search engines still be permitted to index them? The way I read it, they would be forbidden.

Good point: I mistakenly thought those were under another URL. I'll add a prefix to robots.txt.

@dhalbert
Copy link
Collaborator Author

This needs a lot more thought. Converting to draft. https://circuitpython.readthedocs.io/projects, for instance, is not a searchable top-level point.

@dhalbert dhalbert marked this pull request as draft August 17, 2020 14:15
@sommersoft
Copy link
Collaborator

sommersoft commented Aug 17, 2020

I think that the sub-project pages will be allowed via the Allow: /*/latest/, after reviewing the syntax for robots.txt and using an online tester.
rtd_robots_test
EDIT: re-tested with all of the entries, vs just the Allow: /*/latest/.

Probably worth getting this in, and seeing the result after a couple days. Google's documentation does say not to rely on robots.txt and use meta tags instead. It may end up being a multi-pronged approach.

@hierophect
Copy link
Collaborator

Following the rabbit hole on the original issue I linked, I saw a number of discussions involving tags vs robots, so there may be some resources available off that original discussion.

@tannewt
Copy link
Member

tannewt commented Oct 2, 2020

Where are we at on this? It's marked as blocking 6.0.0

@dhalbert
Copy link
Collaborator Author

I would like to revive this, and have un-drafted it. @sommersoft's old comment may assuage my original concerns.

Info from readthedocs: https://docs.readthedocs.io/en/latest/hosting.html#custom-robots-txt-pages

@dhalbert dhalbert marked this pull request as ready for review January 10, 2021 18:59
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for circling back on this!

@tannewt tannewt merged commit b669b62 into adafruit:main Jan 11, 2021
@dhalbert dhalbert deleted the robots.txt branch January 11, 2021 23:49
@dhalbert
Copy link
Collaborator Author

I hope it works! We'll have to take a look at the search results after a while.

@hierophect
Copy link
Collaborator

How fast do we expect this to propagate? Checking it today, it still shows out of date documentation as the first result.

Results from today (incognito). First link is to the 2.x version, second is to latest.
Screen Shot 2021-01-14 at 2 11 09 PM

If it's too early, hopefully this image is a helpful baseline to test again - ideally we should only see the second result appear in search.

@dhalbert
Copy link
Collaborator Author

@hierophect
Copy link
Collaborator

As of today, I'm still seeing results from 2.x and 3.x as the first result on Google, but it's now hiding the content. Typically the Latest docs are the second result. So... partial success? At least it's now easy to tell from the search results page which links are going to go to Latest, even if the bad links still show up.

Screen Shot 2021-03-08 at 12 40 02 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent old versions of the documentation from being indexed by Google
5 participants