Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google search results are showing French translations for English language links #868

Closed
rhelms opened this issue Feb 19, 2019 · 31 comments
Closed
Labels

Comments

@rhelms
Copy link

rhelms commented Feb 19, 2019

Recently in my google searches for django things, French translations have been showing for English links.

For example, this search (https://www.google.com/search?client=ubuntu&channel=fs&q=django+refresh_from_db&ie=utf-8&oe=utf-8) resulted in https://docs.djangoproject.com/en/2.1/ref/models/instances/ being the displayed URL, but French is displayed.

Luckily, django methods are all in English, but could be an issue if I was searching for django concepts that did have a method associated.

2019-02-20-094421_1916x1053_scrot

@timgraham
Copy link
Member

@tobiasmcnulty - could this be related to your recent changes?

@tobiasmcnulty
Copy link
Member

tobiasmcnulty commented Feb 20, 2019

That's...odd. I don't think anything I've changed could be causing that, but I suppose anything is possible.

Could this be the inverse of #805?

I did notice a potential, related issue while working on language activation in the umerged PR #862. Specifically, the site does not activate English if that's the language the user is requesting, so it would inherit whatever language was set previously for that uswsgi thread or by LocaleMiddleware. But, I didn't think this applied to the documentation itself, just the strings on the site.

@claudep
Copy link
Member

claudep commented Feb 20, 2019

That could be a bug in Google too!

@m-aciek
Copy link
Contributor

m-aciek commented Feb 22, 2019

I think this has a chance to be solved with #862 PR. Google might have been fooled by Vary header telling that pages in different languages are the same (so it thought French title is an English title then, and use it as more fresh cause recently changed and reindexed).

@tobiasmcnulty
Copy link
Member

The LocaleMiddleware removal for docs.djangoproject.com has been deployed (#862), so please keep an eye out for any changes. I've also noticed a few search results coming back in French (but linking to the English URLs), so hopefully this helps (though I'm still not seeing how it will, unless Google is indeed doing something odd...).

@nitinnain
Copy link

nitinnain commented Feb 26, 2019

Google search for "Django Tutorial Admin" shows the Search Snippets in Indonesian.

See attached screenshot.
The text "Menulis aplikasi Django kedua anda, bagian 2" is in Indonesian.
Notice that the URL is for "en" (English), but the Google Search snippet shown is for the "id" (Indonesian) webpage. Here are the two pages.
EN: https://docs.djangoproject.com/en/2.1/intro/tutorial02/
ID: https://docs.djangoproject.com/id/2.1/intro/tutorial02/

  • I have never setup Indonesian on my browser.
  • I checked on two different computers (One computer at India and another at US, both showed the same error, even though they are entirely different machines and configured with English.)

Debugging Hint:

Open Page Source for this https://docs.djangoproject.com/en/2.1/intro/tutorial02/ in your browser,
and look fo rel="alternate" or hreflang=

These attributes control the language in which the page is displayed in the browser. I didn’t see anything obviously wrong on the HTML page itself.

The title "Menulis aplikasi Django kedua anda, bagian 2"
is in Indonesian. Notice "id" in the URL here:
https://docs.djangoproject.com/id/2.1/intro/tutorial02/

django google search snippet error

[Above image without edits, in full size](https://user-images.githubusercontent.com/2798690/53386196-44512100-39a7-11e9-997f-f43a1714803e.png)

@charettes
Copy link
Member

This is really weird, I get a /fr/ URL in Indonesian with the same search with a Accept-Language favouring fr-CA

@claudep
Copy link
Member

claudep commented Feb 26, 2019

Does anyone know someone at Google we could report this issue to?

@m-aciek
Copy link
Contributor

m-aciek commented Feb 26, 2019 via email

@kezabelle
Copy link

Chiming in to point out, in case it hasn't been noted already: the cached copy for "Django Tutorial Admin" in Indonesian (with the English link) has <html lang="id"> (which is set in the base_docs.html by {% block html_language_code %}{{ lang|default:"en" }}{% endblock %} and ultimately by a call to activate(lang))

@m-aciek
Copy link
Contributor

m-aciek commented Feb 26, 2019 via email

@tobiasmcnulty
Copy link
Member

Interestingly the Google search link that originally prompted this issue now shows English again. But yes, I see Indonesian text when Googling "Django Admin Tutorial" now, too:

https://www.google.com/search?q=django+tutorial+admin

It feels very much like a caching issue, but I don't see where it could be occurring 😦

It looks like @aaugustin may have verified the domain with Google Webmaster Tools about 6 years ago (a0907ff); would you be able to grant me access to this if you still have it @aaugustin ?

@timgraham
Copy link
Member

timgraham commented Feb 26, 2019

I have access to the Google Search Console but I don't see how I can give access to others. Here's the report for https://docs.djangoproject.com/en/2.1/intro/tutorial02/

screenshot from 2019-02-26 09-35-48

This is the information about "Google-selected canonical" https://support.google.com/webmasters/answer/9012289#google-selected-canonical

@m-aciek
Copy link
Contributor

m-aciek commented Feb 26, 2019

In docs' metatags we set canonical per page (for example .../id/2.1/intro/tutorial02's canonical is .../id/2.1/intro/tutorial02). IMO it can interfere badly with rel='alternate' links. Like if some article has alternate links, all those alternatives should have only one canonical.

Possible solution then would be to set canonicals for all languages to en versions of documentation pages.

@nitinnain
Copy link

nitinnain commented Feb 26, 2019

Just saw French on another search
(The language in the Google search snippet changes depending on search query):
Google "Django Admin Actions"

The problem doesn't occur up on DuckDuckGo!

@m-aciek
Copy link
Contributor

m-aciek commented Feb 26, 2019

Another article seems to support my thesis: https://developers.google.com/search/mobile-sites/mobile-seo/separate-urls. rel:canonical and rel:alternate are treated equally. AFAIC we should make canonicals point to English versions.

@m-aciek
Copy link
Contributor

m-aciek commented Feb 26, 2019

I've just opened draft pull request #871.

@tobiasmcnulty
Copy link
Member

tobiasmcnulty commented Feb 27, 2019

@m-aciek @timgraham It is indeed odd that google chose the 'id' version of that page as canonical, but I'm not sure #871 is the appropriate fix:

screenshot 2019-02-26 21 12 28

From: https://support.google.com/webmasters/answer/139066?hl=en

@tobiasmcnulty
Copy link
Member

tobiasmcnulty commented Feb 27, 2019

To throw another theory out there, I think we are misusing x-default:

https://webmasters.googleblog.com/2013/04/x-default-hreflang-for-international-pages.html

This page seems to suggest that use of x-default should be limited to pages that have no specific language (it's not a "default language"). We appear to render it on all docs pages with a link to the English version of the page, so perhaps that's confusing Google?

https://github.com/django/djangoproject.com/blob/master/djangoproject/templates/docs/doc.html#L26-L30

We also only render <link rel="alternate" ..> for the canonical version of the docs:

https://github.com/django/djangoproject.com/blob/master/docs/views.py#L52

It would seem appropriate to render that for all versions?

I put up a PR with these and some related changes here: #872

@m-aciek
Copy link
Contributor

m-aciek commented Feb 27, 2019

FTR: issue #621 started with similar topic and started SEO for djangoproject.com.

timgraham pushed a commit to tobiasmcnulty/djangoproject.com that referenced this issue Feb 27, 2019
@tobiasmcnulty
Copy link
Member

tobiasmcnulty commented Feb 27, 2019

Good find @m-aciek .

@apollo13 It looks like my PR #872 partly reversed what you did here: d6a966f#diff-edc52c8f3a604a128e8f302806fb9262

Any memory of what the reason was for that and/or do you have any objections to showing the hreflang tags on all docs versions (not just the canonical one)?

@tobiasmcnulty
Copy link
Member

If this doesn't work, another thing we might try is refactoring the sitemap to use hreflang-style link declarations as described here: https://support.google.com/webmasters/answer/189077?hl=en

Right now it looks like each language gets its own sitemap.

@apollo13
Copy link
Member

apollo13 commented Feb 28, 2019 via email

@apollo13
Copy link
Member

@tobiasmcnulty I was mainly following #621 (comment) when coming up with which rels should point where… I'll see that I can give you access to the google search tools.

@tobiasmcnulty
Copy link
Member

Thanks @apollo13 I see your commit implemented exactly what that page recommends. I guess we can see how it behaves with the hreflang tags on all pages for a bit and then revert if there's a regression.

@timgraham @m-aciek I found this link which seems to suggest that making English (or any one language) the canonical version of the page is not correct (see the the "Most common mistakes implementing hreflang and canonical tags" heading): https://www.portent.com/blog/seo/implement-hreflang-canonical-tags-correctly.htm

But again, who knows if these 3rd parties have it correct or not. 😕

@tobiasmcnulty
Copy link
Member

tobiasmcnulty commented Mar 2, 2019

Here's a list of the current (last updated by Google on 2/25/19) URLs that Google chose as canonical instead of the ones we suggested. None of them looks particularly worrisome: https://docs.google.com/spreadsheets/d/16oYtNJVhqAVH7wyIza10z1Pv4NhpSbx8g0QG9EpDGQE/edit#gid=0

Also, I've taken a snapshot of the full current index coverage report here, with some commentary and links to other, related issues: https://docs.google.com/spreadsheets/d/1l86YAEcw5CbvivuY-ZN81oy75Nh7T9g0eJX0sZ8Jww0/edit#gid=0

In particular #878 may be relevant to this issue.

@tobiasmcnulty
Copy link
Member

Still not fixed 🙁

Screenshot 2019-03-18 20 55 05

@apollo13
Copy link
Member

Hrmpf :( If google would document how that stuff is supposed to work :( I mean it worked for years :/

@wolph
Copy link

wolph commented Mar 24, 2019

Indeed, this was scraped 2 days ago and in French: https://webcache.googleusercontent.com/search?q=cache:uxFlZ6Hw5RMJ:https://docs.djangoproject.com/en/2.1/topics/db/examples/many_to_many/+&cd=1&hl=en&ct=clnk&gl=nl

Not sure if it's somehow possible to purge this from the Google results using the webmaster tools, but the tags look mostly ok right now. The only thing that's wrong is that there is a pt-BR for example, but not pt

@stale
Copy link

stale bot commented Oct 6, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 6, 2022
@timgraham
Copy link
Member

I'd think we'd have some recent activity since 2019 if this were still an issue, so closing, at least for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants