Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonsense generated for haw (wrong translation) #70

Open
avisitor opened this issue Mar 6, 2021 · 23 comments
Open

Nonsense generated for haw (wrong translation) #70

avisitor opened this issue Mar 6, 2021 · 23 comments

Comments

@avisitor
Copy link

avisitor commented Mar 6, 2021

const translate = require('@vitalets/google-translate-api');

var text = 'Ia hala ana mai o ka waa o Pele, ia wa i hoouna mai ai o Kahinalii ka makuahine, i ke kai hoee nui a ka launa ole, a lewa ana ka waa o Honuaiakea iluna o ka halehale hanupanupa kuhoho a kawehaweha o ke kai. Ua huahuai ae la na mapuna o ke kai ma lalo ae o ka papaku o ka moana, hakikili ka ua mai ka lani mai. Olaolapa ka uwela i ka lewa uli, nakolokolo ikuwa ka leo papaaina o ka hekili, huikau ka lewa nuu, ka lewa lalo. Auwe! He ino!!';
translate(text, {from: 'haw', to: 'en'}).then(res => {
console.log(res.text);
//console.log(res.from.language.iso);
}).catch(err => {
console.error(err);
});


$ node translate.js
The youngest birdmen came to whom there, the whole drivenaka had ate fish, and in the land of the sea Shell.All the board of the sea of the sea became the bottom of the sea floor, the feature of the sky.The air, Do otherwise social, Judge Mount Hords, communions to the air Nuunu.Wow!Is a bad !!


When pasted into translate.google.com:

When Pele's canoe passed, the mother sent Kahinalii to paddle a great and incompatible paddle, and the canoe from Honuaiakea flew over the mysterious building and the depths of the sea. The waves of the sea poured forth beneath the sea floor, and the rain from heaven thundered. The lightning flashed in the dark sky, the table sound of thunder roared, the sky above and the sky below were confused. Alas! It's bad !!

@songkeys
Copy link

songkeys commented Mar 11, 2021

I have been encountering the same issue. I think the v5 version has a worse quality in comparison to v4. Perhaps it's due to the different APIs used.

see also #71

edit: I tried downgrade the SDK to v4. But found all responses become BAD_REQUEST. (#64 ) I cannot use it any more. The end of an era 😢

@plainheart
Copy link

The random nonsense may be because the server doesn't receive a required header called X-Goog-BatchExecute-Bgr, so our request was recognized as a robot or others illegal. I can't figure the concrete algorithm out yet. It looks like very complicated from the code. It would be great if anyone can help to make the encryption logic of this header clear.

@songkeys
Copy link

@plainheart Nice investigation! I can confirm this. With X-Goog-BatchExecute-Bgr, the result will be the same as the Web browser's one.

@plainheart
Copy link

I forked this repo and added two new endpoints that can work better than the current website endpoint. But I still hope the algorithm of the header could be figured out.

@songkeys
Copy link

I generated an X-Goog-BatchExecute-Bgr yesterday. After 24+ hours, today I found that I can still use it to get the accurate result. I'm not sure how long this ("token"?) will remain valid but I'll keep checking this.

It seems that we can at least use puppeteer + cache method for it if it's hard to extract the algorithm.

@plainheart
Copy link

I ever dug into the source code, the header value may be related to the query string, if we changed the query string, it should be invalid immediately theoretically. Does it still work for you with the text different from the previous?

@songkeys
Copy link

Yes.. You were right. It's generated from the query string. It won't work if I changed my text. Yes... So we have to extract the algorithm then.

@xsxiong
Copy link

xsxiong commented Mar 24, 2021

I generated an X-Goog-BatchExecute-Bgr yesterday. After 24+ hours, today I found that I can still use it to get the accurate result. I'm not sure how long this ("token"?) will remain valid but I'll keep checking this.

It seems that we can at least use puppeteer + cache method for it if it's hard to extract the algorithm.

How did you generate an X-Goog-BatchExecute-Bgr,Thank you!

@vitalets
Copy link
Owner

Thank for the research!
Does the problem occurs only on haw?
I've tested for ru -> en, translation by lib differs from google translate website, but still very close by sense.

@plainheart
Copy link

@vitalets Hi, thanks for your reply.

Does the problem occur only on haw?

No. I'm not sure if the other languages have the same issue. But I can confirm it is existing in Chinese(zh).

translation by lib differs from google translate website, but still very close by sense.

Please refer to #71. Though the translation result is correct generally, there are many unexpected mixed upper-case and lower-case letters in sentences, which affects the normal reading. Besides, the translation looks close by sense, but it has many small grammar issues comparing to the website.

Everything would be okay if a valid value for the header X-Goog-BatchExecute-Bgr could be provided. However, it seems to be hard to figure out how to calculate.

@vitalets
Copy link
Owner

Everything would be okay if a valid value for the header X-Goog-BatchExecute-Bgr could be provided. However, it seems to be hard to figure out how to calculate.

Yeah( Google protects the batch API from such access.

@f0enix
Copy link

f0enix commented Apr 26, 2021

@vitalets I currently use your google-translate-api nodejs library but many users have been complaining that the translation is not accurate and does not match results from the google translate website. So i have been looking into the different ways of getting better translation. I found that if you use the url translate.google.com/translate_a/t or translate.google.com/translate_a/single along with a client like dict-chrome-ex, the results match the ones from google translate website. Since we have not been able to figure out how to generate the X-Goog-BatchExecute-Bgr, i would suggest looking in the translate_a url.

here is an example of the translate.googleapis.com/translate_a/single/ url

I also found that this library google-translate-open-api uses translate.google.com/translate_a/t url. so i was able to use in this way to replace google-translate-api on my server for more accurate translations:

const translateOpenApi = require('google-translate-open-api');
translateOpenApi.default(`some text`, {
    client: "dict-chrome-ex",
    to: 'en'
  }).then(result => {
    const translatedText = result.data.sentences.map(s => s.trans).join('');
    var formattedResult = {
      text: translatedText, 
      from :{
        language: { iso: result.data.src }
      }
    }
    console.log(formattedResult)
  })

Hope this helps and gets integrated in into google-translate-api.

@ArtanisTheOne
Copy link

@vitalets I currently use your google-translate-api nodejs library but many users have been complaining that the translation is not accurate and does not match results from the google translate website. So i have been looking into the different ways of getting better translation. I found that if you use the url translate.google.com/translate_a/t or translate.google.com/translate_a/single along with a client like dict-chrome-ex, the results match the ones from google translate website. Since we have not been able to figure out how to generate the X-Goog-BatchExecute-Bgr, i would suggest looking in the translate_a url.

here is an example of the translate.googleapis.com/translate_a/single/ url

I also found that this library google-translate-open-api uses translate.google.com/translate_a/t url. so i was able to use in this way to replace google-translate-api on my server for more accurate translations:

const translateOpenApi = require('google-translate-open-api');
translateOpenApi.default(`some text`, {
    client: "dict-chrome-ex",
    to: 'en'
  }).then(result => {
    const translatedText = result.data.sentences.map(s => s.trans).join('');
    var formattedResult = {
      text: translatedText, 
      from :{
        language: { iso: result.data.src }
      }
    }
    console.log(formattedResult)
  })

Hope this helps and gets integrated in into google-translate-api.

That is true for the moment, however I suspect Google will soon turn that off in favor of their new RPC method

@songkeys
Copy link

songkeys commented Nov 21, 2021

For anyone who wants an accurate translation: I have an alternative solution using the Puppeteer to scrape result directly: https://github.com/Songkeys/Translateer. It needs more resources (due to the Puppeteer, you know) and perhaps slower (around 1~5s for each response edit: after an upgrade, should be within 500ms) but it's accurate.

@vitalets
Copy link
Owner

For anyone who wants an accurate translation: I have an alternative solution using the Puppeteer to scrape result directly: https://github.com/Songkeys/Translateer. It needs more resources (due to the Puppeteer, you know) and perhaps slower (around 1~5s for each response) but it's accurate.

Good approach. Will add to readme.

@allohamora
Copy link

Anyone who looking accurate translate can use that code. That code uses another google translate route for translate and has the same translation result as the website.

@poowu
Copy link

poowu commented Mar 24, 2022

@vitalets I currently use your google-translate-api nodejs library but many users have been complaining that the translation is not accurate and does not match results from the google translate website. So i have been looking into the different ways of getting better translation. I found that if you use the url translate.google.com/translate_a/t or translate.google.com/translate_a/single along with a client like dict-chrome-ex, the results match the ones from google translate website. Since we have not been able to figure out how to generate the X-Goog-BatchExecute-Bgr, i would suggest looking in the translate_a url.

here is an example of the translate.googleapis.com/translate_a/single/ url

I also found that this library google-translate-open-api uses translate.google.com/translate_a/t url. so i was able to use in this way to replace google-translate-api on my server for more accurate translations:

const translateOpenApi = require('google-translate-open-api');
translateOpenApi.default(`some text`, {
    client: "dict-chrome-ex",
    to: 'en'
  }).then(result => {
    const translatedText = result.data.sentences.map(s => s.trans).join('');
    var formattedResult = {
      text: translatedText, 
      from :{
        language: { iso: result.data.src }
      }
    }
    console.log(formattedResult)
  })

Hope this helps and gets integrated in into google-translate-api.

It shows that "TypeError: Cannot read properties of undefined (reading 'map')".
Doesn't work anymore.

@kevinvugts
Copy link

@vitalets

Is there any active solution on the incorrect translations?
Or any other way of integrating translations with Google?

I am very curious if someone has an solution to the problem.

@saviourdog
Copy link

any update? thanks

@vitalets vitalets changed the title Nonsense generated for haw Nonsense generated for haw (wrong translation) Oct 14, 2022
@AidanWelch
Copy link

Yes.. You were right. It's generated from the query string. It won't work if I changed my text. Yes... So we have to extract the algorithm then.

I've found it's related to your client IP however, from my real IP unproxied autocorrect fails in some cases(without X-Goog-BatchExecute-Bgr sent), but testing using a VPN(or in Github actions testing) it is not required. It could be a check that is only used when an IP makes a high number of requests(or could be related to the provider/classification of the IP)

@AidanWelch
Copy link

Certain networks require the X-Goog-BatchExecute-Bgr header to be sent on requests, or the autocorrect will not be applied to some translations(seemingly typos where a letter is dropped, such as "I spea Dutch!" instead of "I speak Dutch!").

The code for generating this header I believe is found in this static script.

I believe in xH.prototype.s()

This would likely take a while to fix.

From google-translate-api-x#18

@vitalets
Copy link
Owner

vitalets commented Oct 18, 2022

Hey everyone!
Using advice of @allohamora I've fully rewritten the library with another google translate route (see #70 (comment)). Now translation exactly matches the result from google translate website.

New version is available on npm as next release:

npm install @vitalets/google-translate-api@next

Original text of this issue is translated correctly:

import { translate } from '@vitalets/google-translate-api';

const { text } = await translate('Ia hala ana mai o ka waa o Pele, ia wa i hoouna mai ai o Kahinalii ka makuahine, i ke kai hoee nui a ka launa ole, a lewa ana ka waa o Honuaiakea iluna o ka halehale hanupanupa kuhoho a kawehaweha o ke kai. Ua huahuai ae la na mapuna o ke kai ma lalo ae o ka papaku o ka moana, hakikili ka ua mai ka lani mai. Olaolapa ka uwela i ka lewa uli, nakolokolo ikuwa ka leo papaaina o ka hekili, huikau ka lewa nuu, ka lewa lalo. Auwe! He ino!!');
console.log(text);

Output:

When Pele's boat passed, Kahinalii, the mother, sent a great storm of waves, and Honuaiakea's boat hovered over the hanupanupa house, which was deep and divided by the sea. The springs of the sea broke out below the sea floor, rain fell from the sky. The lightning is bright in the green sky, the sound of the thunder is echoing, the green sky is confused, the lower air is confused. Alas! It's bad!!

Output from website:
image

Also I've removed all outdated and vulnerable dependencies, added support of react-native and rewritten in typescript.

I will appreciate if you install and check this beta version in your scenarios and share the feedback here. If everything is ok, I'm ready to release it as main version. Please note that new version shape is uncompatible with previous one. Thanks in advance!

@avisitor
@songkeys
@plainheart
@poowu
@kevinvugts
@saviourdog
@AidanWelch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests