-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GoogleTranslateV1 does not return the same results as translate.google.com (the website) #22
Comments
my bad sorry lol |
@NawtJ0sh Alright so I tested what you were talking about on my machine and it seems that translate.google.com actually uses GoogleTranslateV2 results for some reason (which is weird considering that GoogleTranslateV1 actually uses the batchexecute API, so does translate.google.com) Test results: >>> from translatepy.translators.google import GoogleTranslateV1, GoogleTranslateV2
>>> GoogleTranslateV1().translate('كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!', "eng")
TranslationResult(service=Google, source=كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!, source_language=ara, destination_language=eng, result=Life was there, apparently a dream, about to turn into a nightmare."(Dramatic pause)" Hello you guys!I just found a wonderful new way to shave my hair!)
##### python was restarted to lose the caches
>>> GoogleTranslateV2().translate('كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!', "eng")
TranslationResult(service=Google, source=كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!, source_language=ara, destination_language=eng, result=Life there, apparently a dream, was about to turn into a nightmare. (dramatic pause) Hey guys! I just found a great new way to shave my tresses!) Results from the website |
I did some experiments and found out that web app uses JSON RPC interface and mobile app uses regular API (HTTP?) interface to communicate with server. In translatepy JSON RPC interface is implemented in class GoogleTranslateV1 and API in class GoogleTranslateV2. And now back to the problem - why doesn't GoogleTranslateV1 return the same result as a web application? After all, they work through the same interface. The answer is simple - Google has implemented a special mechanism to prevent the abuse of the free text translation loophole. While previously it was implemented by getting TKK token, now (in JSON RPC) they did it by providing an encrypted header x-goog-batchexecute-bgr, I was able to find this out after several hours of experimentation. If the x-goog-batchexecute-bgr header is correct, the server returns a neater translation (like in the web app), and if not, it returns an alternative translation of the text (less neat) (lol). Now look at the screenshots below, and compare with the result from the GoogleTranslateV1 class, are they similar? It would seem to solve the problem, just insert the x-goog-batchexecute-bgr header value into the GoogleTranslateV1 implementation and everything would work fine. I thought so too, I inserted the value of x-goog-batchexecute-bgr, but the server still returned an alternate translation. I don't even know where to dig here, so I'll leave this my working CURL command to translate the text, which works fine.
|
@ZhymabekRoman Wait so is the JSON RPC result "less good" (GoogleTranslateV1) than the normal API one (GoogleTranslateV2) without the |
Yes. See also: vitalets/google-translate-api#70 vitalets/google-translate-api#79 UlionTse/translators#35 vitalets/google-translate-api#71 |
Hi guys. GoogleV2 doesn't work for me, |
@nnolex, I tested GoogleTranslateV2, and it works fine:
|
@nnolex
Please open a new issue ~ (also, could you provide some example to reproduce your issue) |
I will try to start investigating how tokens are generated. (God, please save me). If there are any new updates, I will post them there. |
Seems like this guy tried to make some drafts, but nada: https://github.com/lzy1960/google-translate/blob/main/packages/src/translate.ts |
Good luck lmao I tried before but didn't have time to finish (the scripts I copied are in the playground I think) |
Probably Google Translate uses this toolkit to minify code: https://github.com/google/closure-compiler |
Yup maybe, I already used it before |
I'm still alive after trying to debug Google Transalte lol. I spent about 120 hours trying to understand how excatly tokens are generated, and unfortunately it's really hard. First I tried to debug it manually and that was too "fast" (sarcasm). Because Google Translate is built on top of some framework (can anyone guess what framework is used?) and minified code, tracing the value manually is not possible. So I tried to automate this process - set debugger breakpoint and step out of all values and end of result check which function generates token by searching values. I rent VDS server with max possible by service RAM value 16 gb and run Chrome with Python script that press F9 to step in and mitmproxy that captures all Chrome debugger values. And ... Chrome ate all the memory and crashed. I tried to connect pagefile and swapfile with 100 GB - same results on Linux and Windows. Idk how we can debug, probably edit v8 JS engine lol or try to use Firefox, I think it can properly use such big swap/pagesys file through full size. Any other ideas? |
Lmaoooo what how did it crash |
something like: render process gone PS. Ahhh. I didn't mention in previous message that Chrome (or correctly to say kernel) didn't give to browser full swap/pagesys space, only like 2-3%. And just kernel (both in Windows and Linux) kills browser process because of lack of space in RAM, but having like 95 GB of empty swap/pagesys :/ New article in the biggest news paper: How to make Chrome eat up all your PC's RAM. PS 1. I also tried to set JITless mode for V8 engine - same result :/ |
I was working on Google's batchexecute and everything was going smoothly until I saw a big chunk of code which I guess is actually generating the token... |
This is what I need to reverse engineer now 🎐 function(r) {
switch (r.g) {
case 1:
c = c.trim();
c.length > f.i && (_.RF("translateText query over character limit. Length: " + c.length + " Limit: " + f.i),
c = c.substring(0, f.i).trim());
var u = new _.Eq;
u = _.Id(u, 2, a);
u = _.Id(u, 3, b);
g = _.Dc(u, 4, _.Ob(d), !1).Tb(c);
e && (u = new _.DX,
u = _.zj(u, 1, e),
_.H(g, _.DX, 5, u));
u = new _.Oq;
u = _.H(u, _.Eq, 1, g);
var v = f.g;
var w = new DZ;
v = v.g;
w = _.Aj(w, 1, Zfb(v.W ? 3 : v.s));
k = _.H(u, DZ, 2, w);
_.Rf(r, 2, 3);
m = Date.now();
return _.C(r, f.j.g(_.Xja.qb(k)), 5);
case 5:
n = r.i;
f.g.g.qa = _.I(n, 3);
u = f.g;
w = {
kp: Date.now() - m,
a2: _.BC(n),
S3: _.Nq(n)
};
w = void 0 === w ? {} : w;
v = w.kp;
var x = w.a2
, E = w.S3;
w = _.aN(u, 338);
if (v) {
var D = new _.iO;
v = _.wj(D, 1, v);
_.H(w, _.iO, 82, v)
}
if (x) {
v = new HZ;
D = _.Lq(x);
D = _.A(D);
for (var K = D.next(); !K.done; K = D.next())
K = $fb(K.value),
_.hj(v, 1, FZ, K);
(x = _.AC(x)) && (_.kj(w, 16) !== x.Qa() || _.kj(w, 1) !== _.I(x, 3) || _.kj(w, 52).trim() !== x.Ua()) && _.vj(v, 2, !0);
if (E) {
x = [];
E = _.A(E);
for (D = E.next(); !D.done; D = E.next())
if (K = _.Lq(D.value),
0 !== K.length) {
D = new GZ;
K = _.A(K);
for (var T = K.next(); !T.done; T = K.next())
T = $fb(T.value),
_.hj(D, 1, FZ, T);
x.push(D)
}
_.gj(v, 3, x)
}
_.H(w, HZ, 115, v)
}
_.bN(u, w);
if (u = !_.Pg(c))
a: {
if (_.dj(n, _.Kq, 2))
for (u = 0; u < _.Lq(_.BC(n)).length; u++)
if (w = _.Lq(_.BC(n))[u],
!_.Pg(_.dJ(w))) {
u = !1;
break a
}
u = !0
}
u && (u = f.g,
w = c,
v = _.I(n, 3),
E = _.fN(u, 166),
_.WM(_.VM(_.GM(_.FM(E, a), b), v), w),
_.YM(u.i, 166),
_.bN(u, E));
return r.return(n);
case 3:
_.Vf(r);
u = f.g;
w = _.ZM(u.g, {
Cz: !0,
uG: !0
});
w = _.zj(w, 31, 1);
_.bN(u, w);
f.g.g.g = 0;
_.Wf(r, 0);
break;
case 2:
throw q = _.Uf(r),
_.RF("Error getting translation", q),
q;
}
} |
Wow, this is... really bad. If we had a working debugger, we might be able to trace all the values, but for now... Probably Firefox can save us.... |
I'm manually tracing all the values using Chromium (Arc) |
@ZhymabekRoman Regarding memory issues - this might be related to a |
@NawtJ0sh, Probably... I'm just burnt out after that and waiting for the best moment to do some reverse engineering. Probably modify Chrome code (lol) |
@ZhymabekRoman @Animenosekai Did Anyone succeed in reverse engineer? |
@parmodrana No result... |
Not for now. I know that with enough will it is possible, but after hours of research I always feel like I wasted time lol. But I'll still try to find a way. |
It's one of the easiest code samples you can find reverse-engineering google :) |
I assume that no one has found success? |
Hey guys, have you tested https://github.com/Animenosekai/translate/blob/main/translatepy/translators/google.py#L89 with the same text you want to translate to english, then tested it on the actual site, to see if it returns the same result?? Am I doing something wrong or does it do that for you too?
text:
كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!
actual site:
Life there, apparently a dream, was about to turn into a nightmare. (dramatic pause) Hey guys! I just found a great new way to shave my tresses!
python module:
Life was there, apparently a dream, about to turn into a nightmare."(Dramatic pause)" Hello you guys!I just found a wonderful new way to shave my hair!
Originally posted by @NawtJ0sh in #21 (comment)
The text was updated successfully, but these errors were encountered: