-
-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_transcript not working #117
Comments
Hi @salonygupta76, could maybe try and run |
@salonygupta76 are you sure you pulled that html from the same host the module was failing on? The html you uploaded seems just fine and I have no problem extracting transcripts from it. |
I've exposed this service as an API in dev environment and I'm trying to query the same using Postman from my local system. The response html is shared after ssh'ing into dev and running curl command that was shared by you. |
Just so that I am understanding you correctly: curling YouTube from your local machine returns the same html as curling YouTube from your server, yet this module works on your machine, but not on your server, right? That seems really odd! What python version are you using btw? And could you please post the exact error message which is returned by this module. |
Hello, I don't know if it's the right place to post, if no, I am very sorry, maybe I should have created a separate issue. But I have problems with your tool using it from the EU zone countries when I do it from my command line. When I do with EU countries VPN on the sites like replit or pythonanywhere, it works fine (it sends requests from their IPs I suppose). When I use VPN for out-of-EU country, it works fine. When I do from the EU (and my friends), it doesn't work. Maybe they've applied some law about it because I have troubles with YouTube tool downloading live chats as well. |
@vanyamlb could you please explain your infrastructure a bit more, I am not sure if I am understanding you correctly. Also, what do you mean by live chats? This module only supports transcripts. And what version of this module are you using? |
Date when the tools stopped working: the beginning of this year's April. |
Just checked everything again. 1 - under Italy's VPN or any other EU |
Well, I live in Germany and I don't have a problem using this, so I'm pretty sure it's not a problem with EU law 😄 |
@jdepoix the problem is that I am not alone, my friend from Germany has this issue too and friends from other EU countries :/ (without a VPN) If it was only with VPN, then sure I would think about that... Maybe it's IPS blocking it or youtube...? I don't know the explanation... I checked YouTube with my VPN and it worked fine... But once more, when I use the same VPN app for non-EU country - i get the tool working! :/ just wondering... could you please check one thing for me? my friend developed his own tool based on yours (it's called yxd and is installed through pip as well). with it, you can scan an entire channel to get transcripts using YouTube API key for video listing + your tool (by the way there was an issue about scanning the entire channel, you can tell that person she can use it). you just enter yxd, then enter your API, then enter yxd -c linktothechannel --first=10 and it starts downloading. Just interesting if it works for you living in Germany. Thanks in advance! (if it says transcript unavailable while there is one, then it doesn't work, but if your tool works then it should lol) |
I'm sorry @vanyamlb but I can't provide support for other modules. However, I might be able to help you if you upload the HTML you receive when accessing any given video (with subtitles) on youtube.com through curl or a browser. |
@jdepoix with curl just got too many requests errors (429) with VPN :/ wondering why did they block the IP and how to avoid that.. |
@vanyamlb probably you're sharing an IP address with other users of that VPN and that IP has been blocked because of too many requests. Is there any way to change the IP address? |
@jdepoix ok I realize that with VPN that's possible... but people who used this for themselves with their IPS and got the block (while I used it so so much too and didn't get it :/)... Maybe they have static IPs while I have dynamic... yes, when I changed, it started working, but idk how to change it within the country to check if that will help (but looks like it should) |
Unfortunately, there is no way I know of to get around the block without changing the IP or simply waiting until the block gets removed. So there's not really anything you can do here. I guess this issue lost track a bit. @salonygupta76 any news on your end? |
@salonygupta76 any news? Otherwise I will close this issue. |
Hey @jdepoix Sorry about the delay in getting back. Unfortunately, at this point of time, I'm unsure what the root cause could be. For a video like this: url (where a transcript exists), sometimes the API simply throws this error instead of retrieving it:
Note that this happens only in case of trying to access the code available on one of dev/prod environments (through an API) and not while testing on my local environment. Also of imp maybe, sometimes simply rebuilding the project from Jenkins resolves the problem. |
@salonygupta76 where did you deploy your application? Maybe you are running into a problem similar to what @vanyamlb is describing? |
@salonygupta76 but what infrastructure are you hosting your application on? If it is a cloud provider like GCP, AWS etc. it is likely that you are sharing a public IP with other users and therefore are being blocked by YouTube. What happens when you curl |
@jdepoix Infra is AWS. Right now, the block has been lifted and I'm able to get results. Can share Curl response when it reverts to throwing errors. |
Hey @jdepoix , facing the issue yet again and error is mostly "Video is not available..." one when trying your code. When I hit curl -L https://www.youtube.com/watch?v=-em-_gFlDfQ, I get the following as response:
Is this the curl response you're looking for? There seems to be some request limit, which when surpassed throws this error. |
Hi @salonygupta76, thank you very much for the detailed information. That is exactly what I was looking for. Unfortunately, this confirms my assumption that you are being blocked by YouTube. The only way to work around this is to
I am aware that none of these solutions are great, but it's all we can do unfortunately (at least afaik). |
@jdepoix Yes, I've thought of these solutions, IP rotation in particular.
Inducing a sleep could work if we know which quota limit is specifically in
play here, that is, requests per min or requests per day or any other. Do
you have any idea which of these leads to our issue?
I tried making requests via a Proxy (in the methods exposed where proxies
is None by default) but in vain. My assumption is that they're not even
being used as they're supposed to be because my specific IP is being
captured and blocked.
Is there any way I could see what request payload is being sent to YouTube
using your package? (Like we do with requests by printing out their request
headers and data attributes)
|
@salonygupta76 Unfortunately, I don't know how long the "sleeping interval" would have to be to be sufficient. You'd have to play around with that. But if you do so, I would greatly appreciate if you could share your findings! If you want to look deeper into the requests which are being sent, you'll have checkout the code and add some logs or run it in a debugger. More specifically |
I am using the statement transcripts=YouTubeTranscriptApi.get_transcripts((video_id)); in a for loop, however, there are a few videos which have their id disabled. Is there a success or failure call for YouTubeTranscriptApi.get_transcripts((video_id)) |
@AaditBhatia what do you mean by success/failure call? You can simply wrap your call in a |
I will close this issue now, as there isn't really much we can do here and the discussion went off rails a bit. |
Hi, I'm in a Linux environment and have verified the following before raising this issue:
@jdepoix Any idea what could be the reason?
Thanks!
The text was updated successfully, but these errors were encountered: