-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could not retrieve a transcript for the video #74
Comments
Hi @iercetin
|
Sure, -- Windows 10 --
Pinging youtube.com [172.217.169.142] with 32 bytes of data: Ping statistics for 172.217.169.142: -- Ubuntu (Digital Ocean Droplet) --
PING youtube.com(ams16s32-in-x0e.1e100.net (2a00:1450:400e:80c::200e)) 56 data bytes --- youtube.com ping statistics --- |
Thanks @iercetin
|
Sure,
https://github.com/iercetin/testtest/blob/master/4.html
https://github.com/iercetin/testtest/blob/master/6.html |
@iercetin mh, that's interesting. It in fact seems that the IPv6 requests returns a different response than the IPv4 response, which proves your point even further. |
I'm having the same problem - works fine locally (OS X) but on my server (Ubuntu 18.04.4 LTS (Bionic Beaver)) I get the same error as @iercetin |
@sdtblck could you please execute those commands on your Ubuntu machine and upload the results:
|
Hi @jdepoix, I'm using this on GCP cloud functions and I think I'm facing similar issue. Would I have to route all the egress traffic through a IPV4 VPC network with static IP to test if IPV4 connections would help with the rate limiting on the shared machine? |
I was able to get the API to work again on GCP Cloud functions now with a static IP following a GCP guide! https://dev.to/alvardev/gcp-cloud-functions-with-a-static-ip-3fe9 |
Hi @adongu, |
Hey @jdepoix I did some digging and found this tidbit the documentations for Google VPC networks. https://cloud.google.com/vpc/docs/vpc#specifications "VPC networks only support IPv4 unicast traffic. They do not support broadcast, multicast, or IPv6 traffic within the network; VMs in the VPC network can only send to IPv4 destinations and only receive traffic from IPv4 sources. However, it is possible to create an IPv6 address for a global load balancer." It looks like all VPC traffic is IPV4, unless I create a IPV6 address on global LB, and route service all traffic first to the LB. The guide I followed didn't create any LB as far as I know and the VPC network routing mode is regional. |
So if I am understanding this correctly, you were probably doing IPv6 requests before setting up the VPC, while now you're doing IPv4 requests. Which would further support the assumption that this module can fail when sending IPv6 requests to YouTube. Thank you for sharing @adongu! I guess my best bet would be to implement something which forces this module to use IPv4. I'll look into that when I have some time at hand. |
Hey @jdepoix , apologies if I was being vague. I think my issue might be related to #60 instead of this. I'm not actually sure if it was serving via IPV6 before since I didn't have any global LB set up as it was a light project. I think forcing the function to go through a reserved static IP stopped youtube from limiting the shared machine my Cloud Function was running on. Apologies for spinning your wheel. |
Thanks for clarifying @adongu. |
I also have the same problem on AWS EC2. To add my context, the requests initially worked fined on AWS. After a few thousands requests (at about 1 / sec), I started getting
Thanks @jdepoix for you diligence in fixing it - I can try to take over making tests from the server! |
Hi @cramdoulfa, Thanks for the information. Could you upload the HTML which is returned by calling |
Here it is @jdepoix for |
Thanks for the additional information @cramdoulfa! This seems a bit odd though, as the information which is required for this module is actually being returned by your request. Are you sure that the module was still failing, while trying to retrieve this video, as you did the requests? Maybe there were some rate limits which did reset. Did you check this, before executing the curl request? |
Hum very good point, the package is actually working again now! I will start a batch of query and update if it starts blocking again. |
I had this issue also myself. It's due to youtube blocking your IP. I switched on a VPN and everything worked as expected. |
@jacksonw765 yeah, that's what I was guessing. It would be great though, if I could see what HTML YouTube returns after they blocked you, so that I can add a proper error message to this module. |
@jdepoix here is a sample HTML page for a video with available transcripts when the API seems to be blocked: |
Slight sidetrack but I'm curious @jacksonw765 do you use a commercial VPN or did you configure one yourself with openVPN? |
@cramdoulfa huh, that seems really odd. Once again the HTML seems to contain all the information needed by this module to retrieve the transcripts. The exception you got was a BTW you can try this out yourself by doing the following: import requests
from youtube_transcript_api._transcripts import TranscriptListFetcher
html = '''HTML as string or load it from file'''
video_id = '<video_id>'
print(TranscriptListFetcher(requests.Session())._extract_captions_json(html, video_id)) If this returns a dict with data about the transcripts, without throwing an exception, it should work fine. This also is the only place where |
Ups sorry for the false call - it seems that the API had indeed been de-blocked in the meantime! Thanks for the code snippet, I will verify next time it happens and notify you if I find an HTML for which |
@cramdoulfa no worries, thanks for putting in the time trying to resolve this! 😊👍 |
I use PIA |
Ok I think this is the right one this time. The page actually says 'We have been receiving large amounts of requests from your nework.' |
Perfect, that's exactly what I was looking for @cramdoulfa! Thank you very much! 👍 The other thing which still remains interesting is the IPv4 vs IPv6 thing suggested above. I would be great if you could try executing a IPv4 and IPv6 request next time you run into the rate limit and upload the results here. The responses which have been uploaded so far have been contradicting each other a bit and the people have unfortunately stopped replying. |
Also, could you make any guesses on how long the rate limit persists until it is reseted, or was it inconsistent for you @cramdoulfa ? |
Great! |
Good luck with the thesis! |
In As this issue is kinda all over the place now with different things being reported (most of them most likely due to rate limits, which now have a more speaking error message) I will close this for now. If individual issues arise again feel free to open a new issue with a title more specific to that issue. |
@jdepoix Is it possible to use an IP rotation service like https://scrapingant.com with this module? |
@mgoldenbe I haven't tried it yet, but in theory it should work. It would be great if you could report back what your experience has been in case you actually try it out! 😊 |
@cramdoulfa Have you been able to determine what is the approximate maximal frequency of requests that does not result in being blocked? |
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=98TQv5IAtY8! This is most likely caused by: The video is no longer available
It works on my local computer (Windows 10) but when I try to use it on Ubuntu 20.04(DigitalOcean Droplet) I get this error!
I assume the error is caused by sender I.P. address.
I got a similar problem using youtube-dl on my droplet and when I tried using "--force-ipv4" with youtube-dl It worked. Is there a similar solution to this?
Code
YouTubeTranscriptApi.get_transcript("98TQv5IAtY8", languages=['en'])
The text was updated successfully, but these errors were encountered: