-
-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cfscrape): Fix for challenge where part of formula is stored in a div #5462
Conversation
The user agent randomization, header order, and query string param order is going to be a problem if aiming for a general solution. |
I have removed the unused import. Could you explain your issues with user agent randomization on class initialization? It uses a random user agent for each instance of the class instead of using the same random user agent for each one. I am also not sure the importance of header order when it comes to making a request. To my knowledge there is nothing enforcing a specific order of headers in a request. Though i could be terribly wrong. |
Hi, I am planning on switching to https://github.com/SickChill/cloudscraper with js2py after some testing. It is possible many of these fixes are already there |
@miigotu There are issues with js2py not being spec compliant, requiring excessive polyfills(lacks https://tc39.github.io/ecma262/#sec-additional-ecmascript-features-for-web-browsers), slow, and security issues. It was dropped by cloudflare-scrape due to the security issues that the author notes himself. I personally will not be maintaining a version that uses js2py. There is also the issue with cloudscraper.py unnecessarily playing with SSL ciphers and lowering security.... i.e. preventing the use of TLSv1.3 and causing CAPTCHA on systems with openssl versions prior to openssl v1.1.1. |
@CodyWoolaver Those problems are to due to Cloudflare's BIC(Browser Integrity Check). The HTTP spec doesn't define any header order or letter casing but Cloudflare, being what it is, has been able to profile requests and filter based on those attributes. This has been seen time and time again with countless bug reports. |
I commented on the other issue, and I'm open to other options. We need to work out the drama surrounding all this it seems. If there are issues with js2py, maybe we can improve js2py? If we must continue with a node reliant method, even just for the time being, that is ok also. Whatever version is used, I will be maintaining a fork here. |
I've already fixed it once: PiotrDabkowski/Js2Py#157 I hope that makes sense as the reason why I don't want to maintain a version that uses it. |
Includes latest challenge update: Anorov/cloudflare-scrape#234
a3380ce
to
722ad62
Compare
New cfscrape version has been built - I am going to pull in that change and test things out. They bumped the version from 1.9.7 to 2.0.0 so I am not sure if there are breaking changes |
@CodyWoolaver There isn't any breaking changes in the public API. Node support did change: Anorov/cloudflare-scrape#241 If you guys run into this issue: Anorov/cloudflare-scrape#235 |
I am seeing some issues around the headers again. Same issue as before when it came to passing in additional headers. Accept in particular - I will dig deeper tonight. The issues are not around cloudflare, but other websites requiring specific header formats that are not being passed in. I don't have all the details yet. |
@CodyWoolaver Have you identified any problems? The master branch has been updated to contain a fix to avoid CAPTCHA. The fix has also been released. It should be noted that whether or not you experience the problem that the recent release resolves is believed to be based on CPU. |
Seems to be intermittent and I have not nailed down the full effects. I can assume my cpu is not a problem though I am using an The standard requests work fine, it is just the interactions with the deluge web api |
@CodyWoolaver would you have a small test case for a to test with? |
If you are receiving a CAPTCHA from Cloudflare, the recent release fixes a case that is seemingly only experienced by users depending on openssl version and CPU. Otherwise, yes, CPU would be irrelevant. I'll look over the |
My test case, within SickChill is to manually search for an episode. I receive a notification informing me the episode has been found, however in my logs i receive this message:
The message is not very helpful, and I have a terrible sickrage setup for debugging. |
@CodyWoolaver All I need is the exact headers that should normally be sent. I should be able to debug it with that information. Also any other request specific information such as payload, query params, etc. would be helpful but not necessary if the headers are known to be the problem. |
Setting a reminder for myself tonight to get back to you - Can't test it from work :( |
Ah, okay, thx. I'm multi-tasking at the moment but a little later I will experiment. I really think that I should be able to determine the problem if it resides with the library without any additional information. If @lukele doesn't beat me to it. 😃 |
Does this document the request? https://deluge.readthedocs.io/en/develop/devguide/how-to/curl-jsonrpc.html |
@CodyWoolaver does this happen with a specific search provider/episode? |
@CodyWoolaver Do you have validate certificate enable in SickChill Deluge settings? I wonder if that might be causing the problems. Also, are you running Deluge on localhost or on another server? From the code it also looks like running it SickChill with debug logs on should quickly help identify the issue, since it prints the data being sent, the host for the connection which is a good starting point. |
I did find one issue but I haven't finished looking into this yet: Anorov/cloudflare-scrape#247 DetailsYou can see how it's affected here: https://github.com/SickChill/SickChill/blob/175df89cabf295358b8caa2d96dafadb2347622d/sickbeard/helpers.py#L1373-L1378 If you want to see whether this is the problem or not: # from collections import OrderedDict
def make_session():
session = requests.Session()
session.headers = OrderedDict(session.headers)
session.headers.update({'User-Agent': USER_AGENT, 'Accept-Encoding': 'gzip,deflate'})
session = cfscrape.create_scraper(sess=session) |
Is only the order affected or are the headers itself affected as well? If the former, that should be no problem, since Cloudflare is not the issue in this case from what I understand. |
Yeh, it's the former but I'm just throwing it out there.. It does have something to do with the headers. 😃 I just found another issue.This will override the HTTPS adapter and undo the fix to avoid CAPTCHA. I can't think of a good solution for this at the moment... :/ |
I worried something like this would happen. Unfortunately the way |
The only problem with that is they have to modify a dependency. I don't think they mind in this case but it's probably not preferred. |
Unrelated to any cfscrape issues, this error could happen if the connection to a deluge host is no longer active, which was established during |
I've completed by review of |
I'm starting to feel bad about polluting this PR... lol... Sorry about that. If you want to open an issue over at cloudflare-scrape's repo, that's cool. |
Original Credit to https://github.com/lukele - All i have done is carry on conversations - he did the heavy lifting.
This is not a full version and has not been released by any official cfscrape repo. I am creating this PR to alleviate the numerous pain-points that others are experiancing. I have been running this version on my own setup (with a deluge webapi as a search provider) for the last several weeks with zero issues.
I am open to comments and suggestions, however my actual knowledge into cfscrape is actually quite limited.
Fixes #5185
Fixes #5420
Fixes #5423
Proposed changes in this pull request:
Pulled in partial changes from Fix for challenge where part of formula is stored in a div Anorov/cloudflare-scrape#206 that will satisfy our requirements for SickChill
PR is based on the DEVELOP branch
Don't send big changes all at once. Split up big PRs into multiple smaller PRs that are easier to manage and review
Read contribution guide
cc'ing individuals related to ongoing conversations in #5423
@pro-src @mitch71h @lukele @WebSpider