-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for challenge where part of formula is stored in a div #206
Conversation
From Anorov/cloudflare-scrape#206 Related issues: inorichi/tachiyomi-extensions#951
Cloudflare changed something again, python traceback
|
Cloudflare seems to have changed their code yet again, and these modifications are working again. |
@@ -61,6 +64,19 @@ def is_cloudflare_challenge(self, resp): | |||
) | |||
|
|||
def request(self, method, url, *args, **kwargs): | |||
self.headers = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right here you are overwriting the existing headers that are present. For example, after sticking a debugger in here above this line. self.headers
already exists and has contents.
> /app/sickrage/lib/cfscrape/__init__.py(70)request()
69 self.headers = (
---> 70 OrderedDict(
71 [
ipdb> self.headers
{'Connection': 'keep-alive', u'Content-Type': u'application/json', u'Accept-Encoding': u'gzip,deflate', 'Accept': '*/*', u'User-Agent': u'SickChill.CE.1/(Linux; 4.18.20-unRAID; 469f26e1-6018-11e9-9ad7-0242ac110002)'}
For my request to be successful, I need the Content-Type
header to remain. Other than that - the commit works well, hopefully we can get this in soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, could you explain this change? It also changes some other headers that are needed, I also assume that, for the servers that respect it, we don't want to limit them to English. The 'Accept' also may varry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodyWoolaver These headers seemed to be working best with Cloudflare, but I agree that breaking support for custom headers doesn't make sense. Can you share a site where you have problems with my current version.
The following gist contains a version which tries to restore support for custom headers by only using best working headers as long as cloudflare has not been bypassed:
https://gist.github.com/lukele/ce188004545192c0d92064e85138f0ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could set those headers for specifically interacting with cloudflare and preserve the interactions with the intended site. That or you could merge in the specific headers you have picked and that way the intended headers could still make it through.
The site that this caused issues for was actually a local install of deluge and their webapi. There is a hard requirement to have Content-Type when interacting with it (no idea why). The client accessing it is https://github.com/SickChill/SickChill/tree/master/lib/cfscrape and uses a copy of cfscrape.
Let me know if I can help in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could set those headers for specifically interacting with cloudflare and preserve the interactions with the intended site.
That's basically what I'm doing in the gist. The only problem is that Cloudflare did require for example the user-agent not to change between requests, so merging might be more problematic. Did you try the gist by any chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense - to preserve the User-Agent as it is a variable that should not frequently change. Is there any risk with picking a couple of headers and making them stay that way?
For example force the header values of User-Agent Connection Upgrade-Insecure-Requests
to be consistent and do kwargs['headers'].update(<contents>)
and that way the user agent, and other important headers remain and we can still pass in custom headers.
I have not had a chance to test it out yet, I won't be in a position to do so for several hours. I can get back to you then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like to mention, that I completely removed lines 67-79 of this PR in the code running at home and was able to get through without a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any more movement on this @lukele? The proposed fix you provided did work, however I still believe we should not be completely ignoring the headers if they are passed in. Having a set of headers that we require to go to cloudflare makes sense, but it should be an update on the object, not a replacement of.
@Anorov do you have any input on this? I know some people are looking for a more prominent fix, rather than trying to cherry-pick this pr :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodyWoolaver sorry for the late reply. Best have a look at cloudscraper (https://github.com/VeNoMouS/cloudscraper). It‘s based on cfscrape but more actively maintained for now. It also includes some good fixes regarding the sometimes presented captcha challenge.
Since the status of the project is currently unclear and the new cloudscraper library retains the almost identical API, implements these changes and adds some improvements, I'm closing this pull request. |
we're all moving to that I take it. |
Seems other who have had this issue have moved to this... https://github.com/VeNoMouS/cloudscraper I don't know if it can be implemented (or helps!). |
Sorry, have been away for a while. @lukele @CodyWoolaver Would you consider this PR ready to merge with respect to the current Cloudflare IUAM challenge? |
I would not, based on the issues I pointed out about hard setting the headers. |
@Anorov since @pro-src version (#233 (comment)) will include the fix to the sometimes appearing captcha challenge (which seems to be related to ssl cipher suites) it's best to wait for their PR. That version might also address @CodyWoolaver header problems. |
@lukele I'm working on it now. 😄 |
@pro-src interesting, I never had the |
Nope, that wasn't me... I am operating under the assumption that |
Ah ok, so maybe my openssl version ( |
@lukele I've reviewed requests usage of urllib3 and the only thing that we might want to do is to ensure that openssl always prefers TLSv1.3 ciphers. Unless you're using an older version of openssl, I can tell you that openssl will prioritize TLSv1.3 ciphers when they're present in the list ciphers. It automatically adjusts the ordering of the provided ciphers depending on whether or not TLSv1.3 ciphers are present in the list. They are present due to request's usage of urllib3 but openssl might not being doing that in older versions. I'm very sure that's the issue with openssl versions prior to Edit: By list I mean the cipher list control string: https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_cipher_list.html @Anorov How do you think we should handle this? We could simply advice people to update their version of openssl if they are encountering a CAPTCHA. If you think we should handle this for the user, should this be addressed in the new PR or would it better to break it up into smaller PRs? Unless recommended to do otherwise, I won't be including anything to alter the SSL in the new PR. |
@pro-src just a thought here, since upgrading OpenSSL is probably not always easily possible. From my testing it looked like only the request receiving the cloudflare challenge required a different set of ciphers. The solution could be submitted using any library and thus using any configuration. So I wonder if it were feasible to have the following workflow: 1.) run a get/post call without special header modification (except for the user agent. Would solve @CodyWoolaver‘s problem I think) |
FYI I assume what cloudflare does with regard to captchas shown based on different TLS version/settings is that it compares TLS behavior with actual HTTP headers, and when the TLS behavior does not match the HTTP headers, the session is more likely to trigger additional protections like a captcha. |
Thanks for the link. Seems to be exactly that. |
@pro-src To add to this, in my testing I have used a consistent UA for Safari on macOS Mojave 10.14.4 with the exact header order (as displayed in tcpdump) and only by trying different cipher options I was able to not be presented with the captcha challenge. I wonder however how cloudflare behave's if the request is coming from a UA which is close to a known one, but not exactly matching. By the way, is there a setting for MITM checking which is configured on your site? Yours was one of the few sites I tested with, were I was always presented with a captcha challenge during testing when using a non-patched cfscrape |
@lukele There is a setting for everything imaginable. Cloudflare's API is comprehensive. I would have you update your openssl and try with many different UA's to see how general the checks actually are. I would load each of the UA's from https://github.com/codemanki/cloudscraper/blob/master/lib/browsers.json and try them. I haven't got a CAPTCHA yet with any of the UA's on python or nodejs and nodejs compiles with it's own openssl. |
@pro-src
That's exactly what I did when trying to figure out which cipher suite works for most configurations :-D |
@lukele Here's the workaround: codemanki/cloudscraper#212 urllib3 is already doing essentially what I've done for nodejs and is why I mentioned how openssl prioritizes the ciphers differently in older versions as the most likely cause. This all bulls down to the signature algorithm extension of the clientHello being checked. If you use TLSv1.1 which doesn't have that extension then you bypass it altogether but ofc. we don't want to do that. Thus the workaround for older versions is going to be to prioritize TLSv1.3 ciphers, it sure looks like that's the conclusion @VeNoMouS came to as well since he didn't remove the transport adapter: VeNoMouS/cloudscraper@7d3731c |
@lukele I'd read this issue: https://github.com/VeNoMouS/cloudscraper/issues/29 and this one which is a little earlier codemanki/cloudscraper#211 |
@pro-src funny enough, we are back to the offending cipher I found causing the original problem in the first place: https://github.com/VeNoMouS/cloudflare-scrape-js2py/issues/28#issuecomment-487330732 But so to understand, what is your proposed solution for older openssl versions? |
@lukele Except that's not a single cipher.
See https://www.openssl.org/docs/man1.0.2/man1/ciphers.html
My preferred solution is to have people update to secure versions of openssl 😄 https://github.com/openssl/openssl/blob/master/ssl/ssl_ciph.c#L1357 I haven't confirmed it yet, I need to look at sources for older versions of openssl. You're welcome to help do that. If the workaround already works, that's great but I still want to confirm the problem exactly. I'm pretty sure that the workaround for older versions of openssl is going to be to prioritize TLSv1.3 ciphers. In terms of code, you'd want to subclass request's transport adapter to set the necessary SSL options. I might not finish my PR today but if not then I promise it will be ready sometime tomorrow: https://github.com/pro-src/cloudflare-scrape/tree/WIP |
I'm in favor of just asking people to update their OpenSSL versions. Ideally we would detect outdated versions and raise an exception explaining why they need to update it and how they can do so. I'm not sure if it's better to raise the exception only upon receiving a captcha while using an outdated version of OpenSSL or to just check the version when the module is imported and raise the exception then. |
if ssl.OPENSSL_VERSION_NUMBER < 0x10101000:
# etc.. |
Closing this pull request since @pro-src's pull request (#234) seems to fix all the issues addressed in this pull request and more. |
No description provided.