-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate network stack away from requests? #9824
Comments
I like the basic idea here, and I think aiohttp might be the better option between the two for our usecases I think the bigger concern/thought I have is whether it makes sense for us to, well, make the entire codebase async or to keep it sync TBH. Both of these should also have sync variants, but the async API has significantly greater benefits AFAICT. |
I don’t think aiohttp has a sync interface, but it’s pretty easy to make async functions sync anyway. |
Also, when combined with its dependencies it's significantly smaller than aiohttp. Sizes:
So the size of the pip wheel would change as follows:
That's a pretty significant size change. So in summary, +1 from me, I prefer httpx as it doesn't use chardet and it's significantly smaller than the alternatives. @pradyunsg I'm not sure what you mean by "full async". I'd go async for all network activity because, well why not? But outside of that, what's to gain? I doubt we could easily run backends async, and everything else is either CPU-bound or file operations, which I don't think async does anything with. But as @uranusjr says, wrapping a chunk of async code in a sync function shouldn't be hard. |
Hi, Before dropping requests. There is still hope, see the ongoing conversation at psf/requests#5797 at least for python 3. Could be an easy fix. |
@Ousret That thread is exactly what prompted me to open this discussion, and I am not too hopeful reading requests maintainers' response on the issue. |
We have a very similar discussion happening in Apache Software Foundation https://issues.apache.org/jira/browse/LEGAL-572 right now. The Unfortunately chardet maintainers also refused to change the licence several times: However we went a bit different route. @ashb from Apache Airflow proposed a PR : psf/requests#5797 where chardet is replaced (in a very soft way - including fallback to So far the requests maintainers were not willing to merge that requests, quoting 'feature-freeze' (see the discussion in the PR) but maybe if enough people and projects will join the request and they will see the critical mass, they might change their mind. We have another, alternative proposal if they don't. We could jointly fork the requests library. The idea is:
This is a lo of hassle, but it seems doable, and easy to semi-automate, and I am willing to spear-head the effort if we go that route. |
TBH, I don't want pip to get involved in the debate over requests and chardet. I consider that a decision for the requests maintainers, and not one we (pip) should attempt to influence. So my vote remains to switch to Regarding going "full async", I've been doing some investigation recently into asyncio, and one potential issue with going that route is that the fast-deps code cannot be used with async http - specifically because the |
Yea, and loop.run_in_executor can be used to wrap blocking I/O operstions: https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools |
That's the wrong way around. What that code needs to do is essentially equivalent to def read(self, n): If the HTTP code is async, it needs to be awaited, so read needs to be async, but it can't be. We can of course use But this level of detail can wait until someone actually wants to work on going full-async. For the purposes of this issue, I think we can go to |
@potiuk I do want to point out that this is a gross misrepresentation of the discussions. We literally - under the terms of the LGPL license - can not change it. The original author is completely unreachable and thus unable to agree to changing the license. We refuse to change the license because we would be violating the terms of the LGPL by doing so, not because we ideologically agree with it more than anything else. Further, the original author (if I remember correctly) licensed the project under LGPL because large swaths of it (the tables of statistics in particular) were taken from Mozilla's LGPL'd universalchardet project that chardet was implemented to mimic in Python. The legal tangles are too deep for anyone to be bothered to deal with. |
Thanks for the clarification @sigmavirus24. Can I ask that the debates over licensing not spill over to the pip tracker - it sounds like chardet and requests are already getting enough traffic over this, and it helps no-one to start another extended and unproductive debate here. The facts are:
|
Also, for what it's worth, I used to frequently talk to Donald about moving to urllib3. I think that should be an option on the table as well |
Sure. This is no problem for me. I won't comment here any longer, I just noticed similar debate in PIP and thought it might raise the severity of the problem. |
At 153KB and no dependencies, I agree we should consider it. The main benefit of httpx is that they claim compatibility with the requests API - which may or may not be important (if we're considering going to async, maybe not as much as I thought). I honestly have no idea how complex our needs are here. |
FWIW, we already depend on urllib3, through requests. |
One feature that the urllib3 (and by virtue requests) handles that HTTPX doesn't is the SecureTransport TLS implementation for macOS+old OpenSSL. If you were to switch to pure urllib3 the biggest hurdle would likely be translating the auto-magic configuration of things like CA certificates, proxies, and TLS verification that Requests handles via environment/system variables. (edited to remove the "well actually" joke) |
👍 to what @sethmlarson said, and also I want to bring up that historically pip has been urllib3's main customer—it helps that our requirements, commitment to our users, and values have always aligned. We've worked with @dstufft, @pradyunsg, and other pip collaborators directly on many occasions to implement functionality to make pip's life easier (even if it was being used from behind requests). We've discussed switching pip to urllib3 directly on many occasions too, but for one reason or another it hasn't happened (mostly lack of urgency). If that is something that pip is interested in doing now, our team is happy to support the transition and continue working with pip to make sure the community's needs are met. :) |
@sethmlarson Not sure what you’re trying to say with the PePy link so I’ll skip that. Aside from CA configuration, pip also makes use of requests’s authentication and custom scheme transport interface. Like the CA thing, they are simply conveniences and nothing technically challenging, but someone will need to re-implement them if we’re going to use urllib3. |
I want to point out that the reason for asking Chardet to change their license seems to be nothing but a misunderstanding. The best reason I could find is this;
This misunderstands the obligations that the LGPL has, it does NOT require you to do anything. It absolutely does not require you to create or provide the ability to swap out the LGPL library. What you likely misunderstood is that LGPL requires the user to be technically able (assuming they have competentce) to make changes to an LGPL library and use that changed one instead. It doesn't mean you need to help them do it, just that you should not obstruct them from accomplishing that goal. The fact that users can download the LGPL library on github, can alter it and can reproduce your build steps to create a binary is enough. Asking projects to give up their chosen license for this reason is really not needed and creates unneeded worry and strive. |
You analysis applies for most people, when a library depends on chardet. But pip is redistributing chardet as a part of pip, and pip is distributed under the MIT license, which is the problem. I know it is still possible to argue that having LGPL code in a pip distribution is OK, but none of the pip maintainers is in the position to contribute on this if someone decides to bring this to court. So unless someone is willing to be that contributor to protect pip from liability, I don't think it's a good idea for both maintainers and users to have the risk if we can avoid it. |
We could also patch our internal requests to use https://github.com/Ousret/charset_normalizer instead? It has a compatible API, and seems to be intentionally developed to be a drop-in replacement. |
Just a comment here - i think we are rather close to convince requests maintainers about replacing chardet with MIT-based charset-normalizer . We've done a series of test and fixed a number of problems pointed out by the maintainers psf/requests#5797 and if we mange to do that, the whole chardet/LGPL problem might be solved automatically (charset-normalizer is MIT licensed and has no other dependencies any more after the fixes we've done together with @Ousret - the author of charset-normalizer. Not asking for support here, just wanted to let you know that this is a possible solution that might materialize soon |
Why do you think that is a problem? If you have a reason to think this is a problem, please explain your reasoning.
Who are you afraid of sueing you? The original author can not be found. Only the copyright owner can sue you or order others to sue you. |
Yep. The PR above works out of the box and is heavily tested so you can for sure use charset-normalizer |
I find it sad to hear that the knowledge about LGPL and licenses is such that people fear using them. For exactly the purpose they are meant: as a component of your application that you don't intent to fork but simply use from upstream. This is the purpose of the LGPL license and the only requirement you have is to make clear to users where they can download the sources of this library and naturally the license and they should be able to replace it. Which means competent users should have the opportunity to replace it and run the result on their own hardware. This is really quite a low bar, it just means you can't have a closed-source build system for instance, or ship it on hardware that refuses to run apps that are not signed by you guys. |
The main reason for Apache Software Foundation of not allowing LGPL as a mandatory dependency is that LGPL imposes limitations not on ASF itself and not on the projects but on the users of our projects. The commercial users of projects are limited because they & for example cannot redistribute our software further, repackaging it in any way they want (LGPL forbids packaging the software in single binary in this case for example without releasing your own code that you added). There was an extensive discussion here for example https://issues.apache.org/jira/browse/LEGAL-572 and the current conclusion of the ASF is that LGPL software cannot be mandatory when you release ASF software (it can be optional though). Not that this is not Licencing Limitation of the Apache 2.0 licence. This is a Policy of ASF. A deliberate and conscious decision that we do not want to limit our commercial users in the ways they want to use our software. This is a recognition that we care for the needs of the whole community - including commercial users and stakeholders, which gives those users and stakeholders a firm assurance that they can use our software without limits that might limit their business models. Not sure if that is of a concern for the PIP maintainers but i do not think it is a matter of licence compliance or legality, but more about whether similar concerns are of an importance for them. Not all projects have the same concerns and policies, they can happily live with the LGPL dependencies. But some other projects and organisations might make different choices (like ASF did). |
It is good to hear that there is a consciousness with regards to the problems not being legal problems. The problems are created by choice, by the age old difference between the BSD-style of licenses and the copyleft style. LGPL was designed to be a bridge between those two. Allowing interoperability. It is legally possible for you to use it. In that difference of opinion it is honestly just that, an opinion and you are free to have one, as do we all. The ideas from the Apache foundation sound political to my personal ears. The issue you refer is that a company user would mistakenly alter not PIP but the upstream library (which they would have to download the sources from github and compile themselves instead of just linking to the one you ship) and then add changes they refuse to open source to that exact LGPL library and redistribute it! (most companies don't redistribute the software they build). This is not a real world problem being solved. The PIP project can choose to reject LGPL. That is their choice, not some rule or legal issue. Lets be honest about this. Copyleft is a choice that ensures software stays open and free, the ASF idea you quoted is doing the opposite: to allow companies to take it and close it and not become part of the community of contributors. You ask others to write software for the benefit of companies that don't intent to contribute back. That is the entire point, to help companies that don't intend to contribute back. You can work like that, but please don't ask others to. Please stop pressuring projects under false pretenses to change license. |
Can I request that people refrain from turning this issue into a licensing debate, please? The pip maintainers are aware of the discussions, and of the varying opinions. Thanks to people who have commented, but we don't want to get involved in a licensing argument, on either side of the issue. Pip will ultimately make a choice over whether to switch from requests. That choice will be made based on a number of factors, of which licensing is only one. And it will be a choice, not the requirement of any sort of license. This issue is for the general question of how and whether we make that switch, and I'd prefer it if we focused on other aspects of the discussion from now on. On a purely personal note, I will say that the licensing discussion has simply made me more inclined to switch - not because of any legal reasons, not even because of questions of what our users might want to do with pip, but simply because I'm sick of these sort of debates flaring up, and if not depending on LGPL software will get rid of at least one reason for them occurring, then that's a win in my book. |
@pfmoore i hope you did not see it as a pressure any more. Not intended if so. I understood that from earlier discussion (and perfectly agree with) that it is entirely up to PIP maintainers to decide. I just (i hope I was clear about it) just stated the reasons ASF made their choice. I am not an 'evangelist' of that approach and I am very far from telling others that they are doing wrong, not following it. Just wanted to explain ASF reasoning. There are many other approaches (starting from GNU/FSF or recent ElasticSearch/Mongo etc.). I am not judging nor trying to tell 'ASF is better' or 'worse'. Just wanted to explain the reasoning. Sorry if you take it as preaching or dragging in. Just wanted to be helpful in making the reasoning clear ans showing that this might - hopefully - change soon. No more comments from me on this - it is up to PIP maintainers to decide their approach :( . One thing left, i am happy to let.you know here if/when requests migrate away of chardet. |
As a (hopeful) bookend to the licensing discussion, requests has now switched from |
Given that the licensing issue isn’t a sticking point, I’m thinking about the next steps here:
|
Closing this out since this is unlikely to happen at this point. |
I had a couple of conversations regarding Requests that makes me think about replacing it with something else. While Requests is easily the most battle-tested and mature (pure Python) HTTP client implementation available, it does comes with issues.
The most problematic part is licensing—one of its dependencies, chardet, is licensed under LGPL. This means pip’s MIT licensing is considered ambiguous (and potentially invalid) by some since it vendors chardet.
The other is project maintenance. HTTP is an ever-changing landscape these days, and Requests, being a mature lirbary that priorities stability, is slow to adapt new features, such as HTTP/2 (not to mention HTTP/3!) and coroutines (async/await).
Historically, pip does not really have a choice since requests is the only viable option that supports Python 2.7. But now we’re 3.6+ we have a lot more choices, e.g. aiohttp and httpx. I am not sure whether either of them is better than requests in either regards (especially they have a lot more dependencies than requests), but I feel this is a good timing to throw out this thought.
The text was updated successfully, but these errors were encountered: