Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate network stack away from requests? #9824

Closed
uranusjr opened this issue Apr 24, 2021 · 32 comments
Closed

Migrate network stack away from requests? #9824

uranusjr opened this issue Apr 24, 2021 · 32 comments

Comments

@uranusjr
Copy link
Member

uranusjr commented Apr 24, 2021

I had a couple of conversations regarding Requests that makes me think about replacing it with something else. While Requests is easily the most battle-tested and mature (pure Python) HTTP client implementation available, it does comes with issues.

The most problematic part is licensing—one of its dependencies, chardet, is licensed under LGPL. This means pip’s MIT licensing is considered ambiguous (and potentially invalid) by some since it vendors chardet.

The other is project maintenance. HTTP is an ever-changing landscape these days, and Requests, being a mature lirbary that priorities stability, is slow to adapt new features, such as HTTP/2 (not to mention HTTP/3!) and coroutines (async/await).

Historically, pip does not really have a choice since requests is the only viable option that supports Python 2.7. But now we’re 3.6+ we have a lot more choices, e.g. aiohttp and httpx. I am not sure whether either of them is better than requests in either regards (especially they have a lot more dependencies than requests), but I feel this is a good timing to throw out this thought.

@pradyunsg
Copy link
Member

I like the basic idea here, and I think aiohttp might be the better option between the two for our usecases

I think the bigger concern/thought I have is whether it makes sense for us to, well, make the entire codebase async or to keep it sync TBH. Both of these should also have sync variants, but the async API has significantly greater benefits AFAICT.

@uranusjr
Copy link
Member Author

uranusjr commented Apr 24, 2021

I don’t think aiohttp has a sync interface, but it’s pretty easy to make async functions sync anyway.

@pfmoore
Copy link
Member

pfmoore commented Apr 24, 2021

httpx has a similar interface to requests, AIUI, but I'm not sure how important that is (we only use a small subset of the requests API). It also has both async and sync variants. And aiohttp depends on chardet, so wouldn't actually solve the original problem 🙁

Also, when combined with its dependencies it's significantly smaller than aiohttp.

Sizes:

  • requests: 597KB
  • httpx: 272KB
  • aiohttp: 1122KB

So the size of the pip wheel would change as follows:

  • requests: 100%
  • httpx: 78%
  • aiohttp: 134%

That's a pretty significant size change.

So in summary, +1 from me, I prefer httpx as it doesn't use chardet and it's significantly smaller than the alternatives.

@pradyunsg I'm not sure what you mean by "full async". I'd go async for all network activity because, well why not? But outside of that, what's to gain? I doubt we could easily run backends async, and everything else is either CPU-bound or file operations, which I don't think async does anything with.

But as @uranusjr says, wrapping a chunk of async code in a sync function shouldn't be hard.

@Ousret
Copy link

Ousret commented May 10, 2021

Hi,

Before dropping requests. There is still hope, see the ongoing conversation at psf/requests#5797 at least for python 3. Could be an easy fix.

@uranusjr
Copy link
Member Author

@Ousret That thread is exactly what prompted me to open this discussion, and I am not too hopeful reading requests maintainers' response on the issue.

@potiuk
Copy link
Contributor

potiuk commented May 10, 2021

We have a very similar discussion happening in Apache Software Foundation https://issues.apache.org/jira/browse/LEGAL-572 right now. The chardet transitive dependency has been brought to our attention as something that is not compliant with the Apache Software Foundation policies and there are ~ 50 projects of the ASF that will have to remove the requests dependency.

Unfortunately chardet maintainers also refused to change the licence several times:

However we went a bit different route. @ashb from Apache Airflow proposed a PR : psf/requests#5797 where chardet is replaced (in a very soft way - including fallback to chardet if installed) with MIT-licensed charset-normalizer.

So far the requests maintainers were not willing to merge that requests, quoting 'feature-freeze' (see the discussion in the PR) but maybe if enough people and projects will join the request and they will see the critical mass, they might change their mind.

We have another, alternative proposal if they don't. We could jointly fork the requests library.

The idea is:

  • we create an organisation in GitHub (we can even name it "requests-NonL-GPL" or similar).
  • we fork requests there and semi-automatically apply the PR from Ash every time they release - and we release our own version right after (also release it to PyPI with requests-non-lgpl name )
  • we fork there also any library that we need that uses requests and modify it to use our fork (and release to PyPI) again with modified name. Or we ask the maintainers of the library to switch to our non-lgpl depending version of requests. We can semi-automatically release those libraries when they are released.
  • we ask all the ASF projects and anyone else who have problem with LGPL chardet to switch to use those non-lgpl depending libraries

This is a lo of hassle, but it seems doable, and easy to semi-automate, and I am willing to spear-head the effort if we go that route.

@pfmoore
Copy link
Member

pfmoore commented May 10, 2021

TBH, I don't want pip to get involved in the debate over requests and chardet. I consider that a decision for the requests maintainers, and not one we (pip) should attempt to influence.

So my vote remains to switch to httpx, but I don't feel that pip should treat this as a matter of significant urgency, just something that we should do in due course. The license question has been around for years now, any sudden urgency is IMO artificial "because we just noticed it".

Regarding going "full async", I've been doing some investigation recently into asyncio, and one potential issue with going that route is that the fast-deps code cannot be used with async http - specifically because the LazyZipOverHTTP class implements a file object's read() function via http, and read() cannot be async, as that's not how the file protocol is defined. However, I'm not an async expert here and I'd love to be proved wrong on this point.

@pradyunsg
Copy link
Member

read() cannot be async

Yea, and loop.run_in_executor can be used to wrap blocking I/O operstions: https://docs.python.org/3/library/asyncio-eventloop.html#executing-code-in-thread-or-process-pools

@pfmoore
Copy link
Member

pfmoore commented May 10, 2021

Yea, and loop.run_in_executor can be used to wrap blocking I/O operstions

That's the wrong way around. What that code needs to do is essentially equivalent to

def read(self, n):
# Download some bytes over HTTP
return self.buffer[current:current+n]

If the HTTP code is async, it needs to be awaited, so read needs to be async, but it can't be. We can of course use asyncio.run (here), but that section notes "This function cannot be called when another asyncio event loop is running in the same thread", so it'll be tricky to get this right. Not impossible, but certainly tricky.

But this level of detail can wait until someone actually wants to work on going full-async. For the purposes of this issue, I think we can go to httpx initially, and look at async later if we want to.

@sigmavirus24
Copy link
Member

Unfortunately chardet maintainers also refused to change the licence several times:

@potiuk I do want to point out that this is a gross misrepresentation of the discussions. We literally - under the terms of the LGPL license - can not change it. The original author is completely unreachable and thus unable to agree to changing the license. We refuse to change the license because we would be violating the terms of the LGPL by doing so, not because we ideologically agree with it more than anything else. Further, the original author (if I remember correctly) licensed the project under LGPL because large swaths of it (the tables of statistics in particular) were taken from Mozilla's LGPL'd universalchardet project that chardet was implemented to mimic in Python. The legal tangles are too deep for anyone to be bothered to deal with.

@pfmoore
Copy link
Member

pfmoore commented May 10, 2021

Thanks for the clarification @sigmavirus24.

Can I ask that the debates over licensing not spill over to the pip tracker - it sounds like chardet and requests are already getting enough traffic over this, and it helps no-one to start another extended and unproductive debate here.

The facts are:

  1. The situation regarding the chardet license is only one of the reasons we're having this discussion about changing our HTTP library.
  2. Nothing will change overnight, so "we're looking at the question" is the only answer anyone will get for the immediate future.
  3. Getting us entangled in licensing debates will only slow progress on any change, as people will get burned out and find other, more enjoyable, things to spend their time on.

@sigmavirus24
Copy link
Member

Also, for what it's worth, I used to frequently talk to Donald about moving to urllib3. I think that should be an option on the table as well

@potiuk
Copy link
Contributor

potiuk commented May 10, 2021

Can I ask that the debates over licensing not spill over to the pip tracker - it sounds like chardet and requests are already getting enough traffic over this, and it helps no-one to start another extended and unproductive debate here.

Sure. This is no problem for me. I won't comment here any longer, I just noticed similar debate in PIP and thought it might raise the severity of the problem.

@pfmoore
Copy link
Member

pfmoore commented May 10, 2021

Also, for what it's worth, I used to frequently talk to Donald about moving to urllib3. I think that should be an option on the table as well

At 153KB and no dependencies, I agree we should consider it. The main benefit of httpx is that they claim compatibility with the requests API - which may or may not be important (if we're considering going to async, maybe not as much as I thought). I honestly have no idea how complex our needs are here.

@pradyunsg
Copy link
Member

FWIW, we already depend on urllib3, through requests.

@sethmlarson
Copy link
Contributor

sethmlarson commented May 10, 2021

One feature that the urllib3 (and by virtue requests) handles that HTTPX doesn't is the SecureTransport TLS implementation for macOS+old OpenSSL. If you were to switch to pure urllib3 the biggest hurdle would likely be translating the auto-magic configuration of things like CA certificates, proxies, and TLS verification that Requests handles via environment/system variables.

(edited to remove the "well actually" joke)

@shazow
Copy link

shazow commented May 10, 2021

👍 to what @sethmlarson said, and also I want to bring up that historically pip has been urllib3's main customer—it helps that our requirements, commitment to our users, and values have always aligned.

We've worked with @dstufft, @pradyunsg, and other pip collaborators directly on many occasions to implement functionality to make pip's life easier (even if it was being used from behind requests).

We've discussed switching pip to urllib3 directly on many occasions too, but for one reason or another it hasn't happened (mostly lack of urgency).

If that is something that pip is interested in doing now, our team is happy to support the transition and continue working with pip to make sure the community's needs are met. :)

@uranusjr
Copy link
Member Author

@sethmlarson Not sure what you’re trying to say with the PePy link so I’ll skip that.

Aside from CA configuration, pip also makes use of requests’s authentication and custom scheme transport interface. Like the CA thing, they are simply conveniences and nothing technically challenging, but someone will need to re-implement them if we’re going to use urllib3.

@zander
Copy link

zander commented Jun 4, 2021

I want to point out that the reason for asking Chardet to change their license seems to be nothing but a misunderstanding.

The best reason I could find is this;

chardet/chardet#36 (comment)

LGPL makes it challenging for including chardet (or any of the many packages that transitively depend on chardet such as requests) inside of a binary due to the restriction that users need to be able to have the ability to swap out the LGPL library for another of their choosing, something infeasible/very tricky for a binary distribution.

This misunderstands the obligations that the LGPL has, it does NOT require you to do anything. It absolutely does not require you to create or provide the ability to swap out the LGPL library.

What you likely misunderstood is that LGPL requires the user to be technically able (assuming they have competentce) to make changes to an LGPL library and use that changed one instead. It doesn't mean you need to help them do it, just that you should not obstruct them from accomplishing that goal.

The fact that users can download the LGPL library on github, can alter it and can reproduce your build steps to create a binary is enough.

Asking projects to give up their chosen license for this reason is really not needed and creates unneeded worry and strive.

@uranusjr
Copy link
Member Author

uranusjr commented Jun 4, 2021

You analysis applies for most people, when a library depends on chardet. But pip is redistributing chardet as a part of pip, and pip is distributed under the MIT license, which is the problem.

I know it is still possible to argue that having LGPL code in a pip distribution is OK, but none of the pip maintainers is in the position to contribute on this if someone decides to bring this to court. So unless someone is willing to be that contributor to protect pip from liability, I don't think it's a good idea for both maintainers and users to have the risk if we can avoid it.

@pradyunsg
Copy link
Member

We could also patch our internal requests to use https://github.com/Ousret/charset_normalizer instead? It has a compatible API, and seems to be intentionally developed to be a drop-in replacement.

@potiuk
Copy link
Contributor

potiuk commented Jun 4, 2021

Just a comment here - i think we are rather close to convince requests maintainers about replacing chardet with MIT-based charset-normalizer . We've done a series of test and fixed a number of problems pointed out by the maintainers psf/requests#5797 and if we mange to do that, the whole chardet/LGPL problem might be solved automatically (charset-normalizer is MIT licensed and has no other dependencies any more after the fixes we've done together with @Ousret - the author of charset-normalizer.

Not asking for support here, just wanted to let you know that this is a possible solution that might materialize soon

@zander
Copy link

zander commented Jun 4, 2021

But pip is redistributing chardet as a part of pip, and pip is distributed under the MIT license, which is the problem.

Why do you think that is a problem?
It happens all the time, Simple example: The Qt libraries are LGPL, look at the enormous amount of opensource apps that are shipped with it that are not the same license.

If you have a reason to think this is a problem, please explain your reasoning.

I don't think it's a good idea for both maintainers and users to have the risk if we can avoid it.

Who are you afraid of sueing you? The original author can not be found. Only the copyright owner can sue you or order others to sue you.

@potiuk
Copy link
Contributor

potiuk commented Jun 4, 2021

We could also patch our internal requests to use https://github.com/Ousret/charset_normalizer instead? It has a compatible API, and seems to be intentionally developed to be a drop-in replacement.

Yep. The PR above works out of the box and is heavily tested so you can for sure use charset-normalizer

@zander
Copy link

zander commented Jun 4, 2021

I find it sad to hear that the knowledge about LGPL and licenses is such that people fear using them. For exactly the purpose they are meant: as a component of your application that you don't intent to fork but simply use from upstream.

This is the purpose of the LGPL license and the only requirement you have is to make clear to users where they can download the sources of this library and naturally the license and they should be able to replace it. Which means competent users should have the opportunity to replace it and run the result on their own hardware.

This is really quite a low bar, it just means you can't have a closed-source build system for instance, or ship it on hardware that refuses to run apps that are not signed by you guys.

@potiuk
Copy link
Contributor

potiuk commented Jun 4, 2021

But pip is redistributing chardet as a part of pip, and pip is distributed under the MIT license, which is the problem.

Why do you think that is a problem?
It happens all the time, Simple example: The Qt libraries are LGPL, look at the enormous amount of opensource apps that are shipped with it that are not the same license.

If you have a reason to think this is a problem, please explain your reasoning.

I don't think it's a good idea for both maintainers and users to have the risk if we can avoid it.

Who are you afraid of sueing you? The original author can not be found. Only the copyright owner can sue you or order others to sue you.

The main reason for Apache Software Foundation of not allowing LGPL as a mandatory dependency is that LGPL imposes limitations not on ASF itself and not on the projects but on the users of our projects. The commercial users of projects are limited because they & for example cannot redistribute our software further, repackaging it in any way they want (LGPL forbids packaging the software in single binary in this case for example without releasing your own code that you added). There was an extensive discussion here for example https://issues.apache.org/jira/browse/LEGAL-572 and the current conclusion of the ASF is that LGPL software cannot be mandatory when you release ASF software (it can be optional though).

Not that this is not Licencing Limitation of the Apache 2.0 licence. This is a Policy of ASF. A deliberate and conscious decision that we do not want to limit our commercial users in the ways they want to use our software. This is a recognition that we care for the needs of the whole community - including commercial users and stakeholders, which gives those users and stakeholders a firm assurance that they can use our software without limits that might limit their business models.

Not sure if that is of a concern for the PIP maintainers but i do not think it is a matter of licence compliance or legality, but more about whether similar concerns are of an importance for them.

Not all projects have the same concerns and policies, they can happily live with the LGPL dependencies. But some other projects and organisations might make different choices (like ASF did).

@zander
Copy link

zander commented Jun 4, 2021

It is good to hear that there is a consciousness with regards to the problems not being legal problems. The problems are created by choice, by the age old difference between the BSD-style of licenses and the copyleft style. LGPL was designed to be a bridge between those two. Allowing interoperability. It is legally possible for you to use it.

In that difference of opinion it is honestly just that, an opinion and you are free to have one, as do we all.

The ideas from the Apache foundation sound political to my personal ears. The issue you refer is that a company user would mistakenly alter not PIP but the upstream library (which they would have to download the sources from github and compile themselves instead of just linking to the one you ship) and then add changes they refuse to open source to that exact LGPL library and redistribute it! (most companies don't redistribute the software they build).

This is not a real world problem being solved.

The PIP project can choose to reject LGPL. That is their choice, not some rule or legal issue. Lets be honest about this.

Copyleft is a choice that ensures software stays open and free, the ASF idea you quoted is doing the opposite: to allow companies to take it and close it and not become part of the community of contributors. You ask others to write software for the benefit of companies that don't intent to contribute back. That is the entire point, to help companies that don't intend to contribute back.

You can work like that, but please don't ask others to. Please stop pressuring projects under false pretenses to change license.

@pfmoore
Copy link
Member

pfmoore commented Jun 4, 2021

Can I request that people refrain from turning this issue into a licensing debate, please? The pip maintainers are aware of the discussions, and of the varying opinions. Thanks to people who have commented, but we don't want to get involved in a licensing argument, on either side of the issue.

Pip will ultimately make a choice over whether to switch from requests. That choice will be made based on a number of factors, of which licensing is only one. And it will be a choice, not the requirement of any sort of license. This issue is for the general question of how and whether we make that switch, and I'd prefer it if we focused on other aspects of the discussion from now on.

On a purely personal note, I will say that the licensing discussion has simply made me more inclined to switch - not because of any legal reasons, not even because of questions of what our users might want to do with pip, but simply because I'm sick of these sort of debates flaring up, and if not depending on LGPL software will get rid of at least one reason for them occurring, then that's a win in my book.

@potiuk
Copy link
Contributor

potiuk commented Jun 4, 2021

@pfmoore i hope you did not see it as a pressure any more. Not intended if so.

I understood that from earlier discussion (and perfectly agree with) that it is entirely up to PIP maintainers to decide. I just (i hope I was clear about it) just stated the reasons ASF made their choice. I am not an 'evangelist' of that approach and I am very far from telling others that they are doing wrong, not following it. Just wanted to explain ASF reasoning. There are many other approaches (starting from GNU/FSF or recent ElasticSearch/Mongo etc.). I am not judging nor trying to tell 'ASF is better' or 'worse'. Just wanted to explain the reasoning.

Sorry if you take it as preaching or dragging in. Just wanted to be helpful in making the reasoning clear ans showing that this might - hopefully - change soon.

No more comments from me on this - it is up to PIP maintainers to decide their approach :( . One thing left, i am happy to let.you know here if/when requests migrate away of chardet.

@domdfcoding
Copy link
Contributor

As a (hopeful) bookend to the licensing discussion, requests has now switched from chardet to charset_normaliser (see psf/requests#5797)

@pradyunsg
Copy link
Member

Given that the licensing issue isn’t a sticking point, I’m thinking about the next steps here:

  • Do we still care? The requests project is under the PSF umbrella now, and I imagine that is a good soft-indicator that it’s likely not going to be killed off / left unmaintained.
  • Have we checked with httpx folks about how they feel about the constraints of being vendored in pip?
  • Do we know what the complexity of migrating from requests to pure urllib3 would be?

@pradyunsg
Copy link
Member

Closing this out since this is unlikely to happen at this point.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants