Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove distutils.util usage from packaging.tags #349

Closed
wants to merge 5 commits into from

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Nov 7, 2020

I basically copied distutils.util.get_host_platform(), run Black on it, and make the code slightly more modern.

There are several possible refactoring available, e.g. move the Windows and Linux logic out of the function to short-circuit some logic, and use function arguments to avoid os.uname() being called repeatedly. But each of these adds chance for things to break, and I feel it’s better to first include an implementation that guarentees no backward incompatibility before we try to improve it.

Fix #327.

@mattip
Copy link
Contributor

mattip commented Dec 11, 2020

This is in case PEP 632 to deprecate distutils goes through?

@uranusjr
Copy link
Member Author

The main reason is because distutils is a poor source to get this information, its internals being undocumented and may break without warning (it already did for some people). See #327 (comment).

@puetzk
Copy link

puetzk commented Dec 11, 2020

My interest started because distutils has already made a breaking change, and in Python >= 3.8 pip no longer works in the vcvarsall "Developer Prompt for Visual Studio XXX" environment - it tries to install wheels matching the MSVC target architecture, rather than the python.exe into which it's installing things. See https://bugs.python.org/issue38989 for my report of this end-symptom, and pypa/pip#8649 for someone else's.

@zooba has indicated, both on python.org and on github that he considers distutils.util.get_host_platform() a private function that should not be used for this purpose. Of course, he's also the author of PEP 632, so it's probably not unrelated to propsing deprecation and removal in general.

@zooba
Copy link

zooba commented Jan 4, 2021

@zooba has indicated, both on python.org and on github that he considers distutils.util.get_host_platform() a private function that should not be used for this purpose. Of course, he's also the author of PEP 632, so it's probably not unrelated to propsing deprecation and removal in general.

Yeah, the recommendation to not use the function is because it's going to be deprecated soon. Distutils has poor platform detection, so anything else will be an improvement, and honestly I like packaging.tags as a location for it, since then other tools can ask "which platform tag should I build for" (since e.g. manylinux has different kinds of rules from win32). We're progressively moving packaging stuff out of the stdlib anyway, so that it can be divorced from the language version.

Base automatically changed from master to main January 21, 2021 19:20
@zooba
Copy link

zooba commented Jan 28, 2021

@uranusjr How can we help this along? People keep being bitten by pip looking at the cross-compile target rather than the current platform.

@uranusjr
Copy link
Member Author

I've done all I could, someone needs to review and merge the PR.

import sysconfig
import _osx_support

osname, release, machine = _osx_support.get_platform_osx(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only query is whether this should rely on _osx_support (which I assume is a core CPython module?) or reproduce whatever it's doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it’s a helper module in stdlib. The function is quite complicated and I’d very much keep referencing the stdlib if possible. Not sure who to ask about whether this is a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the equivalent public API for this is sysconfig.get_platform(), so it should be an easy swap. It might even make a suitable fallback for all cases after you've gotten past the initial tests. I suspect you'll still want your own version of it for backporting fixes/changes, but if the stdlib functionality is fine for this bit then you may as well use as much as you can/like.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I only compared to the version of sysconfig in master... it might be that there are already improvements that aren't in 3.9, and so you'll still need the logic you've got here. But the _osx_support reference can't really be any worse in earlier versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git blame the everything that changed in that function during the recent 5 years are about Windows (ARM support), AIX improvements, and dropping BSD/OS support, so it’s probably safe. I’ll switch that API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the same also applies to _aix_support (prior to 3.9 it’s just {osname}-{release}-{machine}, and was expanded to support PEP 425), so I replaced that as well.

_osx_support.get_platform_osx() only uses three configs:

* MACOSX_DEPLOYMENT_TARGET
* CFLAGS
* _OSX_SUPPORT_INITIAL_CFLAGS

All of these should be identical from both sources.
Black-ify, and replace %-formatting with str.format()
@uranusjr
Copy link
Member Author

uranusjr commented Feb 4, 2021

On further inspection, _get_host_platform is almost identical to sysconfig.get_platform (and the two have been updated simultaneously for a long while), except an ever so slight difference:

sysconfig: https://github.com/python/cpython/blob/master/Lib/sysconfig.py#L656-L671
distutils: https://github.com/python/cpython/blob/master/Lib/distutils/util.py#L38-L54

In distutils, the _PYTHON_HOST_PLATFORM override happens before if os.name != "posix" or not hasattr(os, 'uname'), but sysconfig puts it after the condition. git blame tells me this has been the case since the environ was first introduced in 2012: python/cpython@1abe1c5. Is there a reason to this? Does it cause functional differences? If not, the entire function can simply be replaced.

@brettcannon
Copy link
Member

If the override isn't applied first, then applying it differently on various platforms seems odd, so I would go with the sysconfig approach for consistency.

@puetzk
Copy link

puetzk commented Feb 5, 2021

I see nothing in https://bugs.python.org/issue14330 indicating was intentional either. It seems to have been originally intended as an internal flag for cpython's own build system? But now seems to have a bit broader scope (e.g. dh-python's pybuild sets it, and debian documents it as part of cross-building extensions: https://wiki.debian.org/Python/MultiArch, and google found NumPy build scripts for iOS using it).

It is potentially a functional difference: sysconfig only honors _PYTHON_HOST_PLATFORM for posix systems with os.uname, distutils honors it for everything except nt. At least for CPython though, there don't actually seem to be any other options besides nt and posix. So it seems this should make little difference in practice - if it's os.name is nt both ignore it, if it's posix they both honor it, unless there really is a still-supported posix system where HAVE_UNAME is not defined, and thus os.uname is missing.

Which seems unlikely; uname dates all the way back to SVr4 and was standardized by POSIX.1-1988, though BSD didn't get it until 4.4BSD in 1994. The autoconf check and posixmodule.c here do in fact seem to date all the way back to pre-alpha-1.0.0: python/cpython@c39de5f. Back in 1992 it might have made sense to care about 4.3BSD :-) But I don't think there are any early-90s unices still supported by CPython 3.x, so should be moot.

The documentation suggests java is a valid 3rd option, presumably intended for the maybe-happening-someday jython3? All the other historical options seem to have been dropped in between python 2 and 3. But if it's java cross-compiling is kind of nonsensical. And the unfinished 2016 attempt at jython3 had its own sysconfig.py that deleted _PYTHON_HOST_PLATFORM anyway. Same would presumably go for any other non-CPython implementation, their sysconfig.py would be tweaked to their needs anyway.

@puetzk
Copy link

puetzk commented Feb 5, 2021

Epistemic status: reasonably comfortable with unix history, but no personal knowledge history here, just did doing some software archaeology to see if I could find anything interesting. @doko42 (the original author of _PYTHON_HOST_PLATFORM) seems to still be active in with an occasional cpython/setuptools review, so linking just in case he happens to have have made the sysconfig and distutils implementations different on purpose and (even more unlikely) still remembers why :-)

@uranusjr
Copy link
Member Author

uranusjr commented Feb 5, 2021

I’ve filed #396 so we have something to merge if we decide the two implementations are indeed equivalent.

@ronaldoussoren
Copy link
Contributor

I haven't fully read up on this PR yet, but I do wonder if it wouldn't be better to move this functionality into the stdlib module sysconfig (with a clear and documented interface).

The function is closely related to other functionality in sysconfig (such as sysconfig.get_platform). This would also solve the issue you're having with relying on _osx_support.

BTW. _osx_support was introduced to centralise some functions that are used by both distutils and sysconfig.

@mattip
Copy link
Contributor

mattip commented Feb 5, 2021

Pinging @fangerer for a GraalPython perspective: I don't know if they change os.uname ?

@uranusjr
Copy link
Member Author

uranusjr commented Feb 5, 2021

@ronaldoussoren There is already a sysconfig.get_platform, but the behaviour is (very slightly) different. The main point to decide here is whether we should

  • Have our own implementation that exactly replicates distutils.util.get_host_platform.
  • Use sysconfig.get_platform as-is.
  • Change sysconfig.get_platform to match the behaviour of distutils.util.get_host_platform.

@ronaldoussoren
Copy link
Contributor

The 4th option is to add sysconfig.get_host_platform if the difference in behaviour is important.

The sysconfig module was extracted from distutils and as such has a similar problem in that its specification is not very clear. I'd be careful with changing the behaviour of sysconfig.get_platform, there's a clear risk that this breaks some other project.

@puetzk
Copy link

puetzk commented Feb 5, 2021

graalpython appears to identify itself as os.name = 'posix', by implementing a builtin module named posix, including an implementation of os.uname.

So it would seem to be another one where the sysconfig and distutils implementation difference would not matter.

@doko42
Copy link

doko42 commented Feb 5, 2021

Relying on the os module would break cross-building extensions. I didn't look into cross build recently, so I don't know if it's currently working.

The idea is to run the build python (running on your build machine) with the sysconfig module from your host/target machine, assuming that the sysconfig/sysconfig_data modules provide all information for the host/target. That cannot work, if people start relying on the builtin os module.

@puetzk
Copy link

puetzk commented Feb 5, 2021

@uranusjr had some discussions bout these different sources (before this PR was opened) at https://discuss.python.org/t/pep-425-platform-tag/5157/11

Just to link some more (probably stale) context, which many of the folks commenting have presumably seen, but I just ran across.

@brettcannon
Copy link
Member

This might be one of those situations where we won't know the ramifications until we ask people widely to test or we do a release and wait to see what the complaints are.

@pradyunsg
Copy link
Member

we do a release and wait to see what the complaints are.

I'm thinking let's do this but with #396. The behavior change feels like it has a low risk of breakage, with most people affected knowing their nuances better than us -- but I have no data/reasoning behind "why" beyond intuition TBH.

Relying on the os module would break cross-building extensions.

No? This is only used when the user doesn't provide a specific platform that they're generating these tags for. See the stack "down" from https://github.com/uranusjr/packaging/blob/no-distutils/packaging/tags.py#L275.

@uranusjr
Copy link
Member Author

uranusjr commented Feb 6, 2021

Note that the current implementation in the main branch already does not work particularly well in cross-building scenarios, since it relies on sysconfig values obtained internally. The experience can probably be made better if those values can ba specified manually by the user, but that’s a completely different feature request and should not be discussed with the topic at hand here (replacing distutils).

@brettcannon
Copy link
Member

@pradyunsg your feeling is my feeling as well. No one has any good feeling as to why anything is the way it is, so I say go with the simpler solution. 😄

@uranusjr
Copy link
Member Author

This has been replaced by #396.

@uranusjr uranusjr closed this Apr 24, 2021
@uranusjr uranusjr deleted the no-distutils branch December 10, 2021 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tags.cpython_tags returns an incorrect value when run from a 32-bit VS developer prompt on a 64-bit Windows
8 participants