-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve conda variant handling #448
Conversation
Does this handle more than one feature? |
@beckermr in a way, yes, because in one step we're sorting based on the number of But in the global minimization step we don't take into account how many On the other hand, this entire "track_feature" is a bit of a weird artifact and maybe at some point we can come up with a revamped, better scheme... |
I do think track features is very useful since it sits above version number ordering. Whatever we replace it with needs this same property I think. |
c734a16
to
316da14
Compare
fixes #447 |
Unfortunately this patch didn't work as well as I had hoped (especially in combination with the timestamp maximization which shuffled the order of packages a bit differently) :) The biggest "problem" is that we do not have proper metadata since the "track_features" is not exported in many cases. For example, I might try to install numpy, and there is one As far as I understand we have two options:
Another, third option, would be to figure out how to globally add metadata on variants: we could have a global entry in the repodata with information about what variants exist, what the default choice should be and use that. With that information we might be able to take decisions in a faster way as we can sort the dependencies straight away without searching for potential track features. @mlschroe if you have some insights into how we could achieve this best, would be greatly appreciated! |
How does conda do this? Do they run the solver with each variant fixed and then choose? |
They add these clauses for the SAT solver minimization I guess: I am not yet sure how to properly achieve this with libsolv... but gonna keep trying :) |
Yeah I have no idea what that code does. |
as for libsolv / rpm the way I understand is: for a variant they produce two distinct packages that both have the same "provides". E.g. we'd have
So the numpy package would also have to be a "proper variant", and one of the variants might be the "recommended" variant that would (hopefully) be chosen. My problem is that I don't have this information on the NumPy package itself (on the first level I don't know which one is recommended, only by inspection of the first level of dependencies I can get to that information). |
Sorry for ignoring you, the last days were a bit too packed with other work. I'll try to look at this the next week. |
@mlschroe no worries, would be great to get your input! I think we have two problems that are slightly related. 1. select the better variant right awayWhen we have a package like numpy, we have ~5 variants that currently all look equal to libsolv (we are relying on buildstring or Id comparison, so it's almost a random selection. The variants are basically for python=3.6=cpython, python=3.7=cpython, python=3.9=cpython, ..., and python=3.7=pypy. So first, it would be good to select the variant that has the highest dependencies. My idea was to look at the lower and upper bound of the dependency selectors. And thus to sort the variant to the top that has 2. try other branches if we end up with at track featureThis one is probably harder; conda does a global optimisation to "globally minimise the number of track features in the environment". So if we end up with a solution that has a package with an attached track feature, it would be nice to have a way to figure out if there would be another branch where we wouldn't end up with a track feature package. However, if we continue with the example of numpy, it's a bit tough to figure this out straight away (also because of the way the metadata is currently arranged in the conda-forge channel). For the numpy-1.20-pypy package we have a list of dependencies that looks like
However, the package that is down-weighted by track feature is There are two ways we could change the metadata to make the problem "easier":
However, if we do not change the metadata (which will take time ...) I was thinking that we could intelligently explore alternative branches with libsolv. I checked and for large environments an exhaustive search seemed very slow. However we could note that we have selected a |
Just to give a quick update here: I have some experimental code to extract the lower and upper bound from dependency strings like `>=3.8,https://github.com/wolfv/libsolv/blob/a7ad64b4181a7e5c9515efc37deb6f2dd79e02b4/src/conda.c#L683 Also, in conda-forge a repodata change was merged so that we can now determine from the "first-order" dependencies wether a dependency has a track_feature (e.g. numpy depends on Still very interested in feedback :) |
We did this for python, but I suspect there may be other features where this is not done. |
I think for most other features the pinning will be more "direct". For python it was quite indirect over the python_abi and an explicit dependency to pypy3.7 etc. |
I hope so! We will need to keep this in mind when making features in the future though. We also may need to fix up some of the mpi ones for mpich. |
Regarding track features: IMHO the SAT-wise cleanest implementation would be to add new "trackfeature" rules that disallow the installation of any package that has a track feature (except for the features already installed, I guess). This makes the solver abort when it needs a new tracked feature, which can then be added to the allowed list. The point of doing this is that it makes the solver backtrack if it needs to install a new tracked feature. I.e. in your case, it will go and choose the non-pypy variant if it comes to the pypy dependency. This is pretty cheap to implement and somewhat simulates the track_feature minimization of conda. |
Thanks for the note! We use track features to set the global priority of different solutions. Disallowing if the feature is not there is closer to the deprecated conda behavior that we do not use. Is there a way to keep a running total of how many features are currently found for the solution and force the solver to backtrack if this increases? The only trick here would be to allow solutions with a non-zero number of features if it cannot find any solution with zero features. |
Thanks for getting back at us @mlschroe :) |
The disallow is just an internal sat-solver mechanism to make it backtrack. It will automatically enable the track feature if this is the only option. |
(basically like the automatic uninstall works if SOLVER_FLAG_ALLOW_UNINSTALL is set.) |
I am closing this PR in favor of #457 The new one is simpler and builds more on top of existing stuff. Still would love to get some feedback on how to write better integrated C code :) This improves package resolutions quite a bit in several cases. E.g. |
There are a couple of formatting issues here but I am looking for some early feedback on this PR:
scipy-1.6.3-py37h29e03ee_0
scipy-1.6.3-py37ha768fb6_0
So far this seems to be working quite well. I am just wondering if I am on the right path with this?