-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conda resolver needs to know about all dependencies for a job #3299
Comments
from gitter
That said, I think we have to make sure that the correct combination of packages is selected at install time as well, not sure if we can use |
After some thinking I'm afraid this is not an easy job, given that for the most part the resolver system is designed for series of single packages with no dependencies between these packages. Maybe the solution is to aggressively use metapackages for this. For conda we can certainly build an additional layer to deal with dependencies between packages, but if we think about containerized dependencies -- they have to be metapackages anyway, right? Do you have any thoughts on this @bgruening @jmchilton @nsoranzo ? |
@mvdbeek a few month ago I had a long discussion with @jmchilton about this. @jmchilton do you remember where we had it? What I can recall is that I also proposed single dependencies for conda and solving this by using metadata packages, simply because this is easier to map to Docker/Singularity containers. After some disagreement of how many metapackages are really needed, we agreed to implement some on the fly container generation. So what is possible no is that Galaxy takes a multiple conda dependency list and creates on the fly a container (name is a normalized hash of the requirements) and uses this. Such a container could also be created by the TS and pushed as biocontainer at some point. What I also wanted to point out is, that we do not have any concept until now howto really pinpoint versions, ala a requirements.txt file. Only such a file would be really reproducible and we should at least capture this information, or these metadata packages should more or less consist of such a strictly pinned requirements.txt file. |
I kind of like the |
So right now the dependency resolvers work this way:
I guess we want to switch it up to be:
And then we can add a
Will this work and will this work with cached dependencies? I guess this is the first step - after we have the abstraction in place we can augment it with the requirements pinning - I'm less excited about that but it wouldn't hurt anything I guess. |
I think it could work yes. And we'd also have to think about the installation. I think that doing |
I was imagining the conda resolver would just fallback if it couldn't do everything all at once. That is why the other for loop isn't in an else. I guess what I'm saying is that I think it is fine to just solve this problem for the subset of cases where all the requirements are in conda and not matched anywhere else (for instance tool shed). This subset will be "most" tools and for this subset I feel like we are respecting the dependency resolver conf while still allowing resolution across multiple dependencies at once. I think for this subset we also don't have to worry about combinations of subpackages and such. Combining tool shed and conda resolution is a recipe for disaster. Maybe to make this more explicit we could adjust the above pseudo code as:
|
fair enough, I think that would be good progress indeed. We still have a lot of tools around with dependencies specifying both the conda and the toolshed name, but if we get into this special sort of conflict we can just adjust or remove the toolshed dependency. |
Going from pseudo to real code, we are speaking about |
I think think that would be too late in DependencyManager.dependency_shell_commands(), we'll already need this at install time. I think this should be somewhere in the resolve() function (or we do a better separation between the resolve and install functionality). |
So this sort of flips the status quo on its face. Everything related to caching, copying, linking, and building environments on the fly should now only apply if both of the following conditions are met (1) there is more than one requirement tag in a tool and (2) not all of them are resolvable exactly by the Conda resolver. For recipes that don't meet these two criteria - the normal case I would suspect going forward - Galaxy will just look for a hashed environment for these requirements built for all the requirements at once whenever the requirements are installed. Such environments should be much less buggy for several reasons. - galaxyproject#3299 is solved - in other words Conda is deferred to and if packages have potential conflicts - Conda can choose the right combination of build specifiers to resolve things correctly. - Environments are built on a per-job basis - this means problems related to linking and copying aren't really an issue and complexity related to caching can be safely ignored. My guess is we should re-write all the Conda docs to make the other use case seem like a corner case - because hopefully it is now.
So this sort of flips the status quo on its face. Everything related to caching, copying, linking, and building environments on the fly should now only apply if both of the following conditions are met (1) there is more than one requirement tag in a tool and (2) not all of them are resolvable exactly by the Conda resolver. For recipes that don't meet these two criteria - the normal case I would suspect going forward - Galaxy will just look for a hashed environment for these requirements built for all the requirements at once whenever the requirements are installed. Such environments should be much less buggy for several reasons. - galaxyproject#3299 is solved - in other words Conda is deferred to and if packages have potential conflicts - Conda can choose the right combination of build specifiers to resolve things correctly. - Environments are built on a per-job basis - this means problems related to linking and copying aren't really an issue and complexity related to caching can be safely ignored. My guess is we should re-write all the Conda docs to make the other use case seem like a corner case - because hopefully it is now.
This is a somewhat substantial reversal in the typical way that the Conda resolver works. Everything related to caching, copying, linking, and building environments on the fly should now only apply if both of the following conditions are met (1) there is more than one requirement tag in a tool and (2) not all of them are resolvable exactly by the Conda resolver. For recipes that don't meet both of these two criteria - the normal case I would suspect going forward - Galaxy will just look for a hashed environment for these requirements built for all the requirements at once whenever the requirements are installed. The new 90% case, such environments should be much less buggy for two primary reasons. - galaxyproject#3299 is solved - in other words Conda is deferred to and if packages have potential conflicts - Conda can choose the right combination of build specifiers to resolve things correctly. - Environments are no longer built on a per-job basis - this means file system problems related to linking and copying aren't really an issue and complexity related to caching can be safely ignored. My guess is we should re-write all the Conda docs to make the other use case seem like a corner case - because hopefully it is now. This commit includes a test tool that wouldn't work without the rewrite I believe.
This is a somewhat substantial reversal in the typical way that the Conda resolver works. Everything related to caching, copying, linking, and building environments on the fly should now only apply if both of the following conditions are met (1) there is more than one requirement tag in a tool and (2) not all of them are resolvable exactly by the Conda resolver. For recipes that don't meet both of these two criteria - the normal case I would suspect going forward - Galaxy will just look for a hashed environment for these requirements built for all the requirements at once whenever the requirements are installed. The new 90% case, such environments should be much less buggy for two primary reasons. - galaxyproject#3299 is solved - in other words Conda is deferred to and if packages have potential conflicts - Conda can choose the right combination of build specifiers to resolve things correctly. - Environments are no longer built on a per-job basis - this means file system problems related to linking and copying aren't really an issue and complexity related to caching can be safely ignored. My guess is we should re-write all the Conda docs to make the other use case seem like a corner case - because hopefully it is now. This commit includes a test tool that wouldn't work without the rewrite I believe.
This is a somewhat substantial reversal in the typical way that the Conda resolver works. Everything related to caching, copying, linking, and building environments on the fly should now only apply if both of the following conditions are met (1) there is more than one requirement tag in a tool and (2) not all of them are resolvable exactly by the Conda resolver. For recipes that don't meet both of these two criteria - the normal case I would suspect going forward - Galaxy will just look for a hashed environment for these requirements built for all the requirements at once whenever the requirements are installed. The new 90% case, such environments should be much less buggy for two primary reasons. - galaxyproject#3299 is solved - in other words Conda is deferred to and if packages have potential conflicts - Conda can choose the right combination of build specifiers to resolve things correctly. - Environments are no longer built on a per-job basis - this means file system problems related to linking and copying aren't really an issue and complexity related to caching can be safely ignored. My guess is we should re-write all the Conda docs to make the other use case seem like a corner case - because hopefully it is now. This commit includes a test tool that wouldn't work without the rewrite I believe.
This has been addressed with #3391 |
Hi, planemo test --conda_auto_init --conda_auto_install --conda_dependency_resolution --conda_prefix /tmp/mc --galaxy_branch dev ipo4xcmsSet.xml Here are my requirements And what I get: planemo.out
I think that the primo issue is this Help 🙏 |
@lecorguille You probably need a more recent version of |
Hi! Me again Today, it's update day: galaxy==release_17.01 and planemo==1.37.0 But it's also a failed day 😢
@bgruening Do you confirm that my only choice is to build a Conda meta-package? |
@lecorguille was /tmp/mc2 setup with miniconda2? Might want to try with miniconda3? We've discovered other seemingly unrelated problems solved by switiching to miniconda 3 I guess. This probably won't work - but I had really hoped we had solved this and we had tried some older packages |
@jmchilton I let Planemo/Galaxy the conda installation. |
It is rebuild already. Can you try if you get build 1 and one of these two error goes away? |
Ok |
@lecorguille just said that it is working now. We were able to fix it on the conda site. Everything fine here :) |
to make decisions about which packages are compatible.
We're currently passing the dependencies in one-by-one, but packages that have alternative builds (such as py27 or py35) may need to be selected based on other packages. Not doing so will (infrequently) result in errors like this:
This is for a python2.7 dependency when using miniconda3.
Using all dependencies at once (
conda create -n test_lumpy lumpy-sv numpy
) works fine.The text was updated successfully, but these errors were encountered: