Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse order of pin and rebuild #712

Closed
CJ-Wright opened this issue Feb 15, 2019 · 27 comments
Closed

Reverse order of pin and rebuild #712

CJ-Wright opened this issue Feb 15, 2019 · 27 comments

Comments

@CJ-Wright
Copy link
Member

Currently when we move a pin we then rebuild packages on the graph which have direct dependencies on the newly pinned package. However this could produce breakage. For example if we were to pin a new version of ABI incompatible boost in our current workflow we'd move the boost pin and then run the rebuild the packages in the graph, but any new package (or package with got a rerender for some other reason) would get an incompatibility as they would get some packages using the new version of boost and some using the old since the migration is only partially done.

A potential new approach would be to rebuild the chunk of the graph first, producing a bunch of new builds on the new pin, then move the pin in the official pinning feedstock so new packages (and packages which depend on rebuilt packges) could take advantage of the new pin.

@CJ-Wright
Copy link
Member Author

Is conda smart enough to know not to use the new pin since some of the other dependencies haven't been built with it yet?

@mariusvniekerk
Copy link
Member

could we have two pin files?

@beckermr
Copy link
Member

@CJ-Wright I don't think conda will generically know which pinned version was used and produce consistent environments.

One case I can think of that will work is if bumping a pin produces a bumped runtime dependency via a run_export. With the new runtime dependency at a higher version, conda will produce consistent environments.

As you stated above, the only way this will work is if we bump the global pinning after the rebuild is done, or at last close enough to done.

I bet there is an edge case I am missing here. :(

@mariusvniekerk We would need at least two pin files. Each time we bump a pin, we generate a migration from the current global pinning to the next one. If we have more than one such migration happening (because two different pins are bumped), then we would have to maintain multiple, possibly overlapping, migrations simultaneously.

I think a local solution might be best here.

  1. Instead of having a list of global pinning files somewhere and technology that tells smithy which one to use for each repo, we instead have a single repo where a pinning bump is proposed via PR and then accepted via a merge to master. This could be in the form of a yaml file with the new pinning listed.
  2. Upon finding a new pinning file in master, the migration bot would start to generate PRs against feedstocks in topological order etc. In order to make sure the right pins were used, it would put or add to the conda_build_config.yaml in the recipe folder of the feedstock with the new pin.
  3. Once each of the PRs from 2 is merged, the bot would then generate a final PR to the global pinnings repo to bump the global pinning and the version number.
  4. Finally, we need some mechanism to remove the old parts of the conda_build_config.yaml. This could be the bot issuing a bunch of PRs to delete the relevant pin from the file or we could somehow write some metadata about what to delete that conda-smithy can access and take the correct action on during a rerender.

@CJ-Wright
Copy link
Member Author

If we have more than one such migration happening (because two different pins are bumped), then we would have to maintain multiple, possibly overlapping, migrations simultaneously.

Yes, this is most likely something we'll need to manage.

I like having conda-smithy handle the temp pin, since the correct thing happens without any additional work for the maintainer or CI. It could be possible to tell smithy to inspect the file, compare it to the global pinning file and then remove the pins which are in the global pinning file, leaving the rest.

@CJ-Wright
Copy link
Member Author

We could even include in initial PR information for the bot (eg the version of the migration, which top level packages there are, etc.)

This won't work as well when we need to go play with the internal bits (eg change the platform information) but that's ok.

@beckermr
Copy link
Member

beckermr commented Feb 17, 2019

So conda smithy can only remove the pin once the migration is done and the global pinning is adjusted.

Managing this bit of global state seems annoying.

I wonder if we can simply leave the local pinnings file instead of cleaning it up. The next time we need to adjust them, we will come through with the bot anyways. In the mean time they can sit there and not mess with anything.

@CJ-Wright
Copy link
Member Author

To some extent we already handle that global state, since we have the pinning feedstock and it is a dep of conda-smithy. I don't know if there is a way to compose pinning files. I wonder if there are things in the pinning file which a feedstock would care about but would not fall under a migration's purview.

@beckermr
Copy link
Member

I did some coding against this issue last night for my own edification.

A few things:

  1. conda-build has APIs to combine/compose pinning files (which are allowed to have selectors). It goes like this
    from conda_build.config import Config
    from conda_build.variants import (
        parse_config_file, OrderedDict, combine_specs)

    config = Config()
    specs = OrderedDict()

    specs['global_pins'] = parse_config_file('global_pins.yaml', config)

    files = glob.glob('migrations/*.yaml')
    for f in files:
        specs[f] = parse_config_file(f, config)

    combined_spec = combine_specs(specs, log_output=config.verbose)
  1. As stated in the docs, the ordering of the files matters because files later in the process will clobber the earlier ones. Files passed at the command line via conda build -m pinning.yaml have the highest priority. Right now conda-smithy takes the global pinning and renders it to platform and/or python specific pinning files. So if we want to override that, we will have to put the local pinnings last either at the command line or by combining them with the global ones during in code like above.

  2. The conda_build_config.yaml file in the recipe dir has lower priority than any file passed at the command line.

@beckermr
Copy link
Member

beckermr commented Feb 18, 2019

Given the above, I have a revised suggestion for how to manage the migrations.

Changes to existing tools and/or new repos:

  • add code to conda-smithy to look at some location in each feedstock (e.g., .local_pins.yaml) for extra pins. If such a file is found, then it gets added last when the feed stock is rerendered, overwriting any global pins. These updated pins will then appear in the normal CI configuration files generated by conda-smithy.
  • create a new repo for proposed pinning changes, maybe called conda-forge-pinning-migrations or something

Migration procedure:

  1. Setup a repo where pinning migrations are proposed via PRs. Once accepted, they are pushed to master.
  2. When a pinning migration hits master, a bot triggers a rerendering. It goes through the CF graph in topological order. For any node that needs to be rerenderd, the yaml corresponding to the migration is added to the .local_pins.yaml file. (Any old migrations no longer active are removed at this time as well.) Then the feedstock is rerendered which adds the new local pins to the CI scripts. Finally, the bot makes a PR etc.
  3. The bot above continues doing step 2 until greater than X% of nodes on the graph that could be rerendered have been. We could force this to be 100%, but we may never finish the migrations this way.
  4. Once the bot is done, it makes a PR on the global pinnings feedstock with the updates from the updated pinning repo and deletes those pinnings from the repo in 1.

I don't know how to accomplish 4 yet. The combination of selectors and yaml makes coding an update complicated. We could store the pinning migrations as pure diffs, but this gets fragile when there is more than one.

@beckermr
Copy link
Member

For step 2 above, we could simply have the existing bot pull the migrations repo each time it runs. No need to trigger it or make a new bot.

@beckermr
Copy link
Member

beckermr commented Feb 18, 2019

One option for 4 would be to refactor the global pinnings file into a set of files where each file has a name corresponding to the top level key and the file contains the top level key and the contents.

For example, this file

foo: 2
bar:
- 3  # [osx]
- 4 # [linux]

would become two files

  1. foo.yaml:
foo:2
  1. bar.yaml:
bar:
- 3  # [osx]
- 4 # [linux]

Then the bot could simply do a wholesale replacement of the file.

This works for simple pinnings, but won't work for more complicated things. Edit: Conda build appears to handle pin_run_as_build sections correctly. IDK what it does with zip_keys.

@beckermr
Copy link
Member

Ack. We can use git to do step 4. We accept migrations as PRs onto the global pinnings file. This updated file is the one that is used by the bot. Then we simply reapply the diffs from the merged PR onto the feedstock pinning repo.

@beckermr
Copy link
Member

Well reordering commits might cause issues. The separate files might be more robust. Hrmmmmmm.

@CJ-Wright
Copy link
Member Author

CJ-Wright commented Feb 18, 2019

@beckermr ❤️ this is awesome!

@mariusvniekerk
Copy link
Member

mariusvniekerk commented Feb 18, 2019

So my general view is that we have the global pinnings file and then smaller local ones and we compare the migrator local variant with global to see if things are greater/already compatible on a per value basis. If not we delete that pinning migrator file as it is now no longer relevant. This can be done as a variant preprocessing phase in conda-build-config.

The other part that is of importance is additive (non replacing migrations). For openssl we may want to continue building against 1.0 for a while. Thus we need a way to combine pins additively instead of just replacing them.

# global.yml
openssl:
- 1.0
# migrator
# combine_mode: add
openssl:
- 1.1
# result
openssl:
- 1.0
- 1.1

Given that this additive thing doesn't exist presently, this might be a nice library to use to merge variant files, (which would also be needed for finalizing migrations as @beckermr pointed out)

@beckermr
Copy link
Member

@mariusvniekerk Writing code by hand to support the comparisons above scared me off because of the selectors that conda-build allows and how they can depend on the local environment. For straight dictionaries, I don't see a problem with your proposal. I guess there is code to do this already?

FWIW, we don't actually need to delete old migrations ever. It is nice, but not required since conda-build will just replace with the same value. Further, if we are deleting old migrations under the assumption that higher package versions are more recent, this removes the ability to downgrade pins if needed.

@beckermr
Copy link
Member

If we replace whole sections of the files by splitting the global pinnings into a bunch of small files, then we can add pins easily like this

old:

openssl:
- 1.0

proposed:

openssl:
- 1.0
- 1.1

Then the final merge into the global pinnings just copies the contents of the proposed openssl pinnings to the openssl pinnings file in the final repo.

I am not trying to argue too strongly for this since being able to combine variant files with selectors is a nice thing to have, but replacing whole files is a simpler solution to maintain.

@CJ-Wright
Copy link
Member Author

Does that mean we're going to have an explosion in the build matrix?

@beckermr
Copy link
Member

Not following @CJ-Wright. If someone adds more than one version, then yes, we would double the number of packages in the subgraph that depends on the extra version. Also, conda-smithy removes pins that don't apply to a given recipe so they don't propagate everywhere.

@beckermr
Copy link
Member

beckermr commented Feb 18, 2019

Thinking over this now, I think we use git to manage this. Here is how it would work. Instead of accepting a snippet of yaml to be used as the migration, we actually force people to make a change to the global pinnings file. This new global pinnings file then is then what is located in the local_pins.yaml file.

The proposed migrations repo would have a few rules associated with it to make our lives better as well.

  1. We only accept PRs with the commits squashed. This would associate each migration with a single commit and diff.
  2. For a given migration, we only use the changes up to and including the commit that has the relevant pinning updates. This forces a single global pinning file to be used for a given subgraph that is being migrated. This seems required for consistency.
  3. When the migrations are done, we only merge the commits from the proposed migrations to the global pinnings feedstock in the same order they were merged from the proposed migrations repo.

These rules would keep things orderly and hopefully functional. It also has the advantage that we could in principle bump more than one pin or other big change all at once, potentially saving CI time.

@beckermr
Copy link
Member

This scheme also allows us to figure out if a node needs to be migrated in a robust way. Instead of having to code conditions by hand, we can ask conda-smity to rerender the feedstock with the new local pinnings. If the CI pinning files are different, we make a PR, otherwise we move on. We would want to cache information about which nodes need migration to avoid having to do this more than once.

@beckermr
Copy link
Member

This scheme also avoids us having to diff or combine pinning files by hand. We simply apply the local file after the global one and let conda-build do all of the work as it already does.

@beckermr
Copy link
Member

We may also want to block pinning migrations on each other. If a node needs more than one pinning migration, the earlier commit has to be finished first.

@CJ-Wright
Copy link
Member Author

I'm not certain how this blocking would work.

@beckermr
Copy link
Member

I think the bot looks for a previous migration PR it knows it made on a node. If it finds one, then the two migrations are blocked and the other one has to finish first.

@beckermr
Copy link
Member

One way to do this would be to store the commit hash or migration version in a comment in the local pinnings file. If the bot finds the local pinnings version in the list of current migrations, it would know.

@CJ-Wright
Copy link
Member Author

CJ-Wright commented Mar 6, 2019

The discussion of this has moved to a formal CFEP, conda-forge/cfep#13, so I'm going to close this, please continue discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants