Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary keys in meta.yaml #874

Closed
wants to merge 8 commits into from
Closed

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Apr 8, 2016

Changes conda-build's approach to arbitrary keys. Before, conda-build was very strict about what was in the meta.yaml. If there were any perceived errors with meta.yaml, conda-build would abort.

Now, conda-build will only enforce exactly two keys. These two keys are absolutely required (as stated in the documentation of conda-build): package/name and package/version. These two keys make a valid conda recipe. Every other key is essentially, optional. Conda-build will only act on keys it recognizes. The check recipe functionality has been modified to look for keys that are close to recognized keys in the case of mispellings.

This allows users to add arbitrary keys and data to their meta.yaml. There are still officially recognized keys, but conda-build will ignore any keys it doesn't recognize.

The initial changes have been started in this PR. This is also a PR for community feedback.

@groutr groutr added pending::discussion contains some ongoing discussion that needs to be resolved prior to proceeding enhancement labels Apr 8, 2016
@teoliphant teoliphant added the in_progress [deprecated] use milestones/project boards label Apr 8, 2016
@msarahan
Copy link
Contributor

msarahan commented Apr 8, 2016

We have discussed this internally, and I think it is a good idea. I need to play with it and make sure the suggestions don't go horribly wrong and confuse people, but the suggestions are generally a good idea.

This opens up meta.yaml to be much more extensible. This was possible before by abusing the extras section.

@stuarteberg @jakirkham any opinions here?

@groutr
Copy link
Contributor Author

groutr commented Apr 8, 2016

Also, one nice way we can version the meta.yaml format is as follows:

Example:

format_version = '2.0'
FIELDS = {'package/name', 'package/version', ...}
V2_ADDED = {'build/amazing_new_key', 'source/exotic_vcs'}
V2_DEPRECATED = {'build/fluff', 'about/abandoned'}
RECOGNIZED_FIELDS = (FIELDS | V2_ADDED) - V2_DEPRECATED

@jakirkham
Copy link
Member

Haven't looked too close yet. Just a few thoughts though while I think about it more.

Should we not require the about data? The only reason I think about this is partially inspired by our discussion today. If we let users not add it, maybe we find ourselves needing to re-add lots of that information later.

Might be worth taking a look at how Jupyter notebooks are versioned.

Also, @groutr, just trying to follow what you are saying here. Is this meant to go in every recipe or is this going to live in some central location like conda or conda-build?

@stuarteberg
Copy link
Contributor

I guess this is reasonable, but I'm not sure I understand the motivation for it. I like the idea of giving the user a suggestion when they misspell a key, but I don't understand when it would be useful to have user-defined keys outside of the extra section.

This was possible before by abusing the extras section.

I wouldn't consider this to be "abuse" per se... after all, that's exactly what the extra section was for, right? (I suppose a name like custom or package_specific might be better than the somewhat vague name extra)

In any case, if you have a use-case in mind for these changes, then I've got no objections. I toyed with your PR for a few minutes. I made a few minor changes so I could actually see this idea in action, which I've now submitted as a PR-onto-this-PR in case it's useful: groutr#1.

We should keep in mind that, once we make this change, adding new "standard" keys to meta.yaml will have the potential to break recipes in the wild. For example, if a user defines a field like build/channel (with who-knows-what meaning), and we later decide to define our own field named build/channel then suddenly the user's recipes won't mean what they meant previously.

Furthermore, if we define a new field that is too similar to channel, e.g. channels, then the user's recipes will stop working. They'll get an error like:

In meta.yaml: build/channel is not a recognized key, but it's similar to a standard key.
Perhaps you meant build/channels?

I don't know how often we'll end up stomping on people's custom-defined keys with new additions to meta.yaml, but this problem would be avoided entirely if we just force everybody to use the extra section.

Again, if you have a particular use-case that is motivating this change, then I guess the above concern is no big deal. But if there's no particular use-case in mind, I'd err on the side of caution (and guaranteed backwards compatibility).

@jakirkham
Copy link
Member

In any case, if you have a use-case in mind for these changes, then I've got no objections.

I'm not sure of all the use cases for this, but one could be at conda-forge. We currently add a recipe-maintainers section that list all the GitHub handles of contributors. This is in the extras section. Though there has been discussion from time to time of adding additional metadata from time to (organization, urls, etc.). Having a proper recipe key at the top level instead of having a bunch of recipe-* fields would be nice.

@stuarteberg
Copy link
Contributor

We currently add a recipe-maintainers section that list all the GitHub handles of contributors.

Soon, conda-forge might be official enough that perhaps fields they find useful should just become "standard" anyway. But in that specific case, I wonder if maybe the about section is a more appropriate place.

Perhaps another case where user-defined fields would be nice is something like this example of mine for constructing a package that needs to download multiple source repositories:

https://github.com/stuarteberg/conda-multisrc-example

(More discussion here.)

But still, the nice thing about that example is that the reader can plainly see that the stuff in extra is supported through custom behavior in build.sh, not through conda-build itself. If those were top-level fields, it wouldn't be so obvious that conda-build was ignoring them.

@groutr
Copy link
Contributor Author

groutr commented Apr 8, 2016

@jakirkham, an about section could be required.

@stuarteberg The extra section was a compromise because people were really wanting to add their own data to a recipe, but couldn't. But stuffing all arbitrary data under one section doesn't really help in describing the data that could live there.

There is a change in conda-build in the design phase that will allow downloading from multiple sources. source is envisioned to be a list of dictionaries. This will be another discussion, when I get a PR up.

source:
  - url: <url1>
    md5: <md51>
  - url: <url2>
    md5: <md52>

Conda-build could be more verbose about what values are coming from where. However, the intent of this change is to allow arbitrary data in a meta.yaml (ie, things that wouldn't affect the operation of conda-build anyway). This arbitrary data could be used by tools built around conda-build.

@jakirkham
Copy link
Member

Soon, conda-forge might be official enough that perhaps fields they find useful should just become "standard" anyway.

Not sure on this one. Though if others want to use this tag for different meanings that is certainly their choice.

In the case of the recipe-maintainers, it has a very specific meaning in conda-forge. Namely, this provides someone commit privileges on recipes they maintain and control over the CIs used to build it.

But in that specific case, I wonder if maybe the about section is a more appropriate place.

If you want to wade into that discussion, I can certainly point you the right issue.

@stuarteberg
Copy link
Contributor

@groutr:

  1. So the above-mentioned backwards-compatibility issue is not a concern?
  2. If you can describe the use-case you have in mind for this change, that would be nice.

However, the intent of this change is to allow arbitrary data in a meta.yaml.

I don't quite grok that statement. What else goes into meta.yaml besides "data"?

@stuarteberg
Copy link
Contributor

@jakirkham:

this provides someone commit privileges on recipes they maintain

Ah, interesting. Thanks for the clarification.

@stuarteberg
Copy link
Contributor

@groutr:

There is a change in conda-build in the design phase that will allow downloading from multiple sources.

Nice! Once there is a PR for this, I'd appreciate a ping. So would @timsnyder, presumably.

@jakirkham
Copy link
Member

There is a change in conda-build in the design phase that will allow downloading from multiple sources. source is envisioned to be a list of dictionaries

Having multiple sources would be very awesome. I know a few cases that I encountered recently where this would help. Couple of questions.

  1. How would these be structured relative to the working directory?
  2. Where would the build start relative to these sources? In other words, if I ran pwd in build.sh what would it say?
  3. Would it be possible to specify a structure?
  4. Could one specify a build order?

To give these questions context, consider this recipe for protobuf where I need to checkout code for testing in a particular structure.

@stuarteberg
Copy link
Contributor

Having multiple sources would be very awesome. ... Couple of questions.

I have similar questions. Maybe we can save that discussion for the PR that will be used to land that change. I'm sure @groutr will ping interested parties :-)

@jakirkham
Copy link
Member

Sorry, I let my excitement get the best of me. 😄

Please ping me when that comes out too, @groutr.

@groutr
Copy link
Contributor Author

groutr commented Apr 8, 2016

@stuarteberg backwards compatibility is a big concern. I don't think I grasp your thoughts how this would break compatibility. It's not really about adding new standard keys, but about letting other users have things other than standard keys. My versioning comment, poorly worded as it was, is an observation of how making the fields as they are in my PR, they could be versioned easily. By set unions and differences, we can calculate the compatibility target really easily. A versioned format gives us a clearer picture of how far back a recipe is.

@jakirkham
Copy link
Member

Would it make sense to have an nbconvert like mechanism for conda recipes? Just thinking out loud here.

@groutr
Copy link
Contributor Author

groutr commented Apr 8, 2016

@jakirkham Those are very insightful questions. Thanks for the tip about protobuf. I'll definitely factor that into my thought process.
And when I submit the PR, I'll ping both you and @stuarteberg.
Maybe we can meet on the conda-build gitter channel and discuss ideas sometime next week.

@stuarteberg
Copy link
Contributor

I don't think I grasp your thoughts how this would break compatibility.

If we allow users to define their own keys, then they might be surprised if, at some point in the future, we decide to add a new standard key to meta.yaml that accidentally clashes with their own user-defined name.

Depending on exactly what the new standard key is used for within conda-build, the error they see might be confusing. Perhaps the odds of such name clashes are are low, but they are slightly higher due to the fact that even imperfect matches can be detected. (In my example above, I supposed that a user defined a new channel field, and that we later decide to add channel to the official schema, with a specific behavior associated with it.)

If we don't consider imperfect matches to be an error, then I guess this isn't an issue. I think imperfect matches should be considered errors, as seen in my PR. But I suppose a really loud warning would suffice, too.

I didn't mean this as an outright objection, just a potential concern. If you think it's no big deal, that's good enough for me. :-) Just trying to give thoughtful feedback here...

@groutr
Copy link
Contributor Author

groutr commented Apr 8, 2016

@jakirkham a tool to automate the updating of recipes from one version of a format to another I think is a great idea 👍

@groutr
Copy link
Contributor Author

groutr commented Apr 8, 2016

@stuarteberg Ok. I think I understand your reasoning. The best answer I've got is make it loud and clear, though documentation and release notes, what changes are being made to keys.
A key won't be used by conda-build unless it matches exactly a standard key, which is the current behavior. I prefer the really loud warning. Maybe we can add something like a -Wall flag to conda-build when it check to consider warnings as errors. That way a user has a way to know if they are picking something too close to a standard key.
I appreciate the feedback. What to do in this instance is definitely something that needs to be figured out. I think this discussion has be pretty productive so far.

metadata: Better error message in check_fields()
@msarahan
Copy link
Contributor

msarahan commented May 5, 2016

@groutr where does this sit?

@jakirkham
Copy link
Member

As another use case, @stuarteberg, please take a look at these MSYS2 packages from @mingwandroid. Here is an example. If/when these enter conda-forge, we would have a bit of a mess here and it would be good to avoid that if possible.

@msarahan msarahan added this to the 2.0 milestone May 6, 2016
@mingwandroid
Copy link
Contributor

I agree with @stuarteberg that it is good that conda-build is oblivious to extras and that having it implies special handling in build.sh / build.bat. Maybe a better name like recipe-specific could be used though?

@jakirkham it won't be difficult for me to rearrange things according to the decisions reached. AFAIR I actually decided to use extras after reading about it in a discussion you had with @stuarteberg.

The MSYS2 yaml keys and support script could be formalized into a generic archive repackaging facility in conda-build, if people wanted such a thing. It isn't MSYS2 specific at all, in which case the keys would become a part of conda-build's yaml format.

@groutr
Copy link
Contributor Author

groutr commented May 9, 2016

@msarahan finishing this is still on my shortlist. I'm finishing my rebuild of GDAL. I think that along with merging this, we would be wise to version the meta.yaml (versioning the official set of keys). If there needs to be CEP, I can write one up.

@groutr
Copy link
Contributor Author

groutr commented May 11, 2016

@stuarteberg Regarding your comment (#874 (comment))
Two possible solutions:

  • Treat keys like namespaces. A root level key reserves all possible keys below it. Since package is an official key, then package/* is also reserved. That reduces this PR to the ability to define your own root level keys.
  • Have a set of proposed keys that will warn users of keys under consideration for being standardized. The keys won't be used by conda-build, but checked the same as standard keys. We could even tie proposed keys to a relevant github issue thread and encourage the user's participation in the discussion.

Are these solutions satisfactory? I favor the second solution. The first solution may actually make collisions more likely.

@stuarteberg
Copy link
Contributor

As usual, I'm a big ol' pessimist... 😩

I'm still unclear on why the extra section is not already sufficient for most users, but if we're moving forward with this idea, then I guess I prefer option 1. At least it provides some constraints to help us detect silly mistakes in meta.yaml.

That reduces this PR to the ability to define your own root level keys.

That could be a good thing. Suppose a novice writes this recipe:

# Package section
  package:
      name: mypackage
      version: 1.0

# Build section
  build:
    number: 0
    string: py27_0

# Requirements section
    requirements:
      build:
        - python 2.7

      run:
        - python 2.7

# Test section
      test:
        requirements:
          - pytest

The author has made multiple mistakes (at least 3), but if I understand the current proposal, it will be difficult for conda-build to automatically detect them and provide a useful error message. Under certain circumstances, I think the package build might even succeed without any errors at all!

Have a set of proposed keys that will warn users of keys under consideration

I guess that sounds nice, but it implies changes to the conda-build development/release process, and corresponding awareness (and cooperation) from all conda-build developers. Will the benefits of allowing custom keys be so great that it warrants administrative overhead like this? I dunno, if I were BDFL of the conda project, I'd direct my troops to spend their energy elsewhere.

But just to reiterate: My comments above are meant as a mild objection, not a raging protest. If you guys have discussed this internally and you think the pros outweigh the cons, I'm not going to complain. (Thanks for seeking for my input in the first place.)


try:
_ = _meta['about']['home']
_ = _meta['about']['license_family']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for license_family or just license? Personally, I would vote for the latter.

@msarahan msarahan removed this from the 2.0 milestone Aug 31, 2016
@kenodegard kenodegard added type::feature request for a new feature or capability and removed type::enhancement labels Jan 19, 2022
@github-actions
Copy link

Hi there, thank you for your contribution!

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs.

If you would like this pull request to remain open please:

  1. Rebase and verify the changes still work
  2. Leave a comment with the current status

NOTE: If this pull request was closed prematurely, please leave a comment.

Thanks!

1 similar comment
@github-actions
Copy link

Hi there, thank you for your contribution!

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs.

If you would like this pull request to remain open please:

  1. Rebase and verify the changes still work
  2. Leave a comment with the current status

NOTE: If this pull request was closed prematurely, please leave a comment.

Thanks!

@github-actions github-actions bot added the stale [bot] marked as stale due to inactivity label Jul 27, 2023
@github-actions github-actions bot added the stale::closed [bot] closed after being marked as stale label Aug 26, 2023
@github-actions github-actions bot closed this Aug 26, 2023
@github-actions github-actions bot added the locked [bot] locked due to inactivity label Aug 25, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
in_progress [deprecated] use milestones/project boards locked [bot] locked due to inactivity pending::discussion contains some ongoing discussion that needs to be resolved prior to proceeding stale::closed [bot] closed after being marked as stale stale [bot] marked as stale due to inactivity type::feature request for a new feature or capability
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants