Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new YAML based format for "conda-build" files #54

Merged
merged 13 commits into from
Oct 30, 2023
Next Next commit
add cep 20.1 for only the YAML format
  • Loading branch information
wolfv committed Jun 2, 2023
commit 6242c1af14c99756997e729f71be973bf3fcb334
166 changes: 166 additions & 0 deletions cep-20.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# A new recipe format – part 1

<table>
<tr><td> Title </td><td> A new recipe format </td>
<tr><td> Status </td><td> Proposed</td></tr>
<tr><td> Author(s) </td><td> Wolf Vollprecht &lt;wolf@prefix.dev&gt;</td></tr>
<tr><td> Created </td><td> May 23, 2023</td></tr>
<tr><td> Updated </td><td> May 23, 2023</td></tr>
<tr><td> Discussion </td><td> </td></tr>
<tr><td> Implementation </td><td>https://github.com/prefix-dev/rattler-build</td></tr>
</table>

## Abstract

We propose a new recipe format that is heavily inspired by conda-build. The main change is a pure YAML format without arbitrary Jinja or comments with semantic meaning.

## Motivation

The conda-build format has grown over the years to become quite complex. Unfortunately it has never been formally "specified" and it has grown some features over time that make it hard to parse as straightforward YAML.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one really important motivating factor that isn't mentioned in the motivation section is increasing the ease by which we can parse and modify these files (making them more machine readable friendly). I know that there is currently a desire to be able to do this, but the old format holds us back. It's very hard to parse and multiple attempts have already been made to try to do this if I'm not mistaken.

I know it's mentioned below, but it might be worth emphasizing here briefly too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1000

The CEP attempts to introduce a subset of the conda build format that allows for fast parsing and building of recipes.

### History

A discussion was started on what a new recipe spec could or should look like. The fragments of this discussion can be found here: https://github.com/mamba-org/conda-specs/blob/master/proposed_specs/recipe.md
The reason for a new spec are:

- Make it easier to parse ("pure yaml"). conda-build uses a mix of comments and jinja to achieve a great deal of flexibility, but it's hard to parse the recipe with a computer
- iron out some inconsistencies around multiple outputs (build vs. build/script and more)
- remove any need for recursive parsing & solving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive parsing and solving was mainly to support things like inheriting specs, like run_constrained, from explicitly specified dependencies. Is there any kind of explicit spec for whether/how packages add/impose dependencies on sections outside of the one where they are specified?

What about the pin_compatible jinja? I'm getting the sense that it will be cut in this spec. I'm not sure whether you could impose limits on it to avoid the need to be recursive. It is very useful, so I hope it can stay somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In boa (and rattler-build) we have "deferred" evaluation for the pins, so that still works. We just require that the name of the pin is available, in order to build the proper topologically sorted build-graph :)

A dependency list in rattler-build contains an "enum" of type:

https://github.com/prefix-dev/rattler-build/blob/665c981a392a6ebeba5842eee1d968b8753b2dcd/src/render/dependency_list.rs#L22-L27

Indeed, pin_subpackage is not yet implemented but quite trivial to achieve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the package name it isn't known. For example cross-compilation usage or compilers themselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakirkham in these cases you can infer those variables from the build-time config though (or variant config) so no recursion needed :)


## Major differences with conda-build

- no full Jinja2 support: no conditional or `{% set ...` support, only string interpolation. Variables can be set in the toplevel "context" which is valid YAML
- Jinja string interpolation needs to be quoted at the beginning of a string, e.g. `- "{{ version }}"` in order for it to be valid YAML
- Selectors use a YAML dictionary style (vs. comments in conda-build). E.g. `- sel(osx): somepkg` instead of `- somepkg # [osx]`

## Selectors

Selectors in the new spec take the following format:

`sel(unix): selected_value`
Copy link

@skupr-anaconda skupr-anaconda Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a simplified variant:
(unix): selected_value
(win and arm64)
((osx or linux) and aarch64)
(something == "test")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is also valid YAML:

Screenshot 2023-06-07 at 14 49 00

However, it would effectively special case all dictionary keys that start with a round bracket. I think that would be OK, but less explicit than the sel(...) operator.

Copy link
Member

@jakirkham jakirkham Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to bikeshed, but what about something like is(...) so is(unix)?


This is a valid YAML dictionary. Selector contents are simple boolean expressions and follow Python syntax. The following selectors are all valid:

```
win and arm64
(osx or linux) and aarch64
something == "test"
```

### The cmp function for variant selection

Furthermore, we have a special "cmp" function that can be used to run a check against a selected variant version. The `cmp` function looks like the following:

```
cmp(python, "3.6")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have to use cmp here? We're evaluating the boolean expressions and it might be easier to understand with this:

Suggested change
cmp(python, "3.6")
if: python >= "3.6"

Where it would then look like:

 requirements:
   build:
     - name: dataclasses
       if: python >=3.6 and python <3.10
     - cryptography

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that operator overloading might be harder to implement vs. the cmp function. With the cmp function we'd know that the left hand side is the variant specifier and the right hand side is a version number. We could do python >= Version("3.6") basically, but that is even trickier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe if we know that the left hand side is a VariantSpecifier we could overload the comparison with strings ... but yeah, would need to figure out if that's easy to implement with the Jinja expression system(s).

Copy link

@remkade remkade Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I also find the if: style quite a bit more intuitive. Its similar to how ansible does its conditionals which should help new people transition easier.

You could also do something like unless: python > 3.10 as well, which makes negated conditionals a bit cleaner.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on if as it is consistent with CI providers as well

Copy link
Member

@jakirkham jakirkham Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One caveat with cmp specifically is some languages use that to mean cmp(x, 3) to mean:

  • cmp(x, 3) > 0 # x > 3
  • cmp(x, 3) == 0 # x == 3
  • cmp(x, 3) < 0 # x < 3

Not to bikeshed, but maybe the critiques in the comments above (and the caveat around cmp) can be resolved with a different name? Some options:

  • within(...)
  • contains(...)
  • includes(...)

Same syntax, but maybe this already implies the conditional case evaluating to true? We could also use a different tense (contains -> contained) if preferable

cmp(python, ">=3.6")
cmp(python, ">=3.8,<3.10")
etc
```

This can be used in a selector like so:

```
requirements:
build:
- sel(cmp(python, ">=3.6,<3.10")): dataclasses
```

This functionality generalizes and replaces the previous special variables such as `py2k`, `py3k`, `py36`, `py37`, and works just as well for NumPy, Ruby, R, or any other variant that might be of interest in the future.

### Preprocessing selectors

You can add selectors to any item, and the selector is evaluated in
a preprocessing stage. If a selector evaluates to `true`, the item is
flattened into the parent element. If a selector evaluates to `false`,
the item is removed.

```yaml
source:
- sel(not win):
url: http://path/to/unix/source
- sel(win):
url: http://path/to/windows/source
```

Because the selector is a valid Jinja expression, complicated logic
is possible:

```yaml
source:
- sel(win):
url: http://path/to/windows/source
- sel(unix and cmp(python, "2")):
url: http://path/to/python2/unix/source
- sel(unix and cmp(python, "3")):
url: http://path/to/python3/unix/source
```

Lists are automatically "merged" upwards, so it is possible to group multiple items under a single selector:

```yaml
test:
commands:
- sel(unix):
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfigVersion.cmake
- sel(win):
- if not exist %LIBRARY_PREFIX%\include\xtensor\xarray.hpp (exit 1)
- if not exist %LIBRARY_PREFIX%\lib\cmake\xtensor\xtensorConfig.cmake (exit 1)
- if not exist %LIBRARY_PREFIX%\lib\cmake\xtensor\xtensorConfigVersion.cmake (exit 1)

# On unix this is rendered to:
test:
commands:
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfigVersion.cmake
```

## Templating with Jinja

The spec supports simple Jinja templating in the `recipe.yaml` file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth elaborating what "simple" means in this context, what wouldn't be covered by this?

Would this also work with MiniJinja for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple means:

  • Nothing that could go between {% ..., means no for loops, no set, no if / else etc.
  • Everything that can go between {{ ... }} is still supported, including the pipe operator, so that {{ version | lowercase }} and things work. They can be assigned to helper variables inside the context, e.g.
context:
  version: "0.1.2.POST1234"
  lower_version: "{{ version | lower }}"  # evaluates to 0.1.2.post1234
  • Inside the selectors, the Jinja expression language is used which only knows a few primitive types (str, int, float, ...).

This works fantastically with MiniJinja (which is what we use in rattler-build). However, MiniJinja would support much more than what we are currently supporting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to preserve the meta.yaml filename and cause conda-build or rattler-build to detect the format from its contents?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly since we are doing a bit of a redesign here, think it is worth asking whether we still want Jinja to play a role here. In other words, if we were not already using Jinja today, how would we have solved these use cases? How do others approach YAML templating? Could we think of pure YAML solutions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, what does jinja accomplish that we can't do with the other new features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well other new features would also have to be implemented and I am not sure we would necessary end up with something simpler.
I think there is a future where the entire build tool looks a bit different (and the recipe as well) and wolfi-os / melange have a nice approach: https://github.com/wolfi-dev/os/blob/main/libidn2.yaml

But I am not sure how to proceed here - we could either go down that road right now or build something that is more reasonable for people to port existing recipes to and then iterate in a next step on the recipe format (when we have the necessary infrastructure nicely implemented in rattler-build).


You can set up Jinja variables in the context YAML section:

```yaml
context:
name: "test"
version: "5.1.2"
major_version: "{{ version.split('.')[0] }}"
```

Later in your `recipe.yaml` you can use these values in string interpolation
with Jinja. For example:

```yaml
source:
url: https://github.com/mamba-org/{{ name }}/v{{ version }}.tar.gz
```

Jinja has built-in support for some common string manipulations.

In the new spec, complex Jinja is completely disallowed as we try to produce YAML that is valid at all times.
So you should not use any `{% if ... %}` or similar Jinja constructs that produce invalid yaml.
Furthermore, quotes need to be applied when starting a value with double-curly brackets like so:

```yaml
package:
name: {{ name }} # WRONG: invalid yaml
name: "{{ name }}" # correct
```

Some Jinja functions remain valid, but this is out of the scope of this spec. However, as an example,
the `compiler` Jinja function will still be available, with the main difference being the quoting of the
brackets.

```yaml
requirements:
build:
- "{{ compiler('cxx') }}"
```