Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rhel10: move os package sets to yaml files #1104

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ondrejbudai
Copy link
Member

This is my RFC PR for moving package sets outside Go into yaml.

The code is of a PoC quality. The purpose is to start the conversation. However, I believe that after minor changes (converting all RHEL versions, or adding code to limit the change to RHEL 10), it could be actually deployed to production (this would be a yolo thing, though).

All terms are not final. I used spec because it's not used anywhere else in the Image Builder stack so there should be no confusion.

What I did

Image spec format

I introduced a new yaml-based format called image spec. Spec is a simple format for defining image properties. It's yaml-based, supports inheritance, and deduplication across multiple distro versions.

A quick snippet showing rhel 10's qcow2:

includes:
  - ../common.yaml
  - ../../centos-10/generic/qcow2.yaml
spec:
  packages:
    - subscription-manager-cockpit

I hope that this snippet is self-explaining, RHEL 10 qcow2 is composed of a common RHEL 10 spec, a Centos 10 qcow2 spec, and an extra package.

The whole specification is in specs/README.md.

Conversion of Centos Stream 10 and RHEL 10 and to image spec

Os package sets of both CS10 and RHEL 10 are now fully converted to image specs. There should not be any functional changes. The image specs are embedded using go:embed, so everything feels the same.

I might need help from @achilleas-k to verify that nothing really changed, the generated manifests have different UUIDs for reasons that I don't understand.

Overriding embedded image specs

Embedded image specs can be fully overriden. Just pass the spec directory via OSBUILD_IMAGES_IMAGE_SPECS_DIR, and the library won't use the internal bits.

Next steps

We can start moving more and more stuff into yaml. I think that the next piece could be base partition tables:

spec:
  packages:
    - "@core"
  exclude_packages:
    - wifi-drivers
  base_partition_table:
    type: gpt
    partitions:
      - size: "10 GiB"
      - mountpoint: /
      # and more

The ultimate goal is that the Go code doesn't know anything about what's e.g. an ami. The yaml files drive the available image types. There are some gaps in the current spec definition for this to happen, but we can iterate on them.

@achilleas-k achilleas-k self-requested a review December 12, 2024 10:51
@achilleas-k
Copy link
Member

I might need help from @achilleas-k to verify that nothing really changed, the generated manifests have different UUIDs for reasons that I don't understand.

Have you tried setting OSBUILD_TESTING_RNG_SEED=1?

@ondrejbudai
Copy link
Member Author

OSBUILD_TESTING_RNG_SEED=1

Thanks! That did the trick for centos, it turns out that the x86_64 image installer was missing grub2-pc, so I added a arch-specific file for it. :)

For RHEL, it's a bit of a mess, because the package set merging works slightly differently that in images, so the arrays are not sorted in the same way. Didn't we invent something for this for otk?

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot! Some tiny ideas inline and maybe worth brainstorming on some naming)

@@ -284,6 +290,47 @@ func (t *ImageType) Manifest(bp *blueprint.Blueprint,
staticPackageSets[name] = getter(t)
}

var itSpec struct {
Spec struct {
Packages []string `yaml:"packages"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Packages struct {
   Include []string
   Exclude []string
}

it would be a deeper nesting but also more symmetric? Plus it seems most image types have some excludes (even tar!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: the current merge rules and their implementation merge lists just for top-level keys, so moving Include and Exclude under Packages would break everything. However, we can alter the merge rules:

Dictionaries:
If both values are dictionaries, perform a recursive merge:
For each key:
If the key exists in both, merge the corresponding values.
If the key exists in only one, use that value.

Lists:
If both values are lists, merge them:
Append items from the second list to the first.
Deduplicate items, maintaining all unique entries.

Scalars (Strings, Numbers, Booleans, Null):
If both values are scalars, the value from the second structure overrides the first.

Type Mismatch:
If the types of values for a given key differ, the merge is not allowed. This results in an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: are these rules a good default? Are there easy to reason about? I have no good answers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, sorry, we can start with the exiting top-level logic, right now the format is internal and as long as we do not have external image defintions we can always refactor. So +1 with starting from that (even though I think we should get something better soon(ish)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is no longer generic. Instead, fields are being merged manually. We can definitely move back to a generic merge later, but for now it seems like a manual merge is less controversial.

Copy link
Member Author

@ondrejbudai ondrejbudai Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... and this is now basically implemented :)

Packages struct {
   Include []string
   Exclude []string
}

:)

pkg/spec/spec.go Outdated Show resolved Hide resolved
specs/README.md Outdated
@@ -0,0 +1,58 @@
# Image specs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion but maybe it is worth brainstorming the name here a bit. Spec sounds like rpm-spec to me but maybe that is just me. Alternative ideas for the dirname

  • imgtypes
  • types
  • images
  • image_definitions
  • imgdefs
  • imgspecs (if we want to keep specs)
    ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to definitions for now :)

@supakeen
Copy link
Member

Very cool and very necessary. As for naming, specifications or definitions both sounds fine.

You've only taken the RHEL packages here but in the Fedora package sets we have a "lot" of conditionals such as only enabling certain packages for certain versions. How would we handle this with the YAML files here?

@ondrejbudai
Copy link
Member Author

I think I'm leaning towards the term "image definitions". We always refered to the go files as definitions, so I think we can continue using that term, just switch the "driver" to yaml. So I think I would rename the specs directory to definitions, and the spec package to defparser.

You've only taken the RHEL packages here but in the Fedora package sets we have a "lot" of conditionals such as only enabling certain packages for certain versions. How would we handle this with the YAML files here?

I think that Fedora would benefit from less deduplication than RHEL. So, one directory == one version, no sharing between them. I think that our long-term dream was/is to extract these yamls into a separate repository, and use the traditional branching model.

So for now, I suggest we just have three dirs: fedora-40, fedora-41 and fedora-rawhide (or 42, dunno).

Later, when they are in a separate repo (and maybe shipped in a separate RPM?), we can either teach images how to load them from system, or keep it simple and just embed them using submodules:

[submodule "fedora-40"]
        path = definitions/fedora-40
        url = https://pagure.io/fedora-kickstarts.git
        branch = f40
[submodule "fedora-41"]
        path = definitions/fedora-41
        url = https://pagure.io/fedora-kickstarts.git
        branch = f41
[submodule "fedora-rawhide"]
        path = definitions/fedora-rawhide
        url = https://pagure.io/fedora-kickstarts.git
        branch = main

Note that for CentOS/RHEL, I think I would keep the deduplication (as it exists in the go code) as much as possible. We know that it works for us, and for RHEL we are basically the maintainers. We can always reconsider this later.

I don't like that I'm basically proposing different models for Fedora and RHEL, but I think it sorta makes sense given the differences between these two distros.

@cgwalters
Copy link
Contributor

cgwalters commented Dec 16, 2024

I'm a bit surprised there's no apparent comparisons or references to other extant image format definitions already used in the Fedora-derivative ecosystem? There are many:

  • https://github.com/minimization/content-resolver-input/ which many of us are forced to maintain (I think it has implicit includes?)
  • kickstart (also has file includes, package excludes, etc.)
  • rpm-ostree's tree file (also quite similar)
  • comps is also related
  • Finally of course also of high importance is spec files which many people are forced to understand and maintain anyways
  • Edit: Oh yeah and the Kiwi XML https://pagure.io/fedora-kiwi-descriptions (not my favorite, but already exists and is used for better or worse)

The thing that drove me towards spec files (a conclusion which honestly I didn't want to get to) is documented here https://gitlab.com/fedora/bootc/base-images-experimental/-/blob/main/README-build-rationale.md?ref_type=heads#key-aspect-reusing-spec-files

That whole file has also a lot of related discussion.

Certainly getting packages out of code into a declarative file format is something that makes obvious sense to me, but introducing a new package-list-in-YAML/toml/XML/specfile that isn't one of the above seems like it needs some rationale. Every single time someone has introduced one it has just been a new file format only understood by one tool.

Picking just one is impossible, but my personal inclination is to try to have more tools accept multiple formats - that was the idea behind coreos/rpm-ostree#5119 (and I also have no problem having it accept blueprints too, but honestly they're pretty verbose for this use case).

@achilleas-k
Copy link
Member

Certainly getting packages out of code into a declarative file format is something that makes obvious sense to me, but introducing a new package-list-in-YAML/toml/XML/specfile that isn't one of the above seems like it needs some rationale.

The rationale is that this is the first step to externalising our image definitions. These are currently defined as a combination of:

Now there's clearly a lot of history (and technical debt) here. Some of the things in these configuration structures can be simplified, squashed together, etc, but the fact is that they're our current state and our starting point for this process.

Regardless, what we expect to end up with is some transformation (simplification) of all of these into an internal structure that's useful for us, for everything we need.
otk was our fist big attempt at this (see for example everything under otk.define here https://github.com/osbuild/otk/blob/bf22601c1557cf2ff19638c6f2a6e328210dad18/example/centos/centos-9-aarch64-ami.yaml).

I don't think something like kickstart (or any of the other examples in your list, at least the ones I'm familiar with) is going to cut it. Or if it can, then we'd need to start parsing kickstarts to make partition tables, or image configs, or os customizations, and that's not only a lot of work, it's also kinda pointless. We're not going to be working with an internal representation of a kickstart.

What we're doing here is taking code structures and pulling them out (and simplifying, either during the extraction of after the fact). We're not creating an end-user-facing configuration format, just making our lives easier in a way that will hopefully make it easier for contributors (and hopefully distro maintainers) to modify image configurations without needing to navigate (or write) code and without needing to recompile go binaries.

Oh and XML is out of the question 😄

@cgwalters
Copy link
Contributor

Of course, I think this is an improvement and obviously none of my comments should be construed as trying to block anything. But at the same time, we can and should think about how the evolution of this fits in with at least some of the other ones. In the course of needing to succeed at doing my job I have had to interact with (or at least think about) relatively recently with every single one of the above things, and this would just add a new one to the mix.

@achilleas-k
Copy link
Member

In the course of needing to succeed at doing my job I have had to interact with (or at least think about) relatively recently with every single one of the above things, and this would just add a new one to the mix.

I understand where you're coming from. There's a long-running struggle to figure out the target audience for each configuration format and level. The configuration formats you mentioned are, in my opinion, serving similar purposes to our blueprints and blueprints are definitely the weakest of all of those.

I think a conceptual problem here is that every one of the configuration formats you mentioned span the whole range of use cases from lightly customizing a base image configuration to distro/spin image configuration from scratch, and we always kept those two ends separate. A big part of the value of image builder is that (for example) selecting an ami as your image type includes a lot of decisions for how RHEL should be run on aws. That's the stuff we have in code and that's what we want to extract in these image definitions. What you get with blueprints is enough knobs and switches to modify that base ami to fit your needs, but not so many that you can turn it into an Azure image, or worse, make it unusable in AWS.

Now, you're right, the other configs are serving a similar (or sometimes the same) purpose as what we're doing here: they let you define an image configuration pretty much from scratch, with some useful abstractions. But a lot of what we're defining here is coupled with how osbuild works (to varying degrees). Pulling in a configuration format from another project doesn't really serve any purpose. We could write a parser that reads a kickstart's partitioning directives, or a Kiwi <partitions></partitions> block, to instantiate a disk.PartitionTable (which in turn gets converted to an osbuild pipeline that creates a disk image), but who would that be for? What's the value in that when we can write some yaml that matches disk.PartitionTable for our internal image definitions (e.g. "rhel-9/ami-pt.yaml is the default partition table for for RHEL 9 AMIs"), which lets us and certain types of users modify those base image types at runtime, and provide blueprints for everyone else? And what happens when we need to encode configurations that aren't covered by another config format? I'm not saying we have unique image building requirements that no other build tool has come across, but we're talking about our internally-defined image configurations here. It needs to cover everything we can do to differentiate one image from another. More importantly, these image definitions will be required for populating the internal image configurations of image builder. In other words, without a rhel-9/ami-pt.yaml there can be no RHEL 9 AMI; there is no autopart or similar without them.

@ondrejbudai ondrejbudai force-pushed the specs branch 2 times, most recently from 064c380 to 6ef99c4 Compare January 15, 2025 15:26
definition is a simple format for defining image properties. It's
yaml-based, supports inheritance, and deduplication across multiple
distro versions. This commit defines the format, and adds a library for
parsing it. Additionally, there's a def-flatten command that allows
people to flatten definition files.
@ondrejbudai
Copy link
Member Author

ondrejbudai commented Jan 15, 2025

I just pushed a second iteration of this work. Breaking changes:

  1. The name is now definitions instead of specs. We are used to call the code that's definining the image types as definitions, so let's continue using it even in the yaml implementation.
  2. The yaml format changed slightly. includes is now from, and spec is now def. I'm happy to hear different suggestions.
  3. The directory layout changed. Previously, every distro version has its top-level directory (fedora-42), now it's a two-level structure (fedora/42). This gives us more flexibility.
  4. The merging algorithm is no longer generic. Instead, it needs to be handled manually. We can go back to the generic one in the future if we figure out that's a better way.
  5. You can no longer include one file twice (not even if the same file is included by two different files). We can relax this rule later, but for now, this is the simplest way to implement cycle-protection.
  6. Multiple package sets are now supported:
    def:
      packages
        os:
          include:
            - "@core"
          exclude:
            - wifi-firmwares

Otherwise, I refactored the code quite a lot, and covered the definition package with tests.

At this point, I basically consider the definition package ready to be merged, but we need to decide whether to convert all distribution in one huge PR, or whether we want to do this gradually. I have no strong opinions.

This commit moves all os pipeline packages to the new definition format.
There should be no functional changes in images.
Copy link
Member

@achilleas-k achilleas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the code too closely, I assume it does what it says in the commit and docs :)

I have some comments on the README part and one question about the overall approach:

Would it be easier if each package set was in a separate file instead of bundling them?
This would mean pushing all files one level deeper and making the image type a directory: rhel/10.2/x86_64/ami/build.yaml and rhel/10.2/x86_64/ami/os.yaml
And then we could do generic bits like:
rhel/10.2/x86_64/generic/build.yaml or even rhel/10.2/build.yaml for packages that are in every RHEL 10.2 buildroot.

# Image definituons
This directory contains definitions of image types that this library can build.

> Currently, only packages are defined in YAML. The rest is still in Go.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using > ... block quotes for emphasis might look nice here, but it will vary based on the markdown renderer (some will put quote marks around the block to make it clear it's a quote, for example). I think it's misleading and it's more important to have proper semantic markup for various reasons (like screen readers).

If we want emphasis here, I'd go with bold or italics... or both.

Comment on lines +11 to +14
- firstly, it tries to open `rhel/10.2/x86_64/ami.yaml`
- if it doesn't exist, it tries to open `rhel/10.2/generic/ami.yaml`
- if it doesn't exist, it tries to open `rhel/X.Y/x86_64/ami.yaml` or `rhel/X.Y/generic/ami.yaml`, with X.Y being the closest older version to 10.2 (10.1 will be prefered over 10.0)
- if it doesn't exist, the build fails
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid repetition, I'd rephrase the part above this list to describe that we're going to list a series of paths that are checked in order until one is found.

For example:

To find the definition for a specific image configuration (distribution, architecture, image type combination), the definition loader will search for the most suitable definition by starting with the most accurate path and widening the search to more generic paths at every step.

For example, to find the definition for RHEL 10.2, x86_64, AMI, the loader will check the following paths in order and load the first file it finds:

  • rhel/10.2/x86_64/ami.yaml
  • rhel/10.2/generic/ami.yaml
  • rhel/<X.Y>/x86_64/ami.yaml, where X.Y is any major.minor version older than 10.2 that exists in that path. If there are multiple, the closest one is chosen (i.e. 10.0 is chosen over 9.10 if both exist).
  • rhel/<X.Y>/generic/ami.yaml (see previous point for explanation of X.Y).
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it really fall back to a previous major version? eg. no matching 10.Y entries so it uses 9.6. I think the X should stay fixed on the major release and it should fail if there is no X.0 or higher directories found.

Also, I'd slightly prefer a well-formed filename instead of a directory tree. It's easier to explore and work with manually, or compare file contents. eg. rhel-10.2-x86_64-ami.yaml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it really fall back to a previous major version? eg. no matching 10.Y entries so it uses 9.6. I think the X should stay fixed on the major release and it should fail if there is no X.0 or higher directories found.

I think the code in the PR does fall back to lower major versions, so maybe that's something to discuss.

Also, I'd slightly prefer a well-formed filename instead of a directory tree. It's easier to explore and work with manually, or compare file contents. eg. rhel-10.2-x86_64-ami.yaml

That's funny, I made the exact opposite suggestion (adding one more level) to my main comment above :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a bit of motivation as to why we want to do this fallback dance in the first place. It seems some of it could be managed more explicitly with symlinks in the repository and it would make it (to me) easier to reason about what its doing.

Is it purely to avoid us creating those possible symlinks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main benefit I could see is when a new release appears but there aren't any new/specific definitions for it like when Fedora branches from rawhide.

Comment on lines +37 to +38
##### `exclude`
List of packages that must not be in the depsolved transaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth explaining what the purpose of the excludes are? I don't know if we have it documented anywhere for newcomers.

For example:

Excludes are useful for avoiding installing unwanted packages that might be weak dependencies of a selected package. Adding an exclude which is a hard dependency of an included package will cause the build to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what can the name include? Just the package name? The name-version? Full NEVRA? This should be specified to prevent future confusion (or attempts to use something it doesn't support).

`def` is the core of the definition file. It contains following keys:

#### `packages`
Map of package sets installed in the image type. The keys are currently defined in Go code. The values are structs, each one represents inputs for one DNF transaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention what the keys mean? That they're keywords for the purpose of each package set, e.g. build is for the build root and os is for the image itself. Other keys exist for different image types.

Copy link
Contributor

@bcl bcl Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, as well as examples. Reading this I have no idea what it really means or what it should look like. I expect that will be clearer when I get to the yaml part, but coming at it as a new users it leaves me feeling uneasy.

I would change the name, packages is incorrect and confusing. I'd call it package-sets instead. Or maybe just sets to be less verbose.

@achilleas-k
Copy link
Member

For RHEL, it's a bit of a mess, because the package set merging works slightly differently that in images, so the arrays are not sorted in the same way.

If you generate manifests without depsolving go run ./cmd/gen-manifests --packages=false you can verify (very very quickly) if the package selection is correct. The packages in the curl source options will be sorted by checksum which helps with diffing.

#### `packages`
Map of package sets installed in the image type. The keys are currently defined in Go code. The values are structs, each one represents inputs for one DNF transaction.

When merging package sets, the list of includes and excludes are simply appended together.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes it sound like the merged set is includes + exclues, which doesn't make sense. Maybe something like:
The merged package set contains all the the member set's includes appended together, and all of the excludes appended.

// Definition is a struct that represents a single image definition.
type Definition struct {
// Packages is a map of package sets installed inside the image.
// The accepted keys are currently defined by image types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined where? And an example would help.


When merging package sets, the list of includes and excludes are simply appended together.

The struct has following keys:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of struct say package set, since we're talking about the definition, not the code.


The includes are processed using [DFS postordering](https://en.wikipedia.org/wiki/Depth-first_search#Vertex_orderings).

### `def`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this exist? I'd put the package sets at the top instead of them being the single key under def.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely because @ondrejbudai considers the inheritance/merging story to be separate from the definitions in the file.

For that I understand the choice and quite like it. Definition files contain definitions and things-to-mix-together-with-those-definitions.

I would prefer to rename def to definition here; there's no real need to be terse.

Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start to moving some things which are currently literal in the code to be ... actual literals in a config format which I like.

I've commented on one or two things; mostly I don't see the immediate need for these automatic fallbacks yet and something about the file format.

I'd like to see a definition where we exclude a certain package only for RHEL < 9.6 (or include a package only in higher versions) or some of the other conditional package sets being handled so I have a better idea of how the merging is going to work in practice.

Comment on lines +11 to +14
- firstly, it tries to open `rhel/10.2/x86_64/ami.yaml`
- if it doesn't exist, it tries to open `rhel/10.2/generic/ami.yaml`
- if it doesn't exist, it tries to open `rhel/X.Y/x86_64/ami.yaml` or `rhel/X.Y/generic/ami.yaml`, with X.Y being the closest older version to 10.2 (10.1 will be prefered over 10.0)
- if it doesn't exist, the build fails
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a bit of motivation as to why we want to do this fallback dance in the first place. It seems some of it could be managed more explicitly with symlinks in the repository and it would make it (to me) easier to reason about what its doing.

Is it purely to avoid us creating those possible symlinks?


The includes are processed using [DFS postordering](https://en.wikipedia.org/wiki/Depth-first_search#Vertex_orderings).

### `def`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely because @ondrejbudai considers the inheritance/merging story to be separate from the definitions in the file.

For that I understand the choice and quite like it. Definition files contain definitions and things-to-mix-together-with-those-definitions.

I would prefer to rename def to definition here; there's no real need to be terse.

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine from a (not too deep) read, thanks for this! What is holding it in "draft"? It seems pretty complete and already quite useful?

return nil, fmt.Errorf("failed to decode file %s: %w", filepath, err)
}

// close the file as soon as possible to avoid having too many open files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) I think extracting a small helper may make this a bit more straightforward:
(and maybe definitions.File could become DefFile or Data, DataFile, FileData or something - but that is s super minor complaint):

diff --git a/pkg/definition/definition.go b/pkg/definition/definition.go
index d3e8b2cf5..4f35acc72 100644
--- a/pkg/definition/definition.go
+++ b/pkg/definition/definition.go
@@ -143,6 +143,25 @@ type configTraverser struct {
        seen map[string]bool
 }
 
+func decodeFile(dir fs.FS, filepath string) (*File, error) {
+       file, err := dir.Open(filepath)
+       if err != nil {
+               return nil, err
+       }
+       defer file.Close()
+
+       yamlDecoder := yaml.NewDecoder(file)
+       yamlDecoder.KnownFields(true)
+
+       var f File
+       err = yamlDecoder.Decode(&f)
+       if err != nil {
+               return nil, fmt.Errorf("failed to decode file %s: %w", filepath, err)
+       }
+
+       return &f, nil
+}
+
 // traverse processes the given file and all its includes (the top-level from key) and returns a list of definitions
 // in DFS post-order.
 func (c *configTraverser) traverse(dir fs.FS, filepath string) ([]Definition, error) {
@@ -156,30 +175,11 @@ func (c *configTraverser) traverse(dir fs.FS, filepath string) ([]Definition, er
        }
        c.seen[filepath] = true
 
-       file, err := dir.Open(filepath)
+       f, err := decodeFile(dir, filepath)
        if err != nil {
                return nil, err
        }
 
-       defer func() {
-               if file != nil {
-                       file.Close()
-               }
-       }()
-
-       yamlDecoder := yaml.NewDecoder(file)
-       yamlDecoder.KnownFields(true)
-
-       var f File
-       err = yamlDecoder.Decode(&f)
-       if err != nil {
-               return nil, fmt.Errorf("failed to decode file %s: %w", filepath, err)
-       }
-
-       // close the file as soon as possible to avoid having too many open files
-       file.Close()
-       file = nil
-
        var allDefs []Definition
 
        for _, include := range f.From {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants