Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rhel10: move os package sets to yaml files #1104

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ondrejbudai
Copy link
Member

This is my RFC PR for moving package sets outside Go into yaml.

The code is of a PoC quality. The purpose is to start the conversation. However, I believe that after minor changes (converting all RHEL versions, or adding code to limit the change to RHEL 10), it could be actually deployed to production (this would be a yolo thing, though).

All terms are not final. I used spec because it's not used anywhere else in the Image Builder stack so there should be no confusion.

What I did

Image spec format

I introduced a new yaml-based format called image spec. Spec is a simple format for defining image properties. It's yaml-based, supports inheritance, and deduplication across multiple distro versions.

A quick snippet showing rhel 10's qcow2:

includes:
  - ../common.yaml
  - ../../centos-10/generic/qcow2.yaml
spec:
  packages:
    - subscription-manager-cockpit

I hope that this snippet is self-explaining, RHEL 10 qcow2 is composed of a common RHEL 10 spec, a Centos 10 qcow2 spec, and an extra package.

The whole specification is in specs/README.md.

Conversion of Centos Stream 10 and RHEL 10 and to image spec

Os package sets of both CS10 and RHEL 10 are now fully converted to image specs. There should not be any functional changes. The image specs are embedded using go:embed, so everything feels the same.

I might need help from @achilleas-k to verify that nothing really changed, the generated manifests have different UUIDs for reasons that I don't understand.

Overriding embedded image specs

Embedded image specs can be fully overriden. Just pass the spec directory via OSBUILD_IMAGES_IMAGE_SPECS_DIR, and the library won't use the internal bits.

Next steps

We can start moving more and more stuff into yaml. I think that the next piece could be base partition tables:

spec:
  packages:
    - "@core"
  exclude_packages:
    - wifi-drivers
  base_partition_table:
    type: gpt
    partitions:
      - size: "10 GiB"
      - mountpoint: /
      # and more

The ultimate goal is that the Go code doesn't know anything about what's e.g. an ami. The yaml files drive the available image types. There are some gaps in the current spec definition for this to happen, but we can iterate on them.

@achilleas-k achilleas-k self-requested a review December 12, 2024 10:51
@achilleas-k
Copy link
Member

I might need help from @achilleas-k to verify that nothing really changed, the generated manifests have different UUIDs for reasons that I don't understand.

Have you tried setting OSBUILD_TESTING_RNG_SEED=1?

spec is a simple format for defining image properties. It's yaml-based,
supports inheritance, and deduplication across multiple distro versions.
This commit defines the format, and adds a library for parsing it.
Additionally, there's a spec-flatten command that allows people to
flatten spec files.
This commit moves all os pipeline packages in the new spec format.
There should be no functional changes in images.
People don't like the packages that we picked for individual types? They
can now create their own spec directories, and pass them via
OSBUILD_IMAGES_IMAGE_SPECS_DIR.

This obviously should not be an environment variable, but this is the
easiest first step that we can make.
@ondrejbudai
Copy link
Member Author

OSBUILD_TESTING_RNG_SEED=1

Thanks! That did the trick for centos, it turns out that the x86_64 image installer was missing grub2-pc, so I added a arch-specific file for it. :)

For RHEL, it's a bit of a mess, because the package set merging works slightly differently that in images, so the arrays are not sorted in the same way. Didn't we invent something for this for otk?

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot! Some tiny ideas inline and maybe worth brainstorming on some naming)

@@ -284,6 +290,47 @@ func (t *ImageType) Manifest(bp *blueprint.Blueprint,
staticPackageSets[name] = getter(t)
}

var itSpec struct {
Spec struct {
Packages []string `yaml:"packages"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Packages struct {
   Include []string
   Exclude []string
}

it would be a deeper nesting but also more symmetric? Plus it seems most image types have some excludes (even tar!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: the current merge rules and their implementation merge lists just for top-level keys, so moving Include and Exclude under Packages would break everything. However, we can alter the merge rules:

Dictionaries:
If both values are dictionaries, perform a recursive merge:
For each key:
If the key exists in both, merge the corresponding values.
If the key exists in only one, use that value.

Lists:
If both values are lists, merge them:
Append items from the second list to the first.
Deduplicate items, maintaining all unique entries.

Scalars (Strings, Numbers, Booleans, Null):
If both values are scalars, the value from the second structure overrides the first.

Type Mismatch:
If the types of values for a given key differ, the merge is not allowed. This results in an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: are these rules a good default? Are there easy to reason about? I have no good answers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, sorry, we can start with the exiting top-level logic, right now the format is internal and as long as we do not have external image defintions we can always refactor. So +1 with starting from that (even though I think we should get something better soon(ish)).

}

func fileExists(dir fs.FS, filepath string) bool {
f, err := dir.Open(filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a small comment there that the more common _, err := os.Stat(filepath); return err == nil will not work because it operates on a fs.FS? // fs.FS has no Stat() so we actually need to test-open the file (or something)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what I was thinking about, it shouldn't be a problem to use Stat. It's still very much prone to TOCTOU, but I cannot see how to solve this without having all files open, and I guess it's not a huge deal.

@@ -0,0 +1,58 @@
# Image specs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion but maybe it is worth brainstorming the name here a bit. Spec sounds like rpm-spec to me but maybe that is just me. Alternative ideas for the dirname

  • imgtypes
  • types
  • images
  • image_definitions
  • imgdefs
  • imgspecs (if we want to keep specs)
    ...

@supakeen
Copy link
Member

Very cool and very necessary. As for naming, specifications or definitions both sounds fine.

You've only taken the RHEL packages here but in the Fedora package sets we have a "lot" of conditionals such as only enabling certain packages for certain versions. How would we handle this with the YAML files here?

@ondrejbudai
Copy link
Member Author

I think I'm leaning towards the term "image definitions". We always refered to the go files as definitions, so I think we can continue using that term, just switch the "driver" to yaml. So I think I would rename the specs directory to definitions, and the spec package to defparser.

You've only taken the RHEL packages here but in the Fedora package sets we have a "lot" of conditionals such as only enabling certain packages for certain versions. How would we handle this with the YAML files here?

I think that Fedora would benefit from less deduplication than RHEL. So, one directory == one version, no sharing between them. I think that our long-term dream was/is to extract these yamls into a separate repository, and use the traditional branching model.

So for now, I suggest we just have three dirs: fedora-40, fedora-41 and fedora-rawhide (or 42, dunno).

Later, when they are in a separate repo (and maybe shipped in a separate RPM?), we can either teach images how to load them from system, or keep it simple and just embed them using submodules:

[submodule "fedora-40"]
        path = definitions/fedora-40
        url = https://pagure.io/fedora-kickstarts.git
        branch = f40
[submodule "fedora-41"]
        path = definitions/fedora-41
        url = https://pagure.io/fedora-kickstarts.git
        branch = f41
[submodule "fedora-rawhide"]
        path = definitions/fedora-rawhide
        url = https://pagure.io/fedora-kickstarts.git
        branch = main

Note that for CentOS/RHEL, I think I would keep the deduplication (as it exists in the go code) as much as possible. We know that it works for us, and for RHEL we are basically the maintainers. We can always reconsider this later.

I don't like that I'm basically proposing different models for Fedora and RHEL, but I think it sorta makes sense given the differences between these two distros.

@cgwalters
Copy link
Contributor

cgwalters commented Dec 16, 2024

I'm a bit surprised there's no apparent comparisons or references to other extant image format definitions already used in the Fedora-derivative ecosystem? There are many:

  • https://github.com/minimization/content-resolver-input/ which many of us are forced to maintain (I think it has implicit includes?)
  • kickstart (also has file includes, package excludes, etc.)
  • rpm-ostree's tree file (also quite similar)
  • comps is also related
  • Finally of course also of high importance is spec files which many people are forced to understand and maintain anyways
  • Edit: Oh yeah and the Kiwi XML https://pagure.io/fedora-kiwi-descriptions (not my favorite, but already exists and is used for better or worse)

The thing that drove me towards spec files (a conclusion which honestly I didn't want to get to) is documented here https://gitlab.com/fedora/bootc/base-images-experimental/-/blob/main/README-build-rationale.md?ref_type=heads#key-aspect-reusing-spec-files

That whole file has also a lot of related discussion.

Certainly getting packages out of code into a declarative file format is something that makes obvious sense to me, but introducing a new package-list-in-YAML/toml/XML/specfile that isn't one of the above seems like it needs some rationale. Every single time someone has introduced one it has just been a new file format only understood by one tool.

Picking just one is impossible, but my personal inclination is to try to have more tools accept multiple formats - that was the idea behind coreos/rpm-ostree#5119 (and I also have no problem having it accept blueprints too, but honestly they're pretty verbose for this use case).

@achilleas-k
Copy link
Member

Certainly getting packages out of code into a declarative file format is something that makes obvious sense to me, but introducing a new package-list-in-YAML/toml/XML/specfile that isn't one of the above seems like it needs some rationale.

The rationale is that this is the first step to externalising our image definitions. These are currently defined as a combination of:

Now there's clearly a lot of history (and technical debt) here. Some of the things in these configuration structures can be simplified, squashed together, etc, but the fact is that they're our current state and our starting point for this process.

Regardless, what we expect to end up with is some transformation (simplification) of all of these into an internal structure that's useful for us, for everything we need.
otk was our fist big attempt at this (see for example everything under otk.define here https://github.com/osbuild/otk/blob/bf22601c1557cf2ff19638c6f2a6e328210dad18/example/centos/centos-9-aarch64-ami.yaml).

I don't think something like kickstart (or any of the other examples in your list, at least the ones I'm familiar with) is going to cut it. Or if it can, then we'd need to start parsing kickstarts to make partition tables, or image configs, or os customizations, and that's not only a lot of work, it's also kinda pointless. We're not going to be working with an internal representation of a kickstart.

What we're doing here is taking code structures and pulling them out (and simplifying, either during the extraction of after the fact). We're not creating an end-user-facing configuration format, just making our lives easier in a way that will hopefully make it easier for contributors (and hopefully distro maintainers) to modify image configurations without needing to navigate (or write) code and without needing to recompile go binaries.

Oh and XML is out of the question 😄

@cgwalters
Copy link
Contributor

Of course, I think this is an improvement and obviously none of my comments should be construed as trying to block anything. But at the same time, we can and should think about how the evolution of this fits in with at least some of the other ones. In the course of needing to succeed at doing my job I have had to interact with (or at least think about) relatively recently with every single one of the above things, and this would just add a new one to the mix.

@achilleas-k
Copy link
Member

In the course of needing to succeed at doing my job I have had to interact with (or at least think about) relatively recently with every single one of the above things, and this would just add a new one to the mix.

I understand where you're coming from. There's a long-running struggle to figure out the target audience for each configuration format and level. The configuration formats you mentioned are, in my opinion, serving similar purposes to our blueprints and blueprints are definitely the weakest of all of those.

I think a conceptual problem here is that every one of the configuration formats you mentioned span the whole range of use cases from lightly customizing a base image configuration to distro/spin image configuration from scratch, and we always kept those two ends separate. A big part of the value of image builder is that (for example) selecting an ami as your image type includes a lot of decisions for how RHEL should be run on aws. That's the stuff we have in code and that's what we want to extract in these image definitions. What you get with blueprints is enough knobs and switches to modify that base ami to fit your needs, but not so many that you can turn it into an Azure image, or worse, make it unusable in AWS.

Now, you're right, the other configs are serving a similar (or sometimes the same) purpose as what we're doing here: they let you define an image configuration pretty much from scratch, with some useful abstractions. But a lot of what we're defining here is coupled with how osbuild works (to varying degrees). Pulling in a configuration format from another project doesn't really serve any purpose. We could write a parser that reads a kickstart's partitioning directives, or a Kiwi <partitions></partitions> block, to instantiate a disk.PartitionTable (which in turn gets converted to an osbuild pipeline that creates a disk image), but who would that be for? What's the value in that when we can write some yaml that matches disk.PartitionTable for our internal image definitions (e.g. "rhel-9/ami-pt.yaml is the default partition table for for RHEL 9 AMIs"), which lets us and certain types of users modify those base image types at runtime, and provide blueprints for everyone else? And what happens when we need to encode configurations that aren't covered by another config format? I'm not saying we have unique image building requirements that no other build tool has come across, but we're talking about our internally-defined image configurations here. It needs to cover everything we can do to differentiate one image from another. More importantly, these image definitions will be required for populating the internal image configurations of image builder. In other words, without a rhel-9/ami-pt.yaml there can be no RHEL 9 AMI; there is no autopart or similar without them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants