Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check support for prerelease qualifiers #6523

Closed
pchila opened this issue Jan 13, 2025 · 10 comments · May be fixed by #6540
Closed

Check support for prerelease qualifiers #6523

pchila opened this issue Jan 13, 2025 · 10 comments · May be fixed by #6540
Assignees
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@pchila
Copy link
Member

pchila commented Jan 13, 2025

For 9.0 release Elastic Agent needs to be packaged using prerelease qualifiers, like:

  • beta
  • rc1, rc2

The goal of this issue is to check that the relative packages can be correctly generated using mage package, specifically:

  • File names contain the prerelease qualifiers
  • version command output correctly prints the specified qualifiers
  • Agent can be upgraded from/to a version containing prerelease qualifiers
  • Agent can enroll in Fleet when version contains prerelease qualifiers
  • Unified release pipelines work for versions with prerelease qualifiers
  • Release candidate can be re-released as final 9.0 version

Notes:

  • These qualifiers will be used in staging, so there should be no overlap with SNAPSHOT or IAR releases
  • At the time of writing we can create a beta version using
    AGENT_PACKAGE_VERSION="9.0.0-beta" BEAT_VERSION="9.0.0-SNAPSHOT" VERSION_QUALIFIER="beta" EXTERNAL=true  mage package
    There is still a need to check if the same can be achieved by specifying a manifest for the dependencies (that's how the unified release pipelines are invoked)
@pchila pchila added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jan 13, 2025
@pchila pchila self-assigned this Jan 13, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@cmacknz
Copy link
Member

cmacknz commented Jan 13, 2025

We need to make sure upgrades work with this, that probably requires us to know the artifacts URL structure.

Release candidate can be re-released as final 9.0 version

It would be good to confirm how far we need to take this, the independent agent release infrastructure can create new packages using previously compiled binaries. The Elastic agent package job shouldn't actually be compiling anything, but it still qualifies as a new build that would have to be tested.

We need to confirm if re-packaging binaries that were not recompiled is good enough to satisfy this, if it is we probably meet it by triggering the package job again while reusing previous DRA artifacts.

@pchila
Copy link
Member Author

pchila commented Jan 16, 2025

Performed a test using a SNAPSHOT manifest (staging manifest for 9.0.0 is not available at this time) using:

MANIFEST_URL="https://snapshots.elastic.co/9.0.0-7ab33a91/manifest-9.0.0-SNAPSHOT.json" AGENT_DROP_PATH=build/elastic-agent-drop AGENT_PACKAGE_VERSION="9.0.0-beta" BEAT_VERSION="9.0.0-SNAPSHOT" VERSION_QUALIFIER="alpha1" EXTERNAL=true mage -v clean downloadManifest package

We have an error when building docker image variants service and cloud:

Error: multiple failures: failed building elastic-agent type=docker for platform=linux/amd64 : error copying files for docker variant "service": failed to copy from build/elastic-agent-drop/archives/linux-x86_64.tar.gz/agentbeat-9.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz to build/package/elastic-agent-service/elastic-agent-linux-amd64.docker/docker-build/beat/data/cloud_downloads/agentbeat-9.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz: copy failed: cannot stat source file build/elastic-agent-drop/archives/linux-x86_64.tar.gz/agentbeat-9.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz: stat build/elastic-agent-drop/archives/linux-x86_64.tar.gz/agentbeat-9.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz: no such file or directory
failed building elastic-agent type=docker for platform=linux/amd64 : error copying files for docker variant "cloud": failed to copy from build/elastic-agent-drop/archives/linux-x86_64.tar.gz/agentbeat-9.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz to build/package/elastic-agent-cloud/elastic-agent-linux-amd64.docker/docker-build/beat/data/cloud_downloads/agentbeat-9.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz: copy failed: cannot stat source file build/elastic-agent-drop/archives/linux-x86_64.tar.gz/agentbeat-9.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz: stat build/elastic-agent-drop/archives/linux-x86_64.tar.gz/agentbeat-9.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz: no such file or directory

It seems that when a SNAPSHOT manifest is used for retrieving dependencies, SNAPSHOT flag is forced to true, so these package specs generate a wrong archive name:

source: '{{.AgentDropPath}}/archives/{{.GOOS}}-{{.AgentArchName}}.tar.gz/agentbeat-{{ beat_version }}{{if .Snapshot}}-SNAPSHOT{{end}}-{{.GOOS}}-{{.AgentArchName}}.tar.gz'

source: '{{.AgentDropPath}}/archives/{{.GOOS}}-{{.AgentArchName}}.tar.gz/connectors-{{ beat_version }}{{if .Snapshot}}-SNAPSHOT{{end}}.zip'

The code that forces the BEAT_VERSION and SNAPSHOT flags from the manifest is here (it has been added for Independent Agent Release flow in PR #4885

elastic-agent/magefile.go

Lines 1321 to 1327 in ed8e351

// When getting the packageVersion from snapshot we should also update the env of SNAPSHOT=true which is
// something that we use as an implicit parameter to various functions
if parsedVersion.IsSnapshot() {
os.Setenv(snapshotEnv, "true")
mage.Snapshot = true
}
os.Setenv("BEAT_VERSION", parsedVersion.CoreVersion())

The other artifacts have the wrong version -beta-SNAPSHOT as a side effect of the SNAPSHOT flag:

➜  elastic-agent git:(main) ✗ tree -L 1 build/distributions
build/distributions
├── elastic-agent-9.0.0-beta-SNAPSHOT-aarch64.rpm
├── elastic-agent-9.0.0-beta-SNAPSHOT-aarch64.rpm.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-amd64.deb
├── elastic-agent-9.0.0-beta-SNAPSHOT-amd64.deb.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-arm64.deb
├── elastic-agent-9.0.0-beta-SNAPSHOT-arm64.deb.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-darwin-aarch64.tar.gz
├── elastic-agent-9.0.0-beta-SNAPSHOT-darwin-aarch64.tar.gz.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-darwin-x86_64.tar.gz
├── elastic-agent-9.0.0-beta-SNAPSHOT-darwin-x86_64.tar.gz.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz
├── elastic-agent-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-linux-arm64.tar.gz
├── elastic-agent-9.0.0-beta-SNAPSHOT-linux-arm64.tar.gz.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-linux-x86_64.tar.gz
├── elastic-agent-9.0.0-beta-SNAPSHOT-linux-x86_64.tar.gz.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-windows-x86_64.zip
├── elastic-agent-9.0.0-beta-SNAPSHOT-windows-x86_64.zip.sha512
├── elastic-agent-9.0.0-beta-SNAPSHOT-x86_64.rpm
├── elastic-agent-9.0.0-beta-SNAPSHOT-x86_64.rpm.sha512
├── elastic-agent-complete-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz
├── elastic-agent-complete-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz.sha512
├── elastic-agent-complete-wolfi-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz
├── elastic-agent-complete-wolfi-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz.sha512
├── elastic-agent-ubi-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz
├── elastic-agent-ubi-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz.sha512
├── elastic-agent-wolfi-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz
└── elastic-agent-wolfi-9.0.0-beta-SNAPSHOT-linux-amd64.docker.tar.gz.sha512

0 directories, 28 files

I would like to retest this with a non-SNAPSHOT manifest to see if the side effect is still present, in which case we need to find a workaround that does not break IAR but still allows for packaging agent with a VERSION_QUALIFIER

/cc @dwhyrock @cmacknz

@dwhyrock
Copy link
Collaborator

I believe that the VERSION_QUALIFIER is not intended to be used (or at least it's not a requirement) for SNAPSHOT builds. It's only for Staging builds.

Also, it's worth noting that the Independent Agent Release will never have to support VERSION_QUALIFIER either. Those builds will always be from a version that has already been released, and there's no reason to have to do an IAR build from a major version pre-release like 9.0.0-alpha1.

I hope that helps simplify the logic for you.

@pchila
Copy link
Member Author

pchila commented Jan 16, 2025

@dwhyrock
Agreed, the VERSION_QUALIFIER, SNAPSHOT and IAR releases should have no intersection... however there is some code added for IAR that is being executed and is having some side effects, I will check if we can remove the side effect without breaking IAR (hopefully).

I am currently testing with a SNAPSHOT manifest because there's no staging manifest yet for 9.0.0 😓

I am wondering why we force SNAPSHOT=true for the agent packaging if the Manifest version contains -SNAPSHOT...
I am assuming that it's because of the 2 lines in packages.yml but maybe there's other reasons...

@dwhyrock
Copy link
Collaborator

I am wondering why we force SNAPSHOT=true for the agent packaging if the Manifest version contains -SNAPSHOT...
I am assuming that it's because of the 2 lines in packages.yml but maybe there's other reasons...

I believe it's for a couple reasons. 1) We don't want to "mix" artifacts (Snapshot vs Staging), and 2) I believe it was a way to quickly determine that the packaging is a Snapshot build. I think there are places where we skip for IAR if it's a Snapshot build.

It may not be the most clean way to do this (and perhaps it's a bit heavy-handed?). I'd be happy to do some IAR builds to help test any change you make. Just let me know.

@pchila pchila linked a pull request Jan 17, 2025 that will close this issue
8 tasks
@pchila
Copy link
Member Author

pchila commented Jan 17, 2025

Created a draft PR that will allow testing packaging as it is invoked by the unified release but using a SNAPSHOT version of dependencies using:

MANIFEST_URL="https://snapshots.elastic.co/9.0.0-7ab33a91/manifest-9.0.0-SNAPSHOT.json" AGENT_DROP_PATH=build/elastic-agent-drop AGENT_PACKAGE_VERSION="9.0.0-alpha1" BEAT_VERSION="9.0.0-SNAPSHOT" VERSION_QUALIFIER="alpha1" PLATFORMS="linux/amd64" mage -v clean downloadManifest package

However at the moment those modification broke the dev packaging so it will stay in draft until that is fixed as well.

@dwhyrock how time-consuming it would be to test the IAR on my PR branch ?

@dwhyrock
Copy link
Collaborator

I'm going to create a draft PR in unified-release so that it calls your branch. i'll post back when I have some sort of result.

@dwhyrock
Copy link
Collaborator

@pchila I was able to execute your branch via the Independent Agent Release Staging pipeline, but only after getting help figuring out a workaround to be able to use your branch from your forked elastic-agent repo. There's no current way to specify a fork branch when triggering another pipeline. Thanks for the help, @brianseeders !

New "Fork Trampoline" Checkout Method

To get it to work, I created an elastic-agent branch that essentially does a post-checkout hook that adds a remote repo (your fork), and then checks out your branch. Here's the branch/PR that does that: #6546

(As of now, I don't necessarily think it should be merged, but can be useful in this testing).

I also have a branch in the unified-release repo that tells the IAR Staging pipeline to trigger the elastic-agent-package pipeline with my elastic-agent branch, but also pass in two env vars that effectively point to your forked repo: https://github.com/elastic/unified-release/pull/1823

Current Bug

Currently, it looks like there's an error in the packaging part, where it is looking for a connectors package with the IAR version (8.17.0+build202501172033) and not the previously-released version (8.17.0).

https://buildkite.com/elastic/elastic-agent-package/builds/3837#01947602-c562-445b-9559-af7eec694117/6-55

I believe most Agent dependencies should be using artifacts with the previously-released version, and only the final Elastic Agent Package artifacts should contain the new IAR version.

There are daily IAR Staging builds for 8.16 and 8.17 if you need to look at them.

For Subsequent Testing of IAR Staging With Your Branch

Here is the IAR Staging pipeline that can be run (on my unified-release branch) that will use the HEAD of your forked branch when triggering the elastic-agent-package pipeline: https://buildkite.com/elastic/independent-agent-release-staging/builds?branch=dougw-test-pchila-version-qual-agent-branch

So, whenever you have an update pushed in your forked branch, you should be able to hit "New Build" from the above link and it will use your latest changes.

There is a holiday for US folks on Monday, so I will be out, but you can run new builds from that pipeline I just linked to exercise the IAR Staging build. No parameters/env vars needed, and it won't affect any release if you run it as-is.

@pchila
Copy link
Member Author

pchila commented Jan 20, 2025

Since the current code on main is able to package version with pre-release qualifiers, I will hold off for a bit on #6540 in order to reduce risk for 9.0.0 release.

Even though in its current form on main it's difficult to test (we need to have the same non-SNAPSHOT version in the manifest and in the agent package, using different dependencies version has weird side-effects) it's safer not to change the packaging unless it's absolutely necessary and for 9.0.0 it isn't.

Thanks @cmacknz @dwhyrock for your feedback and your checks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants