Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an integration test to emulate an air gapped agent upgrade #3403

Closed
Tracked by #2176
cmacknz opened this issue Sep 12, 2023 · 3 comments · Fixed by #3724
Closed
Tracked by #2176

Add an integration test to emulate an air gapped agent upgrade #3403

cmacknz opened this issue Sep 12, 2023 · 3 comments · Fixed by #3724
Assignees
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@cmacknz
Copy link
Member

cmacknz commented Sep 12, 2023

We recently introduced a bug that broke the upgrade process for agents deployed in air gapped environments: #3368. One root cause for this is that we have no automated testing for air gapped deployments, largely because doing this for the entire stack in an automated manner would be challenging considering the automation itself needs network access to run the test.

While we likely can't create a true air gapped environment, we should be able to emulate one. The Elastic Agent only makes three outgoing network connections to known hosts, only one of which is likely to cause unique problems in an air gapped environment:

  1. A connection to Fleet Server. This connection must exist in all deployments, and whether it is on an external network or internal network makes no difference to the Elastic Agent. For air gapped environments the majority of the difference is in the network setup and infrastructure.
  2. A connection to the selected output, usually Elasticsearch. This connection must exist in all deployments, and is in the same category as Fleet Server above.
  3. A connection to the Binary Download location that is only used during upgrades. This defaults to https://artifacts.elastic.co/downloads/ and can be customized in the agent policy.
  4. A connection to the GPG fallback URL, which cannot be changed at the time of writing and defaults to https://artifacts.elastic.co/GPG-KEY-elastic-agent.

defaultUpgradeFallbackPGP = "https://artifacts.elastic.co/GPG-KEY-elastic-agent"

For the purposes of an air gapped upgrade, we only need to care about connections 3 and 4 which default to the public artifacts.elastic.co URL. We should be able to write a test that maintains network connectivity but blocks requests to this host from succeeding unless the connection in 3 is changed to a reachable alternative (ideally a local file server) and the failure from the connection in 4 is ignored as described in #3368.

The scope of this issue is to implement this test and prove that it adequately simulates an air gapped upgrade by verifying that the test fails until the binary download location is changed and specifically verifies that the fallback GPG URL request error is ignored.

@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Sep 12, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@jlind23
Copy link
Contributor

jlind23 commented Sep 26, 2023

@michalpristas assigning this to you for next sprint as your gpg test is strongly related.

@cmacknz
Copy link
Member Author

cmacknz commented Oct 10, 2023

We should actually block *.elastic.co entirely and not just artifacts.elastic.co since the exact hostname varies depending on what we are doing. artifacts is the prefix for production releases, in tests it is likely to be snapshots.elastic.co for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants