Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - PR pytests are failing, possibly because repo env vars are not available #2783

Closed
tylergraff opened this issue Oct 21, 2024 · 12 comments · Fixed by #2790
Closed

[BUG] - PR pytests are failing, possibly because repo env vars are not available #2783

tylergraff opened this issue Oct 21, 2024 · 12 comments · Fixed by #2790
Assignees
Labels
area: CI/CD 👷🏽‍♀️ needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working

Comments

@tylergraff
Copy link
Contributor

Describe the bug

pytests are failing for at least several PRs. Logs indicate that an exception is being raised by many tests due to various env vars not being set:

==================================== ERRORS ====================================
_ ERROR at setup of test_set_config_from_environment_variables[nebari_config_options1] _
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/nebari-dev/lib/python3.10/site-packages/_pytest/runner.py", line 342, in from_call
  
  <snip>
  
    File "/usr/share/miniconda/envs/nebari-dev/lib/python3.10/os.py", line 680, in __getitem__
    raise KeyError(key) from None
KeyError: 'AWS_ACCESS_KEY_ID'

See full logs here:
https://github.com/nebari-dev/nebari/actions/runs/11386526924/job/31851899812?pr=2752

Expected behavior

PR Pytests should pass.

OS and architecture in which you are running Nebari

This is an issue with the existing github Nebari test pipeline.

How to Reproduce the problem?

Example failing PRs:
#2752
#2730

Command output

No response

Versions and dependencies used.

No response

Compute environment

None

Integrations

No response

Anything else?

No response

@viniciusdc
Copy link
Contributor

Thanks for raising the issue, @tylergraff . Indeed, this has come to our attention since last week. We are not entirely sure why this started happening now. My main guess is that this was related to our change in branches and some internal global environment variables from the nebari-dev org.

This is quite troublesome since it blocks development in any given PR upstream that alters any files under src. In the ideal world, these falling tests (likely affecting all providers, at least AWS and GCP so far) should not need credentials at all and only do so due to the test init part of the unit tests under it.

The best-case scenario would be to decouple such checks completely from contests since we already have a separate workflow. However, a more likely fix will be to mock the cloud API requests that require credentials using pytest.mock.

Another quick fix for any affected PR right now would be to copy the vault block that we have under our cloud provider tests workflow and have a similar step in these falling tests to populate the env vars with the missing credentials.

@tylergraff
Copy link
Contributor Author

@viniciusdc the vault block fix - it seems this should be a separate PR which gets merged to fix the tests. Do you agree?

@joneszc
Copy link
Contributor

joneszc commented Oct 22, 2024

@tylergraff @viniciusdc
I created a new Fork on nebari-dev main as opposed to using the MetroStar/nebari, which is forked on the previous default develop branch.
Unfortunately, my new test PR#2788, forked on main branch, still fails the same tests

I'm curious to see if this commit from last month is causing an error to be reported

@viniciusdc
Copy link
Contributor

Re-opening as the issue still persists, though the work done in the PR above, helped clear some related matters.

@viniciusdc viniciusdc reopened this Oct 23, 2024
@github-project-automation github-project-automation bot moved this from Done 💪🏾 to In progress 🏗 in 🪴 Nebari Project Management Oct 23, 2024
@joneszc
Copy link
Contributor

joneszc commented Oct 23, 2024

Hello @viniciusdc @marcelovilla

I'm wondering, is there a reason this commit did not include .github/workflows/test.yaml in the Rename default branch to main and get rid of develop as a target update ?

image

The develop branch appears to be included still as a target for Pytest tests.

@marcelovilla
Copy link
Member

Hey @joneszc. No particular reason other than I probably missed that one. Happy to open another PR removing it.

@joneszc
Copy link
Contributor

joneszc commented Oct 23, 2024

Hey @joneszc. No particular reason other than I probably missed that one. Happy to open another PR removing it.

@marcelovilla
it could be coincidence, but we are seeing PRs fail specifically those Pytest tests from that workflow. We are troubleshooting whether the Pytest failures are related to renaming develop branch to main

Will you have time to add the PR today for removing develop from test.yaml branches ?
Thanks

@marcelovilla
Copy link
Member

@joneszc I've opened #2792. Would appreciate your review there.

@marcelovilla
Copy link
Member

@joneszc now that the PR was merged, are you still seeing the Pytest failures?

@dcmcand
Copy link
Contributor

dcmcand commented Oct 24, 2024

Looking into this, I don't think the reason for the pytest failures is anything to do with branches or env vars. Here is what I found.

I wanted to make sure this wasn't due to the PR coming from a forked repo since secrets are not shared in PR's from a forked repo. To test this made a branch in the nebari repo and merged @joneszc 's PR (#2788) into my branch. Result was CI still failed. This eliminated the forked repo as the cause.

I wanted to eliminate the possibility that there was something to do with the secrets not being available for some reason. I could not see a reason they wouldn't be, but more importantly, I could not see a way that the secrets were actually being used in the unit test workflow. Regardless, I explicitly made them available in e37bc5c. The result was the error changed, but the workflow still failed. Now instead of a key error it said the provided token was invalid. This told me 2 important things. First, the action had access to secrets. Second the action was not using the secrets when it passed. This led me to believe that this actually represented actual failing unit tests, and not a problem with the pipeline.

I next wanted to investigate how the other tests were passing, and looking at tests/tests_unit/conftest.py I saw that there were pytest fixtures for other assets such as instances, kubernetes versions, etc. This PR introduces a new AWS resource check, a KMS key check, but does not add the pytest mock for unit tests. This is why tests were previously passing, but now are failing. The relevant fixture is at

In summary, I believe that if @joneszc adds the relevant mock to the pytest fixture his code will pass and this issue can be closed.
I did a quick POC at #2794 and you can see tests are passing. Obviously @joneszc will want to be a real implementation with good test coverage, but this at least shows that this will resolve the issue.
This looks like the issue for the other failing PR's as well

@viniciusdc viniciusdc added this to the 2024.9.2 milestone Oct 24, 2024
@viniciusdc
Copy link
Contributor

viniciusdc commented Oct 24, 2024

I next wanted to investigate how the other tests were passing, and looking at tests/tests_unit/conftest.py I saw that there were pytest fixtures for other assets such as instances, kubernetes versions, etc. This PR introduces a new AWS resource check, a KMS key check, but does not add the pytest mock for unit tests. This is why tests were previously passing but are now failing. The relevant fixture is at

Nice, that's great! It explains why we only saw the fails when a change was made to the provider checks. So, this needs to be documented in the unit-tests file or the provider checks so that we are more aware of this in the future. Thanks a lot for the summary and for having a look at this @dcmcand 🚀

@dcmcand
Copy link
Contributor

dcmcand commented Oct 28, 2024

The suggested changes resolved this issue. Closing

@dcmcand dcmcand closed this as completed Oct 28, 2024
@github-project-automation github-project-automation bot moved this from In progress 🏗 to Done 💪🏾 in 🪴 Nebari Project Management Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: CI/CD 👷🏽‍♀️ needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

5 participants