Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use self-hosted runner #84

Merged
merged 8 commits into from
Mar 19, 2024
Merged

Use self-hosted runner #84

merged 8 commits into from
Mar 19, 2024

Conversation

juanep97
Copy link
Owner

@juanep97 juanep97 commented Mar 18, 2024

This PR changes the CI to use the self-hosted runners provided by the VHEGA group.

The self-hosted runners are ephemeral and provisioned on demand, just like the GitHub-hosted runners, but provide the astrometry index files, removing the need of downloading them every time. This was the main reason why tests were previously failing. They also provide slightly increased RAM and CPU, making tests faster.

This solves #19, finally and probably once and for all.

@juanep97 juanep97 marked this pull request as draft March 18, 2024 11:44
@juanep97 juanep97 marked this pull request as ready for review March 18, 2024 15:09
Comment on lines 74 to 79
- name: Download test data
env:
TEST_DATA_PASSWORD: ${{ secrets.test_data_password }}
run: |
export TESTDATA_MD5SUM=`grep 'TESTDATA_MD5SUM' ./tests/conftest.py | awk -F"'" '{print $2}' | tr -d '\n'`
wget --post-data "pass=$TEST_DATA_PASSWORD" "https://vhega.iaa.es/iop4/iop4testdata.tar.gz?md5sum=$TESTDATA_MD5SUM" -O $HOME/iop4testdata.tar.gz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even avoid this step in the self-hosted runner, no?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can provide the test dataset to the runners the same way as the astrometry index files.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even avoid this step in the self-hosted runner, no?

By providing the test dataset we could save ~1min of download, and if we provided it already uncompressed, ~2min more (~3min total). But this would require modifying the tests themselves, since the unpacking is done there. We can do this on a different PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this can be left untouched

@juanep97
Copy link
Owner Author

Mar 19 11:16:07 garm-nvf7a1AWg8GC systemd[1]: actions.runner.juanep97-iop4.garm-nvf7a1AWg8GC.service: Failed with result 'oom-kill'.

This happens inside the runner. We might increase a bit the host memory or reduce the parallelization in the tests. Maybe this was the reason it was not working in the github runners when the astrometry index files were correctly mounted.

@morcuended
Copy link
Collaborator

Mar 19 11:16:07 garm-nvf7a1AWg8GC systemd[1]: actions.runner.juanep97-iop4.garm-nvf7a1AWg8GC.service: Failed with result 'oom-kill'.

This happens inside the runner. We might increase a bit the host memory or reduce the parallelization in the tests. Maybe this was the reason it was not working in the github runners when the astrometry index files were correctly mounted.

Do you have an estimate on how much memory was used? You could reduce the a bit parallelization

@juanep97
Copy link
Owner Author

Mar 19 11:16:07 garm-nvf7a1AWg8GC systemd[1]: actions.runner.juanep97-iop4.garm-nvf7a1AWg8GC.service: Failed with result 'oom-kill'.

This happens inside the runner. We might increase a bit the host memory or reduce the parallelization in the tests. Maybe this was the reason it was not working in the github runners when the astrometry index files were correctly mounted.

Do you have an estimate on how much memory was used? You could reduce the a bit parallelization

Almost 2GB. The host had 2GB, so the runner can't use 2GB fully, and the OOM killed it when it tried to use more.

@rlopezcoto
Copy link
Collaborator

👏 👏 👏

@juanep97 juanep97 changed the title test self-hosted runner Use self-hosted runner Mar 19, 2024
@juanep97
Copy link
Owner Author

juanep97 commented Mar 19, 2024

Mar 19 14:14:14 tinyvm1 kernel: [ 9630.258394] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/lxc.payload.garm-HlMWiI7rNFen/system.slice/actions.runner.juanep97-iop4.garm-HlMWiI7rNFen.service,task=pytest,pid=18016,uid=1001000
Mar 19 14:14:14 tinyvm1 kernel: [ 9630.258417] Out of memory: Killed process 18016 (pytest) total-vm:5020352kB, anon-rss:2385292kB, file-rss:0kB, shmem-rss:4kB, UID:1001000 pgtables:8744kB oom_score_adj:500

Weird, now it is the host killing the lxc containers. Let me investigate.

@@ -22,6 +22,7 @@ env:


jobs:

static-code-checks:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the static checks step does nothing but setting up python, it could be used for formatting, lining (unrelated to this PR)

Comment on lines 74 to 79
- name: Download test data
env:
TEST_DATA_PASSWORD: ${{ secrets.test_data_password }}
run: |
export TESTDATA_MD5SUM=`grep 'TESTDATA_MD5SUM' ./tests/conftest.py | awk -F"'" '{print $2}' | tr -d '\n'`
wget --post-data "pass=$TEST_DATA_PASSWORD" "https://vhega.iaa.es/iop4/iop4testdata.tar.gz?md5sum=$TESTDATA_MD5SUM" -O $HOME/iop4testdata.tar.gz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this can be left untouched

@juanep97
Copy link
Owner Author

Tests passing, I think it can be merged now.

@morcuended morcuended merged commit 1949d86 into main Mar 19, 2024
3 checks passed
@morcuended morcuended deleted the test_selfhosted branch March 19, 2024 17:06
juanep97 added a commit that referenced this pull request Mar 19, 2024
morcuended pushed a commit that referenced this pull request Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants