Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub CI actions for the test can not work because of the private dataset and the need for astrometry index files. #19

Closed
juanep97 opened this issue Sep 13, 2023 · 12 comments · Fixed by #29 or #84
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@juanep97
Copy link
Owner

It is currently impossible to run all tests in GitHub CI Actions because a) it needs a currently private dataset (~22MB) and b) astrometry index files to test the astrometric calibration (~36GB, although it can probably be reduced to less than half of it).

Suggestions?

@morcuended
Copy link
Collaborator

morcuended commented Sep 14, 2023

Solution for a) could be to download the private dataset in the CI before the tests (as done in lstchain https://github.com/cta-observatory/cta-lstchain/blob/a8058123c2ab44e7d19dbc44e8de83a1158e562b/.github/workflows/ci.yml#L82-L87).

For b) I see no easy workaround to the usage of ~10 GB files. Maybe implement specific tests that do not make use of the entire astrometry index files, say to reduce the astrometry test files to ~10-100 MB (if possible)

@juanep97
Copy link
Owner Author

While we think or fix this to reduce all data needed down to the limits by GitHub, we might think of implementing a 'test' api endpoint in iop4api, that checkouts to a given branch (in the official repo for safety) and runs the test, returning the result. Then the CI action will be sending the branch name and testing the result.

@morcuended
Copy link
Collaborator

While we think or fix this to reduce all data needed down to the limits by GitHub, we might think of implementing a 'test' api endpoint in iop4api, that checkouts to a given branch (in the official repo for safety) and runs the test, returning the result. Then the CI action will be sending the branch name and testing the result.

but how do you deal with astrometry index files?

What if GH action connects to one of our machines (with the astrometry files already downloaded) and then communicates the result back to the CI action (?)

@juanep97
Copy link
Owner Author

juanep97 commented Oct 5, 2023

Yes, that is what I meant. Implementing a simple api endpoint, that runs the test in our servers, either directly or in a container. It does not need to be part of IOP4 itself if we don't want to.

@juanep97
Copy link
Owner Author

juanep97 commented Oct 5, 2023

Although, since the client side will be part of this repo (in .github/workflows), we might as well include the server side. I propose adding a 'view_ci_test_client` to iop4api that receives the branch and commit, runs the tests in a container for that specific version, and returns the result when it has finished.

@morcuended
Copy link
Collaborator

ah, good!

Although, since the client side will be part of this repo (in .github/workflows), we might as well include the server side. I propose adding a 'view_ci_test_client` to iop4api that receives the branch and commit, runs the tests in a container for that specific version, and returns the result when it has finished.

+1

@juanep97 juanep97 added enhancement New feature or request help wanted Extra attention is needed labels Oct 5, 2023
@morcuended
Copy link
Collaborator

morcuended commented Oct 9, 2023

I'm testing this approach at https://github.com/morcuended/iop4/pull/1. It basically consists of:

  • Dockerfile to be run in the server (getting the branch of the PR, installing that IOP4 version and running the tests)
    • It has to mount the directory where the astrometry files are located in the server
  • Github action to connect to the server
    • I still have to properly configure the VPN connection to access our server

After discussing this, accessing our server through VPN is not the best approach here. I'll follow @juanep97 suggestion:

Although, since the client side will be part of this repo (in .github/workflows), we might as well include the server side. I propose adding a 'view_ci_test_client` to iop4api that receives the branch and commit, runs the tests in a container for that specific version, and returns the result when it has finished.

@juanep97
Copy link
Owner Author

juanep97 commented Oct 9, 2023

So to summarize, let's build something like

iop4api/views/test.py

# iop4lib config
import iop4lib
iop4conf = iop4lib.Config(config_db=False)

# django imports
from django.http import JsonResponse, HttpResponseBadRequest

# iop4lib imports

# other imports

#logging
import logging
logger = logging.getLogger(__name__)

def test(request):
    # run tests in the docker here, return the result as below
    return JsonResponse({'data': 'test'})

iop4api/urls.py

urlpatterns = [
    path('', views.index, name='index'),
    path('api/test/', views.test, name='test'), # make the view accessible in some url 
    path('api/login/', views.login_view, name='login_view'),

Then it can be queried from the CI cli as

$ curl -i -s -H "Content-type: application/json" -H "Accept: application/json" 'http://localhost:8000/iop4/api/test/'
HTTP/1.1 200 OK
Date: Mon, 09 Oct 2023 16:34:42 GMT
Server: WSGIServer/0.2 CPython/3.10.9
Content-Type: application/json
X-Frame-Options: DENY
Content-Length: 16
X-Content-Type-Options: nosniff
Referrer-Policy: same-origin
Cross-Origin-Opener-Policy: same-origin

{"data": "test"}%   

Omitting the -i option will return only the result, not the whole HTTP header. Authentication can be as simple as setting a github secret with some token and sending it in the request, then comparing it with a value that can be stored in a ignored file in the repo or somewhere else.

@juanep97
Copy link
Owner Author

juanep97 commented Oct 14, 2023

After some thought, I see other possible two solutions, but I have not made any of them work yet:

Let's give it some thought or try to see if any of them also work. I think both of them will require less coding / maintenance.

@juanep97 juanep97 linked a pull request Oct 15, 2023 that will close this issue
@juanep97
Copy link
Owner Author

It seems that for the moment, PR #29 can solve the issue.

It mounts the astrometry cache as a remote directory, so the astrometry module does not try to download all the index files (5200 series, scales 0-4, ~34 GB, which was making tests fail). Instead it "only" tries to access and downloads ~11 GB (using cache). This is enough for tests to pass.

This ~11GB can probably be reduced by using less scales in the tests, but I see no reason to bother, ~15min is acceptable, I think.

If in the future we need to use more astrometry index files or a larger dataset, going over the limits or taking too much time, we might think of implementing the self hosted action runner solution inside isolated vms.

@juanep97
Copy link
Owner Author

@juanep97
Copy link
Owner Author

juanep97 commented Jan 16, 2024

The problem seems to be alleviated -but not solved- in #53 by using httpdirfs in a single thread and not running the tests for python 3.10.9 and 3.11 in parallel.

@juanep97 juanep97 linked a pull request Mar 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
2 participants